OPTiCAL Benchmark

This repository contains data and code for using the OPTiCAL positional reasoning benchmark for VLMs. Models are asked which shape is furthest in one direction in a set of images, and the primary metric of performance is Accuracy.

To start, get the dataset used in our paper from OSF or generate your own data using the Shape Maker. See the Accessing Data and Shape Maker sections for details.

With the data in hand, label it using the data labeler. See the Labeling Data section for details.

Finally run the inference scripts run_models.sh or cal_model.py to run inference on all or individual models. See the Running Inference section for details.

Setup

Setup has two steps.

First, rename the env file provided with this repository to the hidden name .env. On linux, run

	cp env .env

Second, create mamba environments for each of the models used in our experiment using mamba. For each file in the dependencies directory, run

	mamba create -f <env>

replacing <env> with the appropriate yml file.

Accessing Data

Users are free to download the data (n=30,000) used in our paper from OSF, to generate novel samples with the Shape Maker script available in this repo, or to use the sample data (n=300) included in the repository. Using the original data, place it into the data/imgs directory. Continue to the Labeling Data section. Please adjust the N_IMG and N_IMG_GEN parameters in the env file to the correct size for your dataset.

Note that the data on OSF are zipped using the Linux zip utility. Please unzip the data using unzip before attempting to use it.

Shape Maker

As an alternative to downloading the original data, this repository contains a Python script generates a customizable grid of geometric shapes using matplotlib. Each shape can be defined with a distinct color. The resulting plot is either rendered as a PNG or shown in a GUI window

Capabilities

Supports the following shapes:
- circle, square, triangle, rectangle, pentagon
- Directional wedges (partial circles): upper_wedge, lower_wedge, left_wedge, right_wedge
Custom colors for each shape
Easily scalable grid layout
CLI interface for integration with other tools or batch generation
Blank spaces using the keyword none

Run Example

python shape_maker.py 
  --shapes "circle,square,triangle;none,rectangle,pentagon" 
  --colors "red,green,blue;none,orange,purple"

Note that you will not be able to see the plot unless you include plt.show() at the end of the script. Also, the parameter N_IMG_GEN in the env file controls the number of shapes generated.

Labeling Data

When the data are changed, either by downloading it from OSF or by generating it using the Shape Maker, it must be relabeled or results gathered from inference will be garbage data based on labels from the previous data samples. To relabel the data, place it in a subdirectory inside data and adjust the IMG_DIR parameter to match the location. Run

	python src/relabel_data.py

Running Inference

Once the data are generated using the shapemaker, to reproduce the experiment in our paper, run the run_models.sh script.

	chmod +x run_models.sh
	./run_models.sh

To run inference with an individual model, activate the appropriate mamba environment and run the call_model.py python script independently.

	mamba activate <model_env>
	python src/call_model.py -m <model>

To run with the desired model, replace <model> and <model_env> with entries from the table above corresponding to the model's HF tag. Results are written to files in the data data directory with names <model>.tsv, <model>_counts.tsv, <model>_metrics.tsv. These contain the raw results, counts of the total number of questions for each category of shape and direction, and metrics (accuracy) for each direction and shape.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

OPTiCAL Benchmark

Setup

Accessing Data

Shape Maker

Capabilities

Run Example

Labeling Data

Running Inference

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
dependencies		dependencies
src		src
.gitignore		.gitignore
README.md		README.md
env		env
run_models.sh		run_models.sh

Uh oh!

Uh oh!

ufdatastudio/optical

Folders and files

Latest commit

History

Repository files navigation

OPTiCAL Benchmark

Setup

Accessing Data

Shape Maker

Capabilities

Run Example

Labeling Data

Running Inference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages