OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Code, metrics, and models for the paper Outcome-supervised Verifiers for Planning in Mathematical Reasoning

The key technical implementations (utils/sampling.py):

Value-guided beam search: step-level beam search guided by a value model
Allow batch generation with caculator using cache (2-3 times faster than a naive implementation)

Models

Model	Dataset	Link
OVM-Llama2-7B	GSM8K	parameters
OVM-Mistral-7B	GSM8K	parameters

Somethings for code

Directories

configs: for model training with accelerate
data: benchmark, and generator created data for training the value model
eval_results: metrics and responses
- generator: generator-only (greedy, self-consistency, or pass@k)
- verifier: ORM accuracy
- generator_with_verifier: guided beam search, i.e. OVM and PRM
scripts: scripts for training and inference
utils: functions and classes

target_set

GSM8K: there are train and test, which corresponds to training set and test set respectively
Game of 24: there are train and mid
- train: the first 900 problems
- mid: problems index 901-1000

scripts for GSM8K and Game of 24 are similar. For simplicity, we only take GSM8K as the example below. You can simply run the same pipeline in Game of 24 by replacing gsm8k with game24

Training

Train the generator

Training data for generator:

GSM8K: data/gsm8k/train.jsonl, from OpenAI GSM8K
Game of 24: data/game24/train.jsonl, the first 900 problems in data/game24/24.csv (from ToT) with enumerated solutions

To run the script train_generator.sh (under scripts/gsm8k or scripts/game24), you should first set WANDB_API_KEY, WANDB_ENTITY, model_name_or_path, save_dir. The generator is named by save_generator_id

cd OVM
bash scripts/gsm8k/train_generator.sh

Train the OVM

Generation

First use the generator generator_id to generate n_solutions for each question in the training set,

cd OVM
bash scripts/gsm8k/generate.sh

You should first config the path of your generator checkpoint model_name_or_path, and set --target_set train

The output will be saved to data/gsm8k/model_generation/

Training

Train OVM using train_verifier.sh. First set WANDB_API_KEY, WANDB_ENTITY, save_dir, and checkpoint_dir (the path of generator checkpoint). The verifier is named with save_verifier_id

cd OVM
bash scripts/gsm8k/train_verifier.sh

Inference

Value-Guided Beam Search

Config your generator checkpoint path model_name_or_path and verifier checkpoint path verifier_model_name_or_path in eval_step_beam.sh

cd OVM
bash scripts/gsm8k/eval_step_beam.sh

(when dedup_mode=1, it will prioritize linguistically different candidates, which means when the sorted candidates are ['a', 'a', 'b', 'b', 'c'] it will select ['a', 'b', 'c'] rather than ['a', 'a', 'b'] if n_beam=3)

The output will be saved to eval_results/gsm8k/generator_with_verifier/test (or eval_results/game24/generator_with_verifier/mid)

Vanilla Sampling with ORM

First sample the data: config the generator checkpoint model_name_or_path, and set --target_set test
```
cd OVM
bash scripts/gsm8k/generate.sh
```
Then call ORM to score and rerank the samples: config the verifier checkpoint verifier_model_name_or_path
```
cd OVM
bash scripts/gsm8k/eval_with_verifier.sh
```

The output will be saved to eval_results/gsm8k/generator_with_verifier/test

Greedy

Config your generator checkpoint path model_name_or_path

cd OVM
bash scripts/gsm8k/greedy_eval.sh

The output will be saved to eval_results/gsm8k/generator/test

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data		data
eval_results		eval_results
scripts		scripts
utils		utils
README.md		README.md
eval_generator_by_step.py		eval_generator_by_step.py
eval_with_verifier.py		eval_with_verifier.py
generate_paths_and_eval.py		generate_paths_and_eval.py
requirements.txt		requirements.txt
train_generator.py		train_generator.py
train_verifier.py		train_verifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Models

Somethings for code

Training

Train the generator

Train the OVM

Generation

Training

Inference

Value-Guided Beam Search

Vanilla Sampling with ORM

Greedy

About

Uh oh!

Releases

Packages

Languages

remy-rec/OVM

Folders and files

Latest commit

History

Repository files navigation

OVM, Outcome-supervised Value Models for Planning in Mathematical Reasoning

Models

Somethings for code

Training

Train the generator

Train the OVM

Generation

Training

Inference

Value-Guided Beam Search

Vanilla Sampling with ORM

Greedy

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages