Derivative-Free Guidance in Diffusion Models with Soft Value-Based Decoding (DNAs, RNAs)

This code accompanies the paper on soft value-based decoding in diffusion models, where the objective is to maximize downstream reward functions in diffusion models. In this implementation, we focus on designing biological sequences, such as DNA (enhancers) and RNA (5'UTRs). For images, refer to here. For molecular tasks, refer to here.

We will make molecule/protein generation part publicaly available soon. The algorithm is summarized in the following table/figure.

Design of Enhancers

We prepared the pre-trained model using the masked diffusion model Sahoo et.al, 2024 and the dataset in Gosai et al., 2023. We aim to generate natural enhancers with higher activities in HepG2 with soft-value based decoding. The first one is SVDD-MC. The second one corresponds to SVDD-PM. Then, generated $r$'s are saved in the log folder. Regarding evaluation of generated samples, you could refer to eval_simple.ipynb

CUDA_VISIBLE_DEVICES=1 python decode.py --load_checkpoint_path artifacts/DNA_value:v0/human_enhancer_diffusion_enformer_7_11_1536_16_ep10_it3500.pt --task dna --sample_M 10

CUDA_VISIBLE_DEVICES=2 python decode_tweedie.py --load_checkpoint_path artifacts/DNA_value:v0/human_enhancer_diffusion_enformer_7_11_1536_16_ep10_it3500.pt --task dna --sample_M 10 --tweedie True

Design of 5'UTRs

We prepared the pre-trained model using the masked diffusion model Sahoo et.al, 2024 and the dataset in Sample et al., 2019. We aim to generate natural enhancers with higher activities in HepG2 with soft-value based decoding. The first one is SVDD-MC. The second one corresponds to SVDD-PM. Then, generated $r$'s are saved in the log folder. Regarding evaluation of generated samples, you could refer to eval_simple.ipynb

CUDA_VISIBLE_DEVICES=3 python decode.py --load_checkpoint_path artifacts/RNA_MRL_value:v0/rna_MRL_diffusion_convgru_6_64_512_ep10_it2800.pt --reward_name MRL --task rna --sample_M 10

CUDA_VISIBLE_DEVICES=1 python decode_tweedie.py --load_checkpoint_path artifacts/RNA_MRL_value:v0/rna_MRL_diffusion_convgru_6_64_512_ep10_it2800.pt --reward_name MRL --task rna --sample_M 10 --tweedie True

Instllation & Preparation

Run

conda create -n biodif python=3.9
conda activate biodif
pip install -r requirements.txt 
`

Then, to get pre-trained diffusoin models/oracles from W&B, run

python allmodels/model_load.py

References

If you find this work useful in your research, please cite:

@article{li2024derivative,
  title={Derivative-Free Guidance in Continuous and Discrete Diffusion Models with Soft Value-Based Decoding},
  author={Li, Xiner and Zhao, Yulai and Wang, Chenyu and Scalia, Gabriele and Eraslan, Gokcen and Nair, Surag and Biancalani, Tommaso and Regev, Aviv and Levine, Sergey and Uehara, Masatoshi},
  journal={arXiv preprint arXiv:2408.08252},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
allmodels		allmodels
configs		configs
configs_gosai		configs_gosai
configs_gosai_rna		configs_gosai_rna
log		log
media		media
models		models
.gitignore		.gitignore
Enformer.py		Enformer.py
README.md		README.md
anoter_README.md		anoter_README.md
dataloader.py		dataloader.py
dataloader_gosai.py		dataloader_gosai.py
dataset.py		dataset.py
decode.py		decode.py
decode_DG.py		decode_DG.py
decode_DPS.py		decode_DPS.py
decode_TDS.py		decode_TDS.py
decode_classfier.py		decode_classfier.py
decode_tweedie.py		decode_tweedie.py
design.py		design.py
diffusion.py		diffusion.py
diffusion_gosai.py		diffusion_gosai.py
eval.py		eval.py
eval_simple.ipynb		eval_simple.ipynb
evaluation.ipynb		evaluation.ipynb
format.py		format.py
main_gosai.py		main_gosai.py
metric.py		metric.py
noise_schedule.py		noise_schedule.py
oracle.py		oracle.py
requirements.txt		requirements.txt
requirements.yaml		requirements.yaml
rna_MRL_oracle.py		rna_MRL_oracle.py
score.py		score.py
train.py		train.py
trainer.py		trainer.py
utils.py		utils.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Derivative-Free Guidance in Diffusion Models with Soft Value-Based Decoding (DNAs, RNAs)

Design of Enhancers

Design of 5'UTRs

Instllation & Preparation

References

About

Uh oh!

Releases

Packages

Contributors 2

Languages

masa-ue/SVDD

Folders and files

Latest commit

History

Repository files navigation

Derivative-Free Guidance in Diffusion Models with Soft Value-Based Decoding (DNAs, RNAs)

Design of Enhancers

Design of 5'UTRs

Instllation & Preparation

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages