Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
Paper: https://arxiv.org/abs/2507.07955
This repository contains code of the H-Net architecture. Most of the code lies in hnet/
, which has the following structure:
configs/
hnet/
├── models/ # Directory for H-Net
| ├── config_hnet.py (defines the config for the H-Net)
| ├── hnet.py (h-net as a (B, L, D) -> (B, L, D) sequence model)
│ └── mixer_seq.py (wrapper to turn h-net into a language model)
└── modules/ # Directory of model components
├── dc.py (modeling code for the dynamic chunking mechanism)
└── isotropic.py (code for isotropic, i.e. non-hierarchical components)
generate.py # Script for inference/generation
- PyTorch >= 2.5.1
Clone the repository and install package.
git clone https://github.com/goombalab/hnet
cd hnet
pip install -e .
We strongly recommend building mamba_ssm package from the latest source as follows:
git clone https://github.com/state-spaces/mamba
cd mamba
pip install .
Pretrained models are uploaded to
Hugging Face: hnet_1stage_L
, hnet_2stage_L
,
hnet_1stage_XL
, hnet_2stage_XL
.
We trained our models on the 100B-Token subset of FineWeb-Edu. Large and XL are compute-matched to GPT-3 Large and XL, respectively.
We also provide model weights for Chinese and Code, each trained using the 46B-Token subset of FineWeb-Edu Chinese V2.1 and Pile Github: hnet_2stage_XL_chinese
, hnet_2stage_XL_code
.
You can find specifics of these models at configs, and more details from the paper.
We provide generate.py for text generation that you can use with the pretrained checkpoints.
python generate.py --model-path [MODEL_CKPT] --config-path [CONFIG]
python generate.py --model-path hnet_2stage_XL.pt --config-path configs/hnet_2stage_XL.json --max-tokens 1024 --temperature 1.0 --top-p 1.0
If you use this codebase, or otherwise find our work valuable, please cite H-Net:
@article{hnet,
title={Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},
author={Hwang, Sukjun and Wang, Brandon and Gu, Albert},
journal={arXiv preprint arXiv:2507.07955},
year={2025}
}