H-Net

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
Paper: https://arxiv.org/abs/2507.07955

About

This repository contains code of the H-Net architecture. Most of the code lies in hnet/, which has the following structure:

configs/
hnet/
├── models/            # Directory for H-Net
|   ├── config_hnet.py     (defines the config for the H-Net)
|   ├── hnet.py            (h-net as a (B, L, D) -> (B, L, D) sequence model)
│   └── mixer_seq.py       (wrapper to turn h-net into a language model)
└── modules/           # Directory of model components
    ├── dc.py              (modeling code for the dynamic chunking mechanism)
    └── isotropic.py       (code for isotropic, i.e. non-hierarchical components)
generate.py        # Script for inference/generation

Installation

Requirements:

PyTorch >= 2.5.1

Clone the repository and install package.

git clone https://github.com/goombalab/hnet
cd hnet
pip install -e .

We strongly recommend building mamba_ssm package from the latest source as follows:

git clone https://github.com/state-spaces/mamba
cd mamba
pip install .

Pretrained Models

Pretrained models are uploaded to Hugging Face: hnet_1stage_L, hnet_2stage_L, hnet_1stage_XL, hnet_2stage_XL. We trained our models on the 100B-Token subset of FineWeb-Edu. Large and XL are compute-matched to GPT-3 Large and XL, respectively.

We also provide model weights for Chinese and Code, each trained using the 46B-Token subset of FineWeb-Edu Chinese V2.1 and Pile Github: hnet_2stage_XL_chinese, hnet_2stage_XL_code.

You can find specifics of these models at configs, and more details from the paper.

Text Generation

We provide generate.py for text generation that you can use with the pretrained checkpoints.

Examples

python generate.py --model-path [MODEL_CKPT] --config-path [CONFIG]
python generate.py --model-path hnet_2stage_XL.pt --config-path configs/hnet_2stage_XL.json --max-tokens 1024 --temperature 1.0 --top-p 1.0

Citation

If you use this codebase, or otherwise find our work valuable, please cite H-Net:

@article{hnet,
  title={Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},
  author={Hwang, Sukjun and Wang, Brandon and Gu, Albert},
  journal={arXiv preprint arXiv:2507.07955},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
assets		assets
configs		configs
hnet		hnet
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
generate.py		generate.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

H-Net

About

Installation

Requirements:

Pretrained Models

Text Generation

Examples

Citation

About

Uh oh!

Releases

Packages

Languages

License

bkangs/hnet

Folders and files

Latest commit

History

Repository files navigation

H-Net

About

Installation

Requirements:

Pretrained Models

Text Generation

Examples

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages