Skip to content

bkangs/hnet

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

H-Net

English Code
Chinese Korean

Dynamic Chunking for End-to-End Hierarchical Sequence Modeling
Sukjun Hwang, Brandon Wang, Albert Gu
Paper: https://arxiv.org/abs/2507.07955

About

H-Net

This repository contains code of the H-Net architecture. Most of the code lies in hnet/, which has the following structure:

configs/
hnet/
├── models/            # Directory for H-Net
|   ├── config_hnet.py     (defines the config for the H-Net)
|   ├── hnet.py            (h-net as a (B, L, D) -> (B, L, D) sequence model)
│   └── mixer_seq.py       (wrapper to turn h-net into a language model)
└── modules/           # Directory of model components
    ├── dc.py              (modeling code for the dynamic chunking mechanism)
    └── isotropic.py       (code for isotropic, i.e. non-hierarchical components)
generate.py        # Script for inference/generation

Installation

Requirements:

  • PyTorch >= 2.5.1

Clone the repository and install package.

git clone https://github.com/goombalab/hnet
cd hnet
pip install -e .

We strongly recommend building mamba_ssm package from the latest source as follows:

git clone https://github.com/state-spaces/mamba
cd mamba
pip install .

Pretrained Models

Pretrained models are uploaded to Hugging Face: hnet_1stage_L, hnet_2stage_L, hnet_1stage_XL, hnet_2stage_XL. We trained our models on the 100B-Token subset of FineWeb-Edu. Large and XL are compute-matched to GPT-3 Large and XL, respectively.

We also provide model weights for Chinese and Code, each trained using the 46B-Token subset of FineWeb-Edu Chinese V2.1 and Pile Github: hnet_2stage_XL_chinese, hnet_2stage_XL_code.

You can find specifics of these models at configs, and more details from the paper.

Text Generation

We provide generate.py for text generation that you can use with the pretrained checkpoints.

Examples

python generate.py --model-path [MODEL_CKPT] --config-path [CONFIG]
python generate.py --model-path hnet_2stage_XL.pt --config-path configs/hnet_2stage_XL.json --max-tokens 1024 --temperature 1.0 --top-p 1.0

Citation

If you use this codebase, or otherwise find our work valuable, please cite H-Net:

@article{hnet,
  title={Dynamic Chunking for End-to-End Hierarchical Sequence Modeling},
  author={Hwang, Sukjun and Wang, Brandon and Gu, Albert},
  journal={arXiv preprint arXiv:2507.07955},
  year={2025}
}

About

H-Net: Hierarchical Network with Dynamic Chunking

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%