Recursal's experimental datapacker.
DaDarista was developed to meet the needs of "We need to somehow convert jsonl files into multiple formats for different trainers!" As such, DaDarista was born from that need.
For most cases, it can be installed with the following command:
pip install -r requirements.txt
However, there are additional requirements if you need to use the following:
- BinIdx support requires
torch.
DaDarista takes in .yaml files as it's config. yaml config files are similar to RWKV-v5's datapack example found here.
Not all methods are implemented though. As such, refer to modelling.py's DataPack class for a list of supported keys and values.
RWKV-infctx-trainer Developers: Inital Work
Shinon: Refactor, code stripping, etc.
m8than: Distribution by Length
While this is open sourced, we are likely not to take in any PRs or Issues as these tools are what we used internally and catered to that.