Skip to content
forked from soketlabs/coom

A training framework for large-scale language models based on Megatron-Core, the COOM Training Framework is designed to efficiently handle extensive model training inspired by Deepseek's HAI-LLM optimizations.

License

Notifications You must be signed in to change notification settings

somay-jalan/coom

 
 

Repository files navigation

COOM Training Framework

Overview

A training framework for large-scale language models based on Megatron-Core, the COOM Training Framework is designed to efficiently handle extensive model training inspired by Deepseek's HAI-LLM optimizations.

This framework is planned to support state-of-the-art innovations essential for achieving high-performance language modeling, particularly targeting efficiency and scalability for large models.

Key Objectives

  • Develop an optimized pretraining framework based on Deepseek's HAI-LLM.
  • Enable efficient scaling of large multilingual language models.
  • Incorporate cutting-edge optimizations to maximize training efficiency.

Planned Features

Feature
FP8 Pretraining
Mixture of Experts (MoE)
Multi-Head Latent Attention (MLA)
Multi-Token Prediction
Kernel Fusion
KV-Cache Optimisation
Load Balancing Experts
Progressive Model Expansion
Energy-Efficient Training

Future Directions

  • Continuous improvements and feedback loops for better alignment and model robustness.
  • Expansion into multilingual and multimodal capabilities.
  • Further optimization for deployment in edge computing scenarios.

Collaboration and Contribution

We welcome researchers and developers to contribute to the ongoing development of COOM Training Framework. Regular updates, comprehensive documentation, and open-source contributions are encouraged to foster community-driven improvements.

To ensure consistency and maintain code quality, all code contributions must strictly follow PEP 8.

For more details or to get involved, please contact our team.


Note: This framework is currently under active development. Performance metrics and benchmarks will be shared in upcoming releases.

About

A training framework for large-scale language models based on Megatron-Core, the COOM Training Framework is designed to efficiently handle extensive model training inspired by Deepseek's HAI-LLM optimizations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Other 0.4%