Skip to content
View MoKholy's full-sized avatar
😴
😴
  • MBZUAI
  • Abu Dhabi

Block or report MoKholy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

This repository is an open source implementation of the MuonClip strategy from the KIMI K2 Model from Moonshot AI

12 Updated Nov 7, 2025

Yet another BERT model for Arabic.

Python 2 Updated Nov 15, 2020

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 2,813 237 Updated Jan 8, 2026

A concise but complete full-attention transformer with a set of promising experimental features from various papers

Python 5,763 499 Updated Jan 6, 2026

🚀 State-of-the-art parsers for natural language.

Python 875 154 Updated Sep 3, 2023

Helpful tools and examples for working with flex-attention

Python 1,107 70 Updated Jan 8, 2026
Smalltalk 4 Updated Jul 2, 2021

A guided tour on how to use HuggingFace large language models on Macs with Apple Silicon

Jupyter Notebook 193 19 Updated Dec 23, 2025

Code / solutions for Mathematics for Machine Learning (MML Book)

Jupyter Notebook 1,200 194 Updated Sep 15, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 65,341 7,941 Updated Jan 9, 2026

Toolkit to segment text into sentences or other semantic units in a robust, efficient and adaptable way.

Python 1,220 77 Updated Dec 5, 2025

Implementation of the proposed Adam-atan2 from Google Deepmind in Pytorch

Python 134 1 Updated Oct 15, 2025

Hierarchical Reasoning Model Official Release

Python 12,222 1,779 Updated Sep 9, 2025
Jupyter Notebook 4 1 Updated Mar 25, 2025

Muon is Scalable for LLM Training

1,398 77 Updated Aug 3, 2025

Implementation of OpenAI's 'Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets' paper.

Python 40 11 Updated Sep 23, 2023

Muon is an optimizer for hidden layers in neural networks

Python 2,180 105 Updated Nov 23, 2025

Using sparse coding to find distributed representations used by neural networks.

Jupyter Notebook 290 40 Updated Nov 10, 2023
Python 58 1 Updated Nov 19, 2024

maximal update parametrization (µP)

Jupyter Notebook 1,656 103 Updated Jul 17, 2024
Python 52 3 Updated Dec 17, 2025

[ICML 2025] The Diffusion Duality

Python 181 24 Updated Dec 27, 2025

100 numpy exercises (with solutions)

Python 13,669 6,494 Updated Nov 6, 2025

[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models

Python 939 62 Updated Jul 10, 2025

Official PyTorch implementation for ICLR2025 paper "Scaling up Masked Diffusion Models on Text"

Python 355 26 Updated Dec 22, 2024
Python 304 26 Updated Dec 16, 2025

Official PyTorch implementation for "Large Language Diffusion Models"

Python 3,469 234 Updated Nov 12, 2025

We study toy models of skill learning.

Jupyter Notebook 31 4 Updated Jan 20, 2025

Superposition Yields Robust Neural Scaling

Jupyter Notebook 44 5 Updated Nov 28, 2025

Lists of company wise questions available on leetcode premium. Every csv file in the companies directory corresponds to a list of questions on leetcode for a specific company based on the leetcode …

11,228 2,335 Updated Jun 20, 2025
Next