alishafique3

Follow

Muhammad Ali Shafique alishafique3

Follow

Ph.D. student @ Kansas State University, USA. | LLMs (text, vision), Agents, LLMOps

34 followers · 2 following

Kansas State University
Manhattan, Kansas USA 66502
22:38 (UTC -06:00)
in/alishafique3
https://medium.com/@alishafique3
@alishafique1992

Achievements

Achievements

alishafique3/README.md

A passionate researcher and developer in efficient LLMs for HPC and Inference optimization.

🎓 PhD Candidate in Electrical and Computer Engineering @ the Kansas State University.

📖 Graduate research assistant @ the ISCAAS Lab.

💻 Currently developing lightweight large language models using knowledge distillation.

🌱 Love to make research projects, tutorials, and insightful technical blogs.

⚡ Fun fact: I love to travel and attend various technical events and community festivals.

Technical Skills:

Connect with me:

Pinned Loading

vLLM-vs-Hugging-Face vLLM-vs-Hugging-Face Public

This project benchmarks vLLM and Hugging Face Transformers for offline LLM inference, leveraging vLLM’s optimized execution such as PagedAttention and continuous batching, to enable faster generati…

Jupyter Notebook
Quantization-From-Scratch-Pytorch Quantization-From-Scratch-Pytorch Public

This project benchmarks from-scratch quantization techniques for LLMs—Absmax, Zeropoint, and LLM.int8, benchmarking them based on perplexity and memory. It highlights mixed-precision int8+fp16 as a…

Jupyter Notebook 1
DDP-in-Torch-From-Scratch DDP-in-Torch-From-Scratch Public

Distributed training implementation from scratch by building manual gradient synchronization, data distribution, and multi-GPU training for GPT/Llama models

Python
PyTorch_Training_Optimization_with_Memory_Analysis PyTorch_Training_Optimization_with_Memory_Analysis Public

In this project, training stage is optimized using memory analysis in Pytorch Tensorboard. Automatic mixed precision, increased batch size, reduced H2D copy, multiprocessing and pinned memory techn…

Jupyter Notebook 3
Distributed_Training_of_LLMs Distributed_Training_of_LLMs Public

In this project, LLM (model: distilbert) is finetuned on a multiple GPUs for text classification task. Distributed training is performed using deepspeed (ZeRO 1, 2, and 3) with profiling in wandb.

Python 3 1
KV-Caching-From-Scratch-Pytorch KV-Caching-From-Scratch-Pytorch Public

This project explores KV Caching in LLMs by implementing it from scratch in GPT-2 and benchmarking its impact on inference speed. It highlights how caching Key/Value pairs during decoding significa…