Skip to content
View alishafique3's full-sized avatar

Block or report alishafique3

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
alishafique3/README.md

A passionate researcher and developer in efficient LLMs for HPC and Inference optimization.


🎓 PhD Candidate in Electrical and Computer Engineering @ the Kansas State University.

📖 Graduate research assistant @ the ISCAAS Lab.

💻 Currently developing lightweight large language models using knowledge distillation.

🌱 Love to make research projects, tutorials, and insightful technical blogs.

⚡ Fun fact: I love to travel and attend various technical events and community festivals.

Technical Skills:

Connect with me:

Email LinkedIn Medium

Pinned Loading

  1. vLLM-vs-Hugging-Face vLLM-vs-Hugging-Face Public

    This project benchmarks vLLM and Hugging Face Transformers for offline LLM inference, leveraging vLLM’s optimized execution such as PagedAttention and continuous batching, to enable faster generati…

    Jupyter Notebook

  2. Quantization-From-Scratch-Pytorch Quantization-From-Scratch-Pytorch Public

    This project benchmarks from-scratch quantization techniques for LLMs—Absmax, Zeropoint, and LLM.int8, benchmarking them based on perplexity and memory. It highlights mixed-precision int8+fp16 as a…

    Jupyter Notebook 1

  3. DDP-in-Torch-From-Scratch DDP-in-Torch-From-Scratch Public

    Distributed training implementation from scratch by building manual gradient synchronization, data distribution, and multi-GPU training for GPT/Llama models

    Python

  4. PyTorch_Training_Optimization_with_Memory_Analysis PyTorch_Training_Optimization_with_Memory_Analysis Public

    In this project, training stage is optimized using memory analysis in Pytorch Tensorboard. Automatic mixed precision, increased batch size, reduced H2D copy, multiprocessing and pinned memory techn…

    Jupyter Notebook 3

  5. Distributed_Training_of_LLMs Distributed_Training_of_LLMs Public

    In this project, LLM (model: distilbert) is finetuned on a multiple GPUs for text classification task. Distributed training is performed using deepspeed (ZeRO 1, 2, and 3) with profiling in wandb.

    Python 3 1

  6. KV-Caching-From-Scratch-Pytorch KV-Caching-From-Scratch-Pytorch Public

    This project explores KV Caching in LLMs by implementing it from scratch in GPT-2 and benchmarking its impact on inference speed. It highlights how caching Key/Value pairs during decoding significa…