Tags: celdiniz/PyTorch-ao
Tags
Don't run mac builds per commit (pytorch#842) * Don't run mac builds per commit * Update and rename build-wheels-m1.yml to build-wheels_m1.yml * Update build-wheels_m1.yml * Update build-wheels_m1.yml
Don't run mac builds per commit (pytorch#842) * Don't run mac builds per commit * Update and rename build-wheels-m1.yml to build-wheels_m1.yml * Update build-wheels_m1.yml * Update build-wheels_m1.yml
Add INT8 mixed-precision training (pytorch#748) * initial commit * expose some UX. update test * add test. update bench * update test. add doc * fix ngpu * fix FSDP * fix * fix fsdp test * fix * grammar * simplify fsdp test * update benchmark script * update * make claim more conservative * register fused adam * update benchmark script * add more ops * update default * use TorchAOBaseTensor * fix fsdp param_dtype * fix param_dtype * dtype check to prevent unnecessary errors * move checks * add note * fix * simplify script * add module-based UX * fix * use FP8 impl of __torch_dispatch__ * rename _dynamice interface * update test * fix compile on 2.4 * log torch version * make log interval customizable * make naming for explicit * update readme * some change * fix big bug * add docstring. update _get_linear_inserter * add TorchAOBaseTensor back * fix FSDP * update FSDP test. add autocast support * reduce iter * update int8_mm fallback * put leading dims logic to _dynamic_int8_mm
Fixing linear_activation_tensor dynamic quant (pytorch#622) Summary: dynamic quant was broken for generate due to no repr function Test Plan: sh benchmarks.sh 20240806170037, tok/s= 9.54, mem/s= 63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8 Reviewers: Subscribers: Tasks: Tags:
Fix FP6-LLM API and add .to(device) op (pytorch#599) * fix * add some ops for convenience Co-authored-by: Thien Tran <[email protected]>
PreviousNext