Skip to content

Tags: celdiniz/PyTorch-ao

Tags

v0.5.0

Toggle v0.5.0's commit message
Don't run mac builds per commit (pytorch#842)

* Don't run mac builds per commit

* Update and rename build-wheels-m1.yml to build-wheels_m1.yml

* Update build-wheels_m1.yml

* Update build-wheels_m1.yml

v0.5.0-rc3

Toggle v0.5.0-rc3's commit message
Don't run mac builds per commit (pytorch#842)

* Don't run mac builds per commit

* Update and rename build-wheels-m1.yml to build-wheels_m1.yml

* Update build-wheels_m1.yml

* Update build-wheels_m1.yml

v0.5.0-rc2

Toggle v0.5.0-rc2's commit message
Add INT8 mixed-precision training (pytorch#748)

* initial commit

* expose some UX. update test

* add test. update bench

* update test. add doc

* fix ngpu

* fix FSDP

* fix

* fix fsdp test

* fix

* grammar

* simplify fsdp test

* update benchmark script

* update

* make claim more conservative

* register fused adam

* update benchmark script

* add more ops

* update default

* use TorchAOBaseTensor

* fix fsdp param_dtype

* fix param_dtype

* dtype check to prevent unnecessary errors

* move checks

* add note

* fix

* simplify script

* add module-based UX

* fix

* use FP8 impl of __torch_dispatch__

* rename _dynamice interface

* update test

* fix compile on 2.4

* log torch version

* make log interval customizable

* make naming for explicit

* update readme

* some change

* fix big bug

* add docstring. update _get_linear_inserter

* add TorchAOBaseTensor back

* fix FSDP

* update FSDP test. add autocast support

* reduce iter

* update int8_mm fallback

* put leading dims logic to _dynamic_int8_mm

v0.5.0-rc1

Toggle v0.5.0-rc1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
[StaticQuant] Update how block_size is calculated with Observers (pyt…

…orch#815)

v0.4.0

Toggle v0.4.0's commit message
Fixing linear_activation_tensor dynamic quant (pytorch#622)

Summary: dynamic quant was broken for generate due to no repr function

Test Plan: sh benchmarks.sh

20240806170037, tok/s=  9.54, mem/s=  63.14 GB/s, peak_mem= 8.61 GB, model_size= 6.62 GB quant: int8dq, mod: Llama-2-7b-chat-hf, kv_quant: False, compile: True, compile_prefill: False, dtype: torch.bfloat16, device: cuda repro: python generate.py --quantization int8dq --checkpoint_path ../../../checkpoints/meta-llama/Llama-2-7b-chat-hf/model.pth --device cuda --precision torch.bfloat16 --compile --num_samples 5 --max_new_tokens 200 --top_k 200 --temperature 0.8

Reviewers:

Subscribers:

Tasks:

Tags:

v0.4.0-rc5

Toggle v0.4.0-rc5's commit message
fix version check

v0.4.0-rc4

Toggle v0.4.0-rc4's commit message
fix atol test again

v0.4.0-rc3

Toggle v0.4.0-rc3's commit message
skip test cases that rely on pt 2.5

v0.4.0-rc2

Toggle v0.4.0-rc2's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Fix FP6-LLM API and add .to(device) op (pytorch#599)

* fix

* add some ops for convenience

Co-authored-by: Thien Tran <[email protected]>

v0.4.0-rc1

Toggle v0.4.0-rc1's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
Update version.txt