-
Notifications
You must be signed in to change notification settings - Fork 208
Description
Hi, thank you for the detailed and exciting paper! While reading through the arXiv version, I came across a couple of small issues and a potential confusion in the notation that I thought might be worth reporting:
1. Typo in Quantization Notation (Section 4.2.2)
In the Memory Optimization subsection of Section 4.2.2, under the "Quantization" bullet point, the paper currently says:
"We adopt the same quantization strategy (WA8A SmoothQuant) ..."
I believe this is a small typo and was meant to be W8A8.
2. Possible Timestep Ordering Issue (Page 5, between Eq. 2 and Eq. 3)
The paragraph says:
"In the auto-regressive model, earlier chunks are cleaner than later ones. For convenience, we define the noise timestep assigned to each chunk as tᵢ, and impose the constraint tᵢ < tⱼ whenever i < j."
However, this seems to contradict the earlier claim that “earlier chunks are cleaner.”
Assuming higher t means clean/less noise, then earlier chunks being cleaner implies that tᵢ > tⱼ whenever i < j.
So possibly either:
The inequality sign tᵢ < tⱼ is reversed, or the interpretation of clean/noisy with respect to t needs clarification.
Would love your confirmation on this, just flagging in case it's helpful for future readers or revisions. Again, thank you for the great work!