|
1 | 1 | # TOA File Format Specification |
2 | 2 |
|
3 | | -Version 0.9 |
| 3 | +Version 0.10 |
4 | 4 |
|
5 | 5 | ## 1. Introduction |
6 | 6 |
|
@@ -307,58 +307,75 @@ The compressed data MUST be an LZMA2s stream. LZMA2s is a simplified variant of |
307 | 307 |
|
308 | 308 | ##### 4.1.3.1 LZMA2s Stream Format |
309 | 309 |
|
310 | | -An LZMA2s stream consists of a sequence of chunks, each beginning with a control byte that determines the chunk type and |
311 | | -size. The stream MUST end with a control byte value of 0x00. |
| 310 | +An LZMA2s stream MUST consist of a sequence of chunks, each beginning with a control byte that determines the chunk type |
| 311 | +and size. The stream MUST terminate with a control byte value of 0x00. Implementations MUST NOT process streams that |
| 312 | +lack proper termination. Each chunk MUST be processed sequentially, and implementations MUST reject malformed chunk |
| 313 | +sequences. |
312 | 314 |
|
313 | 315 | ##### 4.1.3.2 Control Byte Encoding |
314 | 316 |
|
315 | | -The control byte uses the following encoding: |
| 317 | +The control byte MUST use the following encoding schemes. Implementations MUST reject control bytes that do not conform |
| 318 | +to these patterns: |
316 | 319 |
|
317 | 320 | **End of Stream**: `0x00` |
318 | 321 |
|
319 | | -- Indicates end of the LZMA2s stream |
320 | | -- No additional bytes follow. |
| 322 | +- MUST indicate the end of the LZMA2s stream |
| 323 | +- MUST NOT be followed by any additional bytes |
| 324 | +- Implementations MUST terminate processing upon encountering this byte |
321 | 325 |
|
322 | 326 | **Uncompressed Chunk**: `001sssss` + 2 bytes |
323 | 327 |
|
324 | | -- Bits 7-5: `001` identifies uncompressed chunk |
325 | | -- Bits 4-0: High 5 bits of 21-bit size |
326 | | -- Following 2 bytes: Middle and low bytes of size |
327 | | -- Actual size = encoded_size + 1 |
| 328 | +- Bits 7-5: MUST be `001` to identify an uncompressed chunk |
| 329 | +- Bits 4-0: MUST contain the high 5 bits of the 21-bit size |
| 330 | +- The following 2 bytes MUST contain the middle and low bytes of size in big-endian order |
| 331 | +- Implementations MUST calculate the actual size as encoded_size + 1 |
| 332 | +- The chunk data following these headers MUST be exactly the calculated size in bytes |
328 | 333 |
|
329 | 334 | **Compressed Chunk**: `010uuuuu` + 4 bytes |
330 | 335 |
|
331 | | -- Bits 7-5: `010` identifies compressed chunk |
332 | | -- Bits 4-0: High 5 bits of 21-bit uncompressed size |
333 | | -- Following 2 bytes: Middle and low bytes of uncompressed size |
334 | | -- Following 2 bytes: 16-bit compressed size |
335 | | -- Actual sizes = encoded_size + 1 |
| 336 | +- Bits 7-5: MUST be `010` to identify a compressed chunk |
| 337 | +- Bits 4-0: MUST contain the high 5 bits of the 21-bit uncompressed size |
| 338 | +- The following 2 bytes MUST contain the middle and low bytes of uncompressed size |
| 339 | +- The following 2 bytes MUST contain the 16-bit compressed size in big-endian order |
| 340 | +- Implementations MUST calculate actual sizes as encoded_size + 1 |
| 341 | +- The compressed data MUST decompress to exactly the specified uncompressed size |
336 | 342 |
|
337 | 343 | **Delta Compressed Chunk**: `011uuuuu` + 3 bytes |
338 | 344 |
|
339 | | -- Bits 7-5: `011` identifies delta compressed chunk |
340 | | -- Bits 4-0: High 5 bits of 21-bit uncompressed size |
341 | | -- Following 2 bytes: Middle and low bytes of uncompressed size |
342 | | -- Following 1 byte: Delta from 65536 |
343 | | -- Compressed size = 65536 - delta |
| 345 | +- Bits 7-5: MUST be `011` to identify a delta compressed chunk |
| 346 | +- Bits 4-0: MUST contain the high 5 bits of the 21-bit uncompressed size |
| 347 | +- The following 2 bytes MUST contain the middle and low bytes of uncompressed size |
| 348 | +- The following 1 byte MUST contain the delta from 65536 |
| 349 | +- Implementations MUST calculate compressed size as 65536 - delta |
| 350 | +- The delta value MUST NOT exceed 65536 |
344 | 351 |
|
345 | 352 | **Delta Uncompressed Chunk**: `1sdddddd` + 1 byte |
346 | 353 |
|
347 | | -- Bit 7: `1` identifies delta uncompressed |
348 | | -- Bit 6: Sign (0=add to 65536, 1=subtract from 65536) |
349 | | -- Bits 5-0: High 6 bits of 14-bit delta |
350 | | -- Following 1 byte: Low 8 bits of delta |
351 | | -- Size = 65536 ± delta (range: 49,152 to 81,920 bytes) |
| 354 | +- Bit 7: MUST be `1` to identify delta uncompressed chunk |
| 355 | +- Bit 6: MUST be 0 to add to 65536, or 1 to subtract from 65536 |
| 356 | +- Bits 5-0: MUST contain the high 6 bits of the 14-bit delta |
| 357 | +- The following 1 byte MUST contain the low 8 bits of delta |
| 358 | +- Implementations MUST calculate size as 65536 ± delta |
| 359 | +- The resulting size MUST be within the range 49,152 to 81,920 bytes inclusive |
| 360 | + |
| 361 | +Decoders MUST reject any control byte patterns not defined above. Encoders MUST NOT generate undefined control byte |
| 362 | +patterns. |
| 363 | + |
| 364 | +You're absolutely right. Let me correct that section: |
352 | 365 |
|
353 | 366 | ##### 4.1.3.3 State Management |
354 | 367 |
|
355 | | -LZMA2s maintains simpler state than LZMA2: |
| 368 | +LZMA2s implementations MUST maintain the following state constraints: |
356 | 369 |
|
357 | | -- Properties (lc, lp, pb) remain constant throughout the stream |
358 | | -- Dictionary size remains constant throughout the stream |
359 | | -- The LZMA decoder state MUST be reset only when transitioning from an uncompressed chunk to a compressed chunk |
| 370 | +- Properties (lc, lp, pb) MUST remain constant throughout the entire stream |
| 371 | +- Dictionary size MUST remain constant throughout the entire stream |
| 372 | +- The LZMA decoder state MUST be reset when and only when transitioning from an uncompressed chunk to a compressed chunk |
| 373 | +- The LZMA decoder state MUST NOT be reset at any other time |
| 374 | +- Implementations MUST maintain dictionary contents across all chunks within the stream, regardless of chunk type |
| 375 | +- The dictionary MUST only be reset at block boundaries, never within an LZMA2s stream |
360 | 376 |
|
361 | | -This simplified state management reduces implementation complexity while maintaining compression efficiency. |
| 377 | +Encoders MUST ensure that chunk transitions respect these state management rules. Decoders MUST verify state consistency |
| 378 | +and MUST reject streams that violate these constraints. |
362 | 379 |
|
363 | 380 | ### 4.2 BLAKE3 Tree Integration |
364 | 381 |
|
@@ -1044,6 +1061,7 @@ Files using this format SHOULD use the extension `.toa`. |
1044 | 1061 |
|
1045 | 1062 | ## Revision History |
1046 | 1063 |
|
| 1064 | +- Version 0.10 (2025-08-27): Change wording of the LZMA2s normative section |
1047 | 1065 | - Version 0.9 (2025-08-22): Changes in the ECC: |
1048 | 1066 | - Switched polynomial for Reed-Solomon ECC from 0x11D (x^8 + x^4 + x^3 + x^2 + 1) to |
1049 | 1067 | 0x11B (x^8 + x^4 + x^3 + x + 1) for better CPU instruction set support (like x86's GFNI) |
|
0 commit comments