Skip to content

Commit 0d0ba75

Browse files
committed
Update SPECIFICATION.md to 0.10
Mainly changed the wording of the LZMA2s to be more normative.
1 parent 98e296b commit 0d0ba75

File tree

4 files changed

+51
-33
lines changed

4 files changed

+51
-33
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ against bit rot. It uses LZMA2s, a special variant of LZMA as it's compression a
1919

2020
**TOA is still experimental, do NOT use it in production yet**
2121

22-
**Note**: TOA format is currently at specification version 0.8 and not yet frozen. While the core features are stable,
22+
**Note**: TOA format is currently at specification version 0.10 and not yet frozen. While the core features are stable,
2323
the format may evolve before reaching version 1.0. The specification is thoroughly documented
2424
in [SPECIFICATION.md](SPECIFICATION.md).
2525

SPECIFICATION.md

Lines changed: 48 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# TOA File Format Specification
22

3-
Version 0.9
3+
Version 0.10
44

55
## 1. Introduction
66

@@ -307,58 +307,75 @@ The compressed data MUST be an LZMA2s stream. LZMA2s is a simplified variant of
307307

308308
##### 4.1.3.1 LZMA2s Stream Format
309309

310-
An LZMA2s stream consists of a sequence of chunks, each beginning with a control byte that determines the chunk type and
311-
size. The stream MUST end with a control byte value of 0x00.
310+
An LZMA2s stream MUST consist of a sequence of chunks, each beginning with a control byte that determines the chunk type
311+
and size. The stream MUST terminate with a control byte value of 0x00. Implementations MUST NOT process streams that
312+
lack proper termination. Each chunk MUST be processed sequentially, and implementations MUST reject malformed chunk
313+
sequences.
312314

313315
##### 4.1.3.2 Control Byte Encoding
314316

315-
The control byte uses the following encoding:
317+
The control byte MUST use the following encoding schemes. Implementations MUST reject control bytes that do not conform
318+
to these patterns:
316319

317320
**End of Stream**: `0x00`
318321

319-
- Indicates end of the LZMA2s stream
320-
- No additional bytes follow.
322+
- MUST indicate the end of the LZMA2s stream
323+
- MUST NOT be followed by any additional bytes
324+
- Implementations MUST terminate processing upon encountering this byte
321325

322326
**Uncompressed Chunk**: `001sssss` + 2 bytes
323327

324-
- Bits 7-5: `001` identifies uncompressed chunk
325-
- Bits 4-0: High 5 bits of 21-bit size
326-
- Following 2 bytes: Middle and low bytes of size
327-
- Actual size = encoded_size + 1
328+
- Bits 7-5: MUST be `001` to identify an uncompressed chunk
329+
- Bits 4-0: MUST contain the high 5 bits of the 21-bit size
330+
- The following 2 bytes MUST contain the middle and low bytes of size in big-endian order
331+
- Implementations MUST calculate the actual size as encoded_size + 1
332+
- The chunk data following these headers MUST be exactly the calculated size in bytes
328333

329334
**Compressed Chunk**: `010uuuuu` + 4 bytes
330335

331-
- Bits 7-5: `010` identifies compressed chunk
332-
- Bits 4-0: High 5 bits of 21-bit uncompressed size
333-
- Following 2 bytes: Middle and low bytes of uncompressed size
334-
- Following 2 bytes: 16-bit compressed size
335-
- Actual sizes = encoded_size + 1
336+
- Bits 7-5: MUST be `010` to identify a compressed chunk
337+
- Bits 4-0: MUST contain the high 5 bits of the 21-bit uncompressed size
338+
- The following 2 bytes MUST contain the middle and low bytes of uncompressed size
339+
- The following 2 bytes MUST contain the 16-bit compressed size in big-endian order
340+
- Implementations MUST calculate actual sizes as encoded_size + 1
341+
- The compressed data MUST decompress to exactly the specified uncompressed size
336342

337343
**Delta Compressed Chunk**: `011uuuuu` + 3 bytes
338344

339-
- Bits 7-5: `011` identifies delta compressed chunk
340-
- Bits 4-0: High 5 bits of 21-bit uncompressed size
341-
- Following 2 bytes: Middle and low bytes of uncompressed size
342-
- Following 1 byte: Delta from 65536
343-
- Compressed size = 65536 - delta
345+
- Bits 7-5: MUST be `011` to identify a delta compressed chunk
346+
- Bits 4-0: MUST contain the high 5 bits of the 21-bit uncompressed size
347+
- The following 2 bytes MUST contain the middle and low bytes of uncompressed size
348+
- The following 1 byte MUST contain the delta from 65536
349+
- Implementations MUST calculate compressed size as 65536 - delta
350+
- The delta value MUST NOT exceed 65536
344351

345352
**Delta Uncompressed Chunk**: `1sdddddd` + 1 byte
346353

347-
- Bit 7: `1` identifies delta uncompressed
348-
- Bit 6: Sign (0=add to 65536, 1=subtract from 65536)
349-
- Bits 5-0: High 6 bits of 14-bit delta
350-
- Following 1 byte: Low 8 bits of delta
351-
- Size = 65536 ± delta (range: 49,152 to 81,920 bytes)
354+
- Bit 7: MUST be `1` to identify delta uncompressed chunk
355+
- Bit 6: MUST be 0 to add to 65536, or 1 to subtract from 65536
356+
- Bits 5-0: MUST contain the high 6 bits of the 14-bit delta
357+
- The following 1 byte MUST contain the low 8 bits of delta
358+
- Implementations MUST calculate size as 65536 ± delta
359+
- The resulting size MUST be within the range 49,152 to 81,920 bytes inclusive
360+
361+
Decoders MUST reject any control byte patterns not defined above. Encoders MUST NOT generate undefined control byte
362+
patterns.
363+
364+
You're absolutely right. Let me correct that section:
352365

353366
##### 4.1.3.3 State Management
354367

355-
LZMA2s maintains simpler state than LZMA2:
368+
LZMA2s implementations MUST maintain the following state constraints:
356369

357-
- Properties (lc, lp, pb) remain constant throughout the stream
358-
- Dictionary size remains constant throughout the stream
359-
- The LZMA decoder state MUST be reset only when transitioning from an uncompressed chunk to a compressed chunk
370+
- Properties (lc, lp, pb) MUST remain constant throughout the entire stream
371+
- Dictionary size MUST remain constant throughout the entire stream
372+
- The LZMA decoder state MUST be reset when and only when transitioning from an uncompressed chunk to a compressed chunk
373+
- The LZMA decoder state MUST NOT be reset at any other time
374+
- Implementations MUST maintain dictionary contents across all chunks within the stream, regardless of chunk type
375+
- The dictionary MUST only be reset at block boundaries, never within an LZMA2s stream
360376

361-
This simplified state management reduces implementation complexity while maintaining compression efficiency.
377+
Encoders MUST ensure that chunk transitions respect these state management rules. Decoders MUST verify state consistency
378+
and MUST reject streams that violate these constraints.
362379

363380
### 4.2 BLAKE3 Tree Integration
364381

@@ -1044,6 +1061,7 @@ Files using this format SHOULD use the extension `.toa`.
10441061

10451062
## Revision History
10461063

1064+
- Version 0.10 (2025-08-27): Change wording of the LZMA2s normative section
10471065
- Version 0.9 (2025-08-22): Changes in the ECC:
10481066
- Switched polynomial for Reed-Solomon ECC from 0x11D (x^8 + x^4 + x^3 + x^2 + 1) to
10491067
0x11B (x^8 + x^4 + x^3 + x + 1) for better CPU instruction set support (like x86's GFNI)

libtoa/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ Library to compress & decompress TOA files (.toa).
99

1010
**TOA is still experimental, do NOT use it in production yet**
1111

12-
**Note: The TOA format is currently in draft mode (v0.8) and not yet frozen. The specification may change in future
12+
**Note: The TOA format is currently in draft mode (v0.10) and not yet frozen. The specification may change in future
1313
versions.**
1414

1515
## Acknowledgement

libtoa/src/lib.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
//!
1010
//! **TOA is still experimental, do NOT use it in production yet**
1111
//!
12-
//! **Note: The TOA format is currently in draft mode (v0.8) and not yet frozen. The specification
12+
//! **Note: The TOA format is currently in draft mode (v0.10) and not yet frozen. The specification
1313
//! may change in future versions.**
1414
//!
1515
//! ## Acknowledgement

0 commit comments

Comments
 (0)