Skip to content

High-performance, zero-dependency Java implementation of Token-Oriented Object Notation (TOON) – JSON for LLM prompts at half the tokens

License

Notifications You must be signed in to change notification settings

arun-prabhakar/toon4j

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TOON logo with step‑by‑step guide

TOON4J GitHub License

codecov Vulnerabilities Quality Gate Status

High-performance, zero-dependency TOON encoder and decoder for Java.

TOON (Token-Oriented Object Notation) is a compact, human-readable format designed for passing structured data to Large Language Models with significantly reduced token usage.

TOON's sweet spot is uniform arrays of objects – multiple fields per row, with the same structure across all items. It borrows YAML's indentation-based structure for nested objects and CSV's tabular format for uniform data rows, then optimizes both for token efficiency in LLM contexts. Think of it as a translation layer: use standard Java objects (POJOs, Maps, etc.) programmatically, and convert to TOON for efficient LLM input.

This library, TOON4J, delivers production-grade performance (~5ms for 256KB) with zero external dependencies, making it the fastest and lightest Java TOON implementation available. It is a Java port of the original TOON specification.

Table of Contents

Quick Start

Encoding

import im.arun.toon4j.Toon;
import java.util.*;

public class Example {
    public static void main(String[] args) {
        Map<String, Object> data = Map.of(
            "user", Map.of(
                "id", 123,
                "name", "Ada",
                "tags", List.of("reading", "gaming"),
                "active", true
            )
        );

        String toon = Toon.encode(data);
        System.out.println(toon);
    }
}

Output:

user:
  id: 123
  name: Ada
  tags[2]: reading,gaming
  active: true

Decoding

String toon = """
    user:
      id: 123
      name: Ada
      tags[2]: reading,gaming
      active: true
    """;

Object data = Toon.decode(toon);
// Returns: {user={id=123, name=Ada, tags=[reading, gaming], active=true}}

Installation

Maven

Maven Central Version

<dependency>
    <groupId>im.arun</groupId>
    <artifactId>toon4j</artifactId>
    <version>latest</version>
</dependency>

Gradle

implementation 'im.arun:toon4j:1.0.0'

Examples

Looking for comprehensive examples? Check out the toon4j-example project with runnable examples:

  • EncoderExample.java - 9 encoding examples (primitives, objects, arrays, custom options, delimiters)
  • DecoderExample.java - 9 decoding examples (primitives, objects, arrays, error handling, round-trip)
  • AdvancedExample.java - 5 advanced examples (LLM optimization, token savings, large datasets)
  • PojoExample.java - 7 POJO examples (serialization, deserialization, nested objects, records, enums)

Run any example:

git clone https://github.com/arun-prabhakar/toon4j-example.git
cd toon4j-example
mvn clean compile
mvn exec:java -Dexec.mainClass="im.arun.toon4j.example.EncoderExample"

See the Example README for complete documentation.

Why TOON4J?

Performance + Efficiency Combined

TOON4J uniquely combines encoding efficiency with runtime performance:

Aspect Value Benefit
Token Usage 30-60% fewer than JSON 🎯 Direct cost savings on LLM APIs
Encoding Speed 4.9ms (256KB), 9.4ms (1MB) ⚡ Fast enough for real-time APIs
Dependencies Zero 📦 No dependency conflicts or bloat
Bundle Size ~80KB 🪶 Minimal footprint
Scalability Linear performance scaling 📈 Predictable for any data size
Decode Support Full round-trip 🔄 Parse TOON back to objects

Key Advantages

🚀 Production-Ready Performance

  • Sub-5ms encoding for typical API payloads (256KB)
  • Sub-10ms for medium datasets (1MB)
  • Linear scaling ensures predictable performance at any scale

💰 Direct Cost Savings

  • 30-60% token reduction = 30-60% lower LLM API costs
  • Example: $1000/month → $400-700/month savings
  • ROI is immediate and measurable

⚡ Zero-Dependency Architecture

  • No external libraries means no version conflicts
  • Smaller bundle size (80KB vs 2MB+ for alternatives)
  • Easier integration, faster builds, cleaner deployments

☁️ Ideal for Serverless & Edge

  • Fast Cold Starts: Zero dependencies and a tiny footprint (~80KB) ensure minimal startup latency.
  • Low Memory Overhead: Perfect for high-density, memory-constrained environments like AWS Lambda or Google Cloud Functions.
  • No Dependency Conflicts: Simplifies packaging and deployment, avoiding common serverless packaging issues.

🔄 Complete Functionality

  • Full encode/decode support (round-trip serialization)
  • Strict and lenient parsing modes
  • All TOON features supported (tabular arrays, custom delimiters, nested objects)

📊 Enterprise-Ready

  • Thread-safe concurrent caching
  • Optimized for high-throughput scenarios
  • Minimal memory overhead
  • Production-tested architecture

Key Features

  • 💸 Token-efficient: typically 30–60% fewer tokens than JSON
  • High performance: 4.9ms for 256KB, 9.4ms for 1MB, linear scaling with data size
  • 🤿 LLM-friendly guardrails: explicit lengths and field lists help models validate output
  • 🍱 Minimal syntax: removes redundant punctuation (braces, brackets, most quotes)
  • 📐 Indentation-based structure: replaces braces with whitespace for better readability
  • 🧺 Tabular arrays: declare keys once, then stream rows without repetition
  • 🤖 Automatic POJO serialization: works with any Java object out of the box (optimized reflection with cached accessors)
  • 🔄 Full decode support: parse TOON back into Java objects with strict/lenient modes
  • 🪶 Zero dependencies: no external libraries required - pure Java implementation
  • 📦 Lightweight: only ~80KB total
  • Comprehensive type support: Optional, Stream, primitive arrays, and all Java temporal types

Usage Examples

This section covers more advanced use cases beyond the Quick Start.

POJOs (Automatic Serialization)

toon4j automatically serializes any Java object (POJO) using optimized reflection:

public class User {
    private int id;
    private String name;
    private boolean active;

    // constructors, getters...
}

// Automatic POJO serialization - no manual conversion needed!
User user = new User(123, "Ada", true);
String toon = Toon.encode(user);

Output:

id: 123
name: Ada
active: true

Tabular Arrays (Lists of POJOs)

public class Product {
    private String sku;
    private int quantity;
    private double price;
    // constructors, getters...
}

List<Product> products = List.of(
    new Product("A1", 2, 9.99),
    new Product("B2", 1, 14.50),
    new Product("C3", 5, 7.25)
);

String toon = Toon.encode(Map.of("products", products));

Output:

products[3]{sku,quantity,price}:
  A1,2,9.99
  B2,1,14.5
  C3,5,7.25

Optional and Stream Support

import java.util.Optional;
import java.util.stream.Stream;

Map<String, Object> data = Map.of(
    "presentValue", Optional.of("Hello"),
    "emptyValue", Optional.empty(),
    "numbers", Stream.of(1, 2, 3, 4, 5)
);

Toon.encode(data);

Output:

presentValue: Hello
emptyValue: null
numbers[5]: 1,2,3,4,5

Lenient Decoding

While strict decoding (the default) ensures data integrity, lenient mode can be useful for parsing potentially imperfect TOON data.

// Strict mode (default) - throws error if count mismatch
String toon = "items[3]: a,b";  // Declared 3, but only 2 values
// Toon.decode(toon);  // Throws IllegalArgumentException

// Lenient mode - accepts count mismatch
DecodeOptions options = DecodeOptions.lenient();
Map<?, ?> result = (Map<?, ?>) Toon.decode(toon, options);
List<?> items = (List<?>) result.get("items");
// items.size() == 2 (lenient mode accepts it)

POJO Deserialization (Automatic Type Conversion)

toon4j now supports automatic deserialization to POJOs, eliminating manual type casting:

public class User {
    private int id;
    private String name;
    private String email;
    private boolean active;

    // Getters and setters...
}

String toon = """
    id: 123
    name: Ada
    email: [email protected]
    active: true
    """;

// Deserialize directly to POJO
User user = Toon.decode(toon, User.class);

System.out.println(user.getName());  // "Ada"
System.out.println(user.getId());    // 123

Supports:

  • JavaBeans (with setters)
  • Public fields
  • Java Records (Java 17+)
  • Nested POJOs
  • Collections with generics (List<Employee>, Set<String>)
  • Arrays (primitive and object arrays)
  • Enums
  • Numeric type conversions (Integer → Long, etc.)

Nested POJOs:

public class Address {
    private String street;
    private String city;
    // Getters/setters...
}

public class Employee {
    private int id;
    private String name;
    private Address address;
    // Getters/setters...
}

String toon = """
    id: 456
    name: Bob
    address:
      street: 123 Main St
      city: Boston
    """;

Employee employee = Toon.decode(toon, Employee.class);
System.out.println(employee.getAddress().getCity());  // "Boston"

Collections with Generic Types:

public class Department {
    private String name;
    private List<Employee> employees;  // Automatically converts List<Map> to List<Employee>
    // Getters/setters...
}

String toon = """
    name: Engineering
    employees[2]{id,name}:
      1,Alice
      2,Bob
    """;

Department dept = Toon.decode(toon, Department.class);
System.out.println(dept.getEmployees().get(0).getName());  // "Alice"

Java Records:

public record Person(String name, int age, String city) {}

String toon = """
    name: Charlie
    age: 30
    city: Seattle
    """;

Person person = Toon.decode(toon, Person.class);
System.out.println(person.name());  // "Charlie"

Using TOON in LLM Prompts

TOON works best when you show the format instead of describing it. The structure is self-documenting – models parse it naturally once they see the pattern.

Sending TOON to LLMs (Input)

Wrap your encoded data in a fenced code block (label it ```toon for clarity). The indentation and headers are usually enough – models treat it like familiar YAML or CSV. The explicit length markers ([N]) and field headers (`{field1,field2}`) help the model track structure, especially for large tables.

Generating TOON from LLMs (Output)

For output, be more explicit. When you want the model to generate TOON:

  • Show the expected header (users[N]{id,name,role}:). The model fills rows instead of repeating keys, reducing generation errors.
  • State the rules: 2-space indent, no trailing spaces, [N] matches row count.

Here’s a prompt that works for both reading and generating:

Data is in TOON format (2-space indent, arrays show length and fields).

```toon
users[3]{id,name,role,lastLogin}:
  1,Alice,admin,2025-01-15T10:30:00Z
  2,Bob,user,2025-01-14T15:22:00Z
  3,Charlie,user,2025-01-13T09:45:00Z
```

Task: Return only users with role "user" as TOON. Use the same header. Set [N] to match the row count. Output only the code block.

Syntax Cheatsheet

**Show format examples**
// Object
{ id: 1, name: 'Ada' }          → id: 1
                                  name: Ada

// Nested object
{ user: { id: 1 } }             → user:
                                    id: 1

// Primitive array (inline)
{ tags: ['foo', 'bar'] }        → tags[2]: foo,bar

// Tabular array (uniform objects)
{ items: [                      → items[2]{id,qty}:
  { id: 1, qty: 5 },                1,5
  { id: 2, qty: 3 }                 2,3
]}

// Mixed / non-uniform (list)
{ items: [1, { a: 1 }, 'x'] }   → items[3]:
                                    - 1
                                    - a: 1
                                    - x

// Array of arrays
{ pairs: [[1, 2], [3, 4]] }     → pairs[2]:
                                    - [2]: 1,2
                                    - [2]: 3,4

// Root array
['x', 'y']                      → [2]: x,y

// Empty containers
{}                              → (empty output)
{ items: [] }                   → items[0]:

// Special quoting
{ note: 'hello, world' }        → note: "hello, world"
{ items: ['true', true] }       → items[2]: "true",true

⚡ Performance

TOON4J v1.0.0 delivers production-grade performance optimized for real-world workloads:

Benchmark Results

Real-world data encoding performance (50 iterations, Java 17):

Data Size Average Time Throughput Use Case
256KB 4.9ms 203 encodes/s API responses, small datasets
1MB 9.4ms 106 encodes/s Medium datasets, batch processing
5MB 40.1ms 25 encodes/s Large datasets, data exports

Performance characteristics scale linearly with data size, providing predictable behavior.

Key Performance Features

  • Optimized Reflection: Cached field/getter accessors eliminate repeated reflection overhead
  • Thread-Safe Caching: ConcurrentHashMap for lock-free accessor reuse across threads
  • Direct Conversion: No intermediate JSON serialization (POJO → Map directly)
  • Pre-sized Collections: Minimizes array reallocation during encoding
  • Pooled StringBuilder: ThreadLocal object pooling reduces GC pressure
  • Linear Scaling: Performance scales predictably with data size

API Reference

Encoding

// Encode with default options
String toon = Toon.encode(Object value);

// Encode with custom options
String toon = Toon.encode(Object value, EncodeOptions options);

EncodeOptions:

EncodeOptions options = EncodeOptions.builder()
    .indent(4)                          // 4-space indentation (default: 2)
    .delimiter(Delimiter.PIPE)          // Use pipe delimiter (default: COMMA)
    .lengthMarker(true)                 // Add # to array lengths (default: false)
    .build();

// Presets
EncodeOptions.compact()   // Minimal output
EncodeOptions.verbose()   // Maximum readability

Decoding

// Decode to Map/List/primitive
Object data = Toon.decode(String input);
Object data = Toon.decode(String input, DecodeOptions options);

// Decode to POJO (automatic deserialization)
User user = Toon.decode(String input, Class<User> targetClass);
User user = Toon.decode(String input, Class<User> targetClass, DecodeOptions options);

DecodeOptions:

DecodeOptions options = DecodeOptions.lenient();      // Lenient mode with 2-space indent
DecodeOptions options = DecodeOptions.lenient(4);     // Lenient mode with 4-space indent
DecodeOptions options = DecodeOptions.strict();       // Strict mode (default)
DecodeOptions options = DecodeOptions.strict(4);      // Strict mode with 4-space indent

Building from Source

cd toon4j
mvn clean install

Running Tests

mvn test

Specification

For the complete TOON specification, see SPEC.md in the original repository.

Related Projects

When implementing TOON in other languages, please follow the official SPEC.md to ensure compatibility.

License

MIT License - see the original TOON repository for details.

Quality gate SonarQube Cloud

About

High-performance, zero-dependency Java implementation of Token-Oriented Object Notation (TOON) – JSON for LLM prompts at half the tokens

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages