TOON: Token-Oriented Object Notation

4

TOON: Token-Oriented Object Notation programming vibecoding toonformat.dev
via carlana 33 hours ago caches
Archive.org Archive.today Ghostarchive
| 12 comments

1. 12
  
  tonyg edited 30 hours ago
  
  JSON Data Model
  
  Encodes the same objects, arrays, and primitives as JSON with deterministic, lossless round-trips.
  
  (emphasis mine)
  
  It does not. JSON is hopeless.
  
  The syntax here is nice. But people really have to learn to stop thinking that a sensible JSON "Data Model" exists.
  
  Edited to add: Oh my god. It's for communication with LLMs! And here I was taking it seriously as potentially being able to discuss the meaning of a document. I retract my objections, go nuts, here I am discussing motes and completely overlooking the beam.
  1. 7
    
    David_Gerard 29 hours ago
    
    At this point, the LLM stuff is becoming just spam.
  2. ~
    
    andyc 23 hours ago
    
    I like the counter examples of problems with number round tripping and duplicate keys
    
    But it does not mean JSON is hopeless!
    
    Pretty much any serialization format is going to have problems round tripping numbers from say Rust to JavaScript.
    
    The problem is not the serialization format -- it's the fact that Rust and JavaScript don't agree on what numbers are. Python and JavaScript don't agree either.
    
    Rust and Python don't agree, etc.
    
    (the duplicate key issue is about library disagreement, not language disagreement with respect to numbers)
    1. ~
      
      tonyg 22 hours ago
      
      The data language (which is what JSON is, a thing-in-itself, not a serialization format for some other language) can specify what numbers (and strings, etc.) are, abstractly, independent of implementation or host language.
      
      A decent data language definition should come with a self-contained equivalence relation over values.
      
      JSON doesn't. Many do. Then it's on the language binding to faithfully represent that - or not! It's fine not to; but when it's crystal clear what the meaning of a piece of syntax is, in terms of the language itself rather than a particular implementation, it gives the library author an obvious opportunity to explain how and why their implementation may not cover the whole range of values in the language. With JSON, there's no such obvious opportunity: almost anything you can imagine is "correct".
      
      For example, imagine that JSON chose to specify comparison of strings using byte-wise comparison of their UTF-8 encodings. Then a library using UCS2 internally could and would mention areas where that difference could cause trouble. As it stands, JSON libraries often don't even bother to mention how strings are compared, it's just up to the underlying language/library. The same is true of JSON numbers.
      
      (The duplicate key issue is also a language issue: JSON permits duplicate keys to be significant; implementations that don't therefore only implement a subset of JSON. This is also fine! Implementors just have to specify the restrictions of their implementation. And, coming back to OP, implementoirs also have to refrain from making silly claims about full-fidelity roundtripping.)
      1. ~
        
        andyc edited 21 hours ago
        
        A decent data language definition should come with a self-contained equivalence relation over values.
        
        JSON doesn't. Many do.
        
        I don't agree -- it's the same problem that I pointed out. Just like languages already have in-memory/interior representations of numbers and strings, they already have == operators, and functions like strcmp() and memcmp()
        
        Sometimes those are equivalence relations, and sometimes they're not. (And all serialization formats need to support floats, where equality is questionable)
        
        You can define those in a library, but not everyone wants to use your library.
        
        In theory, it would be nice if languages behaved more similarly ... but I see the trend is in the opposite direction: there more heterogeneity in languages.
        
        Empirically, people want to exchange data, and coupling that to language semantic changes means that the areas of application are more limited ... i.e. JSON solves problems. Not perfectly, but it does solve them (and better than say XML, and in many cases protobuf)
        
        ~
        
        andyc 21 hours ago
        
        Or maybe to put it another way, I think you can probably define some new specs:
        
        JSON + equivalence relations over values
        
        XML + equivalence relations over values
        
        Protobuf + equivalence relations over values
        
        But I don't see that anyone is clamoring for that ... If they are a JS programmer, they want to use the values and equality in JS
        
        If they are a Rust programmer, they want to use the values and equality in Rust, etc.
2. 7
  
  rs86 29 hours ago
  
  This made rounds in LinkedIn and pretty much everyone that defended this was mocked. Mostly juniors were spamming this. This has been bashed to the ground. It has all the problems you would like to have if you wanted it to suck.
3. 6
  
  Jan200101 31 hours ago
  
  welcome back CSV
  1. ~
    
    einacio 29 hours ago
    
    For tabular data it's basically csv with length, for object data it's yaml. So it's a mix of those, rather than json
4. ~
  
  mitsuhiko 31 hours ago
  
  It does not really do what it's intended on doing. If you want token efficiency, all the gains are of it being tabular which you can do with JSON too, and then you're more token efficient. I shared their canonical example with tokens colorized and counted in different formats a while back: https://nitter.net/mitsuhiko/status/1990549149629222938
5. ~
  
  benton 23 hours ago
  
  comma separated values? no! it's values separated with commas!
6. ~
  
  peter-leonov 28 hours ago
  
  Maybe I'm yelling at the cloud already, but after the joy of XML -> JSON and XML -> YAML transition, IHMO, there is only nice portable tooling rich binary format that can beat the current state of things. Kinda like HTTP3 or gRPC but for bigger blobs of data that are yet to grow into SQLite.