Count and truncate text based on tokens
Large language models such as GPT-3.5 and GPT-4 work in terms of tokens.
This tool can count tokens, using OpenAI's tiktoken library.
It can also truncate text to a specified number of tokens.
See llm, ttok and strip-tags—CLI tools for working with ChatGPT and other LLMs for more on this project.
Install this tool using pip:
pip install ttokOr using Homebrew:
brew install simonw/llm/ttokProvide text as arguments to this tool to count tokens:
ttok Hello world2
You can also pipe text into the tool:
echo -n "Hello world" | ttok2
Here the echo -n option prevents echo from adding a newline - without that you would get a token count of 3.
To pipe in text and then append extra tokens from arguments, use the -i - option:
echo -n "Hello world" | ttok more text -i -6
By default, the tokenizer model for GPT-3.5 and GPT-4 is used.
To use the model for GPT-2 and GPT-3, add --model gpt2:
ttok boo Hello there this is -m gpt26
Compared to GPT-3.5:
ttok boo Hello there this is5
Further model options are documented here.
Use the -t 10 or --truncate 10 option to truncate text to a specified number of tokens:
ttok This is too many tokens -t 3This is too
The --encode option can be used to view the integer token IDs for the incoming text:
ttok Hello world --encode9906 1917
The --decode method reverses this process:
ttok 9906 1917 --decodeHello world
Add --tokens to either of these options to see a detailed breakdown of the tokens:
ttok Hello world --encode --tokens[b'Hello', b' world']
This is the full list of available models and their corresponding encodings. Model names and encoding names are valid for the -m/--model option.
- gpt-4(- cl100k_base)
- gpt-3.5-turbo(- cl100k_base)
- gpt-3.5(- cl100k_base)
- gpt-35-turbo(- cl100k_base)
- davinci-002(- cl100k_base)
- babbage-002(- cl100k_base)
- text-embedding-ada-002(- cl100k_base)
- text-embedding-3-small(- cl100k_base)
- text-embedding-3-large(- cl100k_base)
- text-davinci-003(- p50k_base)
- text-davinci-002(- p50k_base)
- text-davinci-001(- r50k_base)
- text-curie-001(- r50k_base)
- text-babbage-001(- r50k_base)
- text-ada-001(- r50k_base)
- davinci(- r50k_base)
- curie(- r50k_base)
- babbage(- r50k_base)
- ada(- r50k_base)
- code-davinci-002(- p50k_base)
- code-davinci-001(- p50k_base)
- code-cushman-002(- p50k_base)
- code-cushman-001(- p50k_base)
- davinci-codex(- p50k_base)
- cushman-codex(- p50k_base)
- text-davinci-edit-001(- p50k_edit)
- code-davinci-edit-001(- p50k_edit)
- text-similarity-davinci-001(- r50k_base)
- text-similarity-curie-001(- r50k_base)
- text-similarity-babbage-001(- r50k_base)
- text-similarity-ada-001(- r50k_base)
- text-search-davinci-doc-001(- r50k_base)
- text-search-curie-doc-001(- r50k_base)
- text-search-babbage-doc-001(- r50k_base)
- text-search-ada-doc-001(- r50k_base)
- code-search-babbage-code-001(- r50k_base)
- code-search-ada-code-001(- r50k_base)
- gpt2(- gpt2)
- gpt-2(- gpt2)
Usage: ttok [OPTIONS] [PROMPT]...
  Count and truncate text based on tokens
  To count tokens for text passed as arguments:
      ttok one two three
  To count tokens from stdin:
      cat input.txt | ttok
  To truncate to 100 tokens:
      cat input.txt | ttok -t 100
  To truncate to 100 tokens using the gpt2 model:
      cat input.txt | ttok -t 100 -m gpt2
  To view token integers:
      cat input.txt | ttok --encode
  To convert tokens back to text:
      ttok 9906 1917 --decode
  To see the details of the tokens:
      ttok "hello world" --tokens
  Outputs:
      [b'hello', b' world']
Options:
  --version               Show the version and exit.
  -i, --input FILENAME
  -t, --truncate INTEGER  Truncate to this many tokens
  -m, --model TEXT        Which model to use
  --encode, --tokens      Output token integers
  --decode                Convert token integers to text
  --tokens                Output full tokens
  --allow-special         Do not error on special tokens
  --help                  Show this message and exit.
You can also run this command using:
python -m ttok --helpTo contribute to this tool, first checkout the code. Then create a new virtual environment:
cd ttok
python -m venv venv
source venv/bin/activateNow install the dependencies and test dependencies:
pip install -e '.[test]'To run the tests:
pytest