How models see text: tokens & context
Tokens are the units a model reads and bills on — here is how they work and why context windows matter.
Prerequisites
- What is an LLM?
You will learn
- Explain what a token is and how it differs from a word or character
- Estimate token counts for a piece of text
- Reason about context window limits and cost before you hit them
Telugu lo nerchuko · Watch in Telugu
A model does not read letters or words the way you do. It reads tokens. Understanding tokens explains three things that confuse beginners at once: why you get charged the amounts you do, why long documents get cut off, and why the model sometimes mangles spelling or counts letters wrong.
Overview
A token is a small chunk of text — often a word, sometimes part of a word, sometimes punctuation. Before any model sees your prompt, a tokenizer splits the text into these chunks and maps each to a number. The model only ever works with those numbers. Output is generated one token at a time and converted back to text for you.
Two practical consequences follow. First, both your input and the model's output are measured and billed in tokens. Second, every model has a maximum number of tokens it can hold at once — the context window — and that budget covers the prompt and the reply together.
Key ideas
What counts as a token
A rough rule for English: one token is about four characters, and 100 tokens is about 75 words. Common words are usually a single token; rare or compound words split into several. Whitespace and punctuation count too. Telugu and other non-Latin scripts typically use more tokens per character than English, so the same sentence costs more to process.
Counting tokens in code
Do not guess when it matters. Count. For Anthropic models you can use the token counting endpoint; for OpenAI models the tiktoken library does it locally.
import tiktoken
encoder = tiktoken.get_encoding("cl100k_base")
text = "AI with Sandy teaches AI in Telugu."
tokens = encoder.encode(text)
print(len(tokens)) # number of tokens
print(tokens[:5]) # the integer ids the model actually seesThe context window is a shared budget
If a model has a 200k-token context window, that ceiling is split between everything you send (system prompt, conversation history, pasted documents) and everything it generates. Fill the window with input and you leave no room for the answer. When you exceed the window, the request fails or older messages get dropped — which is why a long chat can seem to "forget" the start.
Why spelling and counting trip it up
Because the model sees tokens, not letters, questions like "how many r's are in strawberry" are genuinely hard for it — the word may be one or two tokens, and the individual letters are not separately visible. This is a tokenization artefact, not a sign the model is broken. Offload that kind of task to code.
Quick recap
- Models read tokens — sub-word chunks mapped to numbers — not words or letters.
- Rough English estimate: ~4 characters per token, ~75 words per 100 tokens.
- The context window is one budget shared by input and output; exceed it and requests fail or truncate.
- Non-Latin scripts like Telugu cost more tokens per character.
- Letter-counting and spelling tasks are hard for tokenizers; hand them to code.