What Are Tokens?
When you type a message to ChatGPT, it doesn't read it the same way you do. It breaks your text into small pieces called tokens.
Token Definition: The basic unit of text that AI models process. A token can be a word, part of a word, or even a punctuation mark. AI models don't read text character-by-character like humans—they break it into tokens first, then process those tokens to understand and generate language.
Tokens Are Like Puzzle Pieces
Think of tokens as puzzle pieces:
- A token can be a whole word: "cat" = 1 token
- Or part of a word: "running" = "run" + "ning" = 2 tokens
- Or a punctuation mark: "!" = 1 token
- Even spaces count!
Rule of Thumb:
- 1 token ≈ 4 characters in English
- 100 tokens ≈ 75 words
- 1,000 tokens ≈ 750 words (about 1 page)
So a typical ChatGPT response of 200 words is about 265 tokens!
Why Do Tokens Matter?
-
Cost: AI companies charge per token
- Input tokens (what you send)
- Output tokens (what the AI generates)
-
Limits: There's a maximum number of tokens the AI can handle at once
-
Understanding: How text is tokenized affects how the AI understands it
Let's See Tokens in Action
Here's a simple interactive example showing how text becomes tokens:
Note: This is a simplified example! Real tokenization (like GPT's BPE tokenization) is more sophisticated and considers subword patterns, common phrases, and statistical frequency. But the concept is the same: breaking text into pieces.
Tokenization Definition: The process of breaking text into smaller pieces (tokens) that an AI model can understand. Different models use different tokenization methods, but all convert human-readable text into tokens before processing, similar to how you might break a sentence into individual words or syllables.
What is a Context Window?
The context window is the maximum amount of text (measured in tokens) that an AI model can "see" and "remember" at one time.
Context Window Definition: The maximum number of tokens an AI model can process at once, including your conversation history, prompts, and responses. Think of it as the AI's "working memory"—everything within this window is visible to the model, but anything outside it is forgotten.
Think of it like short-term memory:
- You can remember the last few things someone said in a conversation
- But you can't remember every word from a 3-hour conversation
- AI is similar - it has a "memory limit"
Context Window Sizes
Different models have different context windows:
Context Window Comparison (Major LLMs)
| Feature |
|---|
What Counts Toward the Context Window?
Everything! The context window includes:
- ✅ Your entire conversation history
- ✅ System instructions (hidden prompts)
- ✅ Your current prompt
- ✅ The AI's responses
- ✅ Any documents you upload (if supported)
Pro Tip: If you have a very long conversation with ChatGPT, it might start "forgetting" things you said at the beginning. That's because older messages get pushed out of the context window!
How Context Window Affects You
Scenario 1: Long Documents
Question: "Can I upload a 200-page PDF to ChatGPT?"
Answer: Depends on the model!
- GPT-4 (8k): ❌ Won't fit
- GPT-4 Turbo: ✅ Might fit
- Gemini 1.5 Pro: ✅ Easily fits
Scenario 2: Long Conversations
Imagine you're having a coding help session:
- Message 1-10: Discussing project requirements (1,500 tokens)
- Message 11-20: Writing code together (2,500 tokens)
- Message 21-30: Debugging (2,000 tokens)
- Total: 6,000 tokens
With a 4k context model, the AI would start "forgetting" your early requirements!
Scenario 3: Summarizing Content
Want to summarize a long article?
- Article: 5,000 tokens
- Your prompt: 100 tokens
- AI response: 300 tokens
- Total needed: 5,400 tokens
You need a model with at least a 8k context window!
Interactive Token Calculator
Why Context Windows Have Limits
You might wonder: "Why not make the context window infinite?"
Technical Reasons:
-
Computational Cost: Longer context = exponentially more computation
- Processing scales quadratically (O(n²))
- 2x context = 4x computation
- 10x context = 100x computation!
-
Memory Requirements: Keeping all that context in memory is expensive
-
Quality Degradation: Models can "lose track" in very long contexts
The Math: If processing 1,000 tokens takes 1 second, processing 10,000 tokens might take 100 seconds (not 10), because attention mechanisms look at every token pair!
Practical Tips
1. Start Fresh for New Topics
If switching topics, start a new conversation to avoid wasting context on irrelevant history.
2. Summarize Long Conversations
Ask the AI to summarize key points, then start fresh with the summary.
3. Choose the Right Model
- Short queries: Smaller context is fine (and cheaper!)
- Long documents: Use models with larger context windows
- Long conversations: Consider GPT-4 Turbo or Claude 3
4. Be Concise
More tokens = more cost and less room for other content.
Test Your Knowledge
Key Takeaways
🔑 Tokens = Small pieces of text (≈4 characters each)
- Models process text as tokens, not characters
- Everything you send/receive counts as tokens
- Tokens determine cost and limits
🔑 Context Window = Maximum tokens the model can handle
- Includes ALL conversation history
- Different models have different limits (4k to 1M tokens)
- Bigger context = more capability but higher cost
🔑 Practical Impact:
- Long conversations may cause the AI to "forget" early messages
- Choose models based on your context needs
- Start fresh conversations for new topics
What's Next?
Now you understand the building blocks (tokens) and memory limits (context windows) of AI. In the next lesson, we'll dive into Prompt Engineering - the art of communicating effectively with AI to get the best results!
Fun Exercise: Try pasting one of your essays or articles into ChatGPT and ask it how many tokens it is. You'll start to develop an intuition for token counts!