What Are Tokens?

When you type a message to ChatGPT, it doesn't read it the same way you do. It breaks your text into small pieces called tokens.

Token Definition: The basic unit of text that AI models process. A token can be a word, part of a word, or even a punctuation mark. AI models don't read text character-by-character like humans—they break it into tokens first, then process those tokens to understand and generate language.

Tokens Are Like Puzzle Pieces

Think of tokens as puzzle pieces:

A token can be a whole word: "cat" = 1 token
Or part of a word: "running" = "run" + "ning" = 2 tokens
Or a punctuation mark: "!" = 1 token
Even spaces count!

How the sentence "Understanding tokenization is awesome!" might be split. Common words stay whole; rarer or longer words get chopped into subwords.

Rule of Thumb:

1 token ≈ 4 characters in English
100 tokens ≈ 75 words
1,000 tokens ≈ 750 words (about 1 page)

So a typical ChatGPT response of 200 words is about 265 tokens!

Why Do Tokens Matter?

Cost: AI companies charge per token
- Input tokens (what you send)
- Output tokens (what the AI generates)
Limits: There's a maximum number of tokens the AI can handle at once
Understanding: How text is tokenized affects how the AI understands it

Let's See Tokens in Action

Here's a simple interactive example showing how text becomes tokens:

javascript Playgroundjavascript

Note: This is a simplified example! Real tokenization (like GPT's BPE tokenization) is more sophisticated and considers subword patterns, common phrases, and statistical frequency. But the concept is the same: breaking text into pieces.

Tokenization Definition: The process of breaking text into smaller pieces (tokens) that an AI model can understand. Different models use different tokenization methods, but all convert human-readable text into tokens before processing, similar to how you might break a sentence into individual words or syllables.

What is a Context Window?

The context window is the maximum amount of text (measured in tokens) that an AI model can "see" and "remember" at one time.

Context Window Definition: The maximum number of tokens an AI model can process at once, including your conversation history, prompts, and responses. Think of it as the AI's "working memory"—everything within this window is visible to the model, but anything outside it is forgotten.

Think of it like short-term memory:

You can remember the last few things someone said in a conversation
But you can't remember every word from a 3-hour conversation
AI is similar - it has a "memory limit"

Context Window Sizes

Different models have different context windows:

Context Window Comparison (Major LLMs)

Feature

What Counts Toward the Context Window?

Everything! The context window includes:

✅ Your entire conversation history
✅ System instructions (hidden prompts)
✅ Your current prompt
✅ The AI's responses
✅ Any documents you upload (if supported)

Pro Tip: If you have a very long conversation with ChatGPT, it might start "forgetting" things you said at the beginning. That's because older messages get pushed out of the context window!

How Context Window Affects You

Scenario 1: Long Documents

Question: "Can I upload a 200-page PDF to ChatGPT?"

Answer: Depends on the model!

GPT-4 (8k): ❌ Won't fit
GPT-4 Turbo: ✅ Might fit
Gemini 1.5 Pro: ✅ Easily fits

Scenario 2: Long Conversations

Imagine you're having a coding help session:

Message 1-10: Discussing project requirements (1,500 tokens)
Message 11-20: Writing code together (2,500 tokens)
Message 21-30: Debugging (2,000 tokens)
Total: 6,000 tokens

With a 4k context model, the AI would start "forgetting" your early requirements!

Scenario 3: Summarizing Content

Want to summarize a long article?

Article: 5,000 tokens
Your prompt: 100 tokens
AI response: 300 tokens
Total needed: 5,400 tokens

You need a model with at least a 8k context window!

Interactive Token Calculator

Context Window Calculatorjavascript

Why Context Windows Have Limits

You might wonder: "Why not make the context window infinite?"

Technical Reasons:

Computational Cost: Longer context = exponentially more computation
- Processing scales quadratically (O(n²))
- 2x context = 4x computation
- 10x context = 100x computation!
Memory Requirements: Keeping all that context in memory is expensive
Quality Degradation: Models can "lose track" in very long contexts

The Math: If processing 1,000 tokens takes 1 second, processing 10,000 tokens might take 100 seconds (not 10), because attention mechanisms look at every token pair!

Practical Tips

1. Start Fresh for New Topics

If switching topics, start a new conversation to avoid wasting context on irrelevant history.

2. Summarize Long Conversations

Ask the AI to summarize key points, then start fresh with the summary.

3. Choose the Right Model

Short queries: Smaller context is fine (and cheaper!)
Long documents: Use models with larger context windows
Long conversations: Consider GPT-4 Turbo or Claude 3

4. Be Concise

More tokens = more cost and less room for other content.

Test Your Knowledge

Key Takeaways

🔑 Tokens = Small pieces of text (≈4 characters each)

Models process text as tokens, not characters
Everything you send/receive counts as tokens
Tokens determine cost and limits

🔑 Context Window = Maximum tokens the model can handle

Includes ALL conversation history
Different models have different limits (4k to 1M tokens)
Bigger context = more capability but higher cost

🔑 Practical Impact:

Long conversations may cause the AI to "forget" early messages
Choose models based on your context needs
Start fresh conversations for new topics

What's Next?

Now you understand the building blocks (tokens) and memory limits (context windows) of AI. In the next lesson, we'll dive into Prompt Engineering - the art of communicating effectively with AI to get the best results!

Fun Exercise: Try pasting one of your essays or articles into ChatGPT and ask it how many tokens it is. You'll start to develop an intuition for token counts!