Memory Systems in LangChain
Stateless LLMs don't remember previous interactions. Memory systems in LangChain solve this by maintaining conversation history and context, enabling natural multi-turn conversations and context-aware responses.
Memory System Definition: A component that stores and manages conversation history, allowing LLMs to maintain context across multiple interactions and provide coherent, contextually aware responses in multi-turn conversations.
Why Memory Matters
Without memory, every LLM interaction is isolated:
# Without memory - each call is independent
llm.invoke("My name is Alice")
# Response: "Hello Alice!"
llm.invoke("What's my name?")
# Response: "I don't know your name." ❌
With memory, the LLM remembers context:
# With memory - maintains conversation history
conversation.invoke("My name is Alice")
# Response: "Hello Alice!"
conversation.invoke("What's my name?")
# Response: "Your name is Alice!" ✅
Types of Memory in LangChain
LangChain provides several memory implementations, each suited for different use cases.
Memory Types:
- ConversationBufferMemory - Stores complete conversation history
- ConversationSummaryMemory - Stores condensed summaries
- ConversationBufferWindowMemory - Keeps only last N messages
- ConversationSummaryBufferMemory - Hybrid approach
- VectorStoreMemory - Semantic search over past conversations
ConversationBufferMemory
The simplest memory type stores the entire conversation history.
ConversationBufferMemory Definition: A memory implementation that stores the complete, unmodified conversation history including all user inputs and AI responses, providing full context but potentially exceeding token limits in long conversations.
Python Example:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain
# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)
# Create memory
memory = ConversationBufferMemory()
# Create conversation chain
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True # Show the conversation history
)
# Have a conversation
print(conversation.predict(input="Hi! I'm working on a Python project."))
print(conversation.predict(input="Can you help me debug a function?"))
print(conversation.predict(input="What was I just working on?"))
# View memory contents
print("\n--- Memory Contents ---")
print(memory.load_memory_variables({}))
JavaScript Example:
import { ChatOpenAI } from "@langchain/openai";
import { BufferMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";
// Initialize LLM
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.7 });
// Create memory
const memory = new BufferMemory();
// Create conversation chain
const conversation = new ConversationChain({
llm: llm,
memory: memory,
verbose: true
});
// Have a conversation
const response1 = await conversation.call({ input: "Hi! I'm working on a Python project." });
console.log(response1.response);
const response2 = await conversation.call({ input: "Can you help me debug a function?" });
console.log(response2.response);
const response3 = await conversation.call({ input: "What was I just working on?" });
console.log(response3.response);
// View memory contents
const memoryContents = await memory.loadMemoryVariables({});
console.log("\n--- Memory Contents ---");
console.log(memoryContents);
Limitation: ConversationBufferMemory stores everything, which can quickly exceed token limits in long conversations. For production applications with extended dialogues, consider other memory types.
ConversationBufferWindowMemory
This memory type keeps only the last K conversation turns, preventing token overflow.
ConversationBufferWindowMemory Definition: A memory type that maintains a sliding window of the most recent K conversation exchanges, automatically discarding older messages to prevent exceeding token limits while preserving recent context.
Python Example:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-3.5-turbo")
# Keep only last 2 conversation turns (4 messages: 2 human + 2 AI)
memory = ConversationBufferWindowMemory(k=2)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
# Conversation exceeding window size
conversation.predict(input="Hi, I'm Alice.")
conversation.predict(input="I like pizza.")
conversation.predict(input="I have a cat named Whiskers.")
conversation.predict(input="What's my name?") # Will remember
conversation.predict(input="What food do I like?") # Might not remember (outside window)
print("\n--- Memory Window Contents ---")
print(memory.load_memory_variables({}))
Choose
kConversationSummaryMemory
Instead of storing raw messages, this creates running summaries to save tokens.
ConversationSummaryMemory Definition: A memory implementation that uses an LLM to create condensed summaries of conversation history, reducing token usage while preserving key information from extended dialogues.
Python Example:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
# Create summary memory
memory = ConversationSummaryMemory(llm=llm)
conversation = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
# Long conversation that will be summarized
conversation.predict(input="Hi! I'm planning a trip to Japan next month.")
conversation.predict(input="I want to visit Tokyo, Kyoto, and Osaka.")
conversation.predict(input="I'm interested in traditional temples and modern technology.")
conversation.predict(input="My budget is around $3000.")
conversation.predict(input="What are your recommendations?")
# View the summary
print("\n--- Conversation Summary ---")
print(memory.load_memory_variables({}))
JavaScript Example:
import { ChatOpenAI } from "@langchain/openai";
import { ConversationSummaryMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0 });
// Create summary memory
const memory = new ConversationSummaryMemory({ llm });
const conversation = new ConversationChain({
llm,
memory,
verbose: true
});
// Long conversation
await conversation.call({ input: "Hi! I'm planning a trip to Japan next month." });
await conversation.call({ input: "I want to visit Tokyo, Kyoto, and Osaka." });
await conversation.call({ input: "I'm interested in traditional temples and modern technology." });
await conversation.call({ input: "My budget is around $3000." });
await conversation.call({ input: "What are your recommendations?" });
// View the summary
const summary = await memory.loadMemoryVariables({});
console.log("\n--- Conversation Summary ---");
console.log(summary);
Best for: Long conversations where you need to maintain context efficiently without hitting token limits. The trade-off is additional LLM calls to create summaries.
Vector Store Memory
For semantic search over conversation history, use vector store memory. This finds relevant past interactions based on similarity.
Vector Store Memory Definition: A memory system that stores conversation history as embeddings in a vector database, enabling semantic search to retrieve relevant past interactions based on meaning rather than recency.
Python Example:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import FAISS
from langchain.chains import ConversationChain
# Initialize components
llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()
# Create vector store
vectorstore = FAISS.from_texts(
["Initial placeholder"],
embedding=embeddings
)
# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# Create vector store memory
memory = VectorStoreRetrieverMemory(retriever=retriever)
# Add some memories manually
memory.save_context(
{"input": "My favorite color is blue"},
{"output": "That's nice! Blue is a calming color."}
)
memory.save_context(
{"input": "I work as a software engineer"},
{"output": "Software engineering is a great field!"}
)
memory.save_context(
{"input": "I enjoy hiking on weekends"},
{"output": "Hiking is wonderful exercise!"}
)
# Retrieve relevant memories
print("--- Relevant Memories for 'What do I do for work?' ---")
relevant = memory.load_memory_variables({"prompt": "What do I do for work?"})
print(relevant)
print("\n--- Relevant Memories for 'What color do I like?' ---")
relevant = memory.load_memory_variables({"prompt": "What color do I like?"})
print(relevant)
Vector store memory is powerful for long-term memory systems where you want to retrieve relevant past conversations based on semantic similarity, not just recency.
Complete Chatbot with Memory
Let's build a complete chatbot that combines memory with custom prompts.
Python Complete Example:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from datetime import datetime
# Custom prompt template
template = """You are a helpful AI assistant with a friendly personality.
You remember details from the conversation and provide personalized responses.
Current conversation:
{history}
Human: {input}
AI:"""
prompt = PromptTemplate(
input_variables=["history", "input"],
template=template
)
# Initialize LLM
llm = ChatOpenAI(
model="gpt-3.5-turbo",
temperature=0.7
)
# Create memory (keep last 5 exchanges)
memory = ConversationBufferWindowMemory(k=5)
# Create conversation chain
chatbot = ConversationChain(
llm=llm,
memory=memory,
prompt=prompt,
verbose=False
)
# Helper function for chatting
def chat(message):
"""Send a message and get a response"""
response = chatbot.predict(input=message)
return response
# Interactive chatbot session
if __name__ == "__main__":
print("Chatbot started! Type 'quit' to exit.\n")
while True:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit', 'bye']:
print("Chatbot: Goodbye! Have a great day!")
break
response = chat(user_input)
print(f"Chatbot: {response}\n")
JavaScript Complete Example:
import { ChatOpenAI } from "@langchain/openai";
import { BufferWindowMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";
import * as readline from "readline";
// Custom prompt template
const template = `You are a helpful AI assistant with a friendly personality.
You remember details from the conversation and provide personalized responses.
Current conversation:
{history}
Human: {input}
AI:`;
const prompt = PromptTemplate.fromTemplate(template);
// Initialize LLM
const llm = new ChatOpenAI({
modelName: "gpt-3.5-turbo",
temperature: 0.7
});
// Create memory (keep last 5 exchanges)
const memory = new BufferWindowMemory({ k: 5 });
// Create conversation chain
const chatbot = new ConversationChain({
llm: llm,
memory: memory,
prompt: prompt,
verbose: false
});
// Helper function for chatting
async function chat(message) {
const response = await chatbot.call({ input: message });
return response.response;
}
// Interactive chatbot session
async function main() {
console.log("Chatbot started! Type 'quit' to exit.\n");
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout
});
const askQuestion = () => {
rl.question("You: ", async (userInput) => {
if (['quit', 'exit', 'bye'].includes(userInput.toLowerCase())) {
console.log("Chatbot: Goodbye! Have a great day!");
rl.close();
return;
}
const response = await chat(userInput);
console.log(`Chatbot: ${response}\n`);
askQuestion();
});
};
askQuestion();
}
main().catch(console.error);
Advanced Memory Pattern: Persistent Storage
For production applications, you'll want to persist memory across sessions.
Python with Redis Example:
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.memory.chat_message_histories import RedisChatMessageHistory
from langchain.chains import ConversationChain
# Create memory with Redis backend
def create_chatbot_with_persistence(session_id):
# Redis stores conversation history persistently
message_history = RedisChatMessageHistory(
session_id=session_id,
url="redis://localhost:6379/0"
)
memory = ConversationBufferMemory(
chat_memory=message_history,
return_messages=True
)
llm = ChatOpenAI(model="gpt-3.5-turbo")
chatbot = ConversationChain(
llm=llm,
memory=memory,
verbose=True
)
return chatbot
# Usage: Each user gets their own session
user1_chatbot = create_chatbot_with_persistence("user_123")
user2_chatbot = create_chatbot_with_persistence("user_456")
# Conversations are maintained separately
user1_chatbot.predict(input="My name is Alice")
user2_chatbot.predict(input="My name is Bob")
# Later, even after restart...
user1_chatbot.predict(input="What's my name?") # "Alice"
user2_chatbot.predict(input="What's my name?") # "Bob"
Memory Management Best Practices
Best Practices:
- Choose the right memory type based on conversation length and context needs
- Set reasonable limits - Use window memory or summaries for long conversations
- Persist memory for production applications using databases like Redis or PostgreSQL
- Clear memory when starting new topics or sessions
- Monitor token usage to avoid exceeding model limits
- Use session IDs to maintain separate conversations for different users
Memory Comparison Table
| Memory Type | Pros | Cons | Best For |
|---|---|---|---|
| Buffer | Simple, complete history | Token limit issues | Short conversations |
| Window | Fixed token usage | Loses old context | Customer support chats |
| Summary | Efficient for long chats | Extra LLM calls | Extended dialogues |
| Vector Store | Semantic retrieval | Complex setup | Knowledge assistants |
Key Takeaways
What You've Learned:
- Memory systems enable stateful conversations with LLMs
- Different memory types suit different use cases
- ConversationBufferMemory stores everything (simple but limited)
- ConversationSummaryMemory creates summaries (efficient for long chats)
- Vector store memory enables semantic search over past conversations
- Production systems need persistent storage with session management
Next Steps
In the next lesson, we'll build a complete LangChain application:
- Design a question-answering system
- Integrate memory and tools
- Add error handling and logging
- Deploy the application
Quiz
Test your understanding of memory systems in LangChain: