Memory Systems in LangChain

Stateless LLMs don't remember previous interactions. Memory systems in LangChain solve this by maintaining conversation history and context, enabling natural multi-turn conversations and context-aware responses.

Memory System Definition: A component that stores and manages conversation history, allowing LLMs to maintain context across multiple interactions and provide coherent, contextually aware responses in multi-turn conversations.

Why Memory Matters

Without memory, every LLM interaction is isolated:

python

# Without memory - each call is independent
llm.invoke("My name is Alice")
# Response: "Hello Alice!"

llm.invoke("What's my name?")
# Response: "I don't know your name."  ❌

With memory, the LLM remembers context:

python

# With memory - maintains conversation history
conversation.invoke("My name is Alice")
# Response: "Hello Alice!"

conversation.invoke("What's my name?")
# Response: "Your name is Alice!"  ✅

Types of Memory in LangChain

LangChain provides several memory implementations, each suited for different use cases.

Memory Types:

ConversationBufferMemory - Stores complete conversation history
ConversationSummaryMemory - Stores condensed summaries
ConversationBufferWindowMemory - Keeps only last N messages
ConversationSummaryBufferMemory - Hybrid approach
VectorStoreMemory - Semantic search over past conversations

ConversationBufferMemory

The simplest memory type stores the entire conversation history.

ConversationBufferMemory Definition: A memory implementation that stores the complete, unmodified conversation history including all user inputs and AI responses, providing full context but potentially exceeding token limits in long conversations.

Python Example:

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationChain

# Initialize LLM
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

# Create memory
memory = ConversationBufferMemory()

# Create conversation chain
conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True  # Show the conversation history
)

# Have a conversation
print(conversation.predict(input="Hi! I'm working on a Python project."))
print(conversation.predict(input="Can you help me debug a function?"))
print(conversation.predict(input="What was I just working on?"))

# View memory contents
print("\n--- Memory Contents ---")
print(memory.load_memory_variables({}))

JavaScript Example:

javascript

import { ChatOpenAI } from "@langchain/openai";
import { BufferMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";

// Initialize LLM
const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0.7 });

// Create memory
const memory = new BufferMemory();

// Create conversation chain
const conversation = new ConversationChain({
  llm: llm,
  memory: memory,
  verbose: true
});

// Have a conversation
const response1 = await conversation.call({ input: "Hi! I'm working on a Python project." });
console.log(response1.response);

const response2 = await conversation.call({ input: "Can you help me debug a function?" });
console.log(response2.response);

const response3 = await conversation.call({ input: "What was I just working on?" });
console.log(response3.response);

// View memory contents
const memoryContents = await memory.loadMemoryVariables({});
console.log("\n--- Memory Contents ---");
console.log(memoryContents);

Limitation: ConversationBufferMemory stores everything, which can quickly exceed token limits in long conversations. For production applications with extended dialogues, consider other memory types.

ConversationBufferWindowMemory

This memory type keeps only the last K conversation turns, preventing token overflow.

ConversationBufferWindowMemory Definition: A memory type that maintains a sliding window of the most recent K conversation exchanges, automatically discarding older messages to prevent exceeding token limits while preserving recent context.

Python Example:

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-3.5-turbo")

# Keep only last 2 conversation turns (4 messages: 2 human + 2 AI)
memory = ConversationBufferWindowMemory(k=2)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

# Conversation exceeding window size
conversation.predict(input="Hi, I'm Alice.")
conversation.predict(input="I like pizza.")
conversation.predict(input="I have a cat named Whiskers.")
conversation.predict(input="What's my name?")  # Will remember
conversation.predict(input="What food do I like?")  # Might not remember (outside window)

print("\n--- Memory Window Contents ---")
print(memory.load_memory_variables({}))

Choose

based on your use case. For customer support, k=5-10 often works well. For quick Q&A, k=2-3 may suffice.

ConversationSummaryMemory

Instead of storing raw messages, this creates running summaries to save tokens.

ConversationSummaryMemory Definition: A memory implementation that uses an LLM to create condensed summaries of conversation history, reducing token usage while preserving key information from extended dialogues.

Python Example:

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationSummaryMemory
from langchain.chains import ConversationChain

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

# Create summary memory
memory = ConversationSummaryMemory(llm=llm)

conversation = ConversationChain(
    llm=llm,
    memory=memory,
    verbose=True
)

# Long conversation that will be summarized
conversation.predict(input="Hi! I'm planning a trip to Japan next month.")
conversation.predict(input="I want to visit Tokyo, Kyoto, and Osaka.")
conversation.predict(input="I'm interested in traditional temples and modern technology.")
conversation.predict(input="My budget is around $3000.")
conversation.predict(input="What are your recommendations?")

# View the summary
print("\n--- Conversation Summary ---")
print(memory.load_memory_variables({}))

JavaScript Example:

javascript

import { ChatOpenAI } from "@langchain/openai";
import { ConversationSummaryMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";

const llm = new ChatOpenAI({ modelName: "gpt-3.5-turbo", temperature: 0 });

// Create summary memory
const memory = new ConversationSummaryMemory({ llm });

const conversation = new ConversationChain({
  llm,
  memory,
  verbose: true
});

// Long conversation
await conversation.call({ input: "Hi! I'm planning a trip to Japan next month." });
await conversation.call({ input: "I want to visit Tokyo, Kyoto, and Osaka." });
await conversation.call({ input: "I'm interested in traditional temples and modern technology." });
await conversation.call({ input: "My budget is around $3000." });
await conversation.call({ input: "What are your recommendations?" });

// View the summary
const summary = await memory.loadMemoryVariables({});
console.log("\n--- Conversation Summary ---");
console.log(summary);

Best for: Long conversations where you need to maintain context efficiently without hitting token limits. The trade-off is additional LLM calls to create summaries.

Vector Store Memory

For semantic search over conversation history, use vector store memory. This finds relevant past interactions based on similarity.

Vector Store Memory Definition: A memory system that stores conversation history as embeddings in a vector database, enabling semantic search to retrieve relevant past interactions based on meaning rather than recency.

Python Example:

python

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain.memory import VectorStoreRetrieverMemory
from langchain.vectorstores import FAISS
from langchain.chains import ConversationChain

# Initialize components
llm = ChatOpenAI(model="gpt-3.5-turbo")
embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = FAISS.from_texts(
    ["Initial placeholder"],
    embedding=embeddings
)

# Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# Create vector store memory
memory = VectorStoreRetrieverMemory(retriever=retriever)

# Add some memories manually
memory.save_context(
    {"input": "My favorite color is blue"},
    {"output": "That's nice! Blue is a calming color."}
)
memory.save_context(
    {"input": "I work as a software engineer"},
    {"output": "Software engineering is a great field!"}
)
memory.save_context(
    {"input": "I enjoy hiking on weekends"},
    {"output": "Hiking is wonderful exercise!"}
)

# Retrieve relevant memories
print("--- Relevant Memories for 'What do I do for work?' ---")
relevant = memory.load_memory_variables({"prompt": "What do I do for work?"})
print(relevant)

print("\n--- Relevant Memories for 'What color do I like?' ---")
relevant = memory.load_memory_variables({"prompt": "What color do I like?"})
print(relevant)

Vector store memory is powerful for long-term memory systems where you want to retrieve relevant past conversations based on semantic similarity, not just recency.

Complete Chatbot with Memory

Let's build a complete chatbot that combines memory with custom prompts.

Python Complete Example:

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferWindowMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate
from datetime import datetime

# Custom prompt template
template = """You are a helpful AI assistant with a friendly personality.
You remember details from the conversation and provide personalized responses.

Current conversation:
{history}
Human: {input}
AI:"""

prompt = PromptTemplate(
    input_variables=["history", "input"],
    template=template
)

# Initialize LLM
llm = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.7
)

# Create memory (keep last 5 exchanges)
memory = ConversationBufferWindowMemory(k=5)

# Create conversation chain
chatbot = ConversationChain(
    llm=llm,
    memory=memory,
    prompt=prompt,
    verbose=False
)

# Helper function for chatting
def chat(message):
    """Send a message and get a response"""
    response = chatbot.predict(input=message)
    return response

# Interactive chatbot session
if __name__ == "__main__":
    print("Chatbot started! Type 'quit' to exit.\n")

    while True:
        user_input = input("You: ")

        if user_input.lower() in ['quit', 'exit', 'bye']:
            print("Chatbot: Goodbye! Have a great day!")
            break

        response = chat(user_input)
        print(f"Chatbot: {response}\n")

JavaScript Complete Example:

javascript

import { ChatOpenAI } from "@langchain/openai";
import { BufferWindowMemory } from "langchain/memory";
import { ConversationChain } from "langchain/chains";
import { PromptTemplate } from "@langchain/core/prompts";
import * as readline from "readline";

// Custom prompt template
const template = `You are a helpful AI assistant with a friendly personality.
You remember details from the conversation and provide personalized responses.

Current conversation:
{history}
Human: {input}
AI:`;

const prompt = PromptTemplate.fromTemplate(template);

// Initialize LLM
const llm = new ChatOpenAI({
  modelName: "gpt-3.5-turbo",
  temperature: 0.7
});

// Create memory (keep last 5 exchanges)
const memory = new BufferWindowMemory({ k: 5 });

// Create conversation chain
const chatbot = new ConversationChain({
  llm: llm,
  memory: memory,
  prompt: prompt,
  verbose: false
});

// Helper function for chatting
async function chat(message) {
  const response = await chatbot.call({ input: message });
  return response.response;
}

// Interactive chatbot session
async function main() {
  console.log("Chatbot started! Type 'quit' to exit.\n");

  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout
  });

  const askQuestion = () => {
    rl.question("You: ", async (userInput) => {
      if (['quit', 'exit', 'bye'].includes(userInput.toLowerCase())) {
        console.log("Chatbot: Goodbye! Have a great day!");
        rl.close();
        return;
      }

      const response = await chat(userInput);
      console.log(`Chatbot: ${response}\n`);
      askQuestion();
    });
  };

  askQuestion();
}

main().catch(console.error);

Advanced Memory Pattern: Persistent Storage

For production applications, you'll want to persist memory across sessions.

Python with Redis Example:

python

from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
from langchain.memory.chat_message_histories import RedisChatMessageHistory
from langchain.chains import ConversationChain

# Create memory with Redis backend
def create_chatbot_with_persistence(session_id):
    # Redis stores conversation history persistently
    message_history = RedisChatMessageHistory(
        session_id=session_id,
        url="redis://localhost:6379/0"
    )

    memory = ConversationBufferMemory(
        chat_memory=message_history,
        return_messages=True
    )

    llm = ChatOpenAI(model="gpt-3.5-turbo")

    chatbot = ConversationChain(
        llm=llm,
        memory=memory,
        verbose=True
    )

    return chatbot

# Usage: Each user gets their own session
user1_chatbot = create_chatbot_with_persistence("user_123")
user2_chatbot = create_chatbot_with_persistence("user_456")

# Conversations are maintained separately
user1_chatbot.predict(input="My name is Alice")
user2_chatbot.predict(input="My name is Bob")

# Later, even after restart...
user1_chatbot.predict(input="What's my name?")  # "Alice"
user2_chatbot.predict(input="What's my name?")  # "Bob"

Memory Management Best Practices

Best Practices:

Choose the right memory type based on conversation length and context needs
Set reasonable limits - Use window memory or summaries for long conversations
Persist memory for production applications using databases like Redis or PostgreSQL
Clear memory when starting new topics or sessions
Monitor token usage to avoid exceeding model limits
Use session IDs to maintain separate conversations for different users

Memory Comparison Table

Memory Type	Pros	Cons	Best For
Buffer	Simple, complete history	Token limit issues	Short conversations
Window	Fixed token usage	Loses old context	Customer support chats
Summary	Efficient for long chats	Extra LLM calls	Extended dialogues
Vector Store	Semantic retrieval	Complex setup	Knowledge assistants

Key Takeaways

What You've Learned:

Memory systems enable stateful conversations with LLMs
Different memory types suit different use cases
ConversationBufferMemory stores everything (simple but limited)
ConversationSummaryMemory creates summaries (efficient for long chats)
Vector store memory enables semantic search over past conversations
Production systems need persistent storage with session management

Next Steps

In the next lesson, we'll build a complete LangChain application:

Design a question-answering system
Integrate memory and tools
Add error handling and logging
Deploy the application

Quiz

Test your understanding of memory systems in LangChain: