2 Easy Ways of Adding Short Term Memory in LangGraph Chatbot

If you’re building chatbots using LangGraph, you’ve likely noticed that by default, they don’t remember previous messages across calls. That’s where Short Term Memory in LangGraph becomes incredibly powerful.

In this blog post, we’ll explore how to equip your chatbot with short-term memory using LangGraph’s native tools like InMemorySaver and trim_messages.

🤖 Why Short Term Memory in LangGraph Matters

Most real-world chatbots need some memory — not just for a smarter experience, but to keep context. Short Term Memory in LangGraph allows your bot to remember what users said earlier, so it can give coherent, contextual responses.

Let’s begin with a stateless chatbot example.

🧪 Example 1: LangGraph Chatbot Without Short Term Memory

Here’s a simple implementation of a LangGraph chatbot that does not retain memory across invocations:

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, HumanMessage
from langgraph.graph.message import add_messages
from langgraph.graph import StateGraph, END, START
from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)

class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]

def simple_agent(state: AgentState) -> AgentState:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

def create_stateless_agent():
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", simple_agent)
    workflow.add_edge(START, "agent")
    workflow.add_edge("agent", END)
    return workflow.compile()

Each time you invoke this agent, it only responds to the current input, with no awareness of the previous ones.

Here is how our agent looks like in the mermaid graph.

Agent with Short term memory in Langgraph

🧠 Example 2: Adding Short Term Memory in LangGraph with `trim_messages` and `InMemorySaver`

To add short term memory in LangGraph, we’ll make two enhancements:

Use trim_messages to avoid token overload.
Add a checkpointer (InMemorySaver) so the agent remembers the thread history.

from langchain_core.messages.utils import count_tokens_approximately, trim_messages
from langgraph.checkpoint.memory import InMemorySaver

def simple_agent(state: AgentState) -> AgentState:
    trimmed_messages = trim_messages(
        state["messages"],
        token_counter=count_tokens_approximately,
        strategy="last",
        max_tokens=500,
        start_on="human",
        allow_partial=True
    )
    response = llm.invoke(trimmed_messages)
    return {"messages": [response]}

def create_agent_with_memory():
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", simple_agent)
    workflow.add_edge(START, "agent")
    workflow.add_edge("agent", END)

    checkpointer = InMemorySaver()
    return workflow.compile(checkpointer=checkpointer)

Now, this agent can maintain a short-term conversational memory by remembering prior messages using a thread_id.

💬 Demonstration of Short Term Memory in LangGraph

def demonstrate_with_memory():
    app = create_agent_with_memory()
    config = {"configurable": {"thread_id": "demo_thread"}}

    print("\n📞 First Invocation:")
    result1 = app.invoke({"messages": [HumanMessage(content="Hi, my name is Bob")]}, config)
    print([msg.content for msg in result1['messages']])

    print("\n📞 Second Invocation:")
    result2 = app.invoke({"messages": [HumanMessage(content="What's my name?")]}, config)
    print([msg.content for msg in result2['messages']])

    print("\n📞 Third Invocation:")
    result3 = app.invoke({"messages": [HumanMessage(content="Do you remember me?")]}, config)
    print([msg.content for msg in result3['messages']])

Now, the chatbot remembers that your name is Bob in the second and third calls — thanks to short term memory in LangGraph.

Here are the logs:

📞 First Invocation:
Output messages count: 2
Messages: ['Hi, my name is Bob', "Hi Bob, it's nice to meet you! How can I help you today?"]

📞 Second Invocation:
Output messages count: 4
Previous messages still there: True
Messages: [
'Hi, my name is Bob',
"Hi Bob, it's nice to meet you! How can I help you today?",
"What's my name?",
'Your name is Bob. You just told me!']

📞 Third Invocation:
Total conversation length: 6
All previous messages preserved: True
Messages: [
'Hi, my name is Bob', 
"Hi Bob, it's nice to meet you! How can I help you today?", 
"What's my name?", 
'Your name is Bob. You just told me!', 
'Do you remember me?', 
"Yes, Bob. I remember you told me your name is Bob. I don't have personal memories like humans do, but I retain information from our current conversation. So, I remember you!"]

📌 Benefits of Adding Short Term Memory in LangGraph

🧠 Contextual Awareness
💡 Improved UX
🪄 Minimal code changes
⏳ Short-term only: Keeps responses focused, not bloated

🧑🏻‍💻 Here is the full Code Example

from typing import TypedDict, Annotated, Sequence
from langchain_core.messages import BaseMessage, HumanMessage
from langchain_core.messages.utils import count_tokens_approximately, trim_messages
from langgraph.graph.message import add_messages
from langgraph.graph import StateGraph, END, START
from langgraph.checkpoint.memory import InMemorySaver
from langchain_google_genai import ChatGoogleGenerativeAI

# Initialize LLM
llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash", temperature=0)

# Define the agent state
class AgentState(TypedDict):
    messages: Annotated[Sequence[BaseMessage], add_messages]

def simple_agent(state: AgentState) -> AgentState:
    """Simple agent that just responds to the latest message."""

    trimmed_messages = trim_messages(
        state["messages"],
        token_counter=count_tokens_approximately,
        strategy="last",
        max_tokens=500,  
        start_on="human",
        allow_partial=True
    )

    response = llm.invoke(trimmed_messages)
    return {"messages": [response]}

def create_agent_with_memory():
    """Create agent WITH memory (InMemorySaver checkpointer)."""
    workflow = StateGraph(AgentState)
    workflow.add_node("agent", simple_agent)
    workflow.add_edge(START, "agent")
    workflow.add_edge("agent", END)
    
    # Compile WITH checkpointer for memory
    checkpointer = InMemorySaver()
    return workflow.compile(checkpointer=checkpointer)


def demonstrate_with_memory():
    """Demonstrate agent behavior WITH memory."""
    print("\n🧠 AGENT WITH MEMORY")
    print("=" * 50)
    
    app = create_agent_with_memory()
    config = {"configurable": {"thread_id": "demo_thread"}}
    
    # First invocation
    print("\n📞 First Invocation:")
    result1 = app.invoke({"messages": [HumanMessage(content="Hi, my name is Bob")]}, config)
    print(f"Output messages count: {len(result1['messages'])}")
    print(f"Messages: {[msg.content for msg in result1['messages']]}")
    
    # Second invocation - REMEMBERS PREVIOUS
    print("\n📞 Second Invocation:")
    result2 = app.invoke({"messages": [HumanMessage(content="What's my name?")]}, config)
    print(f"Output messages count: {len(result2['messages'])}")
    print("Previous messages still there:", len(result2['messages']) > 2)
    print(f"Messages: {[msg.content for msg in result2['messages']]}")
    
    # Third invocation - STILL REMEMBERS ALL
    print("\n📞 Third Invocation:")
    result3 = app.invoke({"messages": [HumanMessage(content="Do you remember me?")]}, config)
    print(f"Total conversation length: {len(result3['messages'])}")
    print("All previous messages preserved:", len(result3['messages']) > 4)
    print(f"Messages: {[msg.content for msg in result3['messages']]}")


if __name__ == "__main__":
    print("🔬 MEMORY COMPARISON DEMONSTRATION")
    print("=" * 60)

    demonstrate_with_memory()
    
    print("\n" + "=" * 60)

Simple MapReduceDocumentsChain with token_max & collapse_documents_chain

📚 FAQs: Short Term Memory in LangGraph

❓ What is short term memory in LangGraph?

It refers to the ability of the chatbot to retain conversation context across invocations using in-memory checkpoints and message trimming.

❓ What is trim_messages used for?

To ensure token limits are respected while still maintaining recent message context.

❓ What is InMemorySaver?

It’s a built-in memory checkpoint manager that lets LangGraph remember previous messages based on thread ID.

❓ How is this different from long-term memory?

Short term memory is volatile and typically used just during active sessions. Long-term memory would require persistent storage like a vector database or file system.