A Beginner's Guide to LangChain

LangChain is a powerful framework for building applications powered by large language models (LLMs). It streamlines the development of advanced LLM apps by providing abstractions for common patterns like chains of calls, agents that use tools, and mechanisms for memory. In this blog post, we'll explain LangChain’s core components – Agents, Chains, Memory, and Tools – and show how they work together. We’ll then build a sample Document Q&A application with LangChain and Streamlit to demonstrate these concepts in practice.

Agents

An Agent in LangChain is a component that uses an LLM to dynamically decide which actions or tools to use to fulfill a task. Unlike a fixed sequence of steps, an agent can reason and adapt its behavior based on the user’s request. In a traditional chain, the sequence of calls is predetermined in code, but in an agent, the language model itself chooses which actions to take and in what ordermedium.com. This means the agent can handle more open-ended tasks where the necessary steps aren’t known beforehand.

Agents typically operate on a thought→action→observation loop (often called the ReAct loop). The agent receives a user question or task, thinks (the LLM generates reasoning), decides on an action (like calling a tool), then observes the tool’s result, and repeats this until it arrives at a final answermedium.com. Because agents can use tools, they are capable of accessing external information or performing calculations as needed. In LangChain’s ecosystem, an agent is essentially an LLM + a suite of tools, possibly with some memory to remember prior interactionsmedium.com medium.com.

LangChain provides ready-made agent implementations (for example, a zero-shot ReAct agent or a conversational agent) so you don’t have to write the reasoning loop from scratch. You can create an agent by specifying an LLM and a list of tools it can use. For instance, you might initialize an agent with a search tool and a calculator tool, allowing the LLM to decide when to search for information or do math. Under the hood, LangChain will prompt the LLM with a special format so that it can output actions (tool name and input) or a final answer. The agent will keep looping, calling tools and gathering observations, until the LLM indicates it has a final answer.

Why use agents? They are useful when your application might require different actions depending on the query. For example, one user query might need a database lookup while another needs a web search. An agent can intelligently choose the appropriate tool. This flexibility comes at the cost of more complex logic (and sometimes longer execution), but it enables very powerful, dynamic behavior.

Chains

A Chain in LangChain is a predetermined sequence of operations, analogous to a pipeline. Each step in a chain could be an LLM call, a function, or even another chain. In other words, a chain encodes a sequence of calls to components like models, retrievers, or other chains, and provides a simple interface for this sequenceapi.python.langchain.com. You might use a chain to enforce a fixed workflow for the LLM – for example, first format the user input with a prompt template, then call the LLM, then post-process the result.

The simplest example is an LLMChain, which takes a prompt template and an LLM. When you run it with input, it fills in the prompt and calls the LLM, returning the result. More complex chains can branch or loop, but importantly, the sequence of steps is hard-coded in advance, unlike an agent which decides at runtimemedium.com.

Chains are great for situations where the task structure is known. They make the logic reusable and composable. In fact, you can nest chains within each other (a chain can call another chain as a subcomponent). LangChain comes with many built-in chains. For instance, a SequentialChain can pass outputs of one step as inputs to the next in sequencemedium.com. A RouterChain can route an input to different sub-chains depending on some condition (often decided by an LLM)medium.com. There are also specialized chains for tasks like question answering, summarization, etc.

One powerful feature is that chains can be stateful – you can attach memory to a chain to carry information across runs. We’ll discuss memory next, but keep in mind that any chain can be given a memory object to persist state between callsapi.python.langchain.com. For example, a chain that handles a conversation could store the conversation history so the LLM sees the context each time.

In summary, use chains to create deterministic sequences of LLM calls or data transformations. They ensure a predictable flow (e.g. always do step A, then B, then C) and are easier to debug compared to agents. However, they lack the flexibility of agents if the task requires decision-making about which tools or steps to take.

Memory

Memory in LangChain refers to persisting state between calls, most commonly to remember previous user inputs and model responses in a conversation. In a stateless setup, each LLM call knows nothing about what happened before. Memory gives your application context and continuity. It is essentially a system that “remembers” information about previous interactionsdocs.langchain.com so that the AI can refer back to earlier parts of a conversation or task.

The most common use of memory is in chatbots. For example, if a user asks, “Who won the 2022 World Cup?” and then follows up with “How about the runner-up?”, the second query is ambiguous on its own. With conversational memory, the system knows “the runner-up” refers to the 2022 World Cup context, and can answer accordingly. Short-term memory usually means we remember the conversation within a single session (often limited by the LLM’s context window)docs.langchain.com. LangChain’s standard chat memory is typically a list of the recent message history.

LangChain provides several memory classes for different strategies. The simplest is ConversationBufferMemory, which just appends new messages to a buffer. There’s also ConversationBufferWindowMemory (which only keeps the last N interactions)medium.com, summary memories that summarize older messages to avoid unlimited growth, and even entity-specific memory that tracks information about specific subjects mentionedmedium.com. The choice depends on your use case and the LLM’s context limits.

Using memory in LangChain is straightforward. You create a memory instance and attach it to a chain or agent. For example, you can do:

Now chain will automatically incorporate the conversation history each time it runs. Similarly, for an agent you can pass memory=ConversationBufferMemory() when initializing it (often using a conversational agent type). Under the hood, the memory will inject the chat history into the prompt (for chat models) or keep track of intermediate steps for agents.

Why is memory important? For multi-turn applications, it makes the AI’s responses more contextually relevant and avoids repetition. It also enables a more natural conversational experience, as the AI can refer to things the user said earlier. Just be mindful of the context window – too much memory can lead to very long prompts. In practice, developers often limit or summarize memory to keep it relevant and concisedocs.langchain.com.

Tools

In LangChain, Tools are functions or utilities that an agent can use to interact with the outside world beyond the LLM’s natural language interface. Think of tools as the skills or APIs you equip your agent with. The agent can decide to invoke a tool if it needs to perform a specific operation. A tool is typically a wrapper around some functionality – it could be a web search API, a database lookup, a calculator, a file loader, or even another chain or model. In fact, tools can be very general: they might be generic utilities (Google search, database query, math), other chains, or even other agentsmedium.com.

Tools extend what the LLM can do by allowing it to take actions with structured inputs and outputs. For example, an LLM by itself cannot fetch real-time data from the internet. But if you provide a “Web Search” tool (a function that takes a query string and returns search results), the agent can choose to call that tool when faced with a question about current events. Similarly, a calculator tool enables the agent to accurately perform arithmetic (to avoid the LLM’s often unreliable math abilities).

Each tool in LangChain is defined by a name, a function (callable) that it executes, and usually a description for the agent. The description helps the LLM decide when a tool is relevant. For instance, you might define a tool like:

Now if this tool is given to an agent, the agent’s prompt will include something like: Tool name: Document Search, Tool description: Searches the uploaded document for relevant content... With that, the LLM can output an action like: Action: Document Search with some input when it decides it needs to use it.

LangChain has many built-in tools and also utilities to easily create tools. There’s even a decorator @tool to turn a normal Python function into a LangChain tool automaticallydocs.langchain.com. Tools can have schemas for their input (for example, using pydantic models) to enforce the format of argumentsdocs.langchain.com. When using modern chat models that support function calling (like OpenAI’s GPT-4 functions), LangChain can seamlessly convert tools into functions the model can call.

In practice, you will select a set of tools relevant to your agent’s task. For a coding assistant agent, you might provide tools like a Python REPL (to execute code) or a documentation search. For our purposes – a document Q&A bot – the primary tool might be something like a vector store retriever (to look up relevant document sections). Agents use tools to take actionsdocs.langchain.com, and this ability is what gives them the “eyes and hands” beyond just text generation. Without tools, an agent is basically just an LLM that can only reply with text. With tools, it can fetch data, run computations, or call external services as directed by its reasoning.

Building a Document Q&A App with LangChain and Streamlit

Now that we’ve covered the fundamentals, let’s put them into practice by building a small project: a Document Question-Answering web app. The idea is that a user can upload a text document and then ask questions about its content. We’ll use Streamlit for the UI (file upload and text input) and LangChain for the backend logic. This example will demonstrate using chains, memory, and even an agent + tool in a real scenario.

How it works: At a high level, the app will ingest the document, index it for retrieval, and then use an LLM to answer questions by referring to the document. The workflow is simple:

User Input: The user uploads a document (e.g. a .txt file) and asks a question about it. They also provide an OpenAI API key (for the LLM) and hit "Submit."

LangChain Processing: On submission, LangChain will first ingest the document: it splits the document into chunks, creates embeddings for those chunks, and stores them in a vector database (a vector store). Then, given the user’s question, it uses a Question-Answering chain to find relevant information from the document and generate an answer using the LLMblog.streamlit.io.

Conceptual architecture of the Document Q&A app. The document is split into chunks, embedded into vectors, and stored in a vector database. At query time, a retriever finds relevant chunks and a QA chain (LLM) uses those to answer the user's question.

Let's break down the implementation steps with code.

Setup and Installation

First, we need to install the necessary packages. For this project, we'll use LangChain, OpenAI’s API, Chroma (an open-source vector store), and Streamlit. You can install these via pip:

This will install LangChain and its dependencies, the OpenAI SDK, Chroma for vector storage, Streamlit for the UI, and tiktoken (used by OpenAI models for tokenization)blog.streamlit.io. Make sure you have Python 3.7+ to satisfy LangChain’s requirementsblog.streamlit.io.

You’ll also need an OpenAI API key if you plan to use OpenAI’s language models. You can get one from OpenAI’s website and set it as an environment variable or pass it directly in code. In our Streamlit app, we’ll provide a text field for the API key for simplicity.

Document Ingestion (Loading, Splitting, and Indexing)

Ingestion is the process of preparing the document so that the LLM can easily access its content. Large documents can’t be fed entirely into the LLM’s prompt (due to context length limits), so instead we use a vector database to enable semantic search of the document.

We will do the following in code:

Load the uploaded file’s content as text.

Split the text into smaller chunks (so that each chunk is reasonably sized for the LLM to read, e.g. 500-1000 characters).

Compute embeddings for each chunk using an embedding model (we’ll use OpenAI’s text embeddings via LangChain).

Store these embeddings in a vector store (Chroma in this case), which will allow us to retrieve relevant chunks by similarity to a question.

LangChain makes these steps straightforward with its utilities. Let’s see a code snippet that handles ingestion inside a function:

A few notes on this code: we use CharacterTextSplitter to break the text into pieces of up to 1000 characters with no overlap. You could also split by sentences or sections depending on the data; LangChain provides different splitters. We then create an OpenAIEmbeddings instance (which will call the OpenAI API to get vector representations for text). Finally, Chroma.from_documents takes the list of documents (chunks) and the embeddings and creates a persistent vector store database. Under the hood, each chunk’s text is converted to a vector and stored in the Chroma DB file (by default, in a local directory).

After this ingestion, we have a vector_db object which we can query. We won’t call the LLM on all the text chunks for every question – that would be expensive and slow. Instead, we use the vector store to retrieve only the most relevant chunks for a given query.

Building a QA Chain to Answer Questions

With the document indexed, the core of our app is a Question-Answering chain that takes a user question, finds relevant document chunks, and feeds them along with the question to the LLM to get an answer. LangChain actually provides a ready-made chain for this: RetrievalQA. This chain connects an LLM with a retriever (which fetches relevant text) and optionally formats the prompt appropriately.

We can create a RetrievalQA chain by supplying:

an LLM (we’ll use OpenAI from LangChain, which is a wrapper for OpenAI’s chat or completion models),

a retriever (which we get from the vector store we made),

and choosing a chain type (the default 'stuff' method is to just stuff all retrieved chunks into the prompt – fine for our example).

Here’s how we put it together:

A couple of details: we used vector_db.as_retriever() to get a retriever interface. We also passed search_kwargs={"k": 4} to retrieve the top 4 most similar chunks for the question (you can adjust this). The OpenAI LLM is initialized with temperature=0 for deterministic output; you could also use ChatOpenAI for a chat model like gpt-3.5-turbo. Finally, RetrievalQA.from_chain_type sets up the chain. Under the hood, this will create an LLMChain that formats a prompt with the question and retrieved texts (the 'stuff' chain type simply appends them)blog.streamlit.io blog.streamlit.io, then calls the LLM and returns the answer.

Essentially, this chain takes care of both retrieval and generation. According to the LangChain documentation, RetrievalQA is a specialized chain that accepts an LLM, a retriever, and optionally a prompt or chain type, then on .run(query) it will query the retriever for relevant documents and feed them into the LLM to get an answerblog.streamlit.io. This abstracts a lot of boilerplate: we don’t have to manually concatenate context or parse outputs.

Adding Memory for Chat History (Multi-turn Q&A)

Our current setup will answer one question at a time. What if we want the user to have a conversation about the document, asking follow-up questions that refer to earlier ones? For that, we can add memory to retain the chat history.

To enable conversational memory, we can use a ConversationalRetrievalChain (a chain designed for chat with memory + retrieval), or manage memory and retrieval separately. One simple approach: use a standard ConversationBufferMemory to keep the dialogue and incorporate it into the prompt.

LangChain’s RetrievalQA chain does not natively include memory (since it’s one shot QA). However, we can create a custom chain or agent for chat. As an alternative, we might use an agent that can handle the conversation. Let’s explore how an agent could work for Q&A, which will also demonstrate tools usage.

Using an Agent with Tools for Dynamic Q&A

Instead of using the fixed RetrievalQA chain, we could treat our task as an agent-based workflow. For example, we can create a tool for document retrieval and give it to a conversational agent. The agent (powered by the LLM) will decide when to use the tool to lookup info in the document and when to answer directly. This approach is more flexible – the agent could potentially use multiple tools or handle off-topic questions by replying “I don’t know,” etc., without a rigid script.

We will set up:

A Document Search tool that, given a question, uses the vector store to find relevant text.

A Conversational Agent (with memory) that has access to this tool.

Using LangChain’s utilities, it looks like this:

In this code, we create a Tool named "Document Search". For simplicity, the tool’s function directly calls vector_db.search() for the query (assuming our vector DB has a search method similar to retriever; we might also do vector_db.as_retriever().get_relevant_documents(q)). The tool returns a snippet of text (here we join top 2 results).

We then create a ConversationBufferMemory to store the conversation. We set memory_key="chat_history" which is the default key the agent’s prompt expects for previous messages, and return_messages=True to keep them as message objects (this is often needed for chat models).

Finally, we use initialize_agent to create the agent. We pass in the list of tools, the LLM, specify the agent type as CONVERSATIONAL_REACT_DESCRIPTION (which is a pre-built agent type good for conversational use-cases with the ReAct framework), provide the memory, and set verbose=True to see the reasoning steps in the console (useful for debugging). This agent will now follow the ReAct loop: it will maintain the chat_history from the memory, and on each user query it can decide either to use the Document Search tool (Action) or to directly respond with an answer.

When the agent is run on a question, you’d call agent.run("my question"). The agent will produce thoughts like: "Do I need to use a tool? Yes, I should use Document Search." Then it will produce an observation (the tool’s output), and finally an answer. All previous Q&A turns are stored in chat_history, so if the user asks a follow-up question, the agent’s prompt will include that history. This allows it to handle context like “Tell me more about that” referring to something from prior answers.

Using an agent in this way might be overkill if you only ever need one tool (retrieval) and you know every question should use it. The RetrievalQA chain is simpler for those cases. However, if you want your app to be more general-purpose – say, sometimes answer from the document, but also be able to do other tasks (like a calculator tool for math questions or a web search for out-of-scope questions) – then an agent with multiple tools is the way to go. Agents combine the strengths of tools and memory, making the system more robust for dynamic queries.

Integrating Everything in the Streamlit App

With the backend components ready, hooking them up to Streamlit is straightforward. We will:

Create a file uploader for the document,

A text input for the user’s question,

A text input for the OpenAI API key (so the user can supply it securely at runtime),

On form submission, call our backend functions (ingest_document once for the uploaded file, then either answer_question via chain or use the agent to get the answer),

Display the answer on the page.

Here’s how the Streamlit script might look in a simplified form:

This pseudo-code illustrates the flow. We use st.file_uploader to get the file and immediately ingest it (we store the vector_db in st.session_state so we don’t redo embeddings on every interaction). Then on clicking the Submit button, we either call the answer_question chain or agent.run to get an answer. The result is displayed with st.write or st.info. In a real app, you might also show the conversation history or sources, but we’ll keep it simple here.

And that’s it! With remarkably few lines of code, we have a working app that can answer questions about a custom document. For deployment, Streamlit makes it easy: you can share the app by deploying it to the Streamlit Cloud or any server, following their instructions. When the app runs, behind the scenes it will perform all these LangChain steps whenever the user asks a question.

Running the App and Example

Let’s consider an example to ensure everything is clear. Suppose we upload a text file containing a transcript of a speech, and we ask: “What did the president say about the economy?” Once we click Submit:

The app calls ingest_document if not done already (splitting the speech, embedding chunks, storing in vector DB).

Then answer_question will use the vector store’s retriever to find parts of the speech that mention “economy”.

Those relevant chunks are passed to the OpenAI model (say GPT-3.5) with a prompt like: "Given the following excerpts from the document, answer the question: 'What did the president say about the economy?' ...".

The LLM returns an answer, for example: “The president said that the economy is improving and that unemployment has fallen to record lows.”

This answer is displayed in the Streamlit app for the user.

If the user then asks a follow-up, like “Did he mention any specific figures?”, the chain/agent will consider the conversation history (via memory) and likely search the document again for numbers related to the economy in that speech, then answer accordingly. The memory ensures the context of “he” refers to the president and “the economy” is the topic in question, even if not explicitly restated.

Throughout this, LangChain handles a lot: embedding the doc, the similarity search, constructing the LLM prompt, and (if using the agent) the entire reasoning process with the tool. The developer can focus on hooking these components together without dealing with low-level API details for embedding or vector math.

Conclusion

In this blog post, we learned about the core building blocks of LangChain – agents, chains, memory, and tools – and how they can be used to build intelligent LLM applications. To recap:

Agents use an LLM to make decisions about which actions to take, enabling dynamic behavior by leveraging toolsmedium.com.

Chains are sequences of operations that orchestrate LLM calls and other steps in a fixed mannermedium.com.

Memory lets the system remember previous interactions, which is crucial for conversational apps to maintain contextdocs.langchain.com.

Tools extend an LLM’s capabilities by allowing it to perform specific tasks like searches or calculationsmedium.com.

We demonstrated these concepts with a document Q&A app. We used a RetrievalQA chain to combine document retrieval with LLM question-answering, and showed how to add memory for context and even wrap the logic in an agent with a tool for more flexibility. The result is an application where a user can upload their own data and have a dialogue about it – a common pattern known as retrieval-augmented generation (RAG)blog.streamlit.io.

If you’re interested in digging deeper, the [LangChain documentation on question answering over documents][16] is a great resourceblog.streamlit.io. LangChain’s official docs also cover many advanced topics and integrations – from connecting to databases, to using chat models with function calling, to evaluating results. As you build with LangChain, remember that its strength lies in composing these pieces: a complex app might use multiple chains and agents together.

Next steps: Try extending the example! For instance, you could add a second tool to the agent (maybe a calculator for any numerical questions in the doc), or use a different vector store or embedding model. You could also experiment with memory types – perhaps use a window memory to limit how far back the agent remembers, or a summary memory to handle very long conversations. The LangChain framework is quite extensive, so use the official docs and examples as a guide.

We hope this tutorial gives you a solid starting point for using LangChain. With these basics – agents that can act, chains that structure flows, memory for context, and tools for actions – you can build far more advanced LLM applications than a simple one-prompt-one-response script. Happy coding, and may your AI apps be ever chain-tastic!

Agents

Chains

Memory

Using memory in LangChain is straightforward. You create a memory instance and attach it to a chain or agent. For example, you can do:

Tools

Building a Document Q&A App with LangChain and Streamlit

How it works: At a high level, the app will ingest the document, index it for retrieval, and then use an LLM to answer questions by referring to the document. The workflow is simple:

User Input: The user uploads a document (e.g. a .txt file) and asks a question about it. They also provide an OpenAI API key (for the LLM) and hit "Submit."

LangChain Processing: On submission, LangChain will first ingest the document: it splits the document into chunks, creates embeddings for those chunks, and stores them in a vector database (a vector store). Then, given the user’s question, it uses a Question-Answering chain to find relevant information from the document and generate an answer using the LLMblog.streamlit.io.

Let's break down the implementation steps with code.

Setup and Installation

Document Ingestion (Loading, Splitting, and Indexing)

We will do the following in code:

Load the uploaded file’s content as text.

Split the text into smaller chunks (so that each chunk is reasonably sized for the LLM to read, e.g. 500-1000 characters).

Compute embeddings for each chunk using an embedding model (we’ll use OpenAI’s text embeddings via LangChain).

Store these embeddings in a vector store (Chroma in this case), which will allow us to retrieve relevant chunks by similarity to a question.

LangChain makes these steps straightforward with its utilities. Let’s see a code snippet that handles ingestion inside a function:

Building a QA Chain to Answer Questions

We can create a RetrievalQA chain by supplying:

an LLM (we’ll use OpenAI from LangChain, which is a wrapper for OpenAI’s chat or completion models),

a retriever (which we get from the vector store we made),

and choosing a chain type (the default 'stuff' method is to just stuff all retrieved chunks into the prompt – fine for our example).

Here’s how we put it together:

Adding Memory for Chat History (Multi-turn Q&A)

Using an Agent with Tools for Dynamic Q&A

We will set up:

A Document Search tool that, given a question, uses the vector store to find relevant text.

A Conversational Agent (with memory) that has access to this tool.

Using LangChain’s utilities, it looks like this:

Integrating Everything in the Streamlit App

With the backend components ready, hooking them up to Streamlit is straightforward. We will:

Create a file uploader for the document,

A text input for the user’s question,

A text input for the OpenAI API key (so the user can supply it securely at runtime),

On form submission, call our backend functions (ingest_document once for the uploaded file, then either answer_question via chain or use the agent to get the answer),

Display the answer on the page.

Here’s how the Streamlit script might look in a simplified form:

Running the App and Example

The app calls ingest_document if not done already (splitting the speech, embedding chunks, storing in vector DB).

Then answer_question will use the vector store’s retriever to find parts of the speech that mention “economy”.

Those relevant chunks are passed to the OpenAI model (say GPT-3.5) with a prompt like: "Given the following excerpts from the document, answer the question: 'What did the president say about the economy?' ...".

The LLM returns an answer, for example: “The president said that the economy is improving and that unemployment has fallen to record lows.”

This answer is displayed in the Streamlit app for the user.

Conclusion

In this blog post, we learned about the core building blocks of LangChain – agents, chains, memory, and tools – and how they can be used to build intelligent LLM applications. To recap:

Agents use an LLM to make decisions about which actions to take, enabling dynamic behavior by leveraging toolsmedium.com.

Chains are sequences of operations that orchestrate LLM calls and other steps in a fixed mannermedium.com.

Memory lets the system remember previous interactions, which is crucial for conversational apps to maintain contextdocs.langchain.com.

Tools extend an LLM’s capabilities by allowing it to perform specific tasks like searches or calculationsmedium.com.

A Beginner's Guide to LangChain

Agents

Chains

Memory

Tools

Building a Document Q&A App with LangChain and Streamlit

Setup and Installation

Document Ingestion (Loading, Splitting, and Indexing)

Building a QA Chain to Answer Questions

Adding Memory for Chat History (Multi-turn Q&A)

Using an Agent with Tools for Dynamic Q&A

Integrating Everything in the Streamlit App

Running the App and Example

Conclusion

More posts

A Beginner's Guide to LangChain

Agents

Chains

Memory

Tools

Building a Document Q&A App with LangChain and Streamlit

Setup and Installation

Document Ingestion (Loading, Splitting, and Indexing)

Building a QA Chain to Answer Questions

Adding Memory for Chat History (Multi-turn Q&A)

Using an Agent with Tools for Dynamic Q&A

Integrating Everything in the Streamlit App

Running the App and Example

Conclusion

More posts