Boost AI Precision with Context Construction

Context is the core of guiding the language model to the right answer. Context is whatever the model sees and knows before it answers our question or execute a requested task. This is different from the model's weights (pre-trained knowledge) this is what we provide to the model at runtime to guide it's output.

We construct context for three main reasons:

Fill Knowledge Gaps : Models Pre-trained data is static and only current to it's cutoff. So If you need it to know about events or proprietary details that came afterward (for example your latest product manual), you must feed those facts in as context.

Steer Output via Prompt Engineering: Context is one of the components of the prompt engineering that steers the model's output, it doesn't need to be a giant document, book or internet content it can be as simple as multiple well-chosen precise bullet points that explains something to the model

Prevent hallucination: When humans lack information, we guess—and sometimes err. Models do the same. Providing factual snippets and clear instructions keeps the model grounded, minimizing the risk of invented or misleading statements.

Two main ways for context construction are RAG, or retrieval-augmented generation, and agents.

Retrieval-augmented generation

In this method we retrieve relevant information from external memory sources like Documents, Knowledge Bases, Internal Database, Books or even user's chat session.

Why RAG Matters

Token Efficiency: Since language model's context is limited to specific number of tokens instead of stuffing a language model’s prompt with every possible fact and bloating our prompt and our bills, RAG only provides related knowledge to the question.

Query-Specific Context: We no longer have one static context for all queries, our context built at the runtime related to the query user asked. This method improves model's accuracy and precision.

📎Providing long context to the model doesn't mean the model use all of it to answer a query. Longer the context means higher the possibility that model focuses on the wrong part of the context.

Retrieval Methods

Term Based Retrival

In this method we convert our original external knowledge to chunks of documents. When user send a query we use keywords to find documents that has highest repetition of the keyword we are looking for, this is the same method Elasticsearch is using. There are important tips we need to pay attention to

If a word has higher repetition in a document or documents means that word has less importance and less informative for example "a" "the" "at" etc
To find important words we need to find less repetitive words

Embedding Based Retrieval

In this method we use semantic similarity to find relevant documents.

Embedding Based Retrieval Components

Query

What it is: The user’s input or task description (“How do I reset my password?”).

Role: Drives retrieval—defines what information the system needs to fetch.

External Memory

What it is: Your knowledge sources—documents, wikis, product manuals, prior chat logs, databases, etc.

Task: The raw material from which relevant snippets are drawn.

Embedding Model

What it is: A neural encoder that turns text (queries or documents) into fixed-length vector representations.

Role: Maps semantically similar text into nearby points in vector space—so “password reset” and “change password” embeddings sit close together.

Vector Database

What it is: A specialized store (e.g., Pinecone, Weaviate, FAISS) for massive collections of vectors.

Role: Efficiently stores, indexes and searches millions of embeddings to find the top-K closest matches to your query vector.

Retriever

What it is: The component that orchestrates vector look-up and document fetch.

Roles:

1) Sends the query embedding to the vector DB

2) Retrieves the IDs of the most similar document embeddings

3) Loads the corresponding text snippets from your external memory

Language Model (LM)

What it is: The generative core (e.g., GPT-4, LLaMA) that produces the final answer. – Roles:

1) Receives the user’s original query plus the retrieved snippets as “context”

2) Generates a response that’s grounded in those snippets—minimizing hallucinations and staying on point

Indexing

This is prerequisite for the querying process. To make our external knowledge queryable we break it to smaller sizes (paragraphs, sentences, specific number of words or tokens) and pass each unit of knowledge to the embedding model to transform it to vector, next we save this vector representation of the unit in our Vector database to make it searchable for our future queries. Our Retriever is also is in charge of Indexing beside querying.

📎Homework: How can we define the right chunk size ? This is highly important since it will define the quality of our retrieved data and highly impact the model's output.

Agents

To construct context for our models we can take advantage of agents. Agents are system that can read our request, plan which steps to take, call the right tools, keeps track of what it’s learned, and finally hands off a concise prompt to the language model (LM) for a polished answer. In more technical terms Agents are orchestration systems that wire up Language Models with set of tools to be able to take actions and gain knowledge about a topic that is not in the models weights.

Let's say we want to know "What’s the latest price of Bitcoin in USD, and summarize any major recent news?" So in this case we want a way to:

Call an API to get latest price of Bitcoin, For example calling The Binance API to get the latest price of Bitcoin
Save the result in a memory
Search internet and find multiple news about Bitcoin (gathering title, description, url)
Scraping those URLs to gather content
Summarizing the content
Save the result in memory
Construct the context using the price and summarized information
Pass it to the LM to polish an output for the user

Here is the workflow for our example:

Planner reads the user request and decides on actions in order: CALL price_fetcher, CALL web_search, CALL web_scraper, CALL summarizer on each scraped article, FINALIZE answer via LM
Executor & Memory: Executor runs price_fetcher → Memory[“price_usd”], Executor runs web_search → Memory[“search_results”], Executor runs summarizer(raw text) and memorizing the values
Assemble Final Prompt System Prompt (fixed): For example “You are a succinct financial news assistant.” Current Bitcoin Price: $117,210. News Summaries: 1. “Bitcoin surpassed $117k after institutional inflows…” 2. “El Salvador announces new Bitcoin bond issuance…” 3. “Major exchange suffered a brief outage amid heavy trading…” User Prompt (final): “Using the above, write a 4-sentence market update.”
LM Generates Answer: “As of today, Bitcoin trades at $117,210 USD. It rallied past $117k thanks to fresh institutional investments and positive on-chain indicators. El Salvador unveiled plans for a Bitcoin-backed bond, boosting market optimism. Meanwhile, a brief outage at one of the major exchanges caused only a minor hiccup in trading volumes.”

Related Blogs

Looking to learn more about ai, lm, context contruction and rag, agents, automation? These related blog articles explore complementary topics, techniques, and strategies that can help you master Boost AI Precision with Context Construction: RAG & Agents.

Is AI Timeless?

Unlock AI fundamentals and explore the Lindy effect to understand why artificial intelligence remains timeless and its impact keeps growing. Learn more!

Top 3 AI Engineering Tasks You Need to Master

Introduction to the top 3 AI engineering tasks—model evaluation, prompt engineering & interface development—to level up your AI projects.

Understanding Embeddings in AI: Semantic Similarity in LLMs

Discover how AI embeddings enhance semantic and lexical similarity to boost large language models. Explore effective embedding techniques for NLP success today!

Prompt Engineering in AI: A Beginner Guide to LLM Prompts

Unlock the secrets of powerful prompts in our comprehensive guide! Explore advanced techniques like Chain-of-Thought, context-construction strategies, and performance monitoring. Learn more!

5 Groundbreaking AI Contributions from Alan Turing

Discover Alan Turing's five pivotal AI breakthroughs that shaped modern technology. Explore his revolutionary contributions to artificial intelligence today!

How to Classify Images with Teachable Machine: Dog vs. Human Image Recognition Tutorial

Learn how to easily create an image classification model using Google’s Teachable Machine to distinguish between your photo and your dog’s photo. Step-by-step guide for beginners.