By: Amir Tadrisi

Published on: 6/3/2025

Last updated on: 6/3/2025

Boost AI Precision with Context Construction

Context is the core of guiding the language model to the right answer. Context is whatever the model sees and knows before it answers our question or execute a requested task. This is different from the model's weights (pre-trained knowledge) this is what we provide to the model at runtime to guide it's output.

We construct context for three main reasons:

Fill Knowledge Gaps : Models Pre-trained data is static and only current to it's cutoff. So If you need it to know about events or proprietary details that came afterward (for example your latest product manual), you must feed those facts in as context.

Steer Output via Prompt Engineering: Context is one of the components of the prompt engineering that steers the model's output, it doesn't need to be a giant document, book or internet content it can be as simple as multiple well-chosen precise bullet points that explains something to the model

Prevent hallucination: When humans lack information, we guess—and sometimes err. Models do the same. Providing factual snippets and clear instructions keeps the model grounded, minimizing the risk of invented or misleading statements.

Two main ways for context construction are RAG, or retrieval-augmented generation, and agents.

Retrieval-augmented generation

In this method we retrieve relevant information from external memory sources like Documents, Knowledge Bases, Internal Database, Books or even user's chat session.

Why RAG Matters

Token Efficiency: Since language model's context is limited to specific number of tokens instead of stuffing a language model’s prompt with every possible fact and bloating our prompt and our bills, RAG only provides related knowledge to the question.

Query-Specific Context: We no longer have one static context for all queries, our context built at the runtime related to the query user asked. This method improves model's accuracy and precision.

Retrieval Methods

Term Based Retrival

In this method we convert our original external knowledge to chunks of documents. When user send a query we use  keywords to find documents that has highest repetition of the keyword we are looking for, this is the same method Elasticsearch is using.  There are important tips we need to pay attention to

  • If a word has higher repetition in a document or documents means that word has less importance and less informative for example "a" "the" "at" etc
  • To find important words we need to find less repetitive words

Embedding Based Retrieval

In this method we use semantic similarity to find relevant documents.

Embedding Based Retrieval Components

RAG Core Components
RAG Core Components

Query

What it is: The user’s input or task description (“How do I reset my password?”).

Role: Drives retrieval—defines what information the system needs to fetch.

External Memory

What it is: Your knowledge sources—documents, wikis, product manuals, prior chat logs, databases, etc.

Task: The raw material from which relevant snippets are drawn.

Embedding Model

What it is: A neural encoder that turns text (queries or documents) into fixed-length vector representations.

Role: Maps semantically similar text into nearby points in vector space—so “password reset” and “change password” embeddings sit close together.

Vector Database

What it is: A specialized store (e.g., Pinecone, Weaviate, FAISS) for massive collections of vectors.

Role: Efficiently stores, indexes and searches millions of embeddings to find the top-K closest matches to your query vector.

Retriever

What it is: The component that orchestrates vector look-up and document fetch.

Roles:

1) Sends the query embedding to the vector DB

2) Retrieves the IDs of the most similar document embeddings

3) Loads the corresponding text snippets from your external memory

Language Model (LM)

What it is: The generative core (e.g., GPT-4, LLaMA) that produces the final answer. – Roles:

1) Receives the user’s original query plus the retrieved snippets as “context”

2) Generates a response that’s grounded in those snippets—minimizing hallucinations and staying on point

Indexing

This is prerequisite for the querying process. To make our external knowledge queryable we break it to smaller sizes (paragraphs, sentences, specific number of words or tokens) and pass each unit of knowledge to the embedding model to transform it to vector, next we save this vector representation of the unit in our Vector database to make it searchable for our future queries. Our Retriever is also is in charge of Indexing beside querying. 

Agents

To construct context for our models we can take advantage of agents. Agents are system that can read our request, plan which steps to take, call the right tools, keeps track of what it’s learned, and finally hands off a concise prompt to the language model (LM) for a polished answer. In more technical terms Agents are orchestration systems that wire up Language Models with set of tools to be able to take actions and gain knowledge about a topic that is not in the models weights.

Let's say we want to know "What’s the latest price of Bitcoin in USD, and summarize any major recent news?" So in this case we want a way to:

  1. Call an API to get latest price of Bitcoin, For example calling The Binance API to get the latest price of Bitcoin
  2. Save the result in a memory
  3. Search internet and find multiple news about Bitcoin (gathering title, description, url)
  4. Scraping those URLs to gather content
  5. Summarizing the content
  6. Save the result in memory
  7. Construct the context using the price and summarized information
  8. Pass it to the LM to polish an output for the user
AI Agents
AI Agents

Here is the workflow for our example:

  1. Planner reads the user request and decides on actions in order: CALL price_fetcher, CALL web_search, CALL web_scraper, CALL summarizer on each scraped article, FINALIZE answer via LM
  2. Executor & Memory: Executor runs price_fetcher → Memory[“price_usd”], Executor runs web_search → Memory[“search_results”],  Executor runs summarizer(raw text) and memorizing the values
  3. Assemble Final Prompt System Prompt (fixed): For example “You are a succinct financial news assistant.” Current Bitcoin Price: $117,210. News Summaries: 1. “Bitcoin surpassed $117k after institutional inflows…” 2. “El Salvador announces new Bitcoin bond issuance…” 3. “Major exchange suffered a brief outage amid heavy trading…” User Prompt (final): “Using the above, write a 4-sentence market update.”
  4. LM Generates Answer:  “As of today, Bitcoin trades at $117,210 USD. It rallied past $117k thanks to fresh institutional investments and positive on-chain indicators. El Salvador unveiled plans for a Bitcoin-backed bond, boosting market optimism. Meanwhile, a brief outage at one of the major exchanges caused only a minor hiccup in trading volumes.”


Related Blogs

Looking to learn more about ai, lm, context contruction and rag, agents, automation? These related blog articles explore complementary topics, techniques, and strategies that can help you master Boost AI Precision with Context Construction: RAG & Agents.

Is AI Timeless?

Explore AI fundamentals and the Lindy effect to uncover why artificial intelligence feels new yet timeless—and why its influence keeps growing.

prompt engineering

Prompt Engineering in AI: A Beginner Guide to LLM Prompts

In this guide, we’ll walk through the building blocks of powerful prompts, explore advanced techniques like Chain-of-Thought (CoT), delve into context-construction strategies, and show you how to monitor performance over time.