Retrieval-Augmented Generation (RAG) Architecture

Definition

A technical architecture that enhances large language model outputs by retrieving relevant information from an external knowledge base before generating a response, grounding the model's output in verified, up-to-date, and domain-specific data. RAG reduces hallucination risk, enables LLMs to access proprietary or recent information not in their training data, and provides citation capabilities. RAG architectures are a key component of enterprise AI deployments and create significant value from the combination of proprietary knowledge bases with general-purpose language models.

Complementary Terms

Concepts that frequently appear alongside Retrieval-Augmented Generation (RAG) Architecture in practice.

Retrieval-Augmented Generation (RAG)

An AI architecture that combines a large language model with an external knowledge retrieval system, enabling the model to ground its responses in verified, up-to-date information rather than relying solely on its training data. RAG reduces hallucination risk, improves factual accuracy, and allows organisations to deploy AI systems that reference proprietary knowledge bases without retraining the underlying model.

Zero Trust Architecture

A cybersecurity framework based on the principle that no user, device, or system should be automatically trusted, whether inside or outside the network perimeter. Zero trust requires continuous verification of identity and access rights for every request, micro-segmentation of network resources, and least-privilege access controls.

AI Hallucination

An output generated by an artificial intelligence system — particularly large language models — that is factually incorrect, fabricated, or nonsensical, yet presented with apparent confidence. AI hallucinations pose significant risks in applications such as legal research, medical advice, and financial analysis, and their mitigation through grounding, retrieval-augmented generation, and human oversight is a key challenge in enterprise AI deployment.

Fine-Tuning

The process of further training a pre-trained machine learning model on a smaller, domain-specific dataset to adapt it for a particular task or industry. Fine-tuning allows organisations to leverage foundational models while creating proprietary, specialised AI capabilities that constitute identifiable intangible assets.

Large Language Model

A type of neural network trained on vast corpora of text data, capable of generating human-like text, answering questions, summarising documents, and performing reasoning tasks. Large language models such as GPT and Claude represent significant R&D investment and are reshaping knowledge work, customer service, and content production across industries.

Prompt Engineering

The practice of designing and optimising input instructions (prompts) to elicit desired outputs from large language models and other generative AI systems. Effective prompt engineering can significantly improve AI output quality and consistency, and documented prompt libraries are emerging as a form of organisational knowledge capital.

Algorithmic Bias

Systematic and repeatable errors in an AI system's outputs that create unfair outcomes for particular groups, typically arising from biased training data, flawed model design, or unrepresentative sampling. Algorithmic bias poses significant reputational, legal, and regulatory risks, and its identification and mitigation are core components of responsible AI governance.

Tokenisation (AI)

The process of breaking text, code, or other sequential data into discrete units (tokens) that serve as the input and output elements for large language models. Tokenisation determines how a model processes language and directly affects inference costs, since API pricing for large language models is typically based on token count.

Retrieval-Augmented Generation (RAG) Architecture

Complementary Terms

Put this knowledge to work