This post is part of a series of texts aiming to explore and understand patterns and practices that enable the construction of a production-ready AI data infrastructure. The series mainly focuses on the modeling and retrieval of evolving data, which would empower Large Language Model (LLM) apps and Agents to serve millions of users concurrently.
For a broad overview of the problem and our understanding of the current state of the LLM landscape, check out our initial post here.
In this post, we delve into creating an initial data platform that can represent the core component of the future MlOps stack. Building a data platform is a big challenge in itself, and many solutions are available to help automate data tracking, ingestion, data contracting, monitoring, and warehousing.
In the last decade, data analytics and engineering fields have undergone significant transformations, shifting from storing data in centralized, siloed Oracle and SQL Server warehouses to a more agile, modular approach involving real-time data and cloud solutions like BigQuery and Snowflake.
Data processing evolved from an inessential activity, whose value would be inflated to please investors during the startup valuation phase, to a fundamental component of product development.
As we enter a new paradigm of interacting with systems through natural language, it's important to recognize that, while this method promises efficiency, it also comes with the challenges inherent in the imperfections of human language.
Suppose we want to use natural language as a new programming tool. In that case, we will need to either impose more constraints on it or make our systems more flexible so that they can adapt to the equivocal nature of language and information.
Our main goal should be to offer consistency, reproducibility and more that would ideally use language as a basic building block for things to come.
In order to come up with a set of solutions that could enable us to move forward, in this series of posts, we call on theoretical models from cognitive science and try to incorporate them into data engineering practices .
In our initial post, we started out conceptualizing a simple retrieval-augmented generation (RAG) model whose aim was to process and understand PDF documents.
We faced many bottlenecks in scaling these tasks, so in our second post, we needed to introduce the concept of memory domains..
In the next step, the focus was mainly on understanding what makes a good RAG considering all possible variables.
In this post, we address the fundamental question of the feasibility of extending LLMs beyond the data on which they were trained.
Baseline RAG struggles to connect the dots when answering a question requires providing synthesized insights by traversing disparate pieces of information through their shared attributes.
Baseline RAG performs poorly when asked to understand summarized semantic concepts holistically over large data collections or even singular large documents.
To fill these gaps in RAG performance, we built a new framework—cognee.
Cognee combines human-inspired cognitive processes with efficient data management practices, infusing data points with more meaningful relationships to represent the (often messy) natural world in code more accurately.
Our observations indicate that systems, agents, and interactions often falter due to overextension and haste.
However, given the extensive demands and expectations surrounding Large Language Models (LLMs), addressing every aspect—agents, actions, integrations, and schedulers—is beyond the scope of the framework’s mission.
We've chosen to prioritize data, recognizing that the crux of many issues has already been addressed within the realm of data engineering.
We aim to establish a framework that includes file storage, tracing, and the development of robust AI memory data pipelines to help us manage and structure data more efficiently through its transformation processes.
Subsequently, our goal will be to devise methods for navigating diverse information segments and determine the most effective application of graph databases to store this data.
Our initial hypothesis—enhancing data management in vector stores through manipulative techniques and attention modulators for input and retrieval—proved less effective than anticipated.
Deconstructing and reorganizing data via graph databases emerged as a superior strategy, allowing us to adapt and repurpose existing tools for our needs more effectively.
AI Memory type
State in Level 2
State in Level 4
Description
Sensory Memory
API
API
Can be interpreted in this context as the interface used for the human input
STM
Weaviate Class with hardcoded contract
Neo4j with a connection to a Weaviate class
The processing layer and a storage of the session/user context
LTM
Weaviate Class with hardcoded contract
Neo4j with a connection to a Weaviate class
The information storage
On Level 4, we describe the integration of keepi, a chatGPT-powered WhatsApp bot that collects and summarizes information, via API endpoints.
Then, once we’ve ensured that we have a robust, scalable infrastructure, we deploy cognee to the cloud.
Users submit queries or documents for storage via the keepi.ai WhatsApp bot. This step integrates with the keepi.ai platform, utilizing Cognee endpoints for processing.
The Cognee manager handles the incoming request and collaborates with several components:
Relational database: Manages state and metadata related to operations.
Classifier: Identifies, organizes, and enhances the content.
Loader: Archives data in vector databases.
The Graph Manager and Vector Store Manager collaboratively process and organize the input into structured nodes. A key function of the system involves breaking down user input into propositions—basic statements retaining factual content. These propositions are interconnected through relationships and cataloged in the Neo4j database by the Graph Manager, associated with specific user nodes. Users are represented by memory nodes that capture various memory levels, some of which link back to the raw data in vector databases.
Browsing the largest AI platform directory available at the moment, we can observe around 7,000 new, mostly semi-finished AI projects — projects whose development is fueled by recent improvements in foundation models and open-source community contributions.
Decades of technological advancements have led to small teams being able to do in 2023 what in 2015 required a team of dozens.
Yet, the AI apps currently being pushed out still mostly feel and perform like demos.
It seems it has never been easier to create a startup, build an AI app, go to market… and fail.
The consensus is, nevertheless, that the AI space is the place to be in 2023.
“The AI Engineer [...] will likely be the highest-demand engineering job of the [coming] decade.”
The stellar rise of AI engineering as a profession is, perhaps, signaling the need for a unified solution that is not yet there — a platform that is, in its essence, a Large Language Model (LLM), which could be employed as a powerful general problem solver.
To address this issue, dlthub and prometh.ai will collaborate on productionizing a common use-case, PDF processing, progressing step by step. We will use LLMs, AI frameworks, and services, refining the code until we attain a clearer understanding of what a modern LLM architecture stack might entail.
Despite all the AI startup hype, there’s a glaring issue lurking right under the surface: foundation models do not have production-ready data infrastructure by default
Everyone seems to be building simple tools, like “Your Sales Agent” or “Your HR helper,” on top of OpenAI — a so-called “Thin Wrapper” — and selling them as services.
Our intention, however, is not to merely capitalize on this nascent industry — it’s to use a new technology to catalyze a true digital paradigm shift — to paraphrase investor Marc Andreessen, content of the new medium as the content of the previous medium.
What Andreessen meant by this is that each new medium for sharing information must encapsulate the content of the prior medium. For example, the internet encapsulates all books, movies, pictures, and stories from previous mediums.
After a unified AI solution is created, only then will AI agents be able to proactively and competently operate the browsers, apps, and devices we operate by ourselves today.
Intelligent agents in AI are programs capable of perceiving their environment, acting autonomously in order to achieve goals, and may improve their performance by learning or acquiring knowledge.
The reality is that we now have a set of data platforms and AI agents that are becoming available to the general public, whose content and methods were previously inaccessible to anyone not privy to the tech-heavy languages of data scientists and engineers.
As engineering tools move toward the mainstream, they need to become more intuitive and user friendly, while hiding their complexity with a set of background solutions.
Fundamentally, the issue of “Thin wrappers” is not an issue of bad products, but an issue of a lack of robust enough data engineering methods coupled with the general difficulties that come with creating production-ready code that relies on robust data platforms in a new space.
The current lack of production-ready data systems for LLMs and AI Agents opens up a gap we want to fill by introducing robust data engineering practices to solve this issue.
In this series of texts, our aim will thus be to explore what would constitute:
Proper data engineering methods for LLMs
A production-ready generative AI data platform that unlocks AI assistants/Agent Networks
Each of the coming blog posts will be followed by Python code, to demonstrate the progress made toward building a modern AI data platform, raise important questions, and facilitate an open-source collaboration.
Let’s start by setting an attainable goal. As an example, let’s conceptualize a production-ready process that can analyze and process hundreds of PDFs for hundreds of users.
Imagine you're a developer, and you've got a stack of digital invoices in PDF format from various vendors. These PDFs are not just simple text files; they contain logos, varying fonts, perhaps some tables, and even handwritten notes or signatures.
Your goal? To extract relevant information, such as vendor names, invoice dates, total amounts, and line-item details, among others.
This task of analyzing PDFs may help us understand and define what a production-ready AI data platform entails. To perform the task, we’ll be drawing a parallel between Data Engineering concepts and those from Cognitive Sciences which tap into our understanding of how human memory works — this should provide the baseline for the evaluation of the POCs in this post.
We assume that Agent Networks of the future would resemble groups of humans with their own memory and unique contexts, all working and contributing toward a set of common objectives.
In our example of data extraction from PDFs — a modern enterprise may have hundreds of thousands, if not millions of such documents stored in different places, with many people hired to make sense of them.
This data is considered unstructured — you cannot handle it easily with current data engineering practices and database technology. The task to structure it is difficult and, to this day, has always needed to be performed manually.
With the advent of Agent Networks, which mimic human cognitive abilities, we could start realistically structuring this kind of information at scale. As this is still data processing — an engineering task — we need to combine those two approaches.
From an engineering standpoint, the next generation Data Platform needs to be built with the following in mind:
We need to give Agents access to the data at scale.
We need our Agents to operate like human minds so we need to provide them with tools to execute tasks and various types of memory for reasoning
We need to keep the systems under control, meaning that we apply good engineering practices to the whole system
We need to be able to test, sandbox, and roll back what Agents do and we need to observe them and log every action
In order to conceptualize a new model of data structure and relationships that transcends the traditional Data Warehousing approach, we can start perceiving procedural steps in Agent execution flows as thoughts and interpreting them through the prism of human cognitive processes such as the functioning of our memory system and its memory components.
Human memory can be divided into several distinct categories:
Sensory Memory (SM) → Very short term (15-30s) memory storage unit receiving information from our senses.
Short Term Memory (STM) → Short term memory that processes the information, and coordinates work based on information provided.
Long-Term Memory (LTM) → Stores information long term, and retrieves what it needs for daily life.
The general structure of human memory. Note that Weng doesn’t expand on the STM here in the way we did above :
Broader, more relevant representation of memory for our context, and the corresponding data processing, based on Atkinson-Schiffrin memory model would be:
To understand the current LLM production systems, how they handle data input and processing, and their evolution, we start at Level 0 — the LLMs and their APIs as they are currently — and progress toward Level 7 — AI Agents and complex AI Data Platforms and Agent Networks of the future.
In order to extract relevant data from PDF documents, as an engineer you would turn to a powerful AI model like OpenAI, Anthropic, or Cohere (Layer 0 in our XYZ stack). Not all of them support this functionality, so you’d use Bing or a ChatGPT plugin like AskPDF, which do.
In order to "extract nuances," you might provide the model with specific examples or more directive prompts. For instance, "Identify the vendor name positioned near the top of the invoice, usually above the billing details."
Next, you'd "prompt it" with various PDFs to see how it reacts. Based on the outputs, you might notice that it misses handwritten dates or gets confused with certain fonts.
This is where "prompt engineering" comes in. You might adjust your initial prompt to be more specific or provide additional context. Maybe you now say, "Identify the vendor name and, if you find any handwritten text, treat it as the invoice date."
Our POC at this stage consists of simply uploading a PDF and asking it questions until we have better and better answers based on prompt engineering. This exercise shows what is available with the current production systems, to help us set a baseline for the solutions to come.
If your goal is to understand the content of a PDF, Bing and OpenAI will enable you to upload documents and get explanations of their contents
Uses basic natural language processing (NLP) prompts without any schema on output data
Typically “forgets” the data after a query — no notion of storage (LTM)
In a production environment, data loss can have significant consequences. It can lead to operational disruptions, inaccurate analytics, and loss of valuable insights
There is no possibility to test the behavior of the system
Let’s break down the Data Platform component at this stage:
Memory type
State
Description
Sensory Memory
Chatbot interface
Can be interpreted in this context as the interface used for the human input
STM
The context window of the chatbot/search. In essence stateless
The processing layer and a storage of the session/user context
LTM
Not present at this stage
The information storage
Lacks:
Decoupling: Separating components to reduce interdependency.
Portability: Ability to run in different environments.
Modularity: Breaking down into smaller, independent parts.
Extendability: Capability to add features or functionality.
Next Steps:
Implement a LTM memory component for information retention.
Develop an abstraction layer for Sensory Memory input and processing multiple file types.
Addressing these points will enhance flexibility, reusability, and adaptability.
This step is basically an upgrade to the current state of the art LLM UX/UI where we add:
Permanent LTM memory (data store)
As a developer, I need to answer questions on large PDFs that I can’t simply pass to the LLM due to technical limitations. The primary issue being addressed is the constraint on prompt length. As of now, GPT-4 has a limit of 4k tokens for both the prompt and the response combined. So, if the prompt comprises 3.5k tokens, the response can only be 0.5k tokens long.
LLM Framework like Langchain to adapt any document type to vector store
Using Langchain provides a neat abstraction for me to get started quickly, connect to VectorDB, and get fast results.
LLMs can’t process all the data that a large PDF could contain. So, we need a place to store the PDF and a way to retrieve relevant information from it, so it can be passed on to the LLM.
When trying to build and process documents or user inputs, it’s important to store them in a Vector Database to be able to retrieve the information when needed, along with the past context.
A vector database is the optimal solution because it enables efficient storage, retrieval, and processing of high-dimensional data, making it ideal for applications like document search and user input analysis where context and similarity are important.
For the past several months, there has been a surge of projects that personalize LLMs by storing user settings and information in a VectorDB so they can be easily retrieved and used as input for the LLM.
This can be done by storing data in the Weaviate Vector Database; then, we can process our PDF.
We start by converting the PDF and translating it
the next step we store the PDF to Weaviate
We load the data into some type of database using dlthub
The parallel with our memory components becomes clearer at this stage. We have some way to define inputs which correspond to SM, while STM and LTM are starting to become two separate, clearly distinguishable entities. It becomes evident that we need to separate LTM data according to domains it belongs to but, at this point, a clear structure for how that would work has not yet emerged.
In addition, we can treat GPT as limited working memory and its context size as how much our model can remember during one operation.
It’s evident that, if we don’t manage the working memory well, we will overload it and fail to retrieve outputs. So, we will need to take a closer look into how humans do the same and how our working memory manages millions of facts, emotions, and senses swirling around our minds.
Let’s break down the Data Platform components at this stage:
Memory type
State
Description
Sensory Memory
Command line interface + arguments
Can be interpreted in this context as the arguments provided to the script
STM
Partially Vector store, partially working memory
The processing layer and a storage of the session/user context
LTM
Vector store
The raw document storage
Sensory Memory
Sensory memory can be seen as an input buffer where the information from the environment is stored temporarily. In our case, it’s the arguments we give to the command line script.
STM
STM is often associated with the concept of "working memory," which holds and manipulates information for short periods.
In our case, it is the time during which the process runs.
LTM
LTM can be conceptualized as a database in software systems. Databases store, organize, and retrieve data over extended periods. The information in LTM is organized and indexed, similar to how databases use tables, keys, and indexes to categorize and retrieve data efficiently.
VectorDB: The LTM Storage of Our AI Data Platform
Unlike traditional relational databases, that store data in tables, and newer NoSQL databases like MongoDB, that use JSON documents, vector databases specifically store and fetch vector embeddings.
Vector databases are crucial for Large Language Models and other modern, resource-hungry applications. They're designed for handling vector data, commonly used in fields like computer graphics, Machine Learning, and Geographic Information Systems.
Vector databases hinge on vector embeddings. These embeddings, packed with semantic details, help AI systems to understand data and retain long-term memory. They're condensed snapshots of training data and act as filters when processing new data in the inference stage of machine learning.
This post is a part of a series of texts aiming to discover and understand patterns and practices that would enable building a production-ready AI data infrastructure. The main focus is on how to evolve data modeling and retrieval in order to enable Large Language Model (LLM) apps and Agents to serve millions of users concurrently.
For a broad overview of the problem and our understanding of the current state of the LLM landscape, check out our previous post
In this text, we continue our inquiry into what would constitute:
Proper data engineering methods for LLMs
A production-ready generative AI data platform that unlocks AI assistants/Agent Networks
To explore these points, we here at prometh.ai have partnered with dlthub in order to productionize a common use case — complex PDF processing — progressing level by level.
In the previous text, we wrote a simple script that relies on the Weaviate Vector database to turn unstructured data into structured data and help us make sense of it.
In this post, some of the shortcomings from the previous level will be addressed, including::
This phase enhances the basic script by incorporating:
Memory Manager
The memory manager facilitates the execution and processing of VectorDB data by:
Uniformly applying CRUD (Create, Read, Update, Delete) operations across various classes
Representing different business domains or concepts, and
Ensuring they adhere to a common data model, which regulates all data points across the system.
Context Manager
This central component processes and analyzes data from Vector DB, evaluates its significance, and compares the results with user-defined benchmarks.
The primary objective is to establish a mechanism that encourages in-context learning and empowers the Agent’s adaptive understanding.
As an example, let’s assume we uploaded the book A Call of the Wild by Jack London to our Vector DB semantic layer, to give our LLM a better understanding of the life of sled dogs in the early 1900s.
Asking a question about the contents of the book will yield a straightforward answer, provided that the book contains an explicit answer to our question.
To enable better question answering and access to additional information such as historical context, summaries, and other documents, we need to introduce different memory stores and a set of attention modulators, which are meant to manage the prioritization of data retrieved for the answers.
Task Manager
Utilizing the tools at hand and guided by the user's prompt, the task manager determines a sequence of actions and their execution order.
For example, let’s assume that the user asks: “When was Buck (one of the dogs from A Call of the Wild) kidnapped” and to have the answer translated to German”
This query would be broken down by the task manager into a set of atomic tasks that can then be handed over to the Agent.
The ordered task list could be:
Retrieve information about the PDF from the database.
Translate the information to German.
The Agent
AI agents can use computers independently. They can browse the web, use apps, read and write files, make credit card payments, and even autonomously execute processes on your personal computer.
In our case, the Agent has only a few tools at its disposal, such as tools to translate text or structure data. Using these tools, it processes and executes tasks in the sequence they are provided by the Task Manager and the Context Manager.
At this stage, our proof of concept (POC) allows uploading a PDF document and requesting specific actions on it such as "load to database", "translate to German", or "convert to JSON." Prior task resolutions and potential operations are assessed by the Context Manager and Task Manager services.
The following set of steps explains the workflow of the POC at level 2:
Initially, we specify the parameters for the document we wish to upload and define our objective in the prompt:
The memory manager retrieves the parameters and the attention modulators and creates context based on Episodic and Semantic memory stores (previous runs of the job + raw data):
To do this, it starts by filtering user input, in the same way our brains filter important from redundant information. As an example, if there are children playing and talking loudly in the background during our Zoom meeting, we can still pool our attention together and focus on what the person on the other side is saying.
The same principle is applied here:
In the next step, we apply a set of attention modulators to process the data obtained from the Vector Store.
NOTE: In cognitive science, attention modulators can be thought of as factors or mechanisms that influence the direction and intensity of attention.
As we have many memory stores, we need to prioritize the data points that we retrieve via semantic search.
Since semantic search is not enough by itself, scoring data points happens via a set of functions that replicate how attention modulators work in cognitive science.
Initially, we’ve implemented a few attention modulators that we thought could improve the document retrieval process:
Frequency: This refers to how often a specific stimulus or event is encountered. Stimuli that are encountered more frequently are more likely to be attended to or remembered.
Recency: This refers to how recently a stimulus or event was encountered. Items or events that occurred more recently are typically easier to recall than those that occurred a long time ago.
We have implemented many more, and you can find them in our
repository. More are still needed and contributions are more than welcome.
Let’s see the modulators in action:
In the code above we fetch the memories from the Semantic Memory bank where our knowledge of the world is stored (the PDFs). We select the relevant documents by using the handle_modulator function.
The handle_modulator function is defined below and explains how scoring of memories happens.
We process the data retrieved with OpenAI functions and store the results for the Task Manager to be able to determine what actions the Agent should take.
The Task Manager then sorts and converts user input into a set of actionable steps based on the tools available.
Finally, the Agent interprets the context and performs the steps using the tools it has available. We see this as the step where the Agents take over the task, executing it in their own way.
Now, let's look back at what constitutes the Data Platform:
Memory type
State
Description
Sensory Memory
API
Can be interpreted in this context as the interface used for the human input
STM
Weaviate Class with hardcoded contract
The processing layer and a storage of the session/user context
LTM
Weaviate Class with hardcoded contract
The information storage
Lacks:
Extendability: Capability to add features or functionality.
Loading flexibility: Ability to apply different chunking strategies
Testability: How to test the code and make sure it runs
This post is part of a series of texts aiming to explore and understand patterns and practices that enable the construction of a production-ready AI data infrastructure. The main focus of the series is on the modeling and retrieval of evolving data, which would empower Large Language Model (LLM) apps and Agents to serve millions of users concurrently.
For a broad overview of the problem and our understanding of the current state of the LLM landscape, check out our initial post here.
In this post, we delve into context enrichment and testing in Retrieval Augmented Generation (RAG) applications.
RAG applications can retrieve relevant information from a knowledge base and generate detailed, context-aware answers to user queries.
As we are trying to improve on the base information LLMs are giving us, we need to be able to retrieve and understand more complex data, which can be stored in various data stores, in many formats, and using different techniques.
All of this leads to a lot of opportunities, but also creates a lot of confusion in generating and using RAG applications and extending the existing context of LLMs with new knowledge.
In navigating the complexities of RAG applications, the first challenge we face is the need for robust testing. Determining whether augmenting a LLM's context with additional information will yield better results is far from straightforward and often relies on subjective assessments.
Imagine, for instance, adding the digital version of the book The Adventures of Tom Sawyer to the LLM's database in order to enrich its context and obtain more detailed answers about the book's content for a paper we're writing. To evaluate this enhancement, we need a way to measure the accuracy of the responses before and after adding the book while considering the variations of every adjustable parameter.
The end-to-end process of enhancing RAG applications involves various adjustable parameters, which offer multiple paths toward achieving similar goals with varying outcomes. These parameters include:
Number of documents loaded into memory.
Size of each sub-document chunk uploaded.
Overlap between documents uploaded.
Relationship between documents (Parent-Son etc.)
Type of embedding used for data-to-vector conversion (OpenAI, Cohere, or any other embedding method).
Metadata structure for data navigation.
Indexes and data structures.
Search methods (text, semantic, or fusion search).
Output retrieval and scoring methods.
Integration of outputs with other data for in-context learning.
The goal we set for our system in our initial post — processing and creating structured data from PDFs — presented an interesting set of problems to solve. OpenAI functions and dlthub allowed us to accomplish this task relatively quickly.
The real issue arises when we try to scale this task — this is what our second post tried to address. In addition, retrieving meaningful data from the Vector Databases turned out to be much more challenging than initially imagined.
In this post, we’ll discuss how we can establish a testing method, improve our ability to retrieve the information we've processed, and make the codebase more robust and production-ready.
We’ll primarily focus on the following:
Memory Manager
The Memory Manager is a set of functions and tools for creating dynamic memory objects. In our previous blog posts, we explored the application of concepts from cognitive science — Short-Term Memory, Long-Term Memory, and Cognitive Buffer — on Agent Network development.
We might need to add more memory domains to the process, as sticking to just these three can pose limitations. Changes in the codebase now enable real-time creation of dynamic memory objects, which have hierarchical relationships and can relate to each other.
RAG test tool
The RAG test tool allows us to control critical parameters for optimizing and testing RAG applications, including chunk size, chunk overlap, search type, metadata structure, and more.
The Memory Manager is a crucial component of any cognitive architecture platform. In our previous posts, we’ve discussed how to turn unstructured data to structured, how to relate concepts to each other in the vector store, and which problems can arise when productionizing these systems.
While we’ve addressed many open questions, many still remain. Based on our surveys and interviews with field experts, applications utilizing Memory components face the following challenges:
Inability to reliably link between Memories
Relying solely on semantic search or its derivatives to recognize the similarities between terms like "pair" and "combine" is a step forward. However, actually defining, capturing, and quantifying the relationships between any two objects would aid future memory access.
Solution: Graphs/Traditional DB
Failure to structure and organize Memories
We used OpenAI functions to structure and organize different Memory elements and convert them into understandable JSONs. Nevertheless, our surveys indicate that many people struggle with metadata management and the structure of retrievals. Ideally, these aspects should all be managed and organized in one place.
Hierarchy, size, and relationships of individual Memory elements
Although semantic search helps us understand the same concepts, we need to add more abstract concepts and ideas and link them. The ultimate goal is to emulate human understanding of the world, which comprises basic concepts that, when combined, create higher complexity objects.
Solution: Graphs/Custom solutions
Evaluation possibilities of memory components (can they be distilled to True/False)
Based on the psycholinguistic theories proposed by Walter Kintsch, any cognitive system should be able to provide True/False evaluations. Kintsch defines a basic memory component, a ‘proposition,’ which can be evaluated as True or False and can interlink with other Memory components.
A proposition could be, for example, "The sky is blue," and its evaluation to True/False could lead to actions such as "Do not bring an umbrella" or "Wear a t-shirt."
We should have a reliable method to test Memory components, at scale, for any number of use-cases. We need benchmarks across every level of testing to capture and define predicted behavior.
Suppose we need to test if Memory data from six months ago can be retrieved by our system and measure how much it contributes to a response that spans memories that are years old.
Solution: RAG testing framework
Let’s look at the RAG testing framework:
It allows to you to test and combine all variations of:
Number of documents loaded into memory. ✅
Size of each sub-document chunk uploaded. ✅
Overlap between documents uploaded. ✅
Relationship between documents (Parent-Son etc.) 👷🏻♂️
Type of embedding used for data-to-vector conversion (OpenAI, Cohere, or any other embedding method). ✅
Metadata structure for data navigation. ✅
Indexes and data structures. ✅
Search methods (text, semantic, or fusion search). ✅
Output retrieval and scoring methods. 👷🏻♂️
Integration of outputs with other data for in-context learning. 👷🏻♂️
Structure of the final output. ✅
These parameters and results of the tests will be stored in Postgres database and can be visualized using Superset