Skip to content

2024

LLMOps stack + Graphs

The past: Berlin startup scene

Machine learning has had its place in the business models of companies for several years, but due to high labor costs, lack of generalizability, and long development cycles, it often did not meet the early days' expectations. With the rise of ChatGPT, however, foundation models and LLMs are reemerging as the next step in the evolution in the Machine Learning stack, democratizing it for the end users.

As a consultant and operator in the Berlin startup scene over the past ten years, I have seen the vocation of “Data Scientist” reflect this trend as well. Its initial popularity, decline, and resurgence easily could have mirrored the rise and fall of the Roman Empire. Humble beginnings, grandiosity, and then the rigidity of thought of early Christianity.

In my early years as a data analyst in tier-II e-commerce businesses, data science was considered a prestigious, cutting-edge title. However,  most of these ventures lacked the experience or maturity to properly productionize their business models.

Often, I would see a data scientist building tons of features for their company’s AI models to only slightly improve on their basic KPIs. They were often stuck in the limbo of demoware, and only the businesses in which data was a key operational element would successfully deploy and run data science systems at scale.


Pandemic and fall out of grace

Over the years, this low impact-high drain dynamic led to data science falling out of favor. The COVID pandemic seemed to deliver a death blow to the Berlin Data Science community, with many data scientists being made redundant.

This played out differently in larger markets and companies, where I saw more mature setups heavily relying on machine learning. However, from the perspective of most software Mittelstand (a German term for medium-sized enterprises), the technology was seen as a nice-to-have, not a must-have.

Suddenly, with the release of ChatGPT, most knowledge previously required to operate machine learning became obsolete,  with the only thing now needed being an API key. This dropped the barrier to entry to the floor and created a need for new tools to be built around these APIs.

Tools like Langchain met this need perfectly, enabling everyone to interact with their data.

ML to LLMOps

A question arises about how we should approach LLM tooling. Ignoring previous knowledge and inferring new paradigms(Agents come to mind)  as if in a vacuum can be counterproductive. Re-inventing categories should be done cautiously; history shows that overdoing it can lead to failure.

A recently published article by angel investor and Professor of Neuroscience at U.C. London, Peter Zhegin, has effectively mapped out the elements of the MLOps system ripe for disruption, suggesting which ones might be impacted:

  1. Vector Stores: The authors argue that data storage and vector stores will be acquired by large players, but that differentiation still may be possible in the data space. They state that "A realistic way to differentiate might be to push for real-time vectorization while finding the best way to use traditional databases and feature stores (relational/NoSQL) together with vector DBs."
  2. Feature Storage: The authors note that "Prompt engineering does not involve traditional training but allows one to change the model's behavior during inference by creating appropriate instructions or queries. This ‘training without training’ presents an interesting opportunity outside the MLOps pipeline."

Feature Stores and the Next Steps

The evolution of the MLOps stack signals a need for a new type of feature store that enables in-context learning.

Since fine-tuning LLMs will start happening at inference time, we need a system to interact with and manage data points fed to the LLM at scale. This implies a need for a feature store that provides more determinism to the LLM outputs, enabling the integration of business processes and practices into data that captures the context of an enterprise or organization.

An example of such a use case would be using a set of documents from different departments, enabling the LLM to understand the relationships between these documents and their individual parts.

This effort often requires humans to provide the base rules for how the LLM should interact with the information, leading to the creation of what is commonly referred to as a RAG (Retrieval Augmented Generation) pipeline.


Since recently, we’ve been able to combine graphs and vector data stores to create a semantic layer on top of naive RAGs. This layer has been a major step towards encoding rules into an in-context learning pipeline.

In his recent blog post, co-founder of WhyHow.AI, Chia Jeng Yang, explained what a typical RAG pipeline looks like. He also introduced Graph Ops and Vector Ops as new elements of the RAG stack which can lead to more stable retrieval patterns.

Shiny_new_LLMOps/Untitled.png

The argument Zhegin made a few months ago is now taking shape. We are seeing feature stores evolve into tools that manage vector and graph stores.

We are still in the early stages, though. As Jon Turrow of Madrona Ventures  suggests, the next generation of AI agent infrastructure—what Chia refers to as Graph Ops—will be a personalization layer.


I believe that these terms are interchangeable and that a new in-context Feature Store, Vector and Graph Ops, and personalization layers are essentially the same thing. Moreover, it’s my belief that Vector and Graph Ops are not differentiation categories in and of themselves.

The challenge is, thus, not connecting Vector and Graph stores or giving a RAG system a 10% performance boost.

The main issues still remain Shiny_new_LLMOps/llm_problems.png

The challenge and solution lie in creating a new type of probabilistic data engine—one with an interface as simple as SQL, but which can retrieve and structure information in real-time, optimizing what we feed the LLM based on solid evaluations.


Striving to make sense of the best computing engine we know of—our mind—cognitive sciences may offer us clues on how to move forward.

After all, we process, store, and retrieve data from our mental lexicon with ease, with inherent personalization and dynamic data enrichment.

I believe that understanding the way our mind carries out these processes may allow us to replicate them in machine learning.

With human language as the new SQL and cognitive theories as inspiration, the next generation of tooling is still on the horizon.

Cognee new website

We are excited to announce the launch of Cognee.ai.

Highlights: Book a Discussion: Schedule a consultation directly through our website. Check it out at: www.cognee.ai and book your discussion with our experts.

We'd be happy to hear your feedback and discuss GraphRAGs, and more.

Cognee v0.1.4

Cognee - release v0.1.4

New Features

Enhanced Text Processing Capabilities:

Customizable models for classification, summarization, and labeling have been introduced to extend the versatility of our text analytics.

Better Graph Integration and Visualization:

We have revamped our graph logic. Introduced comprehensive support for multiple graph database types including but not limited to Neo4j and NetworkX, enhancing integration and scalability. New functions for graph rendering, color palette generation, and dynamic graph visualization to help you better visualize data relationships and structures.

DSPy Module:

You can now train your graph generation query on a dataset and visualize the results with the dspy module.

Async Support:

Added asyncio support in various modules to improve performance and system efficiency, reducing latency and resource consumption.

Enhancements

Infrastructure Upgrades:

Our infrastructure has been significantly upgraded to support a wider range of models and third-party providers, ensuring compatibility and performance. Improved configuration settings for a more robust and adaptive operational environment.

Graph and Logic Improvements:

Improved Neo4j Integration: Enhancements to our Neo4j graph database integration for better performance and stability. Semantic Links and Node Logic: We have improved the semantic linkage between nodes and enhanced the node logic for a more intuitive and powerful user experience.

Refactor

Asynchronous Module Updates:

Various modules have been updated to use asynchronous operations to enhance the responsiveness and scalability of our systems.

Documentation

Enhanced Documentation:

We have extensively added to and reorganized our documentation.

Bug Fixes

Data Handling and Logic Improvements:

Fixed several inconsistencies in data handling and improved the logic flow in text input processing.

Cognee - release v0.1.0

Preface

In a series of posts we explored issues with RAGs and the way we can build new infrastructure stack for the world of agent networks.

To borrow the phrase Microsoft used, to restate the problem:

  • Baseline RAG performs poorly when asked to understand summarized semantic concepts holistically over large data collections or even singular large documents.

In the previous blog post we explained how developing a data platform and a memory layer for LLMs was one of our core aims.

To do that more effectively we turned cognee into a python library in order to make it easier to use and get inspiration from the OSS community.

Improved memory architecture

architecture.png

With the integration of Keepi.ai, we encountered several challenges that made us reassess our strategy. Among the issues we’ve identified were:

  • The decomposition of user prompts into interconnected elements proved overly granular, leading to data management difficulties on load and retrieval.

  • A recurring problem was the near-identical decomposition pattern for similar messages, which resulted in content duplication and an enlarged user graph. Our takeaway was that words and their interrelations represent only a fragment of the broader picture. We need to be able to guide the set of logical connections and make the system dynamic so that the data models can be adapted and adjusted to each particular use-case. What works for e-commerce transaction handling might not work for an AI vertical creating power point slides.

  • The data model, encompassing Long-Term, Short-Term, and Buffer memory, proved both limited in scope and rigid, lacking the versatility to accommodate diverse applications and use cases. Just collecting all elements from all memories seemed naive, while getting certain nodes with classifiers did not add enough value.

  • The retrieval of the entire buffer highlighted the need for improved buffer content management and a more adaptable buffer structure. We conceptualized the buffer as the analogue of human working memory, and recognize the need to better manage the stored data.

Moving forward, we have adopted several new strategies, features, and design principles:

Propositions:

Defined as atomic expressions within a text, each proposition encapsulates a unique factoid, conveyed in a succinct, standalone natural language format. We employ Large Language Models (LLMs) to break down text into propositions and link them, forming graphs with propositions as nodes and their connections as edges. For example, "Grass is green", and "2 + 5 = 5" are propositions. The first proposition has the truth value of "true" and the second "false". The inspiration was found in the following paper: https://arxiv.org/pdf/2312.06648.pdf

Multilayer Graph Network:

A cognitive multilayer networks is both a quantitative and interpretive framework for exploring the mental lexicon, the intricate cognitive system that stores information about known words/concepts.

Mental lexicon is component of the human language faculty that contains information regarding the composition of words.

Utilizing LLMs, we construct layers within the multilayer network to house propositions and their interrelations, enabling the interconnection of different semantic layers and the cross-layer linking of propositions. This facilitates both the segmentation and accurate retrieval of information.

For example, if "John Doe" authored two New York Times cooking articles, we could extract an "ingredients" layer when needed, while also easily accessing all articles by "John Doe".

We used concepts from psycholinguistics described here: https://arxiv.org/abs/1507.08539

Data Loader:

It’s vital that we address the technical challenges associated with Retrieval-Augmented Generation (RAG), such as metadata management, context retrieval, knowledge sanitization, and data enrichment.

The solution lies in a dependable data pipeline capable of efficiently and scalably preparing and loading data in various formats from a range of different sources. For this purpose, we can use 'dlt' as our data loader, gaining access to over 28 supported data sources.

To enhance the Pythonic interface, we streamlined the use of cognee into three primary methods. Users can now execute the following steps:

  • cognee.add(data): This method is used to input and normalize the data. It ensures the data is in the correct format and ready for further processing.
  • cognee.cognify(): This function constructs a multilayer network of propositions, organizing the data into an interconnected, semantic structure that facilitates complex analysis and retrieval.
  • cognee.search(query, method='default'): The search method enables the user to locate specific nodes, vectors, or generate summaries within the dataset, based on the provided query and chosen search method. We employ a combination of search approaches, each one relying on the technology implemented by vector datastores and graph stores.

Integration and Workflow

The integration of these three components allows for a cohesive and efficient workflow:

Data Input and Normalization:

Initially, Cognee.add is employed to input the data. During this stage, a dlt loader operates behind the scenes to process and store the data, assigning a unique dataset ID for tracking and retrieval purposes. This ensures the data is properly formatted and normalized, laying a solid foundation for the subsequent steps.

Creation of Multilayer Network:

Following the data normalization, Cognee.cognify takes the stage, constructing a multilayer network from the propositions derived from the input data. The network is created using LLM as a judge approach, with specific prompt that ask for creating of a set of relationships. This approach results in a set of layers and relationships that represent the document.

Data Retrieval and Analysis

The final step involves Cognee.search, where the user can query the constructed network to find specific information, analyze patterns, or extract summaries. The flexibility of the search function allows to search for content labels, summaries, nodes and also be able to retrieve data via similarity search. We also enable a combination of methods, which leads to getting benefits of different search approaches.

What’s next

We're diligently developing our upcoming features, with key objectives including:

  1. Adding audio and image support
  2. Improving search
  3. Adding evals
  4. Adding local models
  5. Adding dspy

To keep up with the progress, explore our implementation on GitHub and, if you find it valuable, consider starring it to show your support.