Summarization with Retreival Augmented Generation (RAG) AI workflow

Architecture

Introduction for the architecture of this solution pattern.

1. Common Challenges

This solution pattern address the following challenges:

  • Integrating private knowledge into a generative AI workflow.

  • Capturing and syncronizing near real-time stock exchange data makes periodic fine tuning of LLMs prohibitive.

  • Exposing private data sources via externally hosted API services Hosting AI services raises security and privacy concerns for most enterprises.

3. An in-depth look at the solution’s architecture

A closer look at the services and components that make up this solution pattern.

3.1. Data Flow

dataflow

3.1.1. Ingest Engine

Data flow begins at the source with stock information made available via a RESTful API service hosted by Alpha Vantage. The ingest engine consisting of two Camel services takes care of keeping data fresh in the Weaviate vector database. The ingest processes are highly configurable and perform filtering and discarding of invalid or null data. As stock symbol information is read in, vector embeddings are generated and stored in the Weaviate vector database along with the financial entities for each stock symbol. An example json record is available in the AlphaVantage API documentation.

3.1.2. Vector Database

The vector database used is Weaviate. Weaviate is a highly performant and scalable open-source vector database that simplifies the development of AI applications. Built-in vector and hybrid search, easy-to-connect machine learning models, and a focus on data privacy enable developers of all levels to build, iterate, and scale AI capabilities faster.

3.1.3. Machine Learning (ML) models

This solution pattern makes use of two ML models, an enbeddings model (all-minilm) and an LLM (granite3-dense:8b) which is configurable at run time. The enbeddings model is used to generate vector embeddings for each stock symbol and the LLM summarizes the stock information. Granite a series of LLMs developed by IBM, specifically designed for enterprise applications, focusing on business use cases like code generation, summarization, and classification, with a strong emphasis on security and data privacy, all while being open-source under the Apache 2.0 license.

rag architecture

3.2. Architecture

3.2.1. Ingest Engine

The ingest engine consists of a caching proxy and syncronizer services. The caching proxy is a Camel service that caches the stock information from the Alpha Vantage API and the syncronizer service is a Camel service that synchronizes the data with the Weaviate vector database. The ingest engine is deployed on Openshift using a standard source-to-image workflow. Details can be found in the caching proxy and syncronizer repositories.

3.2.2. Weaviate vector database

The Weaviate vector database is installed using the helm installer which allows for enterprise features to be enabled including API key token authentication and data sharding. Weaviate’s cloud-native design supports horizontal scaling and efficient resource consumption, allowing it to handle large volumes of data and user requests easily. Interested readers are encouraged to find out more about the benefits of hosting Weaviate on Openshift.

3.2.3. Ollama Model Server

Ollama is a popular and easy to use platform to host and serve LLMs. It supports a number of operating systems and provides good integration and compatibility with the open source ecosystem including Weaviate. Ollama is deployed on Openshift using a standard source-to-image workflow. Details can be found in this repository.

3.2.4. User Interface

The UI is written in Python using the Gradio framework and is deployed using Openshift source-to-image workflow. Example queries are presented to the user as pushbuttons to perform semantic and generative queries. Users may also enter custom queries and prompts.

3.2.5. Developer IDE

Provided by Red Hat Openshift DevSpaces, this IDE allows developers to work with the codebase and test their changes in a live environment.

4. About the Technology Stack

Weaviate is an open-source vector database that makes it easy to build AI applications. Ollama is a popular and easy to use platform to host and serve LLMs.