Summarization with Retreival Augmented Generation (RAG) AI workflow

Workshop

During this workshop youwill learn how to:

  • Get experience with a Weaviate enterprise installation.

  • Learn how to use the Weaviate Python client.

  • Learn how python application are deployed on Openshift

  • Work with the Gradio library to build a user interface.

  • Work with an LLM server on Openshift

  • Understand the various components that make up a RAG solution

1. Installing the workshop environment

1.1. Before getting started

1.1.1. Prequisites

  • An OpenShift Cluster

    • Version 4.16 or greater

    • A worker node with a minimum of 4 CPUs, 16 GB of memory and a single NVIDIA T4 GPU with 16GB of memory. (AWS g4dn.xlarge or equivalent)

    • The web-terminal operator

  • A client system with the following software installed:

  • An AlphaVantage API Key if you want to periodically refresh the stock symbol data.

1.2. Installing the environment

The following services on OpenShift are required:

1.2.2. Ollama LLM Server

  • oc apply -k ollama/deploy

Pull the necessary ML models:

Login to the Openshift console, open a terminal window in the ollama project and run the following commands.

1.2.3. Stock Overview Ingestion Engine

Caching Proxy

Follow the instructions to install the caching proxy in the camel namespace.

  • oc new-project camel

  • git clone https://github.com/joshdreagan/av-caching-proxy.git

  • cd av-caching-proxy

  • cp src/main/jkube/deployment.yml.template src/main/jkube/deployment.yml

  • cp src/main/jkube/configmap.yml.template src/main/jkube/configmap.yml

  • mvn -P openshift clean package oc:deploy

The default values for deployment.yml, and configmap.yml should be fine.

Stock Overview Syncronizer

Install the AlphaVantage syncronizer in the camel namespace.

  • git clone https://github.com/joshdreagan/av-overview-sync.git

  • cd av-overview-sync

  • cp src/main/jkube/deployment.yml.template src/main/jkube/deployment.yml

  • cp src/main/jkube/configmap.yml.template src/main/jkube/configmap.yml

  • cp src/main/jkube/secret.yml.template src/main/jkube/secret.yml

  • mvn -P openshift clean package oc:deploy

Configure the src/main/jkube/configmap.yml to use the all-minilm vectorizer model and the granite3-dense:8b generative model.

Update your base64 encoded AlphaVantage API Key in the src/main/jkube/secret.yml file.

1.2.4. Gradio UI and application.

From a terminal, create an OpenShift application.

Obtain the Weaviate API key and use it to create the RAG application.

  • oc new-app python~https://github.com/redhat-na-ssa/demo-ai-weaviate --name rag --context-dir=/src --env WEAVIATE_API_KEY=your_weaviate_admin-api-key

Expose the app with an external route and have fun.

  • oc create route edge --service rag --insecure-policy=Redirect

1.3. Running the workshop

1.3.1. Understanding the Use case

Let’s get started by having a look at the example IBM stock symbol json to explain the use case. Each json formatted data record includes the stock symbol, a company name, a short description followed by a number of financial metrics. Imagine that you have been given this data and are asked to write up a financial summary. This sounds like a typical task that a financial analyist may be asked to perform and at first glance seems straight forward but the data is not structured in a way that is easy to understand. Some of the data is represented as currency while others are represented as ratios, percentages, dates and so on. The task becomes more challenging and time consuming when more than one company must be analyzed not to mentioned that the data is semi-realtime and could change several times day. This is where AI can help. In this workshop, we will make use of a vector database and an LLM to give the analyst a head start on the task at hand.

1.3.2. Openshift services.

Weaviate

At the center of the architecture is the Weaviate vector database. It it is deployed as an OpenShift statefulset to provide resilency and performance. The Weaviate vector database is exposed as a RESTful API via the internal service network to allow neighboring services such as the syncronizer and intelligent query clients to interact with it.

Ingest Engine

The ingest engine consists of a caching proxy and syncronizer. Once these services are started, the Weaviate vector database is initialized with a collection of 7118 stock symbol objects from a local file cache. As symbols are ingested, they are converted to vectors via the vectorizer model and upserted into Weaviate as embeddings. Once this cold start sequence is complete, the syncronizer will periodically refresh a number of stock symbols. This list of symbols of interest is may be configured at run time. Notice that the entire ingest engine is highly configurable via Openshift configmaps.

Ollama Model Server

The Ollama model server hosts a vectorizer and a large language model. To provide optimal performance and latency it is accelerated by GPUs.

Developer IDE

Code assistance, debugging and enhanced developer experience is made possible by Openshift DevSpaces.

1.3.3. Running the RAG application.

To run the itelligent application visit the rag route in a web browser. Start off near the top of the web page UI by performing a semantic search using terms like "computers" or "commodities" and see the results. Also feel free to try a term of your own. The number of results can be changed using the horizontal slider. Weaviate will return the closest matches but only the company names will appear in the UI.

Now choose a prompt template near the bottom half of the UI. This is where the magic happens. The Weaviate SDK will fill-in the prompt template with the results of the semantic search and perform a generative search using the power of the LLM. Again, experiment with your own prompts and have fun!