Title: Optimize RAG with AI Agents & Vector Databases
Resource URL: https://www.youtube.com/watch?v=Yq29bZ8Hlrc
Publication Date: 2025-04-30
Format Type: Video
Reading Time: 33 minutes
Contributors: David Levy;
Source: IBM Technology (YouTube)
Keywords: [Retrieval-Augmented Generation, AI Agents, Vector Database, CrewAI Framework, watsonx.ai]
Job Profiles: Backend Developer;Full Stack Developer;Chief Information Officer (CIO);Machine Learning Engineer;Chief Technology Officer (CTO);
Synopsis: In this video, IBM advisory technology engineer David Levy explains a multi-agent approach to improving Retrieval-Augmented Generation (RAG) pipelines by combining CrewAI agents and ChromaDB for accurate query classification and response generation.
Takeaways: [Classifying user queries into clear categories helps reduce irrelevant matches and improves the accuracy of context retrieved from databases., Assigning distinct tasks to different agents allows developers to isolate logic and debug or expand system functions more easily., Connecting agent tasks to custom tools enables real-time querying of vector databases based on classification results., Using a separate language model instance for each agent role allows you to fine-tune behavior by adjusting parameters such as token count or temperature., Borrowing prompt templates from tested implementations helps generate cleaner, more relevant responses without the need for extensive trial and error.]
Summary: This technical walkthrough demonstrates how to build a robust Retrieval-Augmented Generation (RAG) chatbot by integrating multiple specialized AI agents using the CrewAI framework and Watsonx.ai LLMs. The solution addresses a common issue in RAG pipelines: retrieving irrelevant context from large VectorDBs. The presenter proposes using dedicated agents for each step—query categorization, document retrieval, and response generation—linked through a sequential crew process.

First, a categorization agent uses a fine-tuned prompt to classify incoming queries into one of three domains: technical, billing, or account. This classification enables targeted retrieval from isolated ChromaDB collections, reducing noisy or semantically similar but irrelevant matches. The categorization output is passed to a retrieval agent equipped with a tool function, which queries the corresponding vector collection using embedded input.

The third agent, responsible for natural language generation, uses both the original query and the retrieved context. A pre-built prompt from Watson Studio's RAG accelerator is interpolated with these inputs to produce a user-facing response. Each agent is powered by a separate instance of the Granite 3.8B model, with tailored configurations to suit their task’s complexity.

The project architecture includes a FastAPI backend, a React-based UI, and an Express server. While the UI is not the focus, the presenter encourages customization using IBM’s Carbon Design System. The final application handles category identification, vector-based retrieval, and prompt-driven generation, showing how multi-agent collaboration can significantly boost RAG pipeline accuracy and maintainability.
Content: ## Overview
This tutorial demonstrates how to build a multi-agent retrieval-augmented generation (RAG) application by integrating AI agents with a vector database. Participants will walk through cloning a repository, configuring both front-end and back-end environments, and implementing three sequential AI agents—query categorization, context retrieval, and response generation—using the CrewAI framework and IBM watsonx.ai.

## Repository Setup

### Cloning and Directory Structure
1. Clone the repository to your local machine.
2. Navigate to the root directory, which contains two primary folders:
   - **UI**: A React TypeScript application with an accompanying Express TypeScript server. It uses Carbon Design components for rapid and visually appealing interface development.
   - **API**: A Python FastAPI service responsible for orchestrating AI agents and interacting with the vector database.

### UI Environment Configuration
1. Install root dependencies, then run the provided setup script.
2. Copy `client/.env.example` to `client/.env`, and likewise for `server/.env.example` to `server/.env`.
3. (Optional) Customize branding variables such as `APPLICATION_NAME` in `client/.env` (e.g., “Agents in Action!”).
4. Although UI modifications are optional for this session, exploring Carbon Design React components is encouraged to refine the interface later.

## API Environment Configuration

### Virtual Environment and Dependencies
1. In the `api` directory, create a Python virtual environment (e.g., `python -m venv aiagentic`) and activate it.
2. Install dependencies defined in `requirements.txt`, including CrewAI and `watsonx.ai`.
3. Copy `.env.example` to `.env` in the API directory.

### IBM watsonx.ai Credentials
1. In IBM Cloud, open your watsonx.ai resource’s Prompt Lab.
2. Use the “View code” cURL snippet to extract the base URL and project ID.
3. In IBM Cloud’s Access (IAM) section, create an API key; copy its value.
4. Populate the following environment variables in `api/.env`:
   - `WATSON_URL`
   - `WATSON_PROJECT_ID`
   - `WATSON_API_KEY`

## Running the Services
1. In separate terminal windows, start each service:
   - FastAPI: `uvicorn server:app --reload`
   - Express server (in `ui/server`)
   - React client (in `ui/client`)
2. Open your browser to view the chatbot interface.

## Building the Multi-Agent Pipeline

The core of this tutorial is the `agentic` route in `server.py`, which orchestrates three agents using the CrewAI framework.

### 1. Query Categorization Agent
1. **LLM Configuration**: Instantiate a CrewAI LLM using the IBM watsonx.ai model (e.g., “watsonx.granite-3.8b”), with `temperature=0.7` and `max_tokens=50`.
2. **Agent Definition**: Create an `Agent` with attributes:
   - `role`: “Collection Selector”
   - `goal`: Analyze user queries and determine the appropriate ChromaDB collection.
   - `backstory`: Expert in query classification and domain routing.
   - `verbose=True`, `allow_delegation=False`, `max_iterations=1`.
3. **Task Assignment**: Define a CrewAI `Task` that prompts the agent to choose one of “technical,” “billing,” or “account” categories and to return a JSON object conforming to a Pydantic model (`CategoryResponse`).
4. **Crew Orchestration**: Assemble a `Crew` with the single categorization agent and task in a `sequential` process, then invoke `crew.kickoff()` to obtain the classification result.

### 2. Context Retrieval Agent
1. **LLM Configuration**: Clone the first LLM setup but increase `max_tokens` to accommodate longer context queries (e.g., `max_tokens=1000`).
2. **Tool Definition**: Implement a Python function `query_collection_tool(category: str, query: str) -> dict` that:
   - Embeds the user query using watsonx.ai embeddings.
   - Queries a ChromaDB collection named after the category.
   - Returns the most relevant document snippets.
3. **Agent & Task**: Create a new CrewAI agent (“Retriever Agent”) that employs this tool. The corresponding task invokes the tool with the category result from the first agent and the original user query. Context from the tool becomes available to downstream agents.
4. **Crew Extension**: Add the retriever agent and task to the existing crew, preserving the `sequential` process. The crew now outputs both the category and the retrieved context.

### 3. Response Generation Agent
1. **LLM Configuration**: Instantiate a third LLM with a more generous `max_tokens` (e.g., `max_tokens=512`) to produce comprehensive responses.
2. **Prompt Template Tool**: Create a function `generate_response_tool(context: str, query: str) -> str` that interpolates the retrieved context and user query into a pre-built prompt template. The template can be sourced from IBM watsonx.ai’s RAG accelerator library for best practices.
3. **Agent & Task**: Define the “Generation Agent” with its LLM and assign a task that calls the prompt template tool, expecting a Pydantic schema (`FinalResponse`) with fields `category` and `response`.
4. **Crew Finalization**: Incorporate the generation agent and its task into the crew. Upon `crew.kickoff()`, the pipeline returns a structured JSON containing both the query category and the natural language answer.

## Demonstration and Logs
When the user submits a question via the UI, the API:
1. Classifies the question category and logs detailed agent reasoning.
2. Queries the corresponding ChromaDB collection and retrieves relevant text snippets.
3. Generates a polished response using the context and a custom prompt.

All three stages output structured JSON, which the UI consumes to render a dynamic chatbot experience.

## Extension Opportunities
- Implement an agent to perform web searches for out-of-scope queries.
- Add an HTML-formatting agent to enrich responses before sending to the client.
- Experiment with alternative LLM models, temperature settings, and prompt structures.

## Conclusion
This tutorial has illustrated how to architect a scalable, multi-agent RAG chatbot by:
- Defining specialized agents with clear roles, goals, and tools.
- Using CrewAI to manage agent orchestration and inter-agent communication.
- Leveraging IBM watsonx.ai for model hosting, embeddings, and prompt accelerators.

Participants are encouraged to explore further use cases, refine UI elements, and contribute enhancements to the repository.