BBA

Optimize RAG with AI Agents & Vector Databases

In this video, IBM advisory technology engineer David Levy explains a multi-agent approach to improving Retrieval-Augmented Generation (RAG) pipelines by combining CrewAI agents and ChromaDB for accurate query classification and response generation.

Retrieval-Augmented Generation AI Agents Vector Database CrewAI Framework watsonx.ai

Takeaways

Classifying user queries into clear categories helps reduce irrelevant matches and improves the accuracy of context retrieved from databases.
Assigning distinct tasks to different agents allows developers to isolate logic and debug or expand system functions more easily.
Connecting agent tasks to custom tools enables real-time querying of vector databases based on classification results.
Using a separate language model instance for each agent role allows you to fine-tune behavior by adjusting parameters such as token count or temperature.
Borrowing prompt templates from tested implementations helps generate cleaner, more relevant responses without the need for extensive trial and error.

Summary

This technical walkthrough demonstrates how to build a robust Retrieval-Augmented Generation (RAG) chatbot by integrating multiple specialized AI agents using the CrewAI framework and Watsonx.ai LLMs. The solution addresses a common issue in RAG pipelines: retrieving irrelevant context from large VectorDBs. The presenter proposes using dedicated agents for each step—query categorization, document retrieval, and response generation—linked through a sequential crew process.

First, a categorization agent uses a fine-tuned prompt to classify incoming queries into one of three domains: technical, billing, or account. This classification enables targeted retrieval from isolated ChromaDB collections, reducing noisy or semantically similar but irrelevant matches. The categorization output is passed to a retrieval agent equipped with a tool function, which queries the corresponding vector collection using embedded input.

The third agent, responsible for natural language generation, uses both the original query and the retrieved context. A pre-built prompt from Watson Studio's RAG accelerator is interpolated with these inputs to produce a user-facing response. Each agent is powered by a separate instance of the Granite 3.8B model, with tailored configurations to suit their task’s complexity.

The project architecture includes a FastAPI backend, a React-based UI, and an Express server. While the UI is not the focus, the presenter encourages customization using IBM’s Carbon Design System. The final application handles category identification, vector-based retrieval, and prompt-driven generation, showing how multi-agent collaboration can significantly boost RAG pipeline accuracy and maintainability.

Job Profiles

Chief Technology Officer (CTO) Machine Learning Engineer Chief Information Officer (CIO) Full Stack Developer Backend Developer

Actions

Watch full video Export

Contributors

David Levy

Source

IBM Technology (YouTube)

BBA

Content rating = B

Generally reliable
Serves a specific niche/audience

Author rating = B

Has professional experience in the subject matter area

Source rating = A

Features expert contributions
Maintains high editorial standards

Video

BBA