Overview
An LLM-powered conversational AI system designed to be embedded into web products as a support and discovery tool. It understands context across a session, retrieves relevant knowledge from a vector database, and responds in the user's language — supporting English and Hindi.
The Problem
Generic chatbot widgets give generic answers. Businesses need an AI assistant that actually knows their product — their FAQs, their docs, their pricing, their edge cases — and can answer specific questions without hallucinating. The challenge is combining the fluency of LLMs with the accuracy of a retrieval system.
How It Works
The system uses a Retrieval-Augmented Generation (RAG) architecture. At setup, the client's knowledge base is chunked and embedded into a vector database. At query time, the user's message is embedded, the top-k most semantically relevant chunks are retrieved, and injected into the LLM's context window alongside a strict system prompt.
Key Features
- ▹Session-aware conversation memory (last 10 turns)
- ▹Retrieval-Augmented Generation for factual accuracy
- ▹Automatic language detection and response matching
- ▹Admin dashboard to upload and manage the knowledge base
- ▹Confidence scoring — low-confidence responses route to human support
- ▹Streaming responses for instant perceived performance
Challenges
The hardest problem was chunking strategy. Splitting documents naively by character count lost semantic context. The solution was a hybrid approach: recursive paragraph-aware splitting with overlap, plus metadata tagging so retrieved chunks carry their source document and section heading.
Outcome
Core RAG pipeline complete and live. Multilingual support active for English and Hindi. Admin dashboard for knowledge base management deployed.