A large private sector bank was triaging thousands of customer emails daily across 20+ product lines entirely by hand. BootLabs deployed an on-premise AI agent that classifies intent and drafts compliant responses — with zero data leaving the bank's network.
A large private sector bank was receiving thousands of customer emails every day across more than 20 product lines — savings accounts, fixed deposits, credit cards, personal loans, home loans, business banking, trade finance, forex, insurance, wealth management, NRI services, and more. Each product carried 30 to 40 distinct customer intents — queries, complaints, service requests, escalations — totalling over 600 labelled intent categories in all. The bank's operations team handled triage and response drafting entirely by hand, creating chronic SLA breaches, inconsistent response quality, and significant agent burnout. The added constraint: the bank's data security policy prohibited all cloud LLM APIs. Everything had to run on-premise, within their private data centre, with zero inference traffic leaving the network perimeter. BootLabs designed and deployed a fully on-premise AI agent — covering hierarchical intent classification, retrieval-augmented response drafting, and a human-in-the-loop Salesforce integration — that transformed how the bank's operations team handles email at scale.
600+ intents across 20+ products demanded sub-second classification, yet the bank's data security policies required every model inference to stay within their private data centre. No cloud LLM APIs were permissible — meaning all models had to be deployed, served, and maintained entirely on the bank's own GPU infrastructure.
Drafting compliant, product-accurate email responses at scale required deep product knowledge encoded into prompts and retrieval layers — not just generic LLM output. Each draft had to reflect the correct policy, regulatory language, and product-specific detail for the identified intent, at a volume the operations team could never match manually.
Drafted responses needed to flow into the bank's existing CRM and ticketing workflows — not bypass them. A human-in-the-loop approval step before sending was non-negotiable for compliance and quality control, which meant the agent had to fit inside the existing Salesforce-based workflow, not replace it.
Fine-tuned Mistral and LLaMA models were deployed on the bank's GPU infrastructure using vLLM serving. No inference traffic left the bank's network perimeter at any point. The serving stack was optimised for the latency and throughput requirements of a high-volume email operations environment.
A two-stage hierarchical classifier first identifies the product line from 20+ categories, then resolves the specific customer intent within that product — 30 to 40 intents per product. This hierarchical approach kept classification accuracy high even across 600+ total intent categories, where a flat classifier would have degraded sharply.
A RAG pipeline retrieved relevant product policy documents, regulatory guidelines, and approved response templates based on the classified intent. The on-premise LLM used these as grounding context to generate accurate, compliant draft responses — grounded in the bank's own policies rather than generic training data.
The classified intent and drafted response surfaced directly inside the agent's existing CRM view via a Salesforce integration. Operations agents reviewed, edited if needed, and approved with a single click. The approval loop also captured structured feedback that fed a continuous fine-tuning pipeline — improving accuracy each quarter.
The agent was deployed entirely within the bank's private data centre — no cloud dependencies, no data leaving the network. From day one of go-live, the majority of incoming emails were classified and had a compliant draft response ready before a human agent ever opened them. The operations team shifted from writing responses from scratch to reviewing and approving AI-drafted ones — dramatically increasing throughput without adding headcount.
Average email response time reduced from hours to under 15 minutes. Customers receive faster, more consistent answers — and SLA targets that were routinely missed are now met as standard.
The same operations team now handles twice the email volume. Agent time shifted from drafting responses to reviewing and approving them — reducing burnout and freeing capacity for complex escalations that genuinely need human judgement.
Response quality consistency improved dramatically. RAG-grounded drafts reflect the correct product policy, regulatory language, and approved tone — reducing compliance risk and the manual editing burden on senior reviewers.
Every agent approval or edit feeds a structured fine-tuning pipeline. Classification accuracy and draft quality improve each quarter automatically — the system gets better with use, without requiring manual retraining cycles.
Tell us about your challenge and we'll set up a focused 30-minute session.