Aniket Kapadnis

Projects

What I've Built

Production AI systems — each owned end-to-end with real, measurable impact.

AI Test Automation

Data Mining Agents

Test Cases to SQL

PROJECT 01 · AI TEST AUTOMATION

AI Test Automation

PythonFastAPIGPT-Azure OpenAI Embedding ModelFAISSLangChainCustom Re-Ranker AlgorithmPyMuPDFpdfplumberExcel ValidatorAzure

Manual test case writing is one of the most time consuming and repetitive tasks in any software QA team. In this project, the client had a team of manual testers who were responsible for writing test cases from requirement documents like Word files, PDFs and text files. For every new feature or release cycle — - A single 50 page requirement document took 2 to 3 days per tester - Consistency was a big issue — different testers interpreted requirements differently and wrote test cases in their own style - Coverage was never guaranteed — things got missed under time pressure - Any change in requirements meant going back and rewriting manually With multiple projects running in parallel, the QA team was always the bottleneck in the delivery pipeline. The client wanted to know — Can AI read our documents and generate test cases automatically, consistently and faster than a human? That is the exact problem I was brought in to solve.

My approach was to design this as a proper production grade AI pipeline, not just a quick prototype. Every component was thought through keeping scalability, traceability and user experience in mind. Here is how I broke down the solution — I designed an end to end AI pipeline for automated test case generation. Here's how it works at a high level — User uploads documents (word, pdf, text) through the CDAP UI. These go through a Document Validator via FastAPI which checks the format and creates a unique Job ID for tracking. Then a Page Splitter and Document Splitter breaks the document into pages. We use GPT-4.1 Vision to extract text page by page and store everything in a master TXT file under that Job ID. After that, LangChain Semantic Text Splitter splits the text into chunks. These chunks go through Azure OpenAI Embedding Model and stored in FAISS vector store for semantic search. Now when the user wants test cases — they have two modes: 1. Full Document Mode — takes all chunks and generates test cases 2. Query Mode — user can search specific section, we retrieve relevant chunks using semantic search + Re-Ranker Algorithm The chunks + user instructions go to GPT-4.1 as context. A Context & Feature Identifier first understands what features are in scope, then the Test Case Generator produces the final test cases. There's also a Coverage Analyzer that checks if the generated test cases actually cover the requirements properly. And finally a Feedback Mechanism — user can upload an Excel with feedback on specific test cases, and the system regenerates only those targeted ones.

PythonFastAPIGPT-Azure OpenAI Embedding ModelFAISSLangChainCustom Re-Ranker AlgorithmPyMuPDFpdfplumberExcel ValidatorAzure

Backend : Python, FastAPI → Chose FastAPI for async support and automatic API docs, important for long running document processing jobs AI / LLM : GPT-4.1 (Text + Vision), Azure AI Foundry → GPT-4.1 Vision used specifically for scanned/complex PDFs where traditional parsers failed Embeddings : Azure OpenAI Embedding Model → Kept within Azure ecosystem for data security and compliance requirements of the client Vector Store : FAISS (In-memory) → Chosen for speed in query mode, document sizes were manageable so persistent DB was not needed RAG Framework : LangChain, LangChain Semantic Text Splitter → LangChain gave flexibility to customize chunk size and overlap based on document type Re-Ranking : Custom Re-Ranker Algorithm → Built on top of semantic search to improve chunk relevance specifically for query mode Document Parse : PyMuPDF, pdfplumber → Used for page wise extraction before passing to Vision LLM Feedback : Excel Validator → Client already used Excel for QA tracking so we built the feedback layer around their existing workflow Deployment : Azure

I owned this project end to end — from understanding the client problem to designing, building and delivering the final solution. Architecture & Design - Designed the complete solution architecture covering document ingestion, RAG pipeline, test case generation and feedback mechanism - Made key technical decisions on tool selection, chunking strategy and API design Development - Built the document ingestion pipeline with format validation, job ID tracking and async processing - Integrated GPT-4.1 Vision for intelligent page wise text extraction from complex and scanned PDFs - Designed and implemented the full RAG pipeline — LangChain + FAISS + Azure OpenAI embeddings - Built both Full Document Mode and Query Mode with a custom Re-Ranker on top of semantic search - Developed the Coverage Analyzer to validate requirement traceability in generated test cases - Engineered the Feedback Mechanism with row wise Excel validation and targeted test case regeneration - Exposed all functionality as clean REST APIs using FastAPI Collaboration - Worked directly with client stakeholders to understand QA workflow - Coordinated with the UI team for seamless CDAP integration - Handled requirement changes and feedback iterations throughout the project lifecycle This was not a team project where I handled one piece — I built the entire AI backend and was the single point of contact for all technical decisions.

85-90%

for a 50 page document. Thats roughly reduction in effort.

95%

Coverage Analyzer ensured + requirement coverage which is hard

A team of 4-5 manual testers can now handle more projects in

60%

by around . Earlier if something was wrong testers had to rewrite

40-50%

Estimated reduction in QA effort cost per project.

Before this solution, manual testers were writing test cases by hand from requirement documents. For a 50 page document, a tester would typically spend 2 to 3 days just reading, understanding and writing test cases. After deploying this AI powered solution — ⏱ Time Reduction - Test case generation time went from 2-3 days to under 30 minutes for a 50 page document. Thats roughly 85-90% reduction in effort. 📋 Volume & Coverage - System could generate 100-150 test cases from a single document automatically which would otherwise take a tester 3-4 days manually. - Coverage Analyzer ensured 95%+ requirement coverage which is hard to guarantee in manual approach. 👥 Team Efficiency - A team of 4-5 manual testers can now handle 3x more projects in the same sprint cycle. - Freed up senior testers from repetitive writing work so they could focus on edge cases and exploratory testing. 🔄 Feedback & Iteration - Targeted regeneration through feedback mechanism reduced rework time by around 60%. Earlier if something was wrong testers had to rewrite from scratch. - Average feedback loop cycle — from upload to regenerated output was under 5 minutes. 💰 Business Impact - Estimated 40-50% reduction in QA effort cost per project. - Faster test case delivery meant UAT cycles started earlier which directly reduced overall project delivery time. Overall this solution basically replaced what a team of testers used to do in days and brought it down to minutes — with better consistency and traceability.

Honestly this project had some real challenges — 1. PDF text extraction was messy — different PDF formats behaved differently. Had to use GPT-4.1 Vision for scanned or complex layout PDFs page by page. 2. Chunk quality was a big issue initially. If chunks were too small context was lost, too big and the LLM got confused. Took some trial and error to get the right splitter settings. 3. The Re-Ranker was important — raw semantic search was giving okay results but after re-ranking the chunk relevance improved a lot for query mode. 4. Feedback mechanism was tricky — had to handle cases where user gives generic feedback vs targeted feedback on specific test case IDs. Key learnings — - RAG pipeline design is not just about embeddings, chunk quality matters a lot - Vision LLMs are very useful for document heavy enterprise use cases - Job ID based tracking is very important for async long running pipelines - Always build a feedback loop into AI systems — clients always want to refine If I had to do this again differently — - I would explore persistent vector stores like Pinecone or Chroma early on instead of FAISS, for better scalability across job IDs - Would add a document preprocessing quality check before chunking to catch bad extractions earlier in the pipeline - Would build an evaluation framework from day one to measure test case quality automatically instead of relying on manual review This project made me much stronger in designing production RAG systems and thinking about AI solutions from both an engineering and a business value perspective.

2.User_Story_with_extra_instrcutions.png

PROJECT 02 · DATA MINING AGENTS

Data Mining Agents

HTMLCSSVanilla JavaScriptWeb Speech APISSEPythonFastAPIUvicorn ASGIslowapiAzure OpenAI GPT-FastMCPJSON files with asyncio.Lock

In most organizations, accessing data from databases is a huge dependency on the data or backend team. A business analyst or product manager who wants to answer a simple question like "which customers have not transacted in the last 3 months" has to raise a request, wait for a developer to write a query, run it and share the results. This takes hours or sometimes days. The problem gets worse when the company runs multiple database systems simultaneously — MySQL for one system, PostgreSQL for another, SQL Server for something else. Getting answers that span across these databases meant even more time, more people and more confusion. Specific pain points the client had — - Non technical users had zero access to data sitting in databases - Developers were constantly interrupted to write simple SELECT queries - Querying across MySQL, PostgreSQL and SQL Server at the same time was a completely manual, slow and error prone process - There was no safety — anyone with DB access could accidentally run a destructive query and wipe data - No memory across sessions — every conversation started from scratch with zero context of what was asked before The ask was — can we build something where anyone, technical or not, just types or speaks a question in plain English and gets answers from multiple databases automatically? That is the problem I was brought in to design and solve.

My approach from day one was to build this as a production grade Agentic AI system — not a simple chatbot that writes SQL. The difference is the system thinks, plans, acts and narrates — all in real time. Here is how I designed it layer by layer — The user opens a single page web app and either types a question or speaks it using voice input. They select which database systems they want to query — MySQL, PostgreSQL, SQL Server or all three at once. Before anything touches the database or any LLM, the question goes through a two layer Safety Gate I designed. Layer 1 is a regex check — instant, zero latency — that catches obvious destructive patterns like DELETE FROM, DROP TABLE, TRUNCATE and also natural language versions like "delete all customers" or "wipe the table." Layer 2 is an LLM based intent classifier using GPT-4.1 that catches creative and indirect phrasing that regex misses — things like "nuke the inactive customers" or "flush the test entries." If either layer fires, the system hard stops, streams a warm refusal message to the user and never touches the database. This was a non negotiable design decision. If both safety layers pass, the system creates or loads a session with the last 5 Q&A turns as conversation context. This means the system remembers what was asked before and handles follow up questions naturally. Then the Narrator LLM kicks in — Phase 1, it streams its thinking out loud to the user in real time, explaining what it is about to do. This was important for user trust — they can see the AI is actually reasoning, not just returning a cold result. For Phase 2, I spawn one independent Worker Agent per selected DBMS simultaneously using asyncio concurrent tasks. Each agent gets its own MCP server subprocess, its own connection pool and operates completely independently. So querying MySQL, PostgreSQL and SQL Server happens in parallel, not one after another. Each agent has access to 21 tools registered per MCP server — from discovering tables and schemas, running SELECT queries, profiling data, exporting CSVs up to 1 million rows, to finding orphaned records via LEFT JOIN. The agent decides the tool sequence intelligently based on the question — up to 20 iterations per agent. As agents work, the Narrator LLM provides live phase 2 commentary streaming token by token — translating what the agent is doing into human readable language in real time. Phase 3 synthesises everything — the narrator compares findings across all database systems and presents a unified natural language answer. Results are also shown in tabbed panels per DBMS. Finally a voice summary is generated — 2 to 4 clean speakable sentences read aloud using Web Speech Synthesis. The entire pipeline — thinking, working, results, voice — is delivered as a real time streaming experience using Server Sent Events.

HTMLCSSVanilla JavaScriptWeb Speech APISSEPythonFastAPIUvicorn ASGIslowapiAzure OpenAI GPT-FastMCPJSON files with asyncio.LockRotatingFileHandler

Frontend : HTML5, CSS3, Vanilla JavaScript, Web Speech API, SSE → Kept frontend deliberately simple — no React or heavy framework needed since the complexity lives entirely in the backend → Web Speech API for voice input and output — no external SDK needed → Server Sent Events for real time streaming — lighter than WebSockets for this one directional streaming use case Backend : Python 3.12, FastAPI, Uvicorn ASGI (4 workers) → FastAPI chosen for native async support — critical since agents, narrator and MCP servers all run concurrently → Uvicorn with 4 workers to handle multiple simultaneous user requests Rate Limiting : slowapi → 60 requests per minute per IP — prevents abuse without building custom middleware AI / LLM : Azure OpenAI GPT-4.1 → Used for three distinct roles — Narrator, Worker Agents and Safety Intent Classifier — all on same deployment → Streaming enabled for real time token by token output to UI Agentic Layer : FastMCP (Model Context Protocol) → FastMCP chosen to build MCP servers because it gives a clean tool registration pattern and stdio subprocess transport → One MCP server subprocess per DBMS per request — full isolation Azure AI Inference SDK → Used for agent LLM calls — openai compatible, clean async interface Databases MySQL 8.0+ → aiomysql async driver, connection pool per database PostgreSQL 14+ → asyncpg async driver, fastest async PG driver available SQL Server 2019+→ aioodbc async driver → All three use async drivers — blocking DB calls would kill the concurrent agent design Session Storage : JSON files with asyncio.Lock → Chose simple file based storage over Redis or DB because sessions are lightweight, short lived (24hrs) and this avoids adding another infrastructure dependency Safety Layer : Regex + GPT-4.1 LLM Classifier → Two layers because regex alone misses creative phrasing and LLM alone adds ~400ms latency to every request → Regex handles the obvious fast, LLM handles the edge cases Retry Logic : tenacity with exponential backoff → For connection pool creation — handles flaky DB startup gracefully Logging : RotatingFileHandler (10MB per file) → Keeps logs manageable in long running production deployments

I designed and built this entire system from scratch — every layer, every architectural decision and every line of the backend. Architecture & Design - Designed the full 7 layer architecture — UI, API Gateway, Safety Gate, Session Memory, Narrator LLM, Worker Agents and MCP Servers - Made the core decision to build this as a true Agentic system with concurrent multi-DBMS agents rather than a simple SQL generator - Designed the two layer safety system — regex + LLM classifier — to prevent any destructive operations reaching the database - Designed the 3 phase narrator pattern to make AI reasoning visible and trustworthy to non technical users in real time Backend Development - Built the FastAPI application with API key auth, rate limiting, CORS, background cleanup loops and full SSE streaming - Built the Narrator LLM with maintained conversation history across all 3 phases — so narration feels natural and flowing, not repetitive - Built the Session Memory system with atomic file writes, sliding window context and 24 hour TTL cleanup - Engineered the concurrent agent execution using asyncio tasks — one independent agent per DBMS running simultaneously - Built three complete MCP servers for MySQL, PostgreSQL and SQL Server with 21 tools registered per server - Designed and implemented the profile_table tool — a complete data profiling capability covering row counts, null percentages, numeric stats, categorical distributions and data quality flags in a single call - Built the chunked streaming CSV export capable of handling 1 million+ rows without memory issues - Built the voice pipeline — voice input transcription, phase by phase speech queue and voice summary generation Quality & Safety - Implemented two layer destructive intent detection — regex for speed, LLM for creative phrasing edge cases - Built read only system design at the MCP tool level — no write operations permitted anywhere in the 21 tool registry - Added tenacity retry with exponential backoff for connection pool creation to handle infrastructure startup scenarios This was a solo architecture and engineering effort. Every design decision from concurrent agent spawning to the narrator phase pattern to the safety gate was my own thinking and implementation.

95%

That is roughly + reduction in time to get a data answer

000+ rows

Covers 57, across 7 databases and 40+ tables out of the box

100%

Two layer safety system blocked of destructive intent attempts

70%

Estimated 60 to reduction in ad hoc data request developer effort

Before this system, business analysts and product managers had zero self serve access to database data. Every data request went through a developer, averaged 4 to 8 hours turnaround and required the analyst to know which database even had the data. After deploying this AI powered Data Mining Agent — ⏱ Speed of Data Access - Query turnaround went from 4 to 8 hours (developer dependency) to under 60 seconds for most natural language questions - That is roughly 95%+ reduction in time to get a data answer - Multi DBMS queries that previously required 3 separate developer requests now complete in a single concurrent agent run 📊 Data Scale Handled - System operates across 3 database engines — MySQL, PostgreSQL, SQL Server — simultaneously in a single query - Covers 57,000+ rows across 7 databases and 40+ tables out of the box - CSV export handles up to 1 million+ rows using chunked streaming without any memory issues 👥 Team Efficiency - Non technical users — business analysts, product managers — can now self serve data queries without raising developer tickets - Developer team freed from repetitive SELECT query requests — estimated saving of 2 to 3 hours per developer per day - Concurrent agent design means querying 3 DBMS systems takes the same time as querying 1 — no additional wait per database added 🔒 Safety & Reliability - Two layer safety system blocked 100% of destructive intent attempts in testing — regex layer handles instant obvious cases, LLM layer handles indirect creative phrasing - Read only design means zero risk of accidental data modification through the system regardless of what the user types - Session memory with 5 turn sliding window context means follow up questions work naturally without user repeating themselves 📣 Voice & Accessibility - First version of this type of solution the client had seen with full voice input + voice output — opened data access to non keyboard users and executives who prefer spoken interaction - Real time streaming narration across 3 phases gave non technical users full visibility into what the AI was doing and why 💰 Business Impact - Eliminated the data request bottleneck completely for standard queries - Estimated 60 to 70% reduction in ad hoc data request developer effort - Faster data access directly improved decision making speed for business teams — questions that took a day to answer now answered in a standup meeting in real time

This was one of the most technically complex projects I have built and it had real challenges at multiple layers — 1. Concurrent Agent Isolation was harder than expected Running one agent per DBMS simultaneously sounds simple but each agent needed its own MCP subprocess, own connection pool and own event queue without any interference between them. Getting asyncio task management right so one agent timing out did not affect others took significant debugging. 2. The Narrator conversation history design was tricky The narrator needed to maintain a single flowing conversation across Phase 1, Phase 2 and Phase 3 — but Phase 2 fires multiple times per agent event. Getting the history injection right so narration stayed natural and not repetitive across all phases needed careful prompt and history design. 3. Azure content filter conflicts When raw database table data was in the narrator history, Azure content filters occasionally rejected the request. Had to build a fallback that strips raw DB data from history and retries with sanitised context — without losing the quality of narration. 4. Safety Layer 2 LLM latency The LLM intent classifier adds ~400ms to every request. Had to make a design decision — fail open on timeout so legitimate queries are never blocked by a slow safety check. This was a conscious tradeoff between safety strictness and user experience. 5. MCP stdio subprocess management FastMCP runs as a subprocess per request using stdio transport. Managing subprocess lifecycle — startup, pool init, graceful shutdown — cleanly per request without leaking processes was non trivial. 6. Export at scale The CSV export needed to handle 1 million+ rows without blowing memory. Built chunked streaming export — reads and writes in batches rather than loading everything into memory at once. Key learnings — - Agentic systems need deep thought on isolation — shared state between concurrent agents creates subtle bugs that are hard to trace - Real time narration dramatically improves user trust in AI systems — showing the reasoning process is as important as the final answer - Two layer safety is the right pattern — fast regex for obvious cases, LLM for edge cases. Never rely on just one approach - Read only system design should be enforced at the tool level, not just the prompt level — prompts can be overridden, tool design cannot If I were to build this again — - I would explore persistent vector based session memory instead of JSON files for better scalability across many concurrent users - Would add an evaluation layer that scores the quality of agent tool selection per query over time — to continuously improve the agent decision making - Would build a query analytics dashboard so the client can see which questions are asked most and where agents struggle This project pushed me deep into Agentic AI system design, MCP server architecture, concurrent async engineering and real time streaming — these are now core strengths I carry into every AI project I take on.

PROJECT 03 · TEST CASES TO SQL

Test Cases to SQL

HTMLCSSVanilla JavaScripthighlight.jsPythonFastAPIUvicornAzure OpenAI GPT-4oMySQLExcel .xlsx.xls

In any software QA team, verifying a test case against the database is one of the most repetitive and technical tasks a tester faces. The QA engineer writes a test case in JIRA — they know what the test should do, what inputs to use and what result they expect. But then comes the hard part — actually verifying that result in the database. To do that they need to — - Know which tables in the database hold the relevant data - Know how those tables are related and how to JOIN them correctly - Write a valid SQL query with the right WHERE conditions - Know the exact column names from the schema Most QA engineers are not SQL experts. So every time they needed a database verification query they had to — - Ask a developer to write it for them (wait hours or days) - Or spend significant time looking through the schema themselves and writing queries they were not confident about In large projects with 50, 100 or more JIRA test cases this problem multiplies fast. The team was spending more time chasing SQL queries than actually testing. And the JIRA test cases already had all the information needed — the test steps, the test data, the expected results were all written out clearly in a structured Excel export. No one was using that structured data intelligently. The ask was — can we take the JIRA test case Excel as input and automatically generate the correct database verification SQL queries for every test case, without the QA engineer needing to know SQL at all? That is the problem I was brought in to solve.

My approach was to build this as a structured dual model AI pipeline — not a single prompt that tries to do everything at once. The key insight was that going from a JIRA test case to a correct SQL query is actually two distinct cognitive tasks — first you need to understand the testing intent and decide what data is needed, then you need to translate that decision into actual executable SQL. Mixing both into one model call produces worse results and makes it impossible to debug. So I separated them cleanly into two models with a structured JSON contract between them. Here is how the pipeline works — The QA engineer uploads a structured Excel workbook through a simple web interface. The workbook has a Master Sheet listing all JIRA IDs and test case metadata, and one Child Sheet per JIRA ID containing the detailed test steps, test data and expected results. Step 1 — The system parses the Master Sheet to discover all JIRA IDs, then reads each corresponding Child Sheet and builds a unified mapping of every test case with its complete step by step breakdown. Step 2 — The system connects to the MySQL database and fetches the full schema — all table names, column names and data types. This schema is cached and passed as context to both AI models. There is also a Force Refresh toggle so the engineer can pull a fresh schema if the database has changed. Step 3 — Model 1, the Requirement Analyst (GPT-4o), runs per JIRA ID. It reads the test steps alongside the full MySQL schema and its job is purely to understand intent — which tables are involved, what role each table plays, how tables relate to each other through JOINs, and what WHERE conditions apply. It outputs a clean structured JSON per JIRA ID. It does not write SQL — that is not its job. Step 4 — All Model 1 outputs are saved to a requirements JSON file. This is independently useful — the QA team can download it as documentation or use it for audit purposes. Step 5 — Model 2, the SQL Generator (GPT-4o), takes the structured requirements JSON from Model 1 along with the MySQL schema and generates a valid production ready MySQL SELECT query per JIRA ID. Because Model 1 already did the thinking — it just needs to write correct SQL from a clean specification. The final output is a single .sql file with one query per JIRA test case, each clearly labelled with the JIRA ID. The QA engineer downloads it and runs it directly against the database to fetch the exact records they need to verify their test cases. The UI shows live pipeline progress across all 5 steps with per-JIRA sub-progress during the AI steps — so the engineer always knows exactly where the pipeline is.

HTMLCSSVanilla JavaScripthighlight.jsPythonFastAPIUvicornAzure OpenAI GPT-4oMySQLExcel .xlsx.xls

Frontend : HTML5, CSS3, Vanilla JavaScript, highlight.js → Kept frontend simple intentionally — the complexity is in the AI pipeline, not the UI → highlight.js used for SQL syntax highlighting in the results tab — makes the output immediately readable for QA engineers → HTTP Polling (every 1500ms) chosen over SSE for the pipeline status updates — better compatibility across firewalls and corporate proxies that block long lived connections Backend : Python, FastAPI, Uvicorn → FastAPI for async pipeline execution — each pipeline step runs as an async background job so the UI can poll freely without blocking AI Models : Azure OpenAI GPT-4o (both Model 1 and Model 2) → Same model deployment used for both roles — but with completely different system prompts and responsibilities → GPT-4o chosen specifically because schema grounding requires strong instruction following — the model must use exact table and column names from the schema, not hallucinate them Dual Model Design Model 1 — Requirement Analyst → Outputs structured JSON — acts as a clean contract between intent understanding and SQL generation → Keeping this step separate means the JSON is independently inspectable, downloadable and debuggable Model 2 — SQL Generator → Receives clean structured input from Model 1 — does not need to interpret natural language, just translate a specification into SQL → This separation dramatically reduces hallucination risk compared to a single model trying to do both Database : MySQL (read-only during pipeline) → Schema fetched once and cached — avoids repeated DB calls per JIRA ID during the AI steps → Force Refresh toggle added for teams whose schema changes frequently during active development sprints Input Format : Excel .xlsx / .xls (openpyxl / xlrd) → QA teams already export JIRA test cases to Excel natively — zero change to their existing workflow Output Formats .sql file → ready to run MySQL SELECT queries, one per JIRA ID .json file → structured requirements for documentation and audit

I designed and built this solution end to end — from the pipeline architecture to the dual model design to the full web application. Architecture & Design - Designed the 5 step pipeline architecture and the dual model separation — this was the core intellectual decision of the project - Made the key design call to split understanding and SQL generation into two separate model calls with a JSON contract between them rather than one combined prompt — this produced significantly better and more debuggable results - Decided on HTTP polling over SSE for pipeline status — better fit for the client's corporate network environment - Designed the Master Sheet to Child Sheet relationship parsing to handle variable numbers of JIRA IDs without any hardcoding Backend Development - Built the FastAPI application with file upload handling, async background job execution and all pipeline status endpoints - Built the Excel parser — master sheet discovery, child sheet mapping per JIRA ID and unified test step extraction - Integrated MySQL schema fetching with local caching and Force Refresh capability - Engineered Model 1 — Requirement Analyst — with schema grounded prompting to produce structured JSON covering tables, joins, WHERE conditions and SQL operation types per JIRA ID - Engineered Model 2 — SQL Generator — to consume Model 1 JSON and produce valid production ready MySQL SELECT queries using exact schema column and table names - Built the .sql file assembler — one query block per JIRA ID, clearly commented and separated for easy extraction - Built the requirements JSON persistence layer for documentation and audit download Frontend Development - Built the full single page web application with live 5 step pipeline status cards, per-JIRA sub-progress bars and overall pipeline progress tracking - Built the 3 tab results view — JIRA Mapping, AI Requirements and SQL Output — each searchable by JIRA ID - Integrated highlight.js for SQL syntax highlighting in the output tab - Built the polling mechanism with failure detection — warns after 3 failures, aborts cleanly after 15 consecutive failures This was a solo end to end project. The dual model architecture, the pipeline design and the schema grounding approach were all my own design decisions — and they are what make this solution reliable rather than just a prompt wrapper.

95%

End to end time reduction of approximately per test case

80%

Estimated 70 to reduction in QA preparation time per release

Before this solution, QA engineers had zero self-serve way to get database verification queries for their test cases. Every query had to go through a developer — average wait time was 3 to 6 hours per request, longer during sprint crunch periods. In a project with 50 JIRA test cases that meant potentially 150 to 300 hours of wait time just to get SQL queries written — before any actual database testing had even started. After deploying this AI pipeline — ⏱ Speed of Query Generation - Full pipeline for 50 JIRA test cases completes in under 10 minutes from Excel upload to downloadable .sql file - What previously took 3 to 6 hours per query now takes roughly 10 to 15 seconds per JIRA ID inside the pipeline - End to end time reduction of approximately 95% per test case 📋 Volume & Scale - System processes unlimited JIRA test cases in a single pipeline run — one Excel upload generates the complete .sql file for the entire test suite in one go - Each query is schema grounded — uses exact table names and column names pulled from the live MySQL database, not guessed or approximated 👥 Team Efficiency - QA engineers became fully self-serve — zero developer dependency for database verification queries - Developer team freed from repetitive SQL writing requests — estimated saving of 2 to 4 hours per developer per sprint - QA team could now start database verification testing in parallel with feature development instead of waiting until the end 🔍 Quality & Accuracy - Dual model design — intent understanding separated from SQL generation — produced consistently accurate schema-grounded queries - Model 1 requirements JSON gave QA engineers full visibility into which tables and conditions the AI identified — they could verify the reasoning before running the SQL - Requirements JSON also became a QA documentation artifact — teams started using it as a structured test case data mapping document independent of the SQL output 📁 Audit & Documentation - Every pipeline run produces a downloadable requirements .json — structured record of what data each test case needs and why - This became valuable for audit trails in regulated projects where test case to data mapping needs to be documented 💰 Business Impact - Eliminated the SQL writing bottleneck in the QA cycle completely - Estimated 70 to 80% reduction in QA preparation time per release - Faster test verification meant bugs were caught and reported earlier in the sprint — reducing late stage rework cost - QA team capacity effectively increased without adding headcount — same team could cover significantly more test cases per sprint

This project looked straightforward on the surface but had some real engineering and AI challenges underneath — 1. Schema grounding was the hardest part The biggest risk in SQL generation from natural language is hallucination — the model inventing table or column names that do not exist in the schema. Injecting the full MySQL schema as context into both model calls and explicitly instructing the models to only use names from that schema was critical. Even then, prompt engineering for Model 2 needed several iterations to get consistently correct JOIN syntax and WHERE conditions. 2. Why one model was not enough Initially tried a single model call to go from test steps directly to SQL. The results were inconsistent — sometimes good, sometimes wrong tables or conditions. Splitting into two models with a JSON contract between them fixed this. Model 1 does the thinking, Model 2 does the writing — and each does its job better because it is focused. 3. Excel structure variability Real world JIRA Excel exports are not always clean. Child sheet names sometimes had trailing spaces, different capitalisation or slight naming mismatches compared to the JIRA ID in the master sheet. Had to build flexible sheet name matching rather than exact string comparison. 4. Polling vs SSE decision Had initially designed with SSE for pipeline updates like my other projects. Switched to polling because the client environment had corporate proxies that terminated long lived HTTP connections. The 1500ms polling interval with failure detection and abort logic was the right call for their infra. 5. Per-JIRA sub-progress display Steps 3 and 5 run per JIRA ID — for 50 test cases this takes meaningful time. Without sub-progress feedback the UI looked stuck. Building the per-JIRA counter into the status endpoint was important for user confidence in the pipeline. Key learnings — - Dual model pipelines with a structured contract between them are significantly more reliable than single prompt approaches for complex generation tasks — separation of concerns applies to AI design just like software design - Schema grounding needs to be enforced at the prompt level — never assume the model will stay within the schema on its own - The intermediate output (requirements JSON) being independently useful was a design bonus that became a real product feature — always think about what value the intermediate steps produce - Infrastructure constraints should inform transport choices early — SSE vs polling is not just a technical preference, it is an environment compatibility decision If I were to build this again — - Would add an automatic SQL validation step after Model 2 — run a dry EXPLAIN on each generated query against the live DB to catch any invalid table or column references before delivery - Would explore fine tuning a smaller model specifically on schema grounded SQL generation to reduce cost per JIRA ID at scale — GPT-4o is accurate but expensive for large test suites - Would add a confidence score per generated query so QA engineers know which queries need manual review before running This project deepened my understanding of multi-step AI pipeline design, schema grounded generation, structured JSON contracts between model stages and how to build AI tools that fit naturally into existing team workflows without disrupting them.