Business Problem
Manual document processing was creating bottlenecks — assessors spending hours extracting data from unstructured documents, inconsistent decisions due to human bias, and no explainability for rejected cases.
The platform needed to automate extraction, compute risk scores in real time, and give every decision a clear, auditable explanation for compliance teams.
Architecture — End-to-End Flow
① Source Systems
📡
Kafka
High-throughput events
Confluent · SASL_SSL
Confluent · SASL_SSL
📁
Blob Storage
ADLS Gen2 · Parquet
Event Grid notify
Event Grid notify
🌐
REST API
OAuth2 · Paginated
Incremental pull
Incremental pull
🗄️
Synapse
JDBC connector
updated_at watermark
updated_at watermark
📨
Storage Queue
Azure Function trigger
Async · event-driven
Async · event-driven
↘ ↓ ↓ ↓ ↙
② Schema Registry + Data Contracts Gate
📜 Confluent Schema Registry
Avro/Protobuf · BACKWARD compatschema_id in every message
📋 Data Contracts (YAML)
Owner · SLA · field expectationsUnity Catalog tags enforced
✅ Great Expectations Gate
Schema validation pre-ingestFAIL → DLQ + alert
↓ validated data
③ Bronze Layer
bronze_transactions · bronze_documents
✦ raw_value STRING (exact JSON)
✦ Append-only — never modify
✦ CDF enabled · Snappy · 30d retention
✦ Legal audit trail · Delta ACID
↓ Delta CDF — new rows only
④ Silver Layer
silver_transactions · silver_profiles
🔍 Parse
from_json()
typed columns
from_json()
typed columns
→
✅ Validate
null checks
foreachBatch
null checks
foreachBatch
→
⏱️ Watermark
2hr late events
no state leak
2hr late events
no state leak
→
🔁 Deduplicate
MERGE INTO
exactly-once
MERGE INTO
exactly-once
→
🤖 LLM Enrich
ai_extract()
structured fields
ai_extract()
structured fields
❌ Invalid → DLQ Table
✅ Valid → Silver Delta
🔐 PII masked → Unity Catalog
↓ clean validated rows
⑤ Feature Store — Databricks
🔬 Feature Engineering
debt_ratio · tx_velocity_30dcountry_risk · avg_balance_90d
🗃️ Feature Tables
Versioned + time-awarePoint-in-time correct joins
🔗 No Training-Serving Skew
Same features train + servefeature_store.log_model()
↓ pre-computed features
⑥ ML + AI Intelligence Layer
⑥a XGBoost Model
✦ Risk score 0–100
✦ ROC-AUC 0.95+
✦ SHAP per prediction
✦ MLflow tracked
✦ ROC-AUC 0.95+
✦ SHAP per prediction
✦ MLflow tracked
predict_proba(features)
→ risk_score: 73.4
→ risk_score: 73.4
⑥b LLM — Llama 3.1 405B
✦ ai_extract() structured
✦ ai_classify() risk category
✦ ai_query() recommendation
✦ No external API key
✦ ai_classify() risk category
✦ ai_query() recommendation
✦ No external API key
ai_query('llama-3.1-405b')
→ "APPROVE — low risk"
→ "APPROVE — low risk"
⑥c Vector Search + RAG
✦ Silver docs → embeddings
✦ Cosine similarity retrieval
✦ Context → LLM prompt
✦ Grounded — no hallucination
✦ Cosine similarity retrieval
✦ Context → LLM prompt
✦ Grounded — no hallucination
similarity_search(q, k=5)
→ inject top-k context
→ inject top-k context
⑥d MLflow Tracking
✦ Every run logged
✦ SHAP artifact per run
✦ Registry: Staging→Prod
✦ Drift → auto-retrain
✦ SHAP artifact per run
✦ Registry: Staging→Prod
✦ Drift → auto-retrain
mlflow.start_run()
mlflow.log_metrics(r)
mlflow.log_metrics(r)
↓ score + recommendation + RAG context
⑦ Gold Layer
gold_decisions · gold_risk_profiles · gold_pipeline_health
Decision Fields
risk_score ← XGBoostrecommendation ← LLM
decision → APPROVE / REVIEW / REJECT
shap_reason · rag_context
Decision Logic
score < 40 → ✅ APPROVEscore 40–70 → 👁️ REVIEW
score > 70 → ❌ REJECT
OPTIMIZE daily · ZORDER · VACUUM weekly
↓ business-ready decisions
⑧ Model Serving Endpoint
🚀 REST Endpoint
POST /invocationsP99 latency <200ms · auto-scale
🔄 Champion-Challenger
90% champion / 10% challengerShadow mode · canary rollout
📊 Inference Logging
Input + output loggedGround truth feedback loop
↓ scored results
⑨ Consumption
🖥️
Databricks SQL
Live dashboard
Pipeline health
Pipeline health
📊
Power BI
DirectQuery
KPI dashboards
KPI dashboards
💬
Chat Interface
Natural language
Text-to-SQL + RAG
Text-to-SQL + RAG
🔌
REST API
External CRM
Real-time scoring
Real-time scoring
Cross-Cutting — All Layers
🔐 Unity Catalog
Column masking · Row securityAudit logs · PII tags · Lineage
🚀 CI/CD · DAB
Azure DevOps · pytestDev → Staging → Prod
📡 Monitoring
Consumer lag · DLQ alertsModel drift · PagerDuty
🗓️ Databricks Workflows
Orchestrates all tasksOPTIMIZE · VACUUM · Retrain
⚖️ GDPR Compliance
PII masking · RetentionLineage auditable · Right to erase
How It Works — Step by Step
1
Multi-Source Ingestion
5 source systems — Kafka (high-throughput streaming), ADLS Blob, REST APIs, Synapse JDBC, Azure Storage Queue. Each with dedicated ingestion pattern (Structured Streaming, Auto Loader, Python job, Synapse Connector, Azure Function trigger).
2
Schema Registry + Data Contracts Gate
Confluent Schema Registry enforces Avro schema compatibility. YAML Data Contracts define owner, SLA, field-level expectations. Great Expectations validates at boundary — contract breach blocks data and triggers alert.
3
Bronze → Silver Medallion Processing
Bronze stores raw immutable data with CDF enabled. Silver parses, validates, deduplicates via MERGE, applies 2-hour watermarks, and enriches with LLM extraction (ai_extract) — invalid records routed to DLQ.
4
Feature Store — Training/Serving Parity
Databricks Feature Store computes and stores features (debt_ratio, tx_velocity_30d, country_risk). Same feature logic used in training and real-time serving — eliminates training-serving skew entirely.
5
XGBoost + LLM + Vector Search
XGBoost computes risk score 0–100 with SHAP explanation per prediction. LLM (Llama 3.1 405B via ai_query) generates recommendation text. Vector Search retrieves relevant policy documents — injected as RAG context into LLM for grounded answers.
6
Model Serving + Champion-Challenger
Databricks Model Serving Endpoint exposes scoring as REST API (<200ms P99). Champion 90% / Challenger 10% traffic split. New model proven in shadow mode before full rollout. Every prediction logged for drift detection.
7
Gold Layer + Consumption
Gold tables store final decisions with risk score, LLM recommendation, SHAP reason. Consumed by Databricks SQL dashboards, Power BI (DirectQuery), conversational chat interface, and external REST API for CRM integration.
MLOps Practices
- Experiment tracking: Every training run logged — params, metrics, SHAP plots, LLM prompt versions
- Model registry: Staging → Production promotion with evaluation gate (AUC must exceed champion)
- Champion-Challenger: 10% traffic to new model, 2-week comparison before full rollout
- Drift monitoring: Evidently weekly report — auto-retrain triggered if drift exceeds 10%
- Feature Store: Eliminates training-serving skew — same features at training and inference time
- SHAP explainability: Every decision auditable — feature contribution per prediction logged to MLflow
Governance & Compliance
- Unity Catalog: Column masking on PII fields, row-level security per business unit
- GDPR compliance: Retention policies on all tables, right-to-erasure via VACUUM, PII tagged
- Data lineage: Automatic end-to-end lineage — source to Gold — auditable in Unity Catalog
- Data Contracts: Schema evolution controlled, breaking changes blocked at ingestion boundary
- Audit logs: system.access.audit captures every read/write — full compliance trail