Real-Time Fraud Detection | Shrikant Agrawal

Architecture Overview

Figure: Near real-time fraud detection architecture using Kafka, Spark, Airflow, MLflow, and CI/CD.

Event Ingestion: Financial transactions are produced to Kafka topics in real time.
Streaming Inference: Spark Structured Streaming continuously consumes events and performs fraud prediction using the latest production ML model.
Model Management: Models are versioned and promoted using MLflow Model Registry (Staging → Production).
Orchestration: Apache Airflow schedules and controls model training, validation, and promotion workflows.
CI/CD Automation: GitHub Actions builds Docker images, runs tests, and deploys updated inference services.
Monitoring & Feedback: Prometheus metrics and confirmed fraud labels feed back into the system for retraining and continuous improvement.

Detailed case studies, performance metrics, and implementation code will be linked here.