Lead Data Engineer · DataOps · MLOps · GenAI

Data Platforms.
MLOps. GenAI.

14+ years designing and operating large-scale data platforms, real-time streaming systems, and production-grade ML + GenAI pipelines across Azure and AWS.

View Case Studies LinkedIn Email Me
🏗️
Platform Architecture
Design and build scalable Lakehouse platforms on Azure Databricks — from ingestion to AI-ready Gold layer.
🤖
ML & GenAI in Production
Ship ML models and LLM pipelines that stay accurate — Feature Store, drift monitoring, champion-challenger rollout.
🔐
Governance & Compliance
GDPR-ready pipelines with Data Contracts, Unity Catalog, PII masking, lineage, and audit trails built in from day one.
🚀
CI/CD & Platform Reliability
Automated deployment with Databricks Asset Bundles, quality gates, multi-environment promotion — zero manual clicks in prod.
🏆
Databricks Certified
Data Engineer Professional
☁️
Microsoft Certified
Azure Solutions Architect Expert
⚙️
Microsoft Certified
Azure DevOps Engineer Expert

What I Do
Core Expertise

End-to-end platform engineering — from raw ingestion to intelligent AI-powered decision systems.

Data Engineering & Streaming

Real-time event streaming, Medallion Architecture (Bronze/Silver/Gold), Delta Lake, Auto Loader, DLT pipelines at enterprise scale across regulated financial services environments.

Apache Kafka Apache Spark Delta Lake Auto Loader DLT
🤖
MLOps & Model Lifecycle

End-to-end model lifecycle — Feature Store, experiment tracking, model registry, champion-challenger deployment, drift monitoring, SHAP explainability.

MLflow Feature Store XGBoost SHAP Evidently
🧠
GenAI & RAG Pipelines

Production LLM pipelines, Retrieval-Augmented Generation (RAG), Vector Search, semantic search, conversational AI — grounded on enterprise data.

RAG Vector Search LLM Integration Llama 3.1
🔐
Governance & Compliance

Unity Catalog governance, Data Contracts framework, PII masking, row-level security, GDPR-compliant pipelines, full data lineage.

Unity Catalog Data Contracts GDPR Data Lineage
🚀
CI/CD & Platform Engineering

Azure DevOps pipelines, Databricks Asset Bundles, automated testing with Great Expectations, multi-environment promotion (Dev → Staging → Prod).

Azure DevOps DAB GitHub Actions Terraform
☁️
Cloud Platforms

Deep Azure expertise (Databricks, ADLS, Event Grid, Key Vault, ADF) and AWS (EMR, S3, Glue). Cloud-native architecture for regulated enterprise environments.

Microsoft Azure AWS Databricks ADLS Gen2

Real Work
Case Studies

Production platforms built and operated at scale in regulated enterprise environments.

MLOps · GenAI · Streaming
AI-Powered Intelligent Decisioning Platform

End-to-end platform transforming unstructured documents into structured risk profiles. XGBoost risk scoring, LLM-based document extraction, RAG-powered conversational interface, SHAP explainability for every decision — all tracked in MLflow.

0.95+
ROC-AUC
<200ms
Serving latency
RAG
Grounded answers
Read case study →
Data Contracts · Governance
Data Contract Validation Framework

Designed and implemented a YAML-based Data Contract framework enforcing schema, business rules, SLAs, and ownership pre-processing across diverse enterprise data sources — fully integrated with Unity Catalog and Great Expectations.

↓↓
Incidents reduced
Multi
Sources governed
GDPR
Compliant
Read case study →
GenAI · Vector Search · RAG
Enterprise Semantic Search & RAG System

Built intelligent semantic search on a Databricks data platform using Vector Search and RAG architecture — enabling business users to retrieve relevant enterprise data through natural language queries with grounded, accurate answers.

NL
Query interface
Zero
Hallucination risk
Ent.
Scale deployed
Read case study →

Career
Experience

14+ years delivering enterprise data platforms across regulated financial services, banking, and capital markets — across Europe, Asia-Pacific, and India.

Lead Data Engineer
Nov 2021 – Mar 2026
Enterprise Data & AI Platform · Regulated Financial Services · Netherlands
Azure Databricks MLOps GenAI GDPR Unity Catalog
  • Owned end-to-end Azure Databricks platform — large-scale production pipelines across multiple data domains, high-volume ingestion from diverse source systems, serving multiple downstream consumers.
  • Delivered AI-powered intelligent decisioning platform — LLM document extraction, XGBoost risk scoring, RAG conversational interface, SHAP explainability.
  • Built semantic search capability using Generative AI and RAG architecture — enabling natural language queries on enterprise data.
  • Designed Data Contract validation framework — significantly reducing production incidents across high-volume transactional workflows.
  • Implemented Databricks Feature Store, Model Serving Endpoints with champion-challenger traffic splitting, and centralised MLflow model registry.
  • Engineered GDPR-compliant pipelines on Unity Catalog — PII masking, row-level security, data lineage, and retention policies across the full platform.
  • Delivered CI/CD pipelines in Azure DevOps using Databricks Asset Bundles — automated testing and multi-environment promotion.
DataOps Engineer
Nov 2018 – Oct 2021
Cloud Data Engineering · Banking & Capital Markets · Malaysia
AWS PySpark Cloud Migration ETL Optimisation
  • Designed cloud-native data pipelines on AWS supporting large-scale ingestion from diverse transactional and third-party source systems.
  • Engineered and optimised high-volume PySpark ETL pipelines — delivering measurable improvements in throughput and processing latency.
  • Led cloud migration programme modernising legacy on-premises data systems onto AWS — reducing infrastructure costs and operational complexity.
Senior ETL Engineer
Jun 2012 – Sep 2018
Enterprise ETL & DataOps · Investment Banking · India
ETL Data Warehousing Batch Automation Performance Tuning
  • Developed and maintained enterprise ETL systems supporting critical financial data workloads across trading, risk, and reporting domains.
  • Led ETL performance optimisation initiatives — improving query execution times and pipeline throughput for high-volume batch processes.
  • Automated batch workflows and operational monitoring, establishing DataOps-style engineering practices across the delivery team.

Technologies
Tech Stack

Tools and platforms used in production across 13+ years.

Data Engineering
Azure Databricks Apache Spark PySpark Delta Lake Delta Live Tables Auto Loader Apache Kafka Azure Data Factory
MLOps & AI
MLflow Feature Store XGBoost SHAP Evidently Vector Search RAG Llama 3.1
Cloud & DevOps
Microsoft Azure AWS Azure DevOps GitHub Actions Databricks Asset Bundles Terraform Docker
Governance & Quality
Unity Catalog Data Contracts Great Expectations GDPR Compliance Data Lineage PII Masking
Analytics
Databricks SQL Power BI Databricks Dashboards
Programming
Python SQL PySpark YAML Shell

Get In Touch
Contact

Interested in collaborating on data platform challenges, architecture discussions, or knowledge sharing.

✉️
Email
shrikantagrawal61@gmail.com
💼
LinkedIn
linkedin.com/in/shrikant-agrawal
🌐
Website
www.dataxflows.com