I'm passionate about building scalable data pipelines, real-time streaming systems, and cloud-native data engineering solutions. I enjoy working with distributed systems, big data technologies, and machine learning workflows that transform raw data into actionable insights.
-
π Currently focused on Data Engineering & Real-Time Analytics
-
π± Exploring advanced concepts in Distributed Data Processing & Cloud Platforms
-
π‘ Interested in:
- Apache Spark & PySpark
- Kafka Streaming Architectures
- Google Cloud Platform (GCP)
- Data Pipelines & ETL Systems
- Machine Learning Engineering
- Real-Time Analytics
-
β‘ Enjoy solving engineering problems involving scalability, automation, and streaming data
Built a Kafka-based streaming architecture on Google Cloud Platform to process live train data.
- Kafka Producer running on Google Cloud Functions
- Kafka Consumer running on Dataproc Cluster
- Implemented 20-minute rolling window analytics
- Generated platform occupancy insights for station management
- Built preprocessing pipelines for timestamp handling, missing values, and event-time sorting
Tech Used: Kafka, PySpark, GCP, Dataproc, Cloud Functions
Converted a batch image classification workflow into a real-time streaming architecture.
- Streaming-based image ingestion pipeline
- Real-time prediction workflows using Spark Streaming
- Optimized distributed processing for scalable inference
- Integrated machine learning pipelines with streaming systems
Tech Used: Spark Streaming, Python, Machine Learning
Designed and implemented a data engineering pipeline for cryptocurrency data processing and analytics.
- Automated ingestion and transformation workflows
- Data preprocessing and cleaning pipelines
- Structured analytical datasets for downstream ML and reporting
- Built scalable ETL processes for handling financial datasets
Tech Used: Python, SQL, PySpark, Data Pipelines
- Building production-grade data engineering projects
- Deepening expertise in streaming architectures
- Learning scalable cloud-native data platforms
- Contributing to impactful open-source projects
- Exploring modern data lake and lakehouse architectures
- πΌ GitHub: https://github.com/pranampagi
- π« Open to collaboration on Data Engineering & Big Data projects
βData is the new infrastructure β engineering makes it useful.β