Pranam Pagi pranampagi

Hi there, I'm Pranam Pagi 👋

Data Engineer | Data Scientist | Big Data Enthusiast

I'm passionate about building scalable data pipelines, real-time streaming systems, and cloud-native data engineering solutions. I enjoy working with distributed systems, big data technologies, and machine learning workflows that transform raw data into actionable insights.

🚀 About Me

🔭 Currently focused on Data Engineering & Real-Time Analytics
🌱 Exploring advanced concepts in Distributed Data Processing & Cloud Platforms
💡 Interested in:
- Apache Spark & PySpark
- Kafka Streaming Architectures
- Google Cloud Platform (GCP)
- Data Pipelines & ETL Systems
- Machine Learning Engineering
- Real-Time Analytics
⚡ Enjoy solving engineering problems involving scalability, automation, and streaming data

🛠️ Tech Stack

Data Engineering

Cloud & DevOps

Programming & ML

📌 Featured Projects

🚆 Real-Time Train Analytics Pipeline

Built a Kafka-based streaming architecture on Google Cloud Platform to process live train data.

Highlights

Kafka Producer running on Google Cloud Functions
Kafka Consumer running on Dataproc Cluster
Implemented 20-minute rolling window analytics
Generated platform occupancy insights for station management
Built preprocessing pipelines for timestamp handling, missing values, and event-time sorting

Tech Used: Kafka, PySpark, GCP, Dataproc, Cloud Functions

📷 Real-Time Image Classification with Spark Streaming

Converted a batch image classification workflow into a real-time streaming architecture.

Highlights

Streaming-based image ingestion pipeline
Real-time prediction workflows using Spark Streaming
Optimized distributed processing for scalable inference
Integrated machine learning pipelines with streaming systems

Tech Used: Spark Streaming, Python, Machine Learning

📈 Crypto Data Pipeline Project

Designed and implemented a data engineering pipeline for cryptocurrency data processing and analytics.

Highlights

Automated ingestion and transformation workflows
Data preprocessing and cleaning pipelines
Structured analytical datasets for downstream ML and reporting
Built scalable ETL processes for handling financial datasets

Tech Used: Python, SQL, PySpark, Data Pipelines

📊 GitHub Stats

🎯 Current Goals

Building production-grade data engineering projects
Deepening expertise in streaming architectures
Learning scalable cloud-native data platforms
Contributing to impactful open-source projects
Exploring modern data lake and lakehouse architectures

🤝 Connect With Me

💼 GitHub: https://github.com/pranampagi
📫 Open to collaboration on Data Engineering & Big Data projects

💭 Quote I Believe In

“Data is the new infrastructure — engineering makes it useful.”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pranam Pagi pranampagi

Block or report pranampagi

Hi there, I'm Pranam Pagi 👋

Data Engineer | Data Scientist | Big Data Enthusiast

🚀 About Me

🛠️ Tech Stack

Data Engineering

Cloud & DevOps

Programming & ML

📌 Featured Projects

🚆 Real-Time Train Analytics Pipeline

Highlights

📷 Real-Time Image Classification with Spark Streaming

Highlights

📈 Crypto Data Pipeline Project

Highlights

📊 GitHub Stats

🎯 Current Goals

🤝 Connect With Me

💭 Quote I Believe In

Pinned Loading

Uh oh!