Skip to content
View pranampagi's full-sized avatar
🎯
Focusing
🎯
Focusing

Block or report pranampagi

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
pranampagi/README.md

Hi there, I'm Pranam Pagi πŸ‘‹

Data Engineer | Data Scientist | Big Data Enthusiast

I'm passionate about building scalable data pipelines, real-time streaming systems, and cloud-native data engineering solutions. I enjoy working with distributed systems, big data technologies, and machine learning workflows that transform raw data into actionable insights.


πŸš€ About Me

  • πŸ”­ Currently focused on Data Engineering & Real-Time Analytics

  • 🌱 Exploring advanced concepts in Distributed Data Processing & Cloud Platforms

  • πŸ’‘ Interested in:

    • Apache Spark & PySpark
    • Kafka Streaming Architectures
    • Google Cloud Platform (GCP)
    • Data Pipelines & ETL Systems
    • Machine Learning Engineering
    • Real-Time Analytics
  • ⚑ Enjoy solving engineering problems involving scalability, automation, and streaming data


πŸ› οΈ Tech Stack

Data Engineering

Apache Spark PySpark Kafka SQL Hadoop

Cloud & DevOps

Google Cloud Docker Git

Programming & ML

Python Pandas Scikit Learn NumPy


πŸ“Œ Featured Projects

πŸš† Real-Time Train Analytics Pipeline

Built a Kafka-based streaming architecture on Google Cloud Platform to process live train data.

Highlights

  • Kafka Producer running on Google Cloud Functions
  • Kafka Consumer running on Dataproc Cluster
  • Implemented 20-minute rolling window analytics
  • Generated platform occupancy insights for station management
  • Built preprocessing pipelines for timestamp handling, missing values, and event-time sorting

Tech Used: Kafka, PySpark, GCP, Dataproc, Cloud Functions


πŸ“· Real-Time Image Classification with Spark Streaming

Converted a batch image classification workflow into a real-time streaming architecture.

Highlights

  • Streaming-based image ingestion pipeline
  • Real-time prediction workflows using Spark Streaming
  • Optimized distributed processing for scalable inference
  • Integrated machine learning pipelines with streaming systems

Tech Used: Spark Streaming, Python, Machine Learning


πŸ“ˆ Crypto Data Pipeline Project

Designed and implemented a data engineering pipeline for cryptocurrency data processing and analytics.

Highlights

  • Automated ingestion and transformation workflows
  • Data preprocessing and cleaning pipelines
  • Structured analytical datasets for downstream ML and reporting
  • Built scalable ETL processes for handling financial datasets

Tech Used: Python, SQL, PySpark, Data Pipelines


πŸ“Š GitHub Stats

GitHub Stats

Top Languages


🎯 Current Goals

  • Building production-grade data engineering projects
  • Deepening expertise in streaming architectures
  • Learning scalable cloud-native data platforms
  • Contributing to impactful open-source projects
  • Exploring modern data lake and lakehouse architectures

🀝 Connect With Me


πŸ’­ Quote I Believe In

β€œData is the new infrastructure β€” engineering makes it useful.”

Pinned Loading

  1. crypto_pipeline crypto_pipeline Public

    End-to-end data engineering pipeline ingesting live crypto market data from the CoinGecko API, processing it via PySpark in a Medallion architecture, and serving analytics through a Snowflake Star …

    Python

  2. etl_ml_pipeline etl_ml_pipeline Public

    Production-ready Databricks ML pipeline implementing Medallion Architecture with Unity Catalog and Delta Lake. Ingests, cleans, and features the UCI Bank Marketing dataset to train a Random Forest …

    Python

  3. Grocery-Store-V2 Grocery-Store-V2 Public

    A premium, high-performance e-commerce grocery store application. Built with a Flask API (Python 3.14) and a Vue 3 frontend, featuring a glassmorphism UI, database query optimizations, Redis cachin…

    Vue

  4. Book-Library-API Book-Library-API Public

    A full-stack Book Library application with a FastAPI backend (SQLAlchemy, SQLite, MongoDB logging) and a Vue 3 frontend (Vite, Vue Router, Bootstrap 5). Features secure JWT authentication, CRUD boo…

    Python

  5. Lekhan-AI Lekhan-AI Public

    Streamline bureaucratic workflows with Lekhan-AI. Automatically extract summaries and categorize circulars, memos, and notifications using local NLP models.

    Vue

  6. Scholarly Scholarly Public

    πŸ“š Scholarly: A premium research tracker. Effortlessly manage academic resources, track study progress, and visualize your library with dynamic filtering and real-time dashboards.

    Vue