Skip to content
View subhayu99's full-sized avatar
:octocat:
Getting into Open Source
:octocat:
Getting into Open Source

Organizations

@dscciem @CodeChef-CIEM @givemyresume

Block or report subhayu99

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
subhayu99/README.md
SKB

Subhayu Kumar Bala

Data & Infrastructure Engineer · 4 YoE · Currently at Loop AI

LinkedIn Portfolio PyPI Email


I work on data platforms, AI/LLM pipelines, and the kind of performance problems where you stare at query plans for hours. Most of my work has been in consulting — shipping production systems for clients across healthcare, finance, logistics, and food-tech.

I write open-source tools when I find myself solving the same problem twice — 43,000+ PyPI downloads across 6 packages so far. I also have a published research paper on quantum computing simulation from my undergrad days.

When I'm not debugging ingestion pipelines, I'm probably over-engineering my portfolio website.


Tools & technologies I've worked with

Data Engineering
SQL Databricks BigQuery MS Fabric PySpark DuckDB dbt Airflow Kafka ADF Delta Lake Pandas Presto Power BI

AI & LLM Ops
OpenAI Gemini LangChain HuggingFace RAG MCP A2A Fine-tuning ChromaDB Qdrant Neo4j

Backend, DevOps & Cloud
Python Bash FastAPI PostgreSQL MongoDB AWS Azure GCP Docker Kubernetes Terraform Git CI/CD



Pinned Loading

  1. sqlstream sqlstream Public

    A lightweight, pure-Python SQL query engine for CSV, Parquet, JSON, JSONL, HTML, and Markdown files with lazy evaluation and intelligent optimizations.

    Python 1

  2. datasetpipeline datasetpipeline Public

    A data processing and analysis pipeline designed to handle various jobs related to data transformation, quality assessment, deduplication, and formatting.

    Python 1

  3. DocumentAccessPOC DocumentAccessPOC Public

    A secure document sharing PoC where even admins can't access user files, built on FastAPI with strong cryptographic controls.

    Python 1

  4. smart-commit smart-commit Public

    An AI-powered git commit message generator with repository context awareness, built with Python and Typer.

    Python 2

  5. creatree creatree Public

    A Python package and CLI tool for creating directory structures from a tree-like string.

    Python 3

  6. BetterPassphrase BetterPassphrase Public

    A Python library to generate secure, meaningful passphrases.

    Python 1