Gopala Krishna Abba igopalakrishna

Gopala Krishna Abba

Data & AI Systems Engineer

Data Engineering · Applied ML · Search & Ranking · Backend Systems

M.S. Computer Engineering, NYU Tandon — May 2026 · 4.0 GPA

View Portfolio · LinkedIn · Repositories · Résumé · Email

Recruiter Quick View

Best aligned with early-career Data Engineering, Applied Data Science, Search & Ranking, ML/Data Platform, and backend data systems roles.
Current research: Graduate Research Assistant at NYU Chunara Lab since January 2026, building reproducible news-data ingestion and structured LLM-labeling workflows.
Most recent industry experience: Data Science Intern — Emerging Technology at FOX Corporation / FOX Tech, February–April 2026.
Core differentiator: connecting data pipelines, retrieval and ranking, model evaluation, typed APIs, testing, and deployment into measurable end-to-end systems.

Selected Impact

4.0 / 4.0 _{NYU Tandon M.S. GPA}	12,168 profiles _{hybrid retrieval at <350 ms p95}
~13M records _{historical MTA analysis}	92.7% fewer bytes _{scanned in SignalLake benchmark}

Experience

NYU Chunara Lab — Graduate Research Assistant · Jan 2026–present
Building reproducible pipelines across 10 U.S. newspapers that process 7,000+ articles weekly, plus validated LLM-labeling and external research datasets. Experience details

FOX Corporation / FOX Tech — Data Science Intern — Emerging Technology · Feb–Apr 2026
Built a 9-stage Databricks pipeline and hybrid semantic clustering/ranking workflow for editorial content; validated duplicate-safe runs across 14 dates in 32–42 seconds. Experience details

Global Futures Group — Software Engineer Intern, AI/ML & Data Infrastructure · Sep–Dec 2025
Built hybrid FAISS and BM25 retrieval, ranking, FastAPI, and PostgreSQL infrastructure for 12,168 expert profiles with p95 search latency below 350 ms. Case study

NYU DICE Lab — Graduate Research Assistant · May–Dec 2025
Implemented LoRA/PEFT experiments across DistilGPT-2 and Pythia models to measure Dynamic Tanh versus LayerNorm quality and inference trade-offs. Research project

Featured Engineering Work

SignalLake — Operational log analytics

Local-first ingestion and analytics platform that preserves raw JSONL events, transforms them into Hive-partitioned Parquet, and queries operational metrics directly with DuckDB.

FastAPI · DuckDB · Parquet · Pydantic · Docker

Evidence: benchmarked 1M events across 192 files; partition pruning reduced data scanned from 86.7 MB to 6.3 MB. Validated with 21 tests across four pytest files and GitHub Actions. Case study

ExpertMatchAI — Hybrid expert search

Internship project combining FAISS vector retrieval, BM25 lexical search, structured geo filters, tunable ranking weights, and fallback logic behind FastAPI and PostgreSQL services.

FAISS · BM25 · FastAPI · PostgreSQL · Next.js

Evidence: indexed 12,168 profiles; average response time below 200 ms, p95 below 350 ms, and full index rebuilds below 60 seconds. Validated with pytest, Vitest, and Playwright. Proprietary source code is not public.

NYC Subway Foot-Traffic Forecasting — Streaming ML

Combined analysis and model development over approximately 13M historical MTA records with a separate live self-hosted pipeline using simulated turnstile events.

Kafka · Spark Structured Streaming · MongoDB · Random Forest · Docker

Evidence: regression models reached approximately 2,700 RMSE and below 4.8% MAE; a separate traffic-level classifier reached 93.36% accuracy. Case study

Colorectal Cancer Survival Prediction — Reproducible MLOps

Public-healthcare-data MLOps prototype covering feature selection, Gradient Boosting training, experiment tracking, Kubeflow orchestration, and Flask model serving; not a clinically validated system.

scikit-learn · MLflow · DAGsHub · Kubeflow · Flask

Evidence: processed 167,497 records, reduced 28 inputs to five through chi-square selection, and reached 92.9% accuracy with 0.89 ROC-AUC. Case study

Explore public repositories →

Where My Experience Fits

Role family	Strongest public evidence
Data Engineering	FOX Databricks pipeline, current NYU ingestion, SignalLake, Kafka/Spark streaming
Applied Data Science / ML	FOX clustering and ranking, NYU research, forecasting, reproducible MLOps
Search & Ranking	Global Futures / ExpertMatchAI hybrid retrieval, geo filtering, tunable ranking, fallbacks
Backend Data Systems	FastAPI, PostgreSQL, DuckDB, REST APIs, Docker, CI, structured logging, automated tests

Technical Toolkit

Data & Distributed Systems: Python, SQL, PySpark, Spark Structured Streaming, Kafka, Databricks, Delta Lake, Parquet, DuckDB
ML, Retrieval & Evaluation: scikit-learn, PyTorch, Hugging Face, FAISS, BM25, embeddings, MLflow, relevance and model evaluation
Backend & Databases: FastAPI, Flask, PostgreSQL, MongoDB, Prisma, Next.js, TypeScript, REST APIs
Cloud, Testing & Delivery: Docker, Kubernetes, Kubeflow, GitHub Actions, Vercel, Railway, pytest, Vitest, Playwright

How I Work

Start with the simplest correct system, define measurable behavior, and make failures visible before adding complexity.

Evaluate retrieval and ML systems with explicit quality, latency, and failure criteria.
Make data and model workflows reproducible through versioned artifacts, rerunnable pipelines, tests, and CI.
Document architectural trade-offs so another engineer can understand both the result and its limits.

Current Focus

I am currently building reproducible text-data and LLM-labeling systems at NYU Chunara Lab and am open to early-career roles in Data Engineering, Applied Data Science, Search & Ranking, ML/Data Platforms, and backend systems for data and AI.

View the portfolio · Explore repositories · Connect on LinkedIn · Email me

Provide feedback

Saved searches

Use saved searches to filter your results more quickly