All work

Data Engineering · 2025

Sports Analytics Pipeline

Analytics Platform

PythonBigQueryGCPXGBoostMachine Learning

High-throughput predictive analytics pipeline generating 600+ daily performance predictions — 7 concurrent ML models, BigQuery infrastructure, automated grading, and real-time Grafana monitoring.

The challenge

Running multiple model architectures simultaneously — each with different data requirements, update frequencies, and evaluation criteria — at production scale required infrastructure that could operate reliably without constant manual intervention.

What we built

Built a 6-phase data pipeline on GCP: ingests game data, distributes across 7 concurrent ML systems (moving average, zone matchup, similarity matching, XGBoost, CatBoost, ensemble, and more), auto-grades results with 70–90% daily coverage, and surfaces performance metrics through Grafana dashboards. 1,000+ metrics tracked per run.

Have a similar problem?

Get in touch →