PseudoPitch: Calibrated Probabilistic Pitch Classification

Project Overview

Awarded 3rd place at OUSAC 2025, PseudoPitch combines Probabilistic Nearest Neighbors with pseudo-label smoothing to deliver interpretable, well-calibrated probabilities of baseball pitch outcomes—even in low-data regimes. Our method outperforms classic baselines by leveraging self-labeling to refine decision boundaries and tighten confidence estimates.

Methodology

Data Preprocessing
Extract raw pitch features (speed, spin, release angle) and normalize to unit scale.

Feature Embedding
Compute k-nearest neighbor graph and extract neighbor distances as probabilistic features.

Pseudo-Label Smoothing
Iteratively assign soft labels to unlabeled points based on high-confidence predictions, refining the decision surface.

Probabilistic Calibration
Fit a reliability curve via isotonic regression to align raw scores with true outcome frequencies.

Evaluation & Deployment
Benchmarked against logistic regression and standard KNN on held-out seasons; packaged as a Python module with scikit-learn API compatibility.

Presentation

Conference Paper