Awarded 3rd place at OUSAC 2025, PseudoPitch combines Probabilistic Nearest Neighbors with pseudo-label smoothing to deliver interpretable, well-calibrated probabilities of baseball pitch outcomes—even in low-data regimes. Our method outperforms classic baselines by leveraging self-labeling to refine decision boundaries and tighten confidence estimates.
Data Preprocessing
Extract raw pitch features (speed, spin, release angle) and normalize to unit scale.
Feature Embedding
Compute k-nearest neighbor graph and extract neighbor distances as probabilistic features.
Pseudo-Label Smoothing
Iteratively assign soft labels to unlabeled points based on high-confidence predictions, refining the decision surface.
Probabilistic Calibration
Fit a reliability curve via isotonic regression to align raw scores with true outcome frequencies.
Evaluation & Deployment
Benchmarked against logistic regression and standard KNN on held-out seasons; packaged as a Python module with scikit-learn API compatibility.