# launch-monitor-regression — project notes

**Pulled**: 2026-05-26 via `git clone https://github.com/tim-blackmore/launch-monitor-regression.git`
**Upstream**: https://github.com/tim-blackmore/launch-monitor-regression
**License**: MIT (per repo)

## What's here

| File | What |
|---|---|
| `data.csv` | 10169 rows × 35 cols — TrackMan-style launch monitor exports |
| `main.ipynb` | Original author's regression notebook (sample analyses) |
| `requirements.txt` | Original deps (numpy, sklearn, matplotlib — not project-required) |
| `README.md` | Upstream README |

## Schema (key columns)

- **Categorical**: `Club`, `Ball`
- **Swing kinematics**: Club Speed, Attack Angle, Club Path, Swing Plane, Dyn. Loft, Face Angle, Face To Path
- **Ball outcome**: Ball Speed, Smash Factor, Launch Angle, Spin Rate, Spin Axis, Carry/Total distance, Side, Max Height
- **Impact**: Dynamic Lie, Impact Offset, Impact Height (sparse — ~75% NaN)

First data row in `data.csv` is a units-only header (`[mph]`, `[deg]`, etc.) — skip with `pd.read_csv('data.csv', skiprows=[1])`.

## Important caveats (read before using)

- **Driver-only**: All 10169 rows have `Club=Driver`. No iron / wedge / putter coverage. Narrower than the upstream README implies.
- **Ball-type imbalance**: 92% `Premium`, rest split across Limited Distance Soft / Medium / Soft / Premium Hard.
- **Sparsity**: ~14% missing on `Club Path` / `Attack Angle` / `Face Angle`; ~77% missing on Impact Lie/Offset/Height.

## How this complements existing golf data

| Dataset | What | Pairs with |
|---|---|---|
| GolfDB | swing video (160×160) | ❌ no overlap — no shot outcome |
| CaddieSet | joint features + ball-flight 8 cols | ✅ both have ball outcome — **CaddieSet has video-anchored shots, this has TrackMan-level granularity but no swing media** |
| this | TrackMan launch-monitor (driver only) | — |

Use case: pair with CaddieSet's `Distance / Carry / BallSpeed / SpinBack` for outcome regression at scale, treating this as a tabular-only extension of CaddieSet's ball-flight subset.
