LLM Preference Classification — TF-IDF + Logistic Regression
Kaggle “LLM Classification (Finetuning)” entry. Two TF–IDF baselines (full-text vs. separated fields) with multinomial Logistic Regression, ensembled by probability averaging to predict A / B / tie.
Overview
The dataset provides pairwise responses (response_a, response_b) to a prompt and labels which model’s response wins (A, B) or if they tie. I constructed a 3-class target from the competition’s one-hot labels and trained two Logistic Regression baselines on TF–IDF features, then averaged their predicted probabilities for the final submission.
Data & Labels
- Loaded
train.csv
andtest.csv
. - Constructed target
winner ∈ {A, B, tie}
fromwinner_model_a
/winner_model_b
. - Mapped labels via
{A:0, B:1, tie:2}
for modeling.
Features
- Model A — Full text: TF–IDF (1–2 grams, 20k max features) on
prompt + "[SEP]" + response_a + " [VS] " + response_b
. - Model B — Separated + difference: TF–IDF (1–2 grams, 10k each) on
prompt
,response_a
,response_b
; built a difference block(respA − respB)
; final design matrix was[prompt | respA | respB | diff]
(≈40k cols).
Modeling
- Classifier: LogisticRegression (multinomial,
solver='saga'
,C=1.0
,max_iter=500
). - CV: 5-fold on training set with log-loss and accuracy.
- Trained both models on full train; predicted test probabilities; ensembled by averaging.
- Built submission with columns:
winner_model_a
,winner_model_b
,winner_model_tied
.
Cross-Validation (Train)
- Model A (Full-text): log-loss 1.1143 ± 0.0028, accuracy 0.3791 ± 0.0049.
- Model B (Separated + diff): log-loss 1.1812 ± 0.0045, accuracy 0.4212 ± 0.0042.
- Ensemble: average of Model A & B probabilities.
Notes
VectorizerTF–IDF, 1–2 grams, strip accents
DesignFull-text vs. separated + diff features
Output3-way class probabilities