LLM Preference Classification — TF-IDF + Logistic Regression

Kaggle “LLM Classification (Finetuning)” entry. Two TF–IDF baselines (full-text vs. separated fields) with multinomial Logistic Regression, ensembled by probability averaging to predict A / B / tie.

Role: Data Science & Modeling Stack: Python, pandas, NumPy, scikit-learn, SciPy Task: 3-class classification (A / B / tie) CV: 5-fold (log-loss & accuracy)

Overview

The dataset provides pairwise responses (response_a, response_b) to a prompt and labels which model’s response wins (A, B) or if they tie. I constructed a 3-class target from the competition’s one-hot labels and trained two Logistic Regression baselines on TF–IDF features, then averaged their predicted probabilities for the final submission.

Data & Labels

  • Loaded train.csv and test.csv.
  • Constructed target winner ∈ {A, B, tie} from winner_model_a/winner_model_b.
  • Mapped labels via {A:0, B:1, tie:2} for modeling.

Features

  • Model A — Full text: TF–IDF (1–2 grams, 20k max features) on prompt + "[SEP]" + response_a + " [VS] " + response_b.
  • Model B — Separated + difference: TF–IDF (1–2 grams, 10k each) on prompt, response_a, response_b; built a difference block (respA − respB); final design matrix was [prompt | respA | respB | diff] (≈40k cols).

Modeling

  • Classifier: LogisticRegression (multinomial, solver='saga', C=1.0, max_iter=500).
  • CV: 5-fold on training set with log-loss and accuracy.
  • Trained both models on full train; predicted test probabilities; ensembled by averaging.
  • Built submission with columns: winner_model_a, winner_model_b, winner_model_tied.

Cross-Validation (Train)

  • Model A (Full-text): log-loss 1.1143 ± 0.0028, accuracy 0.3791 ± 0.0049.
  • Model B (Separated + diff): log-loss 1.1812 ± 0.0045, accuracy 0.4212 ± 0.0042.
  • Ensemble: average of Model A & B probabilities.

Notes

VectorizerTF–IDF, 1–2 grams, strip accents
DesignFull-text vs. separated + diff features
Output3-way class probabilities