PLAIcraft Newsletter Subscription — KNN Classification

UBC DSCI 100 group project predicting newsletter subscription from gameplay minutes and age. Emphasis on careful data cleaning, simple EDA, and time-respecting evaluation.

Role: Data Cleaning · EDA · Modeling Stack: R (tidyverse, tidymodels) Data: PLAIcraft players.csv Task: Binary classification (K-NN)

Overview

We trained a K-nearest neighbors classifier to predict whether a PLAIcraft player subscribed to the newsletter using two predictors: minutes played and age. The project focused on transparent preprocessing and evaluation.

Data Cleaning

  • Loaded players.csv and selected subscribe, minutes played, and Age.
  • Coerced subscribe to a factor; removed rows with missing values in selected fields.
  • Created a 75/25 train–test split stratified on subscribe.

Exploratory Analysis

  • Plotted distributions of minutes played and age.
  • Reviewed the proportion of subscribers vs. non-subscribers (class balance).

Modeling

  • Framed the task as K-NN classification with the two predictors.
  • Tuned K over candidate values; selected K = 21.
  • Evaluated on the held-out test split; inspected confusion matrix and summary metrics.

Results (Test)

  • Accuracy ≈ 75%
  • Precision ≈ 100% (skewed toward positives)
  • Recall ≈ 8% (very low)

Takeaways

SignalMinutes & age weakly predictive of subscription
BiasModel biased toward predicting “subscribed”
NextAdd behavior features; address class imbalance