Predicting Warranty Claim Validity with Bias-Variance Tradeoff

During my tenure as a Data Scientist for Hyundai, one of the projects I was tasked was to develop a ML model to predict the validity of warranty claims based on historical data. The team needed a model that could quickly identify and process valid claims and reduce the time and resources spent on investigating fraudulent or incorrect claims.

The object was to determine the validity of Hyundai's WTC claims to streamline claim processing and identify fraudulent or incorrect claims. The dataset was extracted using WinSQL and includes information on vehicle warranty claims with part numbers, dealer ID, claim amount, customer complaints, and previous claims along with the status of the claim (valid or invalid).

This is a confidential dataset in which cannot be accessed due to privacy and security policies

Some parts of this project may or may not displayed. To view the entire Python work for this project, please visit my GitHub Repository:

Min Chang's GitHub

Loading and Preprocessing Data

Initial steps include loading the data, encoding categorical variables, scaling numerical features, and preparing the dataset for modeling. For this model, I used encoding and scaling which are both key for handling categorical data and normalizing the distribution of numerical features, ensuring the model processes features correctly.

Textual Analysis

Since the dataset include textual data from Customer Complaints, I applied TF-IDF vectorization to convert the text data into a structureed format that is usable by the model. This helps in transforming textual data into a structured, feature-engineered format that improves the mode's capability to understand nuances in the 'Customer Complaints'.

Initial Model Assessment with Logistic Regression Analysis

To establish a baseline, I created a logistic regression to measure the performance and assess any potential high-level bias. I focused on accuracy and recall which indicates the model's inability to accurately predict the minority class, thus showing a higher bias.
The accuracy came out to 67.5% with a macro-averaged recall of 50%

From this classification report, the model showed that there was evidence of high bias. Having a 0% precision and recall for this class, this indicated that the model had a total failure in predicting class '0' (invalid claims). It was too simplistic for the model as it is unable to capture the underlying complexities of the warranty dataset.

Although the accuracy came out to be 67.5% which might seem moderately acceptable, it is still misleading because of the imalance in the dataset. The model simply predicts the majority class ('Valid' claims) for all inputs. As a result, the accuracy is mostly reflective of the majority class proportion.

Now the macro-averaged recall of 50% further proves the evidence of high bias, showing an average performance across classes that were not skewed by class imbalance. The entire misclassification of 'Invalid' warranty claims heavily affects the metric.

Random Forest Model

To address the high bias that has been clearly identified, I developed a Random Forest model to handle more complex patterns that could potentially reduce bias. I first wanted to measure the average of the squares of the errors to see if switching to Random Forest compared to Logistic Regression would show a lower Mean Squared Error (MSE) that could suggest a reduction in bias.

Hyperparameter Turning and Model Evaluation

I created the GridSearchCV object for the Random Forest Model and used the best estimator found by GridSearch to predict on the test set to calculate the accuracy and recall for the optimized model. The results showed an optimized accuracy of 68.5% and an optimized recall of 51.5%.

Although it was not a huge change, the optimized accuracy resulted in a slight improvement over the baseline logistic regression model which had an accuracy of 67.5%, thus demonstrating that the Random Forest model, with the adjusted parameters, is slightly better at handling the data complexity. The macro-averaged recall also shows a 1.5% improvement compared to the logistic regression model, which is not ideal, but demonstrated again that the RF model was slightly better at identifying true positives across both classes, resulting in a reduced bias towards the majority class.

The hyperparameter tuning for the RF model shows margical improvement in both accuracy and recall which helped balance the bias and variance in the model. Although I haven't gotten to further train the model due to my leave, the model does show potential for more improvements with fine-tuning and additional feature engineering. Having almost no ML models active in the WTC, this model still helped generalizing new data and provided reliable predictions in the validation of warranty claims. Having worked with millions of data in a large-scale data ecosystem, this test was able to highlight the nature of model building and the importance of tuning and testing configurations to find the optimal setup.