top of page

Customer Churn Rate Prediction Project

During my tenure at Delgado Protocol, one of the projects I was tasked was to develop a predictive model using R programming that identifies customers at high risk of churn, enabling targeted interventions to enhance customer retention. The company has been experiencing higher than average customer churn rates in 2023, which impacted revenue and increased the cost associated with acquiring new customers. The existing models do not adequately address the complexities of customer behaviors and interactions with Delgado's services, partly due to incomplete data and lack of robust predictive analytics. The team needed a model that could quickly identify and predict the church rate, addressing the issue of customer retention.​

​

In this project, I discuss how I developed a framework using proxy metrics derived from available data (like session length and frequency of visits) combined with industry benchmarks to model user engagement and predict churn. This approach could have included:

• Data Imputation Techniques: To handle missing data.

• Segmentation Analysis: To identify at-risk user groups.

• Survival Analysis: To model churn over time.

​

**This is a confidential dataset in which cannot be accessed due to privacy and security reasons. Names of actual Stakeholders in the Stakeholder Management Map has been redacted for the sake of privacy.

Screenshot 2024-07-02 at 11.12.47 AM.png

Stakeholder Management Map

To have a better understanding of the stakeholders in relevance to the project, I created a simply Stakeholder Map that helped in ensuring effective and organized collaboration. All stakeholders listed had their influences in decision-making and their interests in the project outcomes. The second level Department segments breaks down into three departments that directly interact with or are impacted by this project. Management ensured that the project had strategic alignment with company goals and provided decision-making support. Operational Teams helped in findings from the project and engaged directly with customers with high retention rates. Creative and Marketing Teams were responsible for creating Video and Static Paid Advertisements and extract historical to live stream data of Ad performances. Support teams provided the necessary tools, data, and analysis to support the project. As a Marketing Data Scientist, this individual project was also reviewed and assisted by other Data Scientists as well as Legal & Compliance to ensure data handling and customer interaction strategies complied with legal standards.

Initial Model Assessment with Logistic Regression Analysis

To establish a baseline, I created a logistic regression to measure the performance and assess any potential high-level bias. I focused on accuracy and recall which indicates the model's inability to accurately predict the minority class, thus showing a higher bias.
The accuracy came out to 67.5% with a macro-averaged recall of 50%

Screenshot 2024-07-02 at 11.32.16 AM 1.png

Using R Programming in R Studio, I have provided a comprehensive view of the code used for this project and how different customer segments behave and their risk of churn overtime. 

After loading the dataset and the necessary libraries, I simulated a more comprehensive dataset using KPIs such as CustomerID, Session Length, Frequency of Visits, Total Spending, Number of Transactions, Customer Satisfaction Score Levels, and Church.

I then introduced missing values randomly for more complexity using samples of Session Length, Frequency of Visits, Total Spending, and Number of Transactions. This continued to the imputation the missing values to ensure a robust dataset for more complex analysis and checked for completeness. This helped in creating a richer dataset where I was able to perform deeper analyses like the correlation between spending habits and church or how customer satisfaction scores impacts churn likelihood.

In the R Script above, I used a segmentation technique on the written KPI's to categorize customers into quartiles to help identify which segments were most susceptible to churn. I then followed it up with survival analysis to evaluate the risk and timing of churn within these segments using survival curves (survfit & plot()) to visually depict the risk overtime. X-Axis represented days and months whereas Y-axis represented the survival probability which showed a score of 0.65 which showed that higher spending customers have a significant but not absolute probability of churning but does show that high spend might correlated with reduced churn. I figured that this could be more related to high-impact customers' satisfaction and loyalty.

Using ggplot2, I then created a Bar Chart of the Churn Rate by Customer Satisfaction Levels that shows bars of satisfaction ratings from 1 to 10. The output showed 30% of churned customers that had a negative satisfaction rating (anything below 5) which suggested a correlation between low satisfaction rates and churn, indicating areas where improvements could potentially reduce customer attrition. 

Having used the key KPIs that were chosen based on their relevance to customer retention, it allowed for a better understanding of customer behavior and their relationship to churn, making the predictive model more effective and actionable. 

bottom of page