Buyer churn, or the lack of prospects to rivals, is a crucial concern for telecom corporations. Excessive churn charges can considerably impression income and profitability. Understanding the elements that contribute to buyer churn and growing methods to scale back it’s important for sustaining a aggressive edge. This text offers an in-depth evaluation of buyer churn for a telecom firm utilizing information analytics and Python programming.
This evaluation is structured as follows:
- Knowledge Assortment and Preparation: Gathering and getting ready the dataset for evaluation.
- Exploratory Knowledge Evaluation (EDA): Figuring out key patterns and traits within the information.
- Predictive Modeling: Constructing and evaluating machine studying fashions to foretell buyer churn.
- Conclusion and Suggestions: Summarizing findings and providing actionable insights.
- References
- Appreciation
Utilizing Python and Jupyter Pocket book, we collected and processed information from the telecom firm’s buyer database. The info included buyer demographics, service utilization patterns, account info, and churn standing.
python
Copy code
#import essential packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import osimport joblibimport scipy.stats as statsfrom dotenv import dotenv_valuesimport pyodbc# Machine Studying, Preprocessing & Hyperparameter Tuningfrom scipy.stats import randint, uniformfrom sklearn.model_selection import train_test_split, RandomizedSearchCV, GridSearchCVfrom sklearn.impute import SimpleImputerfrom sklearn.base import BaseEstimator,TransformerMixinfrom sklearn.preprocessing import OneHotEncoder, PowerTransformer, StandardScaler, LabelEncoderfrom sklearn.compose import ColumnTransformerfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifierfrom sklearn.metrics import classification_report, confusion_matrix, ConfusionMatrixDisplay, roc_curve, aucfrom sklearn.neighbors import KNeighborsClassifierfrom sklearn.tree import DecisionTreeClassifierfrom sklearn.svm import SVCfrom imblearn.pipeline import Pipelinefrom imblearn.over_sampling import RandomOverSampler, SMOTEfrom sklearn.feature_selection import SelectKBest, mutual_info_classiffrom functools import partial# Mannequin Persistenceimport joblib# Different Utilitiesfrom warnings import filterwarningsfilterwarnings('ignore')
EDA was carried out to uncover key insights. We targeted on figuring out patterns and correlations that might assist in understanding why prospects churn.
# test whole variety of rows and columns within the dataset
df_train.form
# test for lacking values and proportion
missing_values = df_train.isna().sum()
missing_percentage = (df_train.isna().sum() / df_train.form[0])*100# Create a DataFrame with the variety of lacking values and their proportion
missing_df = pd.DataFrame({
'Column': df_train.columns,
'Lacking Values': missing_values,
'Proportion': missing_percentage
}).reset_index(drop=True)
# Show the DataFrame
missing_df = pd.DataFrame(missing_df)
missing_df
#pointplot of lacking values
plt.determine(figsize=(16, 5))
ax = sns.pointplot(x='Column', y='Proportion', information=missing_df)
plt.xticks(rotation=90, fontsize=7)
plt.title('Proportion of Lacking Values per Column')
plt.present()
# Distribution of the explicit columnsfor i, predictor in enumerate(df_train.drop(columns=['TotalCharges', 'MonthlyCharges', 'customerID', 'tenure'])):plt.determine(i)sns.countplot(information=df_train, x=predictor)plt.xticks(rotation=90)plt.title(f'Distribution of {predictor}')plt.xlabel(f'{predictor}')plt.ylabel('Depend of shoppers')plt.present()
We used varied machine studying algorithms to foretell buyer churn, together with logistic regression, determination timber, and random forests. The fashions had been evaluated based mostly on their accuracy, precision, recall, and F1 rating.
#Create a category to take care of dropping Buyer ID from the dataset
class columnDropper(BaseEstimator, TransformerMixin):
def match(self, X, y=None):
return selfdef rework(self, X):
# Drop the desired column
return X.drop('customerID', axis=1)
def get_feature_names_out(self, input_features=None):
# If input_features is None or not supplied, return None
if input_features is None:
return None
# Return function names after dropping the desired column
return [feature for feature in input_features if feature != 'customerID']
# Create a category to take care of the inconsistencies within the totalCharges column and convert it right into a float
class TotalCharges_cleaner(BaseEstimator, TransformerMixin):
def match(self, X, y=None):
return self
def rework(self, X):
# Exchange empty string with NA
X['TotalCharges'].substitute(' ', np.nan, inplace=True)
# Convert the values within the Totalcharges column to a float
X['TotalCharges'] = X['TotalCharges'].rework(lambda x: float(x))
return X
# Since this transformer would not take away or alter options, return the enter options
def get_feature_names_out(self, input_features=None):
return input_features
#Create numerical, categorical, full pipeline, and label encoders for machine studying algorithm# Choose the explicit and numerical columns within the datasetnum_columns = ['tenure', 'MonthlyCharges', 'TotalCharges']cat_columns = ['customerID', 'gender', 'Partner', 'Dependents', 'PhoneService','MultipleLines', 'InternetService', 'OnlineSecurity', 'OnlineBackup','DeviceProtection', 'TechSupport', 'StreamingTV', 'StreamingMovies','Contract', 'PaperlessBilling', 'PaymentMethod', 'SeniorCitizen']# Create a categorical pipelinecat_pipeline = Pipeline([('column_dropper', columnDropper()),('imputer', SimpleImputer(missing_values=np.nan, strategy='constant', fill_value='No')),('encoder', OneHotEncoder(sparse_output=False, handle_unknown='ignore')),])# Create a numerical pipelinenum_pipeline = Pipeline([('total_charges_cleaner', TotalCharges_cleaner()),('imputer', SimpleImputer(missing_values=np.nan, strategy="mean")),('pt_transform', PowerTransformer()),('scaling', StandardScaler()),])# Create a full pipeline which accommodates the categorial and numerical pipelinefull_pipeline = ColumnTransformer([('num_pipeline', num_pipeline, num_columns),('cat_pipeline', cat_pipeline, cat_columns),])# Encode the end result columnlabel_encoder = LabelEncoder()y_train_encoded = label_encoder.fit_transform(y_train)y_eval_encoded = label_encoder.rework(y_eval)
The evaluation revealed a number of key insights:
- Key Drivers of Churn: Components similar to customer support points, excessive utilization costs, and lack of bundled providers had been vital contributors to churn.
- Predictive Accuracy: The logistic regression mannequin achieved an accuracy of round 80%, indicating an excellent skill to foretell buyer churn.
- Enhance Buyer Service: Give attention to resolving buyer points promptly to boost satisfaction.
- Supply Aggressive Pricing: Re-evaluate pricing methods to make sure competitiveness and affordability.
- Introduce Bundled Companies: Present enticing bundles of providers to extend buyer retention.