Customer Analytics

Customer Profiling and Segmentation in Banking


No customer shares the same quality to be targeted in a similar way. Each of them is unique from a business perspective and if in case we treat them in a similar fashion we might lose the majority of the customers.

Customer segmentation and profiling provide insight into the behavior of the audience and helps in categorizing them on basis of their similar characteristics. Further, having an understanding of your customer’s profile will help in making informed business decisions. While customer segmentation categorizes customers based on their purchasing behavior, geographies, or demographics; profiling customers helps to better know your customers and describe their persona.

In a bank that offers several products, the focus should be on customer segmentation and profiling to ensure ease of targeting, marketing, and offering personalized products to High-Income Customers, Cross-sell and Up-sell Opportunities, etc.


During business expansion, there are various problems that a bank deals with. Banks should consider the current value as well as the future value that each customer will be able to bring. The main idea behind taking up the study of IndusInd Bank is to segment potential affluent customers from our Income estimation model’s prediction. Customers with high-income levels and a greater frequency to purchase items are more profitable.

If we know about our customer’s total income and spending, including that in other banks, we can successfully target them with suitable offers. The objective of the project was to build an income estimation model that can serve as an appropriate baseline for wealth imputations, specifically for the affluent segment. Below is the overview of the model development plan that was followed :

Model Development Plan

Customer 360°

To understand the behavior and priorities of a customer we need to have a more detailed picture of customer’s activities. For this we have developed a customer 360 degree view. Customer 360 degree view refers to the accurate & detailed picture of the customer and their entire journey with an organization along with their experiences that they have faced during their journey from a website inquiry to a product purchase to a customer support ticket.

To develop customer 360o view we have used customer’s Debit card and Credit card transaction data, product holding, age and geographical data, assets, and different types of liability listed with credit bureaus. Various sources of data such as Equifax, Credit Card, Corporate Salary (Corpsal), and CIBIL were considered to determine the income of the customer.

Comparative analysis for all four data sources was done and the difference in salaries between CIBIL, Corpsal, and Equifax was observed.


Key Observations

  • Salary mentioned for the corporate customer is marked 100% accurate.
  • Difference in the mentioned salaries between (CORPSAL, CIBIL) was much less as compared to difference between (CORPSAL, EQUIFAX), so CIBIL was comparatively more accurate than CORPSAL.
  • Account Type(Pioneer, Exclusive, Others) vs. Income analysis was also performed with following inferences
    • Maximum customers falls in range of (10k-50k) which is counter-intuitive
    • For CIBIL , the distribution is inclined towards (50k-1L)
  • Modeling using all four sources of Income was done: Equifax failed to fit beyond 75,000 income bucket.

Exploratory Data Analysis


After acquiring data, we have performed Exploratory Data Analysis and found the highest correlation with salary, age is a strong indicator of high salary but does not have a linear relationship, NEFT, and RTGS share high correlation and Income values which are less than 20,000* are discarded in model building and validation.

Model Building

For model training, two regressors were trained separately for higher-income- lower-income customers and ensemble to create a single regressor that minimizes bias and predicts even in high heteroskewdacity.

Y1 or Regressor1: Trained on customers with CIBIL salary (~34,000) using WLS + OLS statistical prediction model.



  1. Model predicted right salary values for 48% cases
  2. Nearly 31 % customers have high salary classification due to imputed values in LBC.
  3. Customers with low-fill rate have been classified in lower bucket or because of bureau data absence.

Y2 or Regressor2: Trained on customers with corporate salary (~4,500) using WLS + OLS statistical prediction model.


  1. Model predicted right salary values for 57% cases
  2. Nearly 5 % customers have high salary classification due to imputed values in LBC.
  3. Customers with low-fill rate have been classified in lower bucket or because of bureau data absence.

After building the model, the final recall value of prediction from Regressor 1+ 2 in -25% to +25% error rate was observed to be 53.8 %.

Model Validation

After this, we have validated the model and developed our final model. The model predicted correctly for nearly 89% of customers from the validation set where the average R square value was 82. We have estimated accurate predictions for nearly 53% of customers with an average deviation from act @25,000. We found nearly ~3,000 customers have a salary greater than 1.25Lakh, these CIFs can be potentially targeted for cross-selling and up-sell opportunities. A prospective base of ~4,200 customers from the MASS group have a salary greater than 1 Lakh and could be the prospective base from Exclusive Upgrade. A prospective base of 1,900 Pioneer customers is identified from mass and the remaining affluent base.

Model Summary and Improvement Area

  • Accurate predictions for nearly 53% customers with average deviation from act @25,000
  • A prospective base of 1,900 pioneer customers is identified from mass and remaining affluent base.
  • A prospective exclusive customer base of nearly 40,000 customers from mass and remaining affluent base.
  • Wealth Imputations and generalized model for entire base.
  • Unavailability of Linear Correlated features attracting heavy transformations.
  • Fill rate in case of credit information data source
  • Some KPI such as employment data, household size with appropriate fill rate


The model is enriched with various things like an artificial loan that can be given to the affluent customers who are shy of loans, survey, or conversation data can help in enriching data. There are many applications of this model like Cross-sell and Up-sell Opportunities, Neighborhood Onboarding via identification of Affluent addresses, personalized product for High-Income Customers, Loan Disbursement and Underwriting, Wealth Imputations by providing the right product to the right customer, Prospective Pioneer and Exclusive Upgrade leads.

Learn more about our solutions and case study here or write to us at