A customer segmentation service is generally used to identify which type of product is bought more often by clients (current purchasing behavior) and which ones are potential clients. The goals are thus to:
(a) identify ways to customize the purchases and
(b) provide buyers with the needed products based on market segmentation, which will guide decisions related to product development, message delivery, and distribution to groups of customers through customization.
Customization will be achieved by finding a group or segment of customers who are very similar in their demographics, purchasing behaviors and even their attitudes or desired set of particular choices.
The analysis will thus be designed to answer the following questions:
(a) what types of marketing programs should we develop for each of these three segments?
(b) Are there distinct differences in these segments that allow marketing to better develop a program that is tailored to the segments under consideration?
(c) What would be a good set of programs targeted and aimed expressly for each of these segment groups?
To address these questions, businesses will be divided into subgroups according to their purchasing behavior, what kind of items they purchase, how much money they spend, how often they purchase, what products they purchase, and so on. These analyses will be useful to identify the groups to which the businesses belong (capturer, desarrollar) based on their purchasing behavior. This information will be used so that each business can be targeted with special promotional material (coupons, promotions) that might be of interest, which benefits the business (store – because they have the merchandise they need) and the seller of the product. This information is needed for effective Customer Relationship Management practices.
A profile will be created to inform important questions related to the four Ws (the who – who buys, what type of product they buy, where they buy it – business characteristics purchasing particular products, and when they buy – during which months or particular seasons). Moreover, we will work with clients to identify - the why of the decisions made.
These objectives will be addressed to understand our customer base to deploy sales force in targeted ways. Thus, we will examine the number of customers purchasing particular products within particular geographies as well as their recent purchases, what industries they mainly come from, and so on. The resulting customer profile by geographic area will inform businesses align the sales force with customers’ purchasing trends to achieve greater sales coverage and effectiveness in their customer base.
As first steps, we will thus leverage the data gathered that includes profiling based on:
1. Counting the number of customers by region or zip code range for each industry group
2. Counting the number of customers who have made purchases within the last year and ones who have not
3. ADD HERE OTHER INFORMATION ALREADY AVAILABLE
The services involves 9 steps:
Data cleaning & Data filtering
Frequency analysis and Cross Tabs
Directed Data Mining
Calculate the k-means algorithm
Selection of Variables to Use in Segmentation Profiles
Interpretation of Clusters and Identification of Customer Value Segments
Creation of Association Rules
Merge Association Rules and Customer Segmentation
Delivery of Results to Businesses
Keep only variables with complete data
Filter out extraneous variables (or variables that have ranges outside the desired parameters)
Replace N/A with imputed values, where possible
Each of the types of clients (capture, developed) will be profiled using the following variables (xxx, yyy).
A frequency analysis will be conducted to show differences between each of these types of clients. A plot will be created to visually illustrate differences between each of these types of clients.
Cross Tabs will be calculated to see where products (x, y, and z) are sold more/less frequently. The crosstab analysis will allow us to identify which products are sold within which region.
The directed data mining techniques will be used to explain the value of some particular field or variable (income, response, age, etc.) in terms of all of the other variables available. A target variable is chosen to tell the computer algorithm how to estimate, classify, or predict its value.
We will compute Euclidian distances among variables in order to determine the extent to which the variables relate to each other and to create clusters. Because we will have differing fields on our data set, we will transform the variables onto a numeric scale to have the same meaning for all the fields being considered. Then, after transforming the variables, we will work from the transformed space, and record the distances between each of the records. Next, to understand the resulting distances, we will bring these transformed values back into their original dimensions and scales which will have meaning.
We will use a k-means algorithm (we will calculate the cluster centroid on the basis of the cluster’s current membership rather than its membership at the end of the last cycle of computations). We will define clusters, which are a group of database records that have something measurable (attributes) in common; however, the basic structure of the groups is not known or defined using mean-centered variables to allow for comparisons to be made across variables.
We will create segmentation profiles based on the following three commonly used attributes: recency, frequency, and monetary value (RFM) and we will create a three-dimensional table that represents customers based on these three attributes.
Recency (a term typically used in direct marketing industry) is a measure of the time lag since your customer has either communicated or purchased last from your business. Recency can be measured in weeks, months, quarters, fiscal years, etc.
Frequency is the quantity or volume of items or services purchased and can be single units or perhaps aggregated in deciles or whatever meaningful grouping.
Monetary value is just that, a numeric currency figure representing the value of each of the frequency units or aggregated units that were purchased.
Duration is the expected length of the customer relationship. By relationship, I mean purchase relationship rather than relationship to a salesperson. The purchase relationship is what will matter most when considering the value to the shareholder or the owner of the business as in a privately owned business.
Other Variables
Revenue is the income from the sale of goods (product) or services.
Discount rate is the adjustment to convert a future dollar value into today’s value. This is incorporating the time value of money.
Costs are the marketing expense or direct cost of the product or service. The cost associated to customers does not typically include the cost of buildings, facilities, or other costs. The cost we are after here needs to be directly related to the customer.
Renewal rate is the probability of renewal or retention rate. Again, the retention rate could be determined with a predictive model or perhaps computed directly depending on the business need and type of business complexity.
Risk factor can be incorporated to include the potential risk related to losses such as a customer returning items, not paying, and going into bankruptcy, etc.
Our analysis and interpretation of clusters will be used to identify opportunities for customer growth to find product portfolios that are best suited for each cluster. These analyses will thus be used to determine which customers are valuable, which customers lack potential, and which customers should be grown to develop their value. This in turn will guide business profitability.
Customers that are clearly more profitable should actually be treated differently from customers who are not. That is, inform decisions such as which customers should have different types of attention than others and which type of attention is more valuable and appropriate for their purchasing decisions. Thus, certain clients will be given promotions more often and will also be given added bonus points, exclusive entry into particular programs, campaigns, or special offers for particular customers to create customer value segments comes into practice.
To understand market baskets, we will analyze customer-based transactions using association rules. In order to generate an association rule, we will use the following data elements. First, we will create a customer (or customer identification) identifier. Second, we will create an order or transaction that contains the items purchased (or set of items bought). Lastly, we will record the actual product that was purchased.
Next, we will merge the data from the association rules with the customer segmentation information. This will match merge the product data with zero fills for null values and the scored customer segmentation data for each customer ID. The data sets are both internally sorted on each BY variable. Now run the Merge node. When you view the resulting data set from the Merge node, your data should now contain the cluster segment variables _SEGMENT_ and DISTANCE and also the product values we filled in with zeros for empty values.
The idea now is to compute estimates of average product quantity for each product on each segment. We will also compute the binary product affinities and contrast and compare the results of these two methods. So, to compute the binary product scores from raw quantities, we need to translate 0 for 0 quantity and 1 for any quantity greater than or equal to 1.
Then, we will compare each segment’s product quantity means with the other segments’ means, to examine whether, on average, a segment has a higher or lower affinity for a particular product or set of products. The results will reveal which segments aim to purchase which product more/less than other segments. Thereby enabling providers to target the particular segments with promotional materials and the like.
The results of the analyses will be delivered to businesses to inform their purchasing decisions.
Data & Variables
Customer Characteristics
Current clients are grouped by modules. The modules include the following variables:
Influence Zone – Is composed of two categorical variables:
Relevant Purchase in this case - with a value of > 268 liters. This value is calculated by the amount of money spent on Food, beverage & tobacco. Each of these variables has multiple categories such as beverage (water, tea, softdrink, energizer, or …).
Degree of coverage. This categorical variable has the following levels: “Good, Low, No Coverage” based on monthly purchase of food and drinks as follows:
Group 1 (under 80 liters)
Group 2 (80-150 liters)
Group 3 (150-260 liters)
Group 4 (over 260 liters)
Potential Customers
Potential Clients are grouped as follows:
Low
Medium low,
Medium high,
High
Customer Characteristics
The goal is to segment clients based on purchasing of the various food/drinks in order to re-stock them of Food, beverage & tobacco. The segmentation is based on three variables:
Socio-economic level (NSE),
Traffic Generators,
Residential Type,
This will yield groups or “Winner” products to re-stock