Abstract:
In the dynamic landscape of business, understanding customer behavior is crucial for effective marketing strategies. Customer segmentation offers a methodical approach to categorize customers based on their interactions with a business. In this white paper, we explore the integration of Recency, Frequency, Monetary (RFM) analysis with K-Means clustering to achieve a comprehensive and insightful customer segmentation.
1. Introduction:
Customer segmentation is the process of dividing a customer base into groups that share similar characteristics. RFM analysis is a widely used method for customer segmentation that leverages three key metrics:
Recency (R): How recently a customer made a purchase.
Frequency (F): How often a customer makes a purchase.
Monetary (M): How much money a customer spends.
K-Means clustering, a popular machine learning algorithm, complements RFM analysis by automating the grouping of customers based on these metrics.
2. RFM Analysis:
2.1 Recency:
Recency measures the time since a customer's last purchase. It provides insights into customer engagement and helps identify the most recent and active customers.
2.2 Frequency:
Frequency represents how often a customer makes a purchase. Customers with high frequency are more likely to be loyal.
2.3 Monetary:
Monetary reflects the total amount of money a customer has spent. High monetary value indicates a valuable customer.
3. K-Means Clustering:
K-Means clustering is an unsupervised machine learning algorithm that groups data points into clusters based on their similarities. It minimizes the within-cluster sum of squares to form distinct clusters.
4. Integration of RFM and K-Means Clustering:
4.1 Data Preprocessing:
Before applying K-Means clustering, RFM metrics are standardized to ensure equal importance. Standardization involves transforming the metrics into a common scale.
4.2 Applying K-Means Clustering:
K-Means clustering is applied to the standardized RFM metrics. The algorithm groups customers into clusters based on their recency, frequency, and monetary values.
4.3 Mapping Clusters to Segments:
Cluster labels are mapped to human-readable segments. This step adds interpretability to the results and facilitates targeted marketing.
5. Visualizing Segments:
A scatter plot is a powerful tool to visualize the segments created by RFM and K-Means clustering. It provides a clear representation of customer groups based on recency and frequency.
6. Benefits of Customer Segmentation with RFM and K-Means Clustering:
- Personalized Marketing: Tailor marketing strategies for each segment to maximize effectiveness.
- Customer Retention: Identify and address issues with high-value customers to enhance retention.
- Resource Optimization: Allocate resources efficiently by focusing on segments with the highest potential.
7. Conclusion:
Integrating RFM analysis with K-Means clustering yields a robust approach to customer segmentation. This combined methodology empowers businesses to understand customer behavior, target specific groups effectively, and optimize marketing efforts. The insights gained enable businesses to build stronger customer relationships, drive loyalty, and ultimately improve the bottom line.
Here is a sample Python program for this article:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 | from PyQt6.QtWidgets import QApplication, QMainWindow, QWidget, QVBoxLayout, QTableWidget, QTableWidgetItem from PyQt6.QtCore import Qt from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas import matplotlib.pyplot as plt import pandas as pd import sys import numpy as np from sklearn.cluster import KMeans from sklearn.preprocessing import StandardScaler df1 =[] class PlotCanvas(FigureCanvas): def __init__(self, parent=None, width=8, height=2.1, dpi=90): self.figure, self.ax = plt.subplots(figsize=(width, height), dpi=dpi) self.figure.suptitle('Customer Segmentation with RFM', fontsize=12, color='black') FigureCanvas.__init__(self, self.figure) self.setParent(parent) FigureCanvas.updateGeometry(self) self.plot() def plot(self): # Create a sample dataset global df1 np.random.seed(42) data = {'CustomerID': range(1, 101), 'Recency': np.random.randint(1, 365, 100), 'Frequency': np.random.randint(1, 11, 100), 'Monetary': np.random.randint(100, 1000, 100)} df1 = pd.DataFrame(data) # Standardize the RFM values rfm_cols = ['Recency', 'Frequency', 'Monetary'] scaler = StandardScaler() df1[rfm_cols] = scaler.fit_transform(df1[rfm_cols]) # Applying K-Means clustering with 10 segments kmeans = KMeans(n_clusters=10, random_state=42, n_init=10) df1['Segment'] = kmeans.fit_predict(df1[rfm_cols]) # Mapping segments to human-readable names segment_names = { 0: 'Hibernating', 1: 'At Risk', 2: "Cant Lose Them", 3: 'About to Sleep', 4: 'Need Attention', 5: 'Loyal Customers', 6: 'Promising', 7: 'Potential Loyalists', 8: 'New Customers', 9: 'Champions' } df1['Segment'] = df1['Segment'].map(segment_names) # Visualize the segments for segment, color in zip(segment_names.values(), [ '#1f77b4', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896', '#c5b0d5', '#c49c94', '#e377c2', '#f7b6d2', '#7f7f7f' ]): self.ax.scatter(df1[df1['Segment'] == segment]['Recency'], df1[df1['Segment'] == segment]['Frequency'], color=color, label=segment, alpha=0.7, s=100) #self.ax.set_title('Customer Segmentation with RFM') self.ax.set_xlabel('Recency (Scaled)') self.ax.set_ylabel('Frequency (Scaled)') self.ax.legend() self.draw() class Window(QWidget): def __init__(self): super(Window, self).__init__() self.initUI() def initUI(self): self.m8 = PlotCanvas(self, width=6.7, height=5.10) self.m8.move(264, 125) self.m8.show() #layout = QVBoxLayout(self) #s#elf.m8 = PlotCanvas(self, width=8.2, height=5.10) #layout.addWidget(self.m8) # Create a table view self.table_widget = QTableWidget(self) self.table_widget.setColumnCount(2) self.table_widget.setHorizontalHeaderLabels(['Segment', 'Number of Customers']) self.populate_table() self.table_widget.setGeometry(930, 125, 230, 350) def populate_table(self): global df1 # Count the number of customers in each segment segment_counts = df1['Segment'].value_counts() # Populate the table for segment, count in segment_counts.items(): row_position = self.table_widget.rowCount() self.table_widget.insertRow(row_position) self.table_widget.setItem(row_position, 0, QTableWidgetItem(segment)) self.table_widget.setItem(row_position, 1, QTableWidgetItem(str(count))) if __name__ == '__main__': app = QApplication(sys.argv) window = QMainWindow() central_widget = Window() window.setCentralWidget(central_widget) window.setGeometry(100, 100, 1200, 800) window.show() sys.exit(app.exec()) |
Customer Segmentation with RFM Using K-Means Clustering
In the provided program, the weights on different segments are not explicitly specified in the code. The weights are implicitly determined by the K-Means clustering algorithm. K-Means is an unsupervised machine learning algorithm that partitions the data into k clusters based on similarity. In this case, the number of clusters (n_clusters) is set to 10.
The algorithm assigns each data point to one of the 10 clusters, and the centroids of these clusters are calculated to minimize the sum of squared distances between data points and their respective cluster centroids.
So, the weights on different segments are derived from the centroid locations in the feature space. Each segment (cluster) is represented by a centroid, and the position of this centroid determines the characteristics of the segment. The data points within a segment are closer to the centroid of that segment compared to centroids of other segments.
In the code, the colors and labels for each segment are defined in the loop that follows the K-Means clustering. The loop iterates over the segment names and their corresponding colors, and for each segment, it plots the data points with the specified color, label, and other visual properties.
In summary, the weights on different segments are not explicitly set in the code; they emerge from the clustering process based on the characteristics of the data and the K-Means algorithm's centroid calculations. Each segment represents a group of customers with similar Recency, Frequency, and Monetary values.