Python Bits and Pieces with Cyber Security: Customer Segmentation with RFM Using K-Means Clustering

Abstract:

In the dynamic landscape of business, understanding customer behavior is crucial for effective marketing strategies. Customer segmentation offers a methodical approach to categorize customers based on their interactions with a business. In this white paper, we explore the integration of Recency, Frequency, Monetary (RFM) analysis with K-Means clustering to achieve a comprehensive and insightful customer segmentation.

1. Introduction:

Customer segmentation is the process of dividing a customer base into groups that share similar characteristics. RFM analysis is a widely used method for customer segmentation that leverages three key metrics:
Recency (R): How recently a customer made a purchase.
Frequency (F): How often a customer makes a purchase.
Monetary (M): How much money a customer spends.

K-Means clustering, a popular machine learning algorithm, complements RFM analysis by automating the grouping of customers based on these metrics.

2. RFM Analysis:

2.1 Recency:

Recency measures the time since a customer's last purchase. It provides insights into customer engagement and helps identify the most recent and active customers.

2.2 Frequency:

Frequency represents how often a customer makes a purchase. Customers with high frequency are more likely to be loyal.

2.3 Monetary:

Monetary reflects the total amount of money a customer has spent. High monetary value indicates a valuable customer.

3. K-Means Clustering:

K-Means clustering is an unsupervised machine learning algorithm that groups data points into clusters based on their similarities. It minimizes the within-cluster sum of squares to form distinct clusters.

4. Integration of RFM and K-Means Clustering:

4.1 Data Preprocessing:

Before applying K-Means clustering, RFM metrics are standardized to ensure equal importance. Standardization involves transforming the metrics into a common scale.

4.2 Applying K-Means Clustering:

K-Means clustering is applied to the standardized RFM metrics. The algorithm groups customers into clusters based on their recency, frequency, and monetary values.

4.3 Mapping Clusters to Segments:

Cluster labels are mapped to human-readable segments. This step adds interpretability to the results and facilitates targeted marketing.

5. Visualizing Segments:

A scatter plot is a powerful tool to visualize the segments created by RFM and K-Means clustering. It provides a clear representation of customer groups based on recency and frequency.

6. Benefits of Customer Segmentation with RFM and K-Means Clustering:

Personalized Marketing: Tailor marketing strategies for each segment to maximize effectiveness.
Customer Retention: Identify and address issues with high-value customers to enhance retention.
Resource Optimization: Allocate resources efficiently by focusing on segments with the highest potential.

7. Conclusion:

Integrating RFM analysis with K-Means clustering yields a robust approach to customer segmentation. This combined methodology empowers businesses to understand customer behavior, target specific groups effectively, and optimize marketing efforts. The insights gained enable businesses to build stronger customer relationships, drive loyalty, and ultimately improve the bottom line.

Here is a sample Python program for this article:

from PyQt6.QtWidgets import QApplication, QMainWindow, QWidget, QVBoxLayout, QTableWidget, QTableWidgetItem
from PyQt6.QtCore import Qt
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
import matplotlib.pyplot as plt
import pandas as pd
import sys
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
df1 =[]
class PlotCanvas(FigureCanvas):
    def __init__(self, parent=None, width=8, height=2.1, dpi=90):
        self.figure, self.ax = plt.subplots(figsize=(width, height), dpi=dpi)
        self.figure.suptitle('Customer Segmentation with RFM', fontsize=12, color='black')        
        FigureCanvas.__init__(self, self.figure)        
        self.setParent(parent)        
        FigureCanvas.updateGeometry(self)        
        self.plot()

    def plot(self):
        # Create a sample dataset
        global df1
        np.random.seed(42)
        data = {'CustomerID': range(1, 101),
                'Recency': np.random.randint(1, 365, 100),
                'Frequency': np.random.randint(1, 11, 100),
                'Monetary': np.random.randint(100, 1000, 100)}
        df1 = pd.DataFrame(data)
        # Standardize the RFM values
        rfm_cols = ['Recency', 'Frequency', 'Monetary']
        scaler = StandardScaler()
        df1[rfm_cols] = scaler.fit_transform(df1[rfm_cols])

        # Applying K-Means clustering with 10 segments
        kmeans = KMeans(n_clusters=10, random_state=42, n_init=10)
        df1['Segment'] = kmeans.fit_predict(df1[rfm_cols])

        # Mapping segments to human-readable names
        segment_names = {
            0: 'Hibernating',
            1: 'At Risk',
            2: "Cant Lose Them",
            3: 'About to Sleep',
            4: 'Need Attention',
            5: 'Loyal Customers',
            6: 'Promising',
            7: 'Potential Loyalists',
            8: 'New Customers',
            9: 'Champions'
        }

        df1['Segment'] = df1['Segment'].map(segment_names)

        # Visualize the segments
        for segment, color in zip(segment_names.values(), [
            '#1f77b4', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896',
            '#c5b0d5', '#c49c94', '#e377c2', '#f7b6d2', '#7f7f7f'
        ]):
            self.ax.scatter(df1[df1['Segment'] == segment]['Recency'],
                            df1[df1['Segment'] == segment]['Frequency'],
                            color=color, label=segment, alpha=0.7, s=100)

        #self.ax.set_title('Customer Segmentation with RFM')
        self.ax.set_xlabel('Recency (Scaled)')
        self.ax.set_ylabel('Frequency (Scaled)')
        self.ax.legend()

        self.draw()

class Window(QWidget):
    def __init__(self):
        super(Window, self).__init__()
        self.initUI()

    def initUI(self):
        
        self.m8 = PlotCanvas(self, width=6.7, height=5.10)
        self.m8.move(264, 125)
        self.m8.show()                
        #layout = QVBoxLayout(self)
        #s#elf.m8 = PlotCanvas(self, width=8.2, height=5.10)
        #layout.addWidget(self.m8)

        # Create a table view
        self.table_widget = QTableWidget(self)
        self.table_widget.setColumnCount(2)
        self.table_widget.setHorizontalHeaderLabels(['Segment', 'Number of Customers'])
        self.populate_table()
        self.table_widget.setGeometry(930, 125, 230, 350)

    def populate_table(self):
        global df1
        # Count the number of customers in each segment
        segment_counts = df1['Segment'].value_counts()

        # Populate the table
        for segment, count in segment_counts.items():
            row_position = self.table_widget.rowCount()
            self.table_widget.insertRow(row_position)
            self.table_widget.setItem(row_position, 0, QTableWidgetItem(segment))
            self.table_widget.setItem(row_position, 1, QTableWidgetItem(str(count)))

if __name__ == '__main__':
    app = QApplication(sys.argv)
    window = QMainWindow()
    central_widget = Window()
    window.setCentralWidget(central_widget)
    window.setGeometry(100, 100, 1200, 800)
    window.show()
    sys.exit(app.exec())

Customer Segmentation with RFM Using K-Means Clustering

In the provided program, the weights on different segments are not explicitly specified in the code. The weights are implicitly determined by the K-Means clustering algorithm. K-Means is an unsupervised machine learning algorithm that partitions the data into k clusters based on similarity. In this case, the number of clusters (n_clusters) is set to 10.

The algorithm assigns each data point to one of the 10 clusters, and the centroids of these clusters are calculated to minimize the sum of squared distances between data points and their respective cluster centroids.

So, the weights on different segments are derived from the centroid locations in the feature space. Each segment (cluster) is represented by a centroid, and the position of this centroid determines the characteristics of the segment. The data points within a segment are closer to the centroid of that segment compared to centroids of other segments.

In the code, the colors and labels for each segment are defined in the loop that follows the K-Means clustering. The loop iterates over the segment names and their corresponding colors, and for each segment, it plots the data points with the specified color, label, and other visual properties.

In summary, the weights on different segments are not explicitly set in the code; they emerge from the clustering process based on the characteristics of the data and the K-Means algorithm's centroid calculations. Each segment represents a group of customers with similar Recency, Frequency, and Monetary values.

Python Bits and Pieces with Cyber Security

Friday, January 19, 2024

Customer Segmentation with RFM Using K-Means Clustering

No comments:

Post a Comment