Friday, January 19, 2024

Customer Segmentation with RFM Using K-Means Clustering

 Abstract:

In the dynamic landscape of business, understanding customer behavior is crucial for effective marketing strategies. Customer segmentation offers a methodical approach to categorize customers based on their interactions with a business. In this white paper, we explore the integration of Recency, Frequency, Monetary (RFM) analysis with K-Means clustering to achieve a comprehensive and insightful customer segmentation.

 


1. Introduction:

Customer segmentation is the process of dividing a customer base into groups that share similar characteristics. RFM analysis is a widely used method for customer segmentation that leverages three key metrics:

Recency (R): How recently a customer made a purchase.

Frequency (F): How often a customer makes a purchase.

Monetary (M): How much money a customer spends.

K-Means clustering, a popular machine learning algorithm, complements RFM analysis by automating the grouping of customers based on these metrics.

2. RFM Analysis:

2.1 Recency:

Recency measures the time since a customer's last purchase. It provides insights into customer engagement and helps identify the most recent and active customers.

2.2 Frequency:

Frequency represents how often a customer makes a purchase. Customers with high frequency are more likely to be loyal.

2.3 Monetary:

Monetary reflects the total amount of money a customer has spent. High monetary value indicates a valuable customer.

3. K-Means Clustering:

K-Means clustering is an unsupervised machine learning algorithm that groups data points into clusters based on their similarities. It minimizes the within-cluster sum of squares to form distinct clusters.

4. Integration of RFM and K-Means Clustering:

4.1 Data Preprocessing:

Before applying K-Means clustering, RFM metrics are standardized to ensure equal importance. Standardization involves transforming the metrics into a common scale.

4.2 Applying K-Means Clustering:

K-Means clustering is applied to the standardized RFM metrics. The algorithm groups customers into clusters based on their recency, frequency, and monetary values.

4.3 Mapping Clusters to Segments:

Cluster labels are mapped to human-readable segments. This step adds interpretability to the results and facilitates targeted marketing.

5. Visualizing Segments:

A scatter plot is a powerful tool to visualize the segments created by RFM and K-Means clustering. It provides a clear representation of customer groups based on recency and frequency.

6. Benefits of Customer Segmentation with RFM and K-Means Clustering:

  • Personalized Marketing: Tailor marketing strategies for each segment to maximize effectiveness.
  • Customer Retention: Identify and address issues with high-value customers to enhance retention.
  • Resource Optimization: Allocate resources efficiently by focusing on segments with the highest potential.

7. Conclusion:

Integrating RFM analysis with K-Means clustering yields a robust approach to customer segmentation. This combined methodology empowers businesses to understand customer behavior, target specific groups effectively, and optimize marketing efforts. The insights gained enable businesses to build stronger customer relationships, drive loyalty, and ultimately improve the bottom line.


Here is a sample Python program for this article:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
from PyQt6.QtWidgets import QApplication, QMainWindow, QWidget, QVBoxLayout, QTableWidget, QTableWidgetItem
from PyQt6.QtCore import Qt
from matplotlib.backends.backend_qt5agg import FigureCanvasQTAgg as FigureCanvas
import matplotlib.pyplot as plt
import pandas as pd
import sys
import numpy as np
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
df1 =[]
class PlotCanvas(FigureCanvas):
    def __init__(self, parent=None, width=8, height=2.1, dpi=90):
        self.figure, self.ax = plt.subplots(figsize=(width, height), dpi=dpi)
        self.figure.suptitle('Customer Segmentation with RFM', fontsize=12, color='black')        
        FigureCanvas.__init__(self, self.figure)        
        self.setParent(parent)        
        FigureCanvas.updateGeometry(self)        
        self.plot()

    def plot(self):
        # Create a sample dataset
        global df1
        np.random.seed(42)
        data = {'CustomerID': range(1, 101),
                'Recency': np.random.randint(1, 365, 100),
                'Frequency': np.random.randint(1, 11, 100),
                'Monetary': np.random.randint(100, 1000, 100)}
        df1 = pd.DataFrame(data)
        # Standardize the RFM values
        rfm_cols = ['Recency', 'Frequency', 'Monetary']
        scaler = StandardScaler()
        df1[rfm_cols] = scaler.fit_transform(df1[rfm_cols])

        # Applying K-Means clustering with 10 segments
        kmeans = KMeans(n_clusters=10, random_state=42, n_init=10)
        df1['Segment'] = kmeans.fit_predict(df1[rfm_cols])

        # Mapping segments to human-readable names
        segment_names = {
            0: 'Hibernating',
            1: 'At Risk',
            2: "Cant Lose Them",
            3: 'About to Sleep',
            4: 'Need Attention',
            5: 'Loyal Customers',
            6: 'Promising',
            7: 'Potential Loyalists',
            8: 'New Customers',
            9: 'Champions'
        }

        df1['Segment'] = df1['Segment'].map(segment_names)

        # Visualize the segments
        for segment, color in zip(segment_names.values(), [
            '#1f77b4', '#aec7e8', '#ffbb78', '#98df8a', '#ff9896',
            '#c5b0d5', '#c49c94', '#e377c2', '#f7b6d2', '#7f7f7f'
        ]):
            self.ax.scatter(df1[df1['Segment'] == segment]['Recency'],
                            df1[df1['Segment'] == segment]['Frequency'],
                            color=color, label=segment, alpha=0.7, s=100)

        #self.ax.set_title('Customer Segmentation with RFM')
        self.ax.set_xlabel('Recency (Scaled)')
        self.ax.set_ylabel('Frequency (Scaled)')
        self.ax.legend()

        self.draw()

class Window(QWidget):
    def __init__(self):
        super(Window, self).__init__()
        self.initUI()

    def initUI(self):
        
        self.m8 = PlotCanvas(self, width=6.7, height=5.10)
        self.m8.move(264, 125)
        self.m8.show()                
        #layout = QVBoxLayout(self)
        #s#elf.m8 = PlotCanvas(self, width=8.2, height=5.10)
        #layout.addWidget(self.m8)

        # Create a table view
        self.table_widget = QTableWidget(self)
        self.table_widget.setColumnCount(2)
        self.table_widget.setHorizontalHeaderLabels(['Segment', 'Number of Customers'])
        self.populate_table()
        self.table_widget.setGeometry(930, 125, 230, 350)

    def populate_table(self):
        global df1
        # Count the number of customers in each segment
        segment_counts = df1['Segment'].value_counts()

        # Populate the table
        for segment, count in segment_counts.items():
            row_position = self.table_widget.rowCount()
            self.table_widget.insertRow(row_position)
            self.table_widget.setItem(row_position, 0, QTableWidgetItem(segment))
            self.table_widget.setItem(row_position, 1, QTableWidgetItem(str(count)))

if __name__ == '__main__':
    app = QApplication(sys.argv)
    window = QMainWindow()
    central_widget = Window()
    window.setCentralWidget(central_widget)
    window.setGeometry(100, 100, 1200, 800)
    window.show()
    sys.exit(app.exec())

 Customer Segmentation with RFM Using K-Means Clustering

In the provided program, the weights on different segments are not explicitly specified in the code. The weights are implicitly determined by the K-Means clustering algorithm. K-Means is an unsupervised machine learning algorithm that partitions the data into k clusters based on similarity. In this case, the number of clusters (n_clusters) is set to 10.

The algorithm assigns each data point to one of the 10 clusters, and the centroids of these clusters are calculated to minimize the sum of squared distances between data points and their respective cluster centroids.

So, the weights on different segments are derived from the centroid locations in the feature space. Each segment (cluster) is represented by a centroid, and the position of this centroid determines the characteristics of the segment. The data points within a segment are closer to the centroid of that segment compared to centroids of other segments.

In the code, the colors and labels for each segment are defined in the loop that follows the K-Means clustering. The loop iterates over the segment names and their corresponding colors, and for each segment, it plots the data points with the specified color, label, and other visual properties.

In summary, the weights on different segments are not explicitly set in the code; they emerge from the clustering process based on the characteristics of the data and the K-Means algorithm's centroid calculations. Each segment represents a group of customers with similar Recency, Frequency, and Monetary values.

Sunday, January 7, 2024

Moving Image Regions with Python, Tkinter and OpenCV

 Overview

Imagine having a tool that allows you to interactively delete specific regions of an image with just a few clicks. This Python and Tkinter application empowers users to open an image, draw rectangles around unwanted regions, and seamlessly delete or relocate them. This technical documentation provides an insight into the functionality of each method, shedding light on how this program achieves its remarkable image manipulation capabilities.



Feature Highlights

Open Image: The application starts by prompting the user to open an image file. The selected image is loaded using OpenCV, ensuring compatibility with various image formats.


Draw Rectangles: Users can draw rectangles on the image canvas by clicking and dragging. These rectangles define the regions to be manipulated.


Move and Delete: The application introduces an innovative approach to both moving and deleting image regions. When the user holds the Shift key and clicks, the selected region becomes interactive, allowing effortless relocation. Additionally, the program enables the deletion of unwanted regions, providing a dynamic and user-friendly experience.


Method Breakdown

open_image

This method is responsible for opening a file dialog, allowing users to select an image file. Once an image is selected, it is loaded using OpenCV, and the display_image method is called to present it on the Tkinter canvas.


display_image

The display_image method converts the image from BGR to RGB format and creates a Tkinter PhotoImage object. This image is then displayed on the canvas, ensuring correct dimensions and preventing garbage collection.


select_rect

This method is triggered when the user holds the Shift key and clicks, activating the cursor for moving rectangles or images. It adds the 'selected' tag to the current canvas item.


activate_cursor

The activate_cursor method sets the cursor in motion, binding it to the movement of rectangles or images. When the user releases the mouse button, the deselect method is called.


deselect

The deselect method removes the 'selected' tag, deactivating the cursor. It rebinds the canvas to the initial event handlers, allowing the user to draw rectangles or select regions.


move_rect

The move_rect method handles the movement of rectangles or images. If a rectangle is selected, it is moved using the coords method. If an image is selected, the program calculates the displacement and moves the canvas item accordingly.


on_button_press, on_move_press, on_button_release

These methods respond to mouse events. on_button_press initializes the drawing of rectangles, on_move_press captures the drawn region, and on_button_release finalizes the process.


Conclusion

In essence, this Python and Tkinter application serves as an interactive image manipulation tool, allowing users to draw, move, and delete specific regions effortlessly. By leveraging OpenCV and Tkinter, it provides a robust platform for customizing and extending image processing projects. Whether you're a developer exploring image manipulation or an enthusiast seeking a powerful yet accessible tool, this program opens the door to a world of possibilities.

The source code can be downloaded at my patreon shop.