Unsupervised Modeling (2/2)
In this lesson, you’re expected to:
– understand K-Means Clustering
– learn about Hierarchical Clustering
K-Means is a hard partition algorithm, as it assigns a unique cluster value to each element in the feature space. However, there are other clustering algorithms where an observation can belong to multiple clusters. In K-means, the number of clusters is an input parameter. Thus, we need to know how many groups we expect to find in our data.
The algorithm will not detect the optimum number by itself. We may choose these parameters based on our business experience/knowledge and/or some statistical methods that we will discuss later.
Process of K-Means Clustering
The K-Means algorithm works as follows:
Step 1: Select the number of clusters we want to divide our data into. This means we need to determine K.
Step 2: Initialize the K cluster centers. A simple strategy could be to select them randomly.
Step 3: Assign a class to the N data samples. We do so by assigning them to the nearest cluster centroid.
Step 4: Re-estimate the K cluster centers. We compute the new cluster center as the center of all the observations assigned to that cluster.
Step 5: If none of the N observations changed membership in the last iteration, the algorithm ends. Otherwise, you need to go back to Step 3.
Advantages of K-Means Clustering
One of the advantages of this method is that we can not only classify each customer according to their segment, but also compute the distance from each customer to other segments.
Enlarged version: http://bit.ly/2oGv7l6
However, by performing a K-Means clustering, we are able to detect four groups and by extracting the patterns of each cluster, we can understand their behavior and make decisions.
We have reduced our problems and summarized the purchase behavior n into 4 differentiated patterns and assigned one of these patterns to each customer.
http://www.optimove.com/learning-center/customer-segmentation-via-cluster-analysis
How do I select the number of clusters (K)?
To answer this question, our previous business experience and our domain knowledge will be vital.
Why? An important reason is that after clustering the observations, we will have to make a business interpretation of them, and give them some meaning.
However, there are also techniques that help us find the statistically optimum value of K. The elbow technique is one of them.
What is The Elbow Technique?
By the definition of clustering, we want clusters to be as compact as possible. The notion of compactness can be measured by measuring the distance of the members of the cluster to its centroid. Obviously, with more clusters, the total distance of the observations to their centroid will always decrease, tending to 0 when k tends to N.
Why? If we have N clusters, we have the same number of clusters as observations. Hence, each cluster will contain a single observation and the centroids will be positioned on top of the observations. The elbow technique can help us to find the optimum value of K.
In this technique, we distinguish between two phases in the process of checking the compactness against the number of clusters.
Initially, the average distance of the observations to their centers will decrease dramatically. At some point, there will be a threshold from which it will slowly stabilize. The elbow technique consists of selecting the value where this transition occurs. This cannot always be easily found though.
This method seeks to build a hierarchy of clusters. It is based on a very simple principle: if we can measure the distance among observations, then by grouping the closer ones, we can obtain the clusters.
The idea is that we repeatedly find the two most similar clusters and merge them together. Thus, we not only end up with clusters but also with a hierarchy of clusters.\
To understand this, let’s look at an example.
Suppose we have 6 observations represented in a 2-D space. Let’s group them using hierarchical clustering. You can see the distances between them in the figure below.
The first step is to determine which elements to merge into a cluster. Usually, we want to take the two closest elements. In this case, we will merge the following elements:
– pair node0-node1
– pair node2-node3
So now we have the cluster [node0-node1] which we will label c1, cluster [node2-node3] which we will label c2, node4, and node5.
Now, we will merge the next closer elements. In this case, we will merge clusters c1 and c2, and label it as c12.
Then we will merge c12 and node4 and label it c12-4. Finally, we will merge node 5 and cluster c12-4, which we will call cT.
We can visualize the resulting hierarchical clustering in the form of a dendrogram (a tree diagram), as depicted below.
http://blog.biolab.si/2015/12/02/hierarchical-clustering-a-simple-explanation/