The art of reveal natural groups within the data!

In this post I will talk about Clustering Analysis inside Quark Analytics Portal.

"Cluster analysis is the practice of gathering up a bunch of objects and separating them into groups of similar objects." (Foreman, 2013)

This is the process of grouping our observations into similar groups within a larger population and it has an extensive application inside analytics. From biological to behavioral sciences, medical research and marketing.



When cluster analysis is used?

Questions like: All your customers are the same? Or they split into smaller groups with similar interests or buying behavior patterns? 

The answer for these questions can help you leverage your data and that type of insights can, for sure, will help you make better decisions.  This is an exploratory data analytics technique and there are two common approaches hierarchical agglomerative clustering and partitioning clustering. 

In this first release we have the partitioning approach, where you specify a number of K (clusters) and the observations are then randomly divided into K groups and reshuffled to form cohesive clusters.


Which are the main steps in Cluster Analysis:

  • Select the relevant attributes for your analysis, the first and maybe the most important step is to identify the fields that you find to be important to identify and understand the groups of observations

iris flower

  • Standardize the variables to enter in the solution.  Because the variables vary in range, they need to be standardized prior to clustering. Some clusters algorithms such as K-means use Euclidean distance and different ranges can cause some problems and avoid having a variable that dominates the overall solution due to the scale / magnitude.

elbow curve

  • Select the ideal number of clusters.  Once the value of K is known, the clustering can be performed.  Inside QAP it’s provided a easy recipe where you can try to choose the ideal number of “K” known as the elbow method.  This generates a “scree plot” that helps you to identify the place where the gain in performance mellow out.

pie chart

  • Once clusters are identified, the description of the clusters in terms of the variables used for clustering - and using additional data such as demographics (for example: gender, age and friends)— helps in customizing marketing strategy for each segment. This process of describing the clusters is termed “profiling.


Remember that the cluster analysis can be a valuable tool but requires a delicate balance and the over-interpretation of the solutions obtained should be avoided. 

Remember to check the video to see the details about how to do it inside our portal! ;) 



Hope you’re enjoying your’re ride thought analytics!

Ricardo L.