Unsupervised learning - Clustering

A note for clustering

Michael Luo · 1 minute read

Clustering

Clustering (cluser analysis) is a technique that allows us to find groups of similar objects, objects that are more related to each other than to objects in other groups. Exaples of business-oriented applications of clustering include the grouping of documents, music, and moviews by different topics, or finding customers that share similar interests based on common purchase behaviors as a basis for recommendation egines.

k-means

It belongs to the cateory of protoype-based clustering.

  • Drawbacks
    • One or more clusters can be empty
    • we have to specify the number of clusters, k, a priori

Hard VS Soft clustering

Hard clustering: a family of algorithms where each sample in a dataset is assigned to exactly on cluster. Soft clustering: sometimes called fuzzy clustering, it assign a sample to one or more clusters.

  • fuzzy C-mean (FCM, also called soft k-means or fuzzy k-means).

hierachical

  • agglomerative
    • single linkage
    • complete linkage
  • divisive clustering

density-based clustering

Density-based spatial Clustering of Applications with Noice (DBSCAN)

Advantages:it does not assume that the clusters have a spherical shape as in k-means.

Disadvantage: With an increasing number of features in our dataset -assuming a fixed number of training examples - the negative effect of curse of dimensionality increases.

machine-learning