Enterprise Search Products

How Clustering Works

Tagging vs. Clustering WhitepaperVivísimo Velocity Clustering Engine Datasheet

What is Clustering?

Clustering is the automatic organization of search results into groups or clusters. It differs from other techniques (classification, taxonomy building, tagging, etc.) in that it requires no pre-processing or human intervention. Cluster labels are intelligently created from the words and phrases contained within the search results. Industry or site-specific knowledge can be added to improve the accuracy and usefulness of the cluster labels.

What to Look for in Cluster Labels

Clustering is the grouping of a collection of content—documents, images, etc., based on similarity. But in order to display clustered content that makes sense to users and add significant value over just seeing a ranked list of documents, the quality of the cluster labels must be very good. Here are examples of what to look for in cluster labels:

  1. Concise—does not take up too much screen space
  2. Understandable—look as human-like as possible
  3. Accurate—fairly reflect what is inside the cluster
  4. Distinctive—make clear at a glance how a given cluster is distinct from neighboring clusters

How the Velocity Clustering Engine Works

The Velocity Clustering Engine, when used to cluster search results, uses only the information returned in each result (title, abstract, metadata, etc.) An algorithm then puts the content together (clusters them) based on textual similarity. This raw similarity is augmented with heuristics that address each of the four points above. Since no pre-defined taxonomy or controlled vocabulary is used, every cluster description is taken from the actual search results within the cluster. The Velocity Clustering Engine will not force each document into only one single place in the cluster hierarchy. Documents can be about multiple themes, so content from documents are placed where they seem to fit.