Posted On: Jul 29, 2020
Amazon Elasticsearch Service now supports cosine similarity distance metric with k-Nearest Neighbor (k-NN) to power your similarity search engine. Cosine similarity is used to measure similarities between two vectors, irrespective of their sizes and is most commonly used in information retrieval, image recognition, text similarity, bioinformatics and recommendation systems.
We released k-NN similarity search feature in Amazon Elasticsearch Service that runs nearest neighbor search on billions of documents, represented by vectors, across thousands of dimensions. The initial release of k-NN used Euclidean distance to measure similarity between vectors. Cosine similarity measures the cosine of the angle between two vectors in the same direction where smaller cosine angle denotes higher similarity between the vectors. With cosine similarity, you can now measure the orientation between two vectors. For example, if you use bag-of-words to compare two documents that differ greatly in length yet the most frequent word in both is “pet”, which appears 300 times in the larger document and 75 times in the other, the Euclidean distance between these documents can be large due to different scales while the documents can be considered similar by cosine similarity due to the common orientation in their content. The results from k-NN search with cosine similarity can be further improved in precision, by leveraging Elasticsearch's post processing features like aggregations and filtering. With Elasticsearch's highly distributed architecture, you can implement an enterprise grade cosine similarity-based search engine with high recall and performance.
Cosine similarity search in k-NN is built using the lightweight and efficient Non-Metric Space Library (NMSLIB) and is powered by Open Distro for Elasticsearch, an Apache 2.0-licensed distribution of Elasticsearch. To learn more about Open Distro for Elasticsearch and its k-NN plugin, visit the project website.
Cosine similarity search is available on domains running Elasticsearch 7.7. To learn more, see the documentation.