Thursday, November 3, 2022

NLP based query of Airbnb properties with Asymmetric Semantic Search

Motivation – Why search Airbnb reviews?

Guest rating and reviews are the most valuable assets Airbnb has. For perspective guests, learning from experience shared by others makes the booking experience more predictable and dependable. However, often there are too many reviews to read one by one. More importantly, Airbnb currently only supports property search based on predefined “filters”, thus excluding rich information contained in reviews to be fully utilized in the search process.

Why enable search specifically for reviews?

  • Reviews written by other guests are more subjective and trustworthy
  • Reviews contain rich info, its free text form allows capturing of unlimited variety of experience, stories, and emotions
  • Untapped potential to expand the power of search, i.e., by matching undefined interest of guests with properties


How would the search feature work?

For example, a perspective guest may be searching for a place to stay in LA. In addition to applying map based selection and filtering with structured features currently supported, one could enter a search query describing desired property such as: “quiet and spacious place close to beach in a safe neighborhood”.


Using the query, the search feature will find relevant reviews among current candidate listings, score based on contextual similarity, and return the top ranked listings with closest matched reviews.

 

Asymmetric Semantic Search

To support query, we apply the concept of semantic search. Specifically, utilizing the state of the art NLP transformer models, we can embed reviews to be represented as a vector space. The subsequent search is simply to compare query embedding with review embedding and find the closest matches with a scoring function such as cosine similarity, thus finding reviews with a high semantic overlap with the query.

In choosing the transformer model to test with, it is important to recognize the type of query which is “asymmetric” in this case. Because users usually enter a short query which is to be matched with reviews which are often longer paragraphs, so we utilize MS MARCO models which are created based on user search queries using Bing search engine.

 

System Design

As illustrated, a simple prototype design consists of:

  • Selected Listings - user preselection (filter) of perspective listings
  • Candidate reviews - corresponding reviews for selected listing
  • Review embeddings – use transformer to generate sentence embedding for candidate reviews
  • User query – performs asymmetric semantic search against review embeddings
  • Result – select top K matches and present corresponding listing for user

 


Sample Result

Here are some of the top reviews found to semantically matching the sample query above. Since reviews don’t change, they are pre-transformed to vector form for the search. As a result, the query is fast even with tens and thousands of candidate reviews, and results are highly relevant. Clearly, this represents a powerful new search capability currently unavailable.



Summary

While Airbnb is used here as an example for illustration, the approach of combining unstructured feature such as NLP sematic search with structured feature search is generally applicable. Imagine a few scenarios:

  • Amazon – enabling users to search product via user reviews: “the best portable fishing pole others have taken on a flight to Alaska”
  • Netflix – voice enabled search based on user experience: "looking for a Halloween movie for a 3 year old that is scary but fun”

Prototype code can be found on GithubI am looking for opportunities to continue developing ideas and turning concept into solid business applications.