Few reviews lead to instable ratings

This page is part of a global project to create a better online reviews system. If you want to know more or give your feedback, write at [email protected] and we’ll grab a beer ;)

Many issues listed in the “Unclear Scale”, “Nuances”, and “Categorization & Subjectivity” sections mentioned at the reviewer level tend to be resolved at the aggregation level. The more reviews there are:

  • The more stable and accurate the average rating.
  • The more detailed the description of the experience.

This is the essence of the wisdom of the crowd.

But what about businesses with few reviews? Each new rating has a significant impact on the average:

Here’s an example: with 9 reviews and an average rating of 4.8, receiving a 1-star rating would drop the average to 4.42, putting the business below the threshold of consideration.

For the average score to be relevant, it needs to be stable. A brief calculation shows that Considering an average rating is stable when a new rating doesn’t change the average by more than 0.05, a brief calculation shows that this is generally achieved after at least 30 reviews (with a few hypotheses). The problem is that many businesses and products online have fewer than 30 reviews.

BrightLocal’s study of the average number of Google reviews by industry
BrightLocal’s study of the average number of Google reviews by industry

According to BrightLocal’s study 1^1, 39 reviews is the global average in their database of 93,845 local businesses. However, if we focus on small or mid-sized cities, the average is likely much lower, making volatility in these markets significantly higher.

Small businesses suffer the most from the review system, and readers can’t always trust the average rating.

💡
Exploration
  • Airbnb doesn’t disclose the average rating until the listing receives at least 3 reviews. While this is a good practice, it introduces another issue: no information can be perceived as bad information. Listings with 2 reviews and no average rating struggle to compete against listings with 20 reviews and a 4.8 rating.
  • Airbnb doesn’t show the average rating before reaching 3 reviews.
    Airbnb doesn’t show the average rating before reaching 3 reviews.
  • To combat this volatility effect, Trustpilot “automatically includes the value of 7 reviews worth 3.5 stars each in all TrustScore calculations” 2^2. This means companies with few reviews may have a lower TrustScore than they deserve. For small businesses that are unlikely to get many reviews, their average rating is sentenced to remain low, missing out on customer opportunities due to the threshold effect. This is, in my opinion, a very bad idea.
  • Remove the average rating in industries with fewer than a certain number of reviews. In a given city and for specific business segments, if the median number of ratings is too low (e.g., below 100), it’s fair to remove the average rating while allowing potential customers to read the reviews. Google could implement this and rearrange the map for every online search.
  • A system relying only on positive recommendations without an average rating could fix the issue. If a business has more positive recommendations, it would be more likely to be considered. Companies with fewer recommendations wouldn’t suffer from the threshold effect or the suggestiveness of the average rating in user consideration.
  • Another proposition of redesign for Airbnb, with the possibility for guests to mark their stay as exceptional.
    Another proposition of redesign for Airbnb, with the possibility for guests to mark their stay as exceptional.
  • Labels, presented as a potential solution for the threshold effect, could also help address volatility. If a business receives a label for what they excel at (because reviewers marked them as very good in a specific area), they would suffer less from dissatisfied customers in areas that are not their strengths and still attract potential customers with aligned criteria.
  • A design proposition for Airbnb. Categories are upvoted, and we get rid of the overall rating.
    A design proposition for Airbnb. Categories are upvoted, and we get rid of the overall rating.

1^1 “Local Consumer Review Survey 2024: Trends, Behaviors, and Platforms Explored”, BrightLocal, 2024.

2^2 TrustScore and star rating explained, Trustpilot.

➡️ Next up: All reviews don’t count the same