Sentiment Analysis of Reviews – Data Cleanup Guide (2021)

data cleanup guide

IBM recently estimated that the cost of bad data to the US economy is $3.1 trillion!!!! Imagine that…when thinking about eCommerce ratings and reviews, it may sound simple to analyze them (what could possibly go wrong…) – but there’s actually a lot of moving parts behind the scenes so what you get in terms of insights and analytics is worthwhile.

Keep in mind that 2.4B internet users post reviews globally (Which is 52% of global internet users) – that’s a lot of reviews for you.

We’ll give you a short data cleanup guide so you can start right after you are done with this article.


A quick reminder on why ratings and reviews matter

eCommerce ratings and reviews are the richest sources of information about products, services, and the purchase process. Real buyers who bought a product/service provide feedback as to:

  • What they purchased
  • How easy it was to buy
  • Availability in stock
  • Customer service post-purchase

When analyzing topics and sentiment per the above, one can gain an infinite array of feedback towards areas such as:

  • Product/service optimization
  • Marketing
  • eCommerce
  • Customer service
  • Competition

As reviews are publicly available on brand websites as well as marketplaces, analyzing a wide range of these sources can provide an industry-wide perspective of brands (new and existing), products, and consumer wants and trends. This is typically done in the context of one industry – for example, jeans or shampoos.

The promise of an industry-wide catalog

When collecting reviews from multiple eCommerce sources there’s the opportunity to build an industry-wide perspective. The key though is in the organization. To make an effective analysis you need to associate each review with a product, associate the product with the brand, and this way you gain an industry catalog with

  • All brands (industry level summary of topics and sentiments)
  • All product per brand (Brand level summary of topics and sentiment
  • Product level analysis (What customers like or dislike about a product, the purchase experience, or the customer service)

Being able to view an industry this way is very different in terms of insights vs just summarizing the reviews per industry (no brands or no products involved). Imagine the difference between knowing 87% of consumers love Brand X Feature Y vs knowing that out of the 20 different products that Brand X makes product number 18 has only 50% sentiment towards Feature Y.

What you need to know about cleaning up reviews

Because the ideal way to go is to organize an industry level catalog, cleaning up the data needs to take place on several levels:

  • Review level
  • Product level
  • Brand level

Examples for review level cleanliness challenges:

  • A review may not include text, but only include star rating
  • Reviews may include grammar and spelling mistakes
  • Due to their concise form, each sentence may cover multiple topics (“Loved the product except for the design and price”)
  • Some marketplaces mix languages to increase reviews number per product (English + Spanish + Chinese)
  • Some marketplaces mix reviews across similar products (Same model jeans with different colors, shampoo with 500ml package with the same shampoo with 250ml package)
  • A review may be fake
  • A review may be part of a promotion (AKA incentivized review)
  • Reviews may be syndicated between websites, meaning the same review will be collected and analyzed multiple times from several websites

Examples for product level cleanliness challenges (Mostly related to consolidating reviews from multiple sources/websites):

  • The same product may be listed under a different name in different marketplaces, which means you will not be able to associate the right reviews with the right product across websites
  • Different websites may place the same product under different product hierarchy which again may impact the ability to associate the 2 products together
  • Some marketplaces promote related products (You’re looking to buy a laptop so here’s an offer for a carrying case) and when collecting reviews about one industry (laptops) you may collect reviews of “recommended products” in the process (carrying cases in our example)

Examples for brand level cleanliness challenges (Similar to product level challenges):

  • The same brand may be spelled differently in different marketplaces or even under different online stores within a marketplace, which means you will not be able to associate the right reviews with the right brand


Ratings and reviews offer a huge untapped potential to directly connect with the top of mind of customers in your industry, without them knowing you are even taking a look…it’s basically like listening to the world largest panel, covering you and your competitors on any number of business topics – product, purchase, and service.

To maximize your value from this you need to organize these opinions in a way that reflects product and brand relationship and clean it up so insights are reliable. Once you do you will be able to benefit from an almost unlimited range of insights for a flat fee –

  • Positioning
  • New product launch
  • Trends and wish lists/innovation
  • CSAT
  • Competitive intelligence
  • Brand analysis
  • etc…

This is a worthy exercise to have, and the ROI is enormous. Just make sure to stay the course, not to save time or effort on data clean up and data organization in products/brands, otherwise what you get is just a list of topics and sentiments, without the eCommerce roots of product-specific feedback

Want to learn more about how Revuze works ?

Click Here

Find Out How Revuze Can Help You Grow

About Revuze

Revuze is the first company to provide on-demand access to 1 billion consumer insights into over 300,000 products. Since 2013, Revuze has been servicing some of the biggest brands in the world with access to valuable consumer insights in a matter of hours, instead of months.

Revuze’s AI-powered solution helps product companies in any industry — from electronics to personal care, from home appliances to consumer-packaged goods — monitor the ecommerce market, identify emerging trends, and assess products’ strengths and weaknesses. Backed by Nielsen and SAP, Revuze is headquartered in Netanya, Israel, and has offices in New York, NY, and Montréal, Canada.

Leave a Comment