What is text mining?
Text analytics is an automated process to analyze a piece of writing and extract useful information from it. It is often carried out with the help of a software designed to go through lengthy texts and gather insights that may be useful for marketing, branding, and other research purposes.
Many companies use text analytics to analyze articles, tweets, social media posts, reviews, comments, and other types of writing, to find meaning and gather intelligence with the help of algorithms and machine learning tools.
In this article, we’ll dive into the world of text analytics, we’ll explore its different applications, and we’ll learn about the different steps of this innovative process.
The difference between text analytics, text mining, and NPL
If you are vaguely familiar with the concept of text analytics, then it’s likely you’ve also heard the terms “text mining” and “natural language processing.” Are these the same, or do they refer to different tools and processes?
Text mining is very often used as a synonym of text analytics, so these two terms mostly refer to the same concept. However, text mining is a broader term that refers to the act of gathering useful, high-quality information from a text.
Instead, text analytics is the more specific computational process of analyzing a text to extract such information. Text analytics softwares use linguistic, statistical, and machine learning techniques to structure the content of a text and analyze it.
In order to perform such analysis, programs use natural language processing (commonly abbreviated as NLP), which is the field of computer science, information engineering, and artificial intelligence used to process and analyze the data contained in a piece of writing. NLP is used to understand the meaning behind a text. It usually tries to answer the following questions about a piece of writing: Who’s talking? What are they saying? What are they referring to?
In general, text analytics softwares apply both text mining and natural language processing.
Functions and applications of text analytics
So what is text analytics actually used for?
Text analytics is used in many different fields, from science to academia, from security to biomedicine. In particular, it’s used for business and marketing applications to perform research on markets and customers, make business decisions, and predict future behaviors.
Here are some examples of useful applications:
Voice of customer and customer experience
Some companies gather thousands of words in customer feedback every single week. Just think of Yelp reviews, customer surveys, Facebook comments, and other forms of feedback. If they don’t have the automated tools required to process and analyze the feedback to extract information that can be used to improve the product, customer service, or the branding strategy, what’s the point of even getting the feedback in the first place? Text analytics can help you transform those texts into precious insights.
Social media monitoring
Anyone who works in communications, marketing, or branding knows that social media (i.e. Facebook, Twitter, Reddit, etc.) can be the greatest resource to discover customers’ opinions and feelings about businesses, products, and services. Text analytics is an indispensable tool to analyze whatever people write and share online that mentions your brand or your competitors’ brands. As always, one main rule applies: there is no point of having all this informational available, if you don’t know how to use it. It’s important to note that the writing style on social media can be quite specific and different from other types of writing; sentences can be very short, words can be abbreviated, and emojis often substitute words to express feelings and opinions. A good NPL software needs to be able to understand all of that.
Ever heard of sentiment analysis? If not, check out Revuze’s detailed blog post about the topic.
Sentiment analysis is the automated process to analyze a text and interpret the sentiments behind it. Through machine learning, algorithms can classify statements as positive, negative, and neutral. This process is also known as “opinion mining.” How is text analytics important for sentiment analysis? Text analytics can be applied to perform sentiment analysis in order to gain consumer insights and learn more about both customers and competitors.
How does text analytics work?
Now that we’ve learned the definition of the terms and we’ve explored some of the most useful applications, let’s dive a little deeper and try to understand the process behind text analytics.
There are several steps required to transition from the raw text to the insights we’re looking for. In this article, we’ll go through nine computational phases that help dissect the piece of writing we’re analyzing.
Step 1: Text retrieval. First of all, we need to identify and retrieve the text we’re looking to analyze. While this can be a piece of writing that is already available to us, we might have to use a scraping tool or an API to retrieve it, especially if we’re trying to perform social listening.
Step 2: Language identification. This is another important preparatory phase Text analysis changes greatly depending on what language the text we’re analyzing is written in. Therefore, it’s important to identify the language—is it English, Korean, Spanish, or what? If our database or text contains multiple languages it can be an issue, so let’s be sure before we start the process.
Step 3: Tokenization. Our software needs to be able to categorize the different components of each sentence, such as words, numbers, punctuations, and so on.
Step 4: Breaking text into sentences. Now that we have “tokenized” our text, we can divide it into sentences; while periods usually indicate the end of a sentence, they can have other meanings, too, so watch out for that, as the software needs to be able to catch that.
Step 5: Tagging. Next in line, is the tagging of each part of speech (commonly referred to as “PoS”). A tagging tool goes through every word and categorizes it as a part of speech (such as verb, adverb, proper noun, common noun, adjective, and so on).
Step 6: Chunking. We’ve already divided the text into sentences (see Step 4). A chucking tool splits each sentence into subperiods, by matching words that go together, such as: nouns with their corresponding adjectives (example: “this delicious cake”), phrasal verbs (example: “we started implementing”), prepositional phrases (example: “under these conditions”) and so on.
Step 7: Syntactic parsing. Sentences with similar words can have completely different meanings or nuances depending on the way the words are placed and structured. This step is fundamental in text analytics, as we cannot afford to misinterpret the deeper meaning of a sentence if we want to gather truthful insights. A parser is able to determine, for example, the subject, the action, and the object in a sentence; for example, in the sentence “The company filed a lawsuit,” it should recognize that “the company” is the subject, “filed” is the verb, and “a lawsuit” is the object.
Step 8: Chaining. In a longer text, it’s likely that certain elements are not repeated over and over, but they’re implied. For example, in a text about a chef of a restaurant, it’s very likely the writer will start referring to the chef as “she” rather than continue repeating her name. This phenomenon is also known as “noun phrases.” A chaining tool is able to understand whatever is “implied” by taking into account the entire text when analyzing a specific sentence.
Step 9: Disambiguation. Some words may refer to different things depending on the context. For instance, what does “Ford” refer to? The car company, the actor Harris Ford, the U.S. president? The context is likely helpful in this case.
Revuze is an AI-powered product experience analyst solution which delivers the most nuanced information about consumers and their needs.
Our combination of artificial intelligence, neural networks, and machine learning provides immediate, in-depth, ongoing feedback to companies about how their customers purchase, experience, and view their products and services. Revuze is now introducing its unique technology, available for Product Experience Management, and is currently used by Fortune 500 companies across multiple sectors, including CPG (consumer packaged goods), food services, leisure, and manufacturing.
Keep checking our blog, to stay up to date on this and other data analysis and marketing tools.