With 90% of the world’s data created in the last 2 years the world data is growing at a scary pace…there are so many ways now for consumers to share data and information that organizations everywhere need to analyze and deal with textual data. Obvious examples are customer service (returns, complaints), QA (failures, missing parts, packaging), product (popular features, negative reviews, competitive analysis) and market research (analyzing brands, products and sentiment).

With so much text to look into it just make sense to leverage technology to help you slice it into buckets and areas of interest. This is where Text Analytics and Natural Language Processing (NLP) come it.

 

So what are Text Analytics or NLP?

Text analytics (Sometimes referred to as text data mining) is the process of deriving high quality information from text. This is typically achieved through finding patterns and trends by means such as statistical pattern learning. Text analytics usually involves structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and inserting into a database), deriving patterns within the structured data, and finally evaluation and interpretation to make meaningful observations. Text analytics typically doesn’t involve the semantics in the text and is more about text patterns discovery.

NLP is a component of text analytics that performs a special kind of linguistic analysis that helps a machine “read” text. NLP is about understanding Natural Language, as Natural language is what humans use for communication. The data could be speech or text and as such the main goal is to understand what is the semantic meaning of it.

NLP and text analytics are complimentary, where typically text-mining uses NLP, because it makes sense to mine the data when you understand the data semantically

 

How does NLP work?

First, the computer must understand what each word is. It tries to understand if it’s a noun or a verb, if it’s past or present tense, and so on. This is called Part-of-Speech tagging (POS).

NLP systems also have a vocabulary and a set of grammar rules coded into the system. Modern NLP algorithms use statistical machine learning to apply these rules to the natural language and determine the most likely meaning behind what was said.

The end goal is to have the computer understand the meaning of what was said/written. This is challenging as some words may have several meanings (polysemy) or different words having similar meanings (synonymy), but developers encode rules into their systems and train them to learn to apply the rules correctly.

 

So where is the problem?

The short answer is humans training NLP systems to “read” natural language. They put in a vocabulary and set of rules for the software to look for these words as a way to figure out meaning. The problem is that language is constantly evolving, and younger people create new ways of expressing yourself around a topic that didn’t exist before. How can you train a machine to look for something that doesn’t exist yet? Obviously once you realized that there is a new way to talk about a topic you now need to bring back the experts to train the system again to recognize the new keywords, which is time consuming and likely costly. At the end what it means is that you missed the bus…by the time you realize there is a new way to talk about something that is important to you likely the train had left the station and you missed the

meaning of this.

Lets pick and example. Lets say we’re a smartphone brand and want to analyze what consumers are saying about our latest phone’s battery life. We can try to scan online reviews and search for variations of the word “Battery”, but what happens if consumers are using phrases such as “doesn’t last long enough” or “phone died on me in the middle of the work day”?

 

What’s the right way to do things?

With Artificial Intelligence and Self Training algorithms you can skip the person-training-machine steps which limit the scope of the machine understanding and is also slow in terms of response time and skip directly to a machine-training-machine scenario, growing to unlimited scale and immediate response to any variation of a meaning.

 

Conclusion

Current NLP technologies rely on humans and thus are slow to setup, miss a lot of the meaning

in texts and are slow to adapt. In a world where 90% of the world’s data created in the last 2 years you can’t rely on humans or manual labor to figure things out.

The good news is that now there is enough data to make sure you can get answers to your questions, and all you need is just to analyze the data. Revuze is an innovative technology vendor that addresses just this with the first self-training, fast setup and low touch solution that typically delivers 5-8X the data coverage compared to anything else, and it does it without humans…

Leave a Comment