English English Deutsch Deutsch

Machine Learning for Text

Imagine if a system could analyse every single piece of verbatim from surveys, reviews and complaints with a high degree of accuracy and automatically transform it, contextualise and classify it accurately, no matter how complex? Even better, what if when the system’s predictive performance dropped, it would ‘ask’ the user to clarify in a highly efficient and minimalistic way? Sounds like science fiction. The good news is that it is science fact as machine learning is now sophisticated enough to achieve this.

The secret behind the latest advancements is a new machine-learning technology known as optimised learning. Optimised learning is a speedy way to automate the classification of complex textual data that ‘asks’ users for specific input for where it needs to optimise performance. As a result, it can automatically classify VoC data to a high degree of accuracy with minimal user input. It works better when used with another innovative technology called automated information retrieval (AIR), which automatically generates and selects features from unstructured data which can be used to train machine-learning models.

Conventional machine learning (with or without text analytics) needs to ‘train’ itself on a dataset parsed and curated by a skilled human. The human will also need to generate the ‘features’ within the dataset for the predictive models to be built on. For example, these could be keywords, phrases or other more complex signals within the data. Indeed, the same data can generate several models — consider, for example, multiple ‘topics’ within a single piece of text or conversation.

Clearly, this becomes complex very quickly and requires skill, time and a degree of ‘art’ and experimentation (perhaps the data scientist is more data artist). There is a lot of guesswork, even with experience, and the more features and data that are explored for this training, the more likely the predictive model will perform well. However, the amount of cleansing, parsing and general tinkering is unknown and indeed if the dataset are random then it is a lot of hard work for zero result.

Optimised learning eliminates this huge amount of human intervention and increases accuracy as it quickly starts to ‘ask’ a user, who does not need to be a data scientist at all, for minimum input. With respect to ‘asking’, what the technology actually does is to point the human user to the records and topics that it calculates will reduce the overall uncertainty for the minimal input. By way of analogy, when a child learns, it learns to ask questions that underlie several facts and concepts rather than just clarifying each one. In this way, it can be said that the child learns to understand. (Note that this analogy is used to show the utility rather than anything more — the analogies between human and artificial intelligence are contentious enough and this paper is not intended to stoke the flames of that issue.) Now the predictive models are being generated, and improved in a highly efficient manner by a non-data scientist armed only with domain knowledge (e.g. it could be an agent in a call centre).

This amalgamation of automation and promoted human intervention is creating huge opportunities for organisations looking to gain more insight and utilise more of the data they accumulate. It is also easy to see how new use cases are possible that were not previously feasible due to the time lag and expense of resources. One such case is an early warning classification system that flags up topics and issues and quantifies as they arise.

Try PrediCX for 30 days free - fill in this form and we will set you up a free trial.

© 2017 Warwick Analytics. All rights reserved. Registered in England & Wales. Number 07724630. Registered address 35 Kingsland Road, London, E2 8AA. VAT 120435168.

Warwick Analytics