2020 was a great year for the development of natural language processing (NLP). We have seen many organizations, from petite ventures to massive enterprises, implement NLP in their business solutions and data pipelines. With such a high adoption rate comes another set of innovations and challenges.
As we close out 2020, there is a lot to look forward to in the NLP space in the coming year. Two key areas will stand out in terms of value, usage and ROI: natural language generation (NLG) and improved customization capabilities.
Natural language generation:
NLG (which we have explored before) will play an increasingly crucial role in discovering up-to-date insights. We see existing customer data pipelines starting to integrate summaries. An extractive summary will allow companies to extract the most crucial sentences from a document and construct a summary that allows the reader to quickly understand the overall idea. This could likely be used in review analysis and, for example, customer reviews, press releases or news articles. Watson Natural Language Understanding recently released an experimental summary extraction feature, documented here. Our up-to-date summary extraction technology leverages the power of IBM Research’s Project Debater.
While there will be obvious benefits to using abstract summaries in the near future, sometimes we are far from creating and implementing abstract summaries. Abstract summary derives a summary from the document. For example, if we had the following sentences:
We will certainly employ abstract summaries more initially, but ultimately customers will gravitate more towards abstract summaries as they become available, depending on the employ case.
Customization:
In the coming year, there will be an increasing focus on customizing experiences in the NLP space. Companies around the world are constantly incorporating NLP into their business pipeline. However, each company operates in a slightly different part of NLP, where the language and context in one industry means something completely different in another.
To ensure they are achieving the highest accuracy in their domain, companies will need to implement a customization layer for sentiment analysis and text classification. This requires training the machine learning model to boost accuracy.
However, both of these features require work on the client side to collect, tag, and transmit training data. Many NLP solutions today require a huge number of examples per label for the machine learning model. Additionally, the value of a custom model may not be fully recognized even after the training file has been collected and built. And if the training file wasn’t built correctly or there was an error in data labeling, the results will be less right.
That’s why it’s critical to provide customers with customization capabilities that allow them to focus more on improving their machine learning model, rather than spending countless hours collecting and labeling data. To facilitate you prioritize your time, Watson NLP learns more from less data. In 2020, IBM released a up-to-date, more right natural language understanding (NLU) model in IBM watsonx Assistant for intent classification, as well as up-to-date advances in NLP in IBM watsonx Assistant and Watson Discovery.
While the Covid-19 pandemic has set many businesses back, the NLP space remains a powerful backbone of AI. 2020 brought imaginative and inventive employ cases for NLP, including our collaboration with the Weather Channel, , US Open AND ESPN Fantasy Football. We expect similar, if not higher, growth in the coming year.