Remember Natural language processing? NLP appeared a few years ago, but it was only in 2018 that artificial intelligence researchers proved that it is possible to train a neural network once on a huge amount of data and exploit it repeatedly for various tasks. In 2019, GPT-2 from Open AI and T5 from Google arrived, showing that they are surprisingly good (they have now been incorporated into Google Duplex, pictured). There have even been concerns about their possible misuse.
But since then, things have progressed, well, quite exponentially.
Last year saw a veritable “Cambrian explosion” of NLP startups and large-language models.
This year, Google released LambDa, a huge language model for chatbot applications. Deepmind then released Alpha Code and later Flamingo, a language model for visual understanding. Only in July this year Big Science Project released Bloom, a massive open-source language model, and Meta announced that it had trained a single-language model that can translate into 200 languages.
We are now reaching a tipping point where many more commercial applications of NLP will come to market – some using some of these publicly available open source platforms. You could almost say there’s a gold rush among startups trying to leverage this technology, and an arms race is developing between the huge language model providers.
One such startup is Human loop, an AI spin-out from University College that claims to be making it “significantly” easier for companies to adopt this up-to-date wave of NLP technology through a set of tools that facilitate people “train” AI algorithms. This means that a lawyer, doctor or banker can put a piece of knowledge on the platform, which the software then uses at scale across a huge data set, enabling wider exploit of artificial intelligence across industries.
It has now launched a $2.6 million seed funding round led by Index Ventures, with participation from Y Combinator, Local Globe and Albion.
Founded in 2020 by a team of distinguished computer scientists from UCL and Cambridge and graduates from Google and Amazon, Humanloop says Humanloop’s applications could include building a picture of the national real estate market from unstructured data on the Internet; reviewing electronic health records to identify people who might be candidates to try up-to-date therapies; and even moderating comments on Facebook groups.
“People would be shocked to know what language-based AI is capable of today,” CEO Raza Habib says in a statement. “The biggest challenge, however, is to give the data a form that the algorithm can exploit. With Humanloop, we want to democratize access to artificial intelligence and enable the next generation of bright, self-service applications, enabling any company to leverage domain expertise and leverage it effectively in a machine learning model.
Humanloop claims its success is the development of “probabilistic deep learning,” in which algorithms can discover what they don’t know by tuning out the noise in datasets, finding the good stuff, and asking humans for facilitate on the parts they don’t know. I don’t understand.
Other startups building their own huge language models and putting them behind APIs include: Cohere AI ($164.9M in funding) and Open AI GPT-3. Snorkel AI ($135.3 million in funding) is also a up-to-date startup in this arena.
Humanloop, however, says its focus is less on developing models and more on the tools needed to adapt them to specific exploit cases.
“What many people don’t know is that it’s not the lack of appropriate algorithms that is keeping AI from becoming ubiquitous in every workplace – it’s the lack of properly labeled data,” adds Erin Price-Wright, partner at Index Ventures, which led the investment. “In fact, machine learning itself is becoming more and more commoditized and out-of-the-box, but it is still very difficult for non-technical people to transfer their knowledge to the machine and help the algorithm refine its model.” That’s why Humanloop allows people to modify data.
If an NLP gold rush is indeed upon us, expect a slew of other startups to follow soon.