Corporates Increase NLP Budget by 10% to 30% in 2021

Since 2020, enterprises are increasing their investments in natural language processing (NLP), the subfield of linguistics, computer science, and AI concerned with how algorithms analyze large amounts of language data.

What's the survey result?

John Snow Labs, developer of the Spark NLP library, announced the results of the first Natural Language Processing (NLP) Industry Survey, exploring how companies use NLP technologies. The survey was conducted by Gradient Flow, an independent data science analysis and insights provider.

According to a new survey from John Snow Labs and Gradient Flow, 60% of tech leaders indicated that their NLP budgets grew by at least 10% compared to 2020, while a third — 33% — said that their spending climbed by more than 30%.

Other key findings include:
  • A third of all respondents stated they use Spark NLP, making it the most popular NLP library in this survey. This varied slightly within specific industries: Healthcare (Spark NLP), Technology (spaCy), Financial Services (nltk).
  • More than 40% of all respondents noted accuracy as the most important criteria they use to evaluate NLP libraries.
  • 77% of all survey respondents indicated that they use at least one of the four NLP cloud services listed (Google, AWS, Azure, IBM), with Google’s service garnering the most users.
  • Despite the popularity of cloud-based services, cost and accuracy were key challenges companies face when using them.
  • Data from files (e.g., pdf, txt, docx, etc.) and databases top the list of data sources used in NLP projects (61%).
  • The four most popular applications of NLP are Document Classification, Named Entity Recognition (NER), Sentiment Analysis, and Knowledge Graphs. Respondents from healthcare cited de-identification as another common NLP use case.

What is NLP?

For more details, please check out our earlier blog "What is NLP?"

Why NLP?

“Natural Language Processing is a critical part of enterprise AI systems, and understanding where businesses are on the adoption curve, their main use cases, and the challenges they face, are paramount in optimizing all that NLP has to offer,” said Dr. Ben Lorica, Survey Co-Author and External Program Chair, NLP Summit. “It’s especially encouraging to see that technical leaders understand the value of, and are increasing their investment in NLP technologies, especially when IT budgets are uncertain, due to the COVID-19 pandemic.”

What are the directions ahead? 

There are five takeaways that partially shape the direction of NLP in the near future. 

1. NLP Spending is Increasing, Despite Shrinking IT Budgets
2. Accuracy is a Key Criteria, but also a Key Challenge
3. NLP Cloud Services are Slow to Service Market Needs
4. Spark NLP and SpaCy Voted Most Popular
5. Classic Applications of NLP are Still the Most Used

What are the challenges ahead?

Among the tech leaders John Snow Labs and Gradient Flow surveyed, accuracy (40%) was the most important requirement when evaluating an NLP solution, followed by production readiness (24%) and scalability (16%). But the respondents cited costs, maintenance, and data sharing as outstanding challenges.

As the report’s authors point out, experienced users of NLP tools and libraries understand that they often need to tune and customize models for their specific domains and applications. “General-purpose models tend to be trained on open datasets like Wikipedia or news sources or datasets used for benchmarking specific NLP tasks. For example, an NER model trained on news and media sources is likely to perform poorly when used in specific areas of healthcare or financial services,” the report reads.

But this process can become expensive. In an Anadot survey, 77% of companies with more than $2 million in cloud costs — which include API-based AI services like NLP — said they were surprised by how much they spent. As corporate investments in AI grows to $97.9 billion in 2023, according to IDC, Gartner anticipates that spending on cloud services will increase 18% this year to a total of $304.9 billion.

Looking ahead, John Snow Labs and Gradient Flow expect growth in question-answering and natural language generation NLP workloads powered by large language models like OpenAI’s GPT-3 and AI21’s Jurassic-1. It’s already happening to some degree. OpenAI says that its API, through which developers can access GPT-3, is currently used in more than 300 apps by tens of thousands of developers and producing 4.5 billion words per day.

The full results of the survey are scheduled to be presented at the upcoming NLP Summit, sponsored by John Snow Labs. “As we move into the next phase of NLP growth, it’s encouraging to see investments and use cases expanding, with mature organizations leading the way,” Dr. Ben Lorica, survey coauthor and external program chair at the NLP summit, said in a statement. “Coming off of the political and pandemic-driven uncertainty of last year, it’s exciting to see such progress and potential in the field that is still very much in its infancy.”

Data source: