Large Language Models in the Enterprise: It’s Time to Find a Middle Ground - DATAVERSITY

Large Language Models in the Enterprise: It’s Time to Find a Middle Ground – DATAVERSITY

Source Node: 2738155

ChatGPT, the conversational chatbot released by OpenAI in November, garnered 100 million users in just two months, making it the fastest-growing consumer app in Internet history. But the technology that underpins ChatGPT is relevant and appealing to businesses as well. As you may already know, GPT stands for generative pre-trained transformer, which is the technology underlying the large language model (LLM) creation. Because large language models are trained on vast quantities of data, they can perform a variety of natural language processing (NLP) tasks.

The hype around large language models echoes early hype around artificial intelligence (AI) writ large, in that many people are talking about what’s possible with the technology, but fewer people are publicly discussing the nuts and bolts of putting it into practice, particularly in an enterprise context. A lot of research and practical attempts to make this technology work for enterprise is happening behind the scenes, and many of those who are working on it would agree that it turns out to be much harder than one might think given the extraordinary success and popularity of ChatGPT among regular (non-technical or not directly involved in AI or IT) people.

Two Schools of AI Thought

An important thing to understand about AI at large is that there are two broad schools of thought or approaches with regard to building and implementing AI systems.

On one side we have traditional AI, where researchers are trying to build something brick by brick, leveraging sophisticated rules-based algorithms, formal methods, logic, and reasoning. Those researchers are very rigorous in understanding and reproducing the underlying principles of how people think and process information. For example, they draw a clear line between semantics (the meaning) and syntax (the expression, surface form) of the language, and believe that purely probabilistic modeling of language does not represent the underlying semantics, so it can’t possibly result in truly “intelligent” solutions. A big problem with this approach is that it results in AI applications that are very complex, hard to maintain, and hard to scale, so over time research has shifted to the data-driven machine learning paradigm, where we let the model learn from the data rather than manually implementing rules.

On the other side, we have a deep learning community that took the AI field over by a storm. In essence, instead of building an intelligent system brick by brick from ground up, we are throwing a huge amount of data at it and asking it to learn from that data using the GPT method, but we don’t know exactly what they end up learning beyond the probabilities of words following one another and how well they “understand” the underlying concepts. Ultimately, we are trying to probe these models for their knowledge to understand them better and fine-tune them on more controlled datasets that shift their distributions towards the desired result. Because we don’t know and don’t understand exactly the depth of knowledge of these models and we don’t know how to control them or correct them reliably, it is hard to guarantee the quality of the results they produce, hence it’s hard to build reliable applications on top of those models. These models, indeed, are very good at imitating meaningful responses on a syntactic level but are quite a gamble on the semantic level. As much as we’d like to have an end-to-end solution where you train one model and everything just magically works, what we end up doing is a pretty complex engineering solution where we try to weave hand-crafted rules into machine learning-based applications, or combine LLMs with smaller more deterministic models that help mitigate the unbridled nature of LLMs. This involves a lot of human-in-the-loop processes where a human manually corrects the outputs, or selects the best response from a list of options that LLM has produced. 

For a long time, “end-to-end” was a line of research with little output, especially in the conversational AI field that I’ve been working in for more than 15 years. It was hard to evaluate the generative dialog models and see progress, so we resorted to more traditional building block methods, where each machine learning model is responsible for a very specific task and can do it reasonably well. With significant advances in the hardware required to train AI models and discovery of GPT technology, more people have been drawn away from building-block approach and towards the “end-to-end” school of thought and we are now seeing impressive and unprecedented progress on these “end-to-end” solutions, however, there is still a long way to go before we can get reliable results out of this technology per se. 

Finding a Middle Ground

While the end-to-end paradigm is appealing for many reasons, there are many cases in which enterprise-wide adoption is simply too fast. Because big models can be black boxes, the process of adjusting the model architecture can be extremely difficult. In order to get control of large language models, people are often forced to fall back on traditional methods, such as plugging in some lightweight rule-based algorithms. While the pendulum has swung from smaller models to one grand model, the most effective approach is probably somewhere in-between. 

This trend is evident with regard to generative AI, for instance. Sam Altman, the CEO of OpenAI, has said that next-generation models won’t be larger. Instead, they’re actually going to be smaller and more targeted. While large language models are best at generating natural or fluent text, anything factual is better off coming from different subsystems. Down the line, the responsibilities of those subsystems will likely be shifted back to the large language model. But in the meantime, we’re seeing a slight reversion to more traditional methods. 

The Future of Large Language Models in the Enterprise

Before jumping right to an end-to-end paradigm, it’s recommended for businesses to assess their own readiness for using this technology, as any new application comes with a learning curve and unforeseen issues. While ChatGPT is considered the pinnacle of this technology, there is still a lot of work to be done to be effective in an enterprise context. 

As enterprises look to implement LLMs, many questions remain. The majority of enterprises are still at the stage of simply figuring out what they want from it. Common questions include:

  • How can I leverage LLMs?
  • Do I need to hire new people?
  • Do I need to work with a third-party vendor? 
  • What can LLMs actually do?

These questions should be considered carefully before you dive in. The way things currently stand, large language models cannot solve all the problems people expected them to immediately. But, they will likely be able to do so within the next five or so years. In the meantime, deploying production-ready applications requires finding a middle ground between the traditional building-block approach and the end-to-end approach. 

Time Stamp:

More from DATAVERSITY