Exploring the Potential and Limitations of Large Language Models

As we navigate the ever-evolving landscape of artificial intelligence, it's exciting to see the strides in developing and applying Large Language Models (LLMs). These models could revolutionize how we interact with technology and redefine the software development process.

However, it's important also to acknowledge the boundaries that define their capabilities and engage in necessary discourse around their limitations. Exploring the potential and limitations of LLMs opens up a fascinating dialogue between the promise of innovation and the realism of technological constraints.

Introduction to Large Language Models (LLMs)

In a recent Visionary Voices of Tech webinar, "The Contest for AI Dominance: Decoding AI & The Race for Tomorrow's Tech", Lauri Järvilehto, Professor of Practice at Aalto University, discussed the topic of Large Language Models (LLMs) and Gen AI, with our Senior Product Lead, Peter Schneider.

LLMs are artificial intelligence models that can generate natural language text by using complex algorithms to learn how to configure parameters within their models. They are trained on massive amounts of data, mainly from the internet, and can generate coherent and meaningful sentences. The most well-known LLM is the Generative Pre-trained Transformer 4 (GPT-4), developed by OpenAI, which can generate persuasive content texts. Other popular LLMs include Claude 3, Llama 3, Mistral Large, and StarCoder2. LLMs are unique because they learn differently than humans. Rather than learning from experience, they learn through data. The webinar provided a thorough overview of LLMs, shedding light on this concept and its evolution.

How LLMs Work and their Similarities to Human Learning

During the conversation with Järvilehto, we learned that LLMs have a vocabulary of roughly 50,000 or so words, each defined by a vector of 12,288 parameters or dimensions. LLMs break down prompts into tokens, convert them into parameter vectors, and run them through a set of matrix multiplications called the Transformer. LLMs use a stochastic algorithm to vary the likelihood of different words, resulting in somewhat variable outputs. LLMs are capable of finetuning through reinforcement learning from human feedback (RLHF).

Although the configuration of parameters in LLMs is not precisely the same as how humans learn, there are some similarities. Our brains consist of interconnected neurons that process information. We finetune our language skills through exposure, feedback, and practice. Similarly, LLMs rely on neural networks that contain smaller linear models within massive networks. These smaller models learn from the information in larger ones, and complex finetuning is now streamlined due to in-context learning. Järvilehto also explained that the roots of the Transformer architecture, which was only published in 2017, are in basic artificial neural networks introduced in the 1940s. Hence, the idea that we could mathematically model what's happening in the brain is still at the core of these things.

Dive deeper into the topic and watch the full webinar 'The Contest for AI Dominance: Decoding AI & The Race for Tomorrow's Tech'

Watch On-Demand

Limitations of LLMs

LLMs may still be a word probability engine based on vast training data and an intricate set of matrix multiplication operations that can generate the probability distribution of what should come next. Järvilehto further stated that it was astonishing how these word-guessing engines could generate something that sounds like a human being. However, Järvilehto also mentioned that LLMs lacked logical mathematical inference. Because LLMs generate probable distributions of words or symbols based on the semantic data training, it may be logically valid, and there may not be any distinct way to differentiate between the two. Hence, we should be careful about drawing too definite conclusions.

Although unquestioningly trusting LLMs is not advisable, there are several ways in which Gen AI can be helpful to us. According to Järvilehto, even humans are not always wholly accurate. While he doesn't believe any AI tools are 100% reliable, he also thinks there aren't any foolproof systems. Schneider shares the same sentiment, stating that although we should not trust LLMs unquestioningly, there are several ways in which they can be helpful to us.

The Rise of the Small Language Models (SLMs)

With the rising popularity of LLMs, there is a need to keep up with all the LLMs available, which ones to use, and the trend of downsizing them. In a guest article in The New Stack, "Why Large Language Models Won't Replace Human Coders," Schneider explored the trend of smaller language models specifically meant for code generation. Considering the use of SLMs is trending because more powerful mainstream models, like GPT-4 and Claude 2, can barely solve less than 5% of real-world GitHub issues, and ChatGPT still generates unreliable information and concepts. As a result, smaller LLMs are emerging. As we can see, the LLM landscape is changing very quickly.

The Potential of Smaller Language Models in AI-Assisted Code Generation

In the coding world, StarCoder2 3B and Llama 3 8B are recent examples of Small Language Models that can run on high-end developer laptops with reasonable inference times after quantization. According to the StarCoder2 team, "we find that our small model, StarCoder2-3B, outperforms other Code LLMs of similar size on most benchmarks, and also outperforms StarCoderBase-15B." Their larger model, StarCoder2-15B, significantly outperforms other models of comparable size and matches or outperforms CodeLlama-34B, a model more than twice its size. This suggests that the size of the Language Model is not always directly correlated with code generation performance.

According to Järvilehto, the challenge with coding or programming languages is that they are renewed every six months, with a new release every six weeks. Applying a new syntax or nearly completely new words introduced in the new program language is challenging. The question would be how to finetune an LLM and not recreate it every six weeks from scratch. He also underlined that the finetuning process might be similar to a pre-training process, and the most significant difference may be the amount of data you’re using.

LLMs’ Role in Boosting Productivity in Software Development

As they are now, LLMs can be used for chatbots, virtual assistants, content creation, translation, and automating repetitive tasks that can be burdensome or time-consuming. In software development, by aiding developers with repetitive tasks, LLMs can free up more time for more impactful work and subsequently improve the work experience for developers. They can additionally enhance brainstorming in concepts and planning, provide quick prototypes and visual presentations in design, analyze and optimize codes, and power AI tools that automate the creation of unit, functional, and security tests to ensure better software quality. A few resources have mentioned AI tools in software development, including GitHub Copilot, which can understand and write code, generate test cases, identify and fix errors, and so on, to enhance developers' productivity in the software development process.

For those companies who want to simply use existing coding assistant tools, there are already many Gen AI-powered options available for common Integrated Development Environments such as Visual Studio Code, CLion, and Qt Creator. For those companies that would like to boost the code generation capabilities beyond these out-of-the-box solutions, investing in research for Prompt Engineering, Retrieval Augmented Generation (RAG), and Parameter Efficient Fine-Tuning (PEFT) might be the way forward.

Read the article: 'How to Boost Productivity While Managing Technical Debt'

Read the Article

The Future of LLMs

All in all, we are still in the exploring phase of using LLMs and Gen AI. That could be why we hear more about use cases of adopting these tools for tasks and work that don't require the depth of critical thinking, where human elements are still more necessary. Can generative AI replace all human efforts in any field in the future? Schneider believes that this is not likely to happen anytime soon, especially regarding the capability of end-to-end development and building human-machine interfaces. While LLMs can adequately interact with texts and elements of an image, and some tools can convert web designs into front-end code, it still needs to be more challenging for AI to single-handedly take on design related to graphical and UI/UX workflows.

The Qt Group announced its implementation of a GitHub Copilot integration last year. Furthermore, the Qt Development Tools team continues to investigate possibilities to integrate other Large Language Models to the development workflow such as Claude 3 Opus and Llama 3 70B which show promising QML code generation capabilities.

Dive Deeper!

Explore our cutting-edge resources on 'Productivity' for helpful tips and best practices on improving the quality and efficiency of the development work. Discover more!

Productivity