My takeaway series follow a Q&A format to explain AI concepts at three levels:
Anyone with general knowledge can understand them.
For anyone who wants to dive into the code implementation details of the concept.
For anyone who wants to understand the mathematics behind the technique.
Large language models (LLMs) are neural language models with huge number of parameters, typically in the order of billions and more. Language models (LM) are models that can understand and generate human language, to solve natural language processing (NLP) tasks. Neural language models are those based on neural networks.
Not all language models with large number of parameters are LLMs. LLMs specifically refer to those Transformer-based pre-trained language models (PLM) that made a significant breakthrough, especially after GPT-3 in 2020.
Language models have been around for decades. There are four stages in the development of language models:
- Statistical language models (SLM), 1990s - 2010s, e.g., n-gram models;
- Neural language models (NLM), 2010s - 2016, e.g., RNN, LSTM, GRU;
- Pre-trained language models (PLM), 2016 - 2020, e.g., ELMo, BERT, GPT;
- Large language models (LLM), 2020 - present, e.g., GPT-3, PaLM, LLaMA.
Pre-trained language models emerged after the introduction of the Transformer architecture (Vaswani et al. 2017) in 2017; then large language models are built by massive scaling on top of these pre-trained models. The first large language model is GPT-3 (Brown et al. 2020), which has 175 billion parameters and was released by OpenAI in 2020. Since then, many other large language models have been developed, such as PaLM by Google, LLaMA by Meta, and GPT-4 by OpenAI.
Large language models are a subset of pre-trained language models but with significantly larger scale in terms of parameters, training data, and computational resources. The early pre-trained language models typically have millions to billions of parameters (thereby referred to as small pre-trained models sometimes), while large language models typically have tens of billions to hundreds of billions of parameters.
LLMs display surprising emergent abilities that are not observed in smaller PLMs. They are contributed by the scaling of model size, data, and training duration. However, it is still mysterious why emergent abilities occur in LLMs, instead of smaller PLMs.
LLMs are posing a significant impact on the AI community in:
- In the field of natural language processing (NLP), they have set new state-of-the-art performance on various benchmarks and tasks, such as text generation, summarisation, translation, question answering, and more
- In the field of computer vision (CV), the researchers try to develop ChatGPT-like vision-language models that can better serve multimodal dialogues
- In the field of information retrieval (IR), traditional search engines are challenged by the new information seeking way through AI chatbots
- Advanced researchers are rethinking the possibilities of artificial general intelligence (AGI).
LLMs have been the greatest impact that AI has ever made in human life. In recent years, LLMs have been widely deployed in various daily applications such as:
- AI chatbots (e.g., ChatGPT) and have a great impact on life and work. They are used in various domains such as customer support, content generation, language translation, and more.
- AI-based ecosystems for almost every type of traditional software, such as:
- Office suite: Microsoft Copilot in Microsoft 365 (Word, Excel, PowerPoint, Outlook); Google Bard in Google Workspace (Docs, Sheets, Slides, Gmail), etc.
- Programming: GitHub Copilot, Cursor, etc.