Large Language Model – Shawn’s Blog

My takeaway series follow a Q&A format to explain AI concepts at three levels:

Conceptual Level

Anyone with general knowledge can understand them.

Implementation Level

For anyone who wants to dive into the code implementation details of the concept.

Mathematical Level

For anyone who wants to understand the mathematics behind the technique.

What is large language model?

Large language models (LLMs) are neural language models with huge number of parameters, typically in the order of billions and more. Language models (LM) are models that can understand and generate human language, to solve natural language processing (NLP) tasks. Neural language models are those based on neural networks.

Not all language models with large number of parameters are LLMs. LLMs specifically refer to those Transformer-based pre-trained language models (PLM) that made a significant breakthrough, especially after GPT-3 in 2020.

When did large language model emerge? What is the history?

Language models have been around for decades. There are four stages in the development of language models:

Statistical language models (SLM), 1990s - 2010s, e.g., n-gram models;
Neural language models (NLM), 2010s - 2016, e.g., RNN, LSTM, GRU;
Pre-trained language models (PLM), 2016 - 2020, e.g., ELMo, BERT, GPT;
Large language models (LLM), 2020 - present, e.g., GPT-3, PaLM, LLaMA.

Pre-trained language models emerged after the introduction of the Transformer architecture (Vaswani et al. 2017) in 2017; then large language models are built by massive scaling on top of these pre-trained models. The first large language model is GPT-3 (Brown et al. 2020), which has 175 billion parameters and was released by OpenAI in 2020. Since then, many other large language models have been developed, such as PaLM by Google, LLaMA by Meta, and GPT-4 by OpenAI.

What is the difference between large language model and pre-trained language model?

Large language models are a subset of pre-trained language models but with significantly larger scale in terms of parameters, training data, and computational resources. The early pre-trained language models typically have millions to billions of parameters (thereby referred to as small pre-trained models sometimes), while large language models typically have tens of billions to hundreds of billions of parameters.

Why does large language model work?

LLMs display surprising emergent abilities that are not observed in smaller PLMs. They are contributed by the scaling of model size, data, and training duration. However, it is still mysterious why emergent abilities occur in LLMs, instead of smaller PLMs.

What impact did large language models make in AI research?

LLMs are posing a significant impact on the AI community in:

In the field of natural language processing (NLP), they have set new state-of-the-art performance on various benchmarks and tasks, such as text generation, summarisation, translation, question answering, and more
In the field of computer vision (CV), the researchers try to develop ChatGPT-like vision-language models that can better serve multimodal dialogues
In the field of information retrieval (IR), traditional search engines are challenged by the new information seeking way through AI chatbots
Advanced researchers are rethinking the possibilities of artificial general intelligence (AGI).

What are the real-world applications of large language model?

LLMs have been the greatest impact that AI has ever made in human life. In recent years, LLMs have been widely deployed in various daily applications such as:

AI chatbots (e.g., ChatGPT) and have a great impact on life and work. They are used in various domains such as customer support, content generation, language translation, and more.
AI-based ecosystems for almost every type of traditional software, such as:
- Office suite: Microsoft Copilot in Microsoft 365 (Word, Excel, PowerPoint, Outlook); Google Bard in Google Workspace (Docs, Sheets, Slides, Gmail), etc.
- Programming: GitHub Copilot, Cursor, etc.

References

Brown, Tom, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, et al. 2020. “Language Models Are Few-Shot Learners.” Advances in Neural Information Processing Systems 33: 1877–1901.

Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” Advances in Neural Information Processing Systems 30.