1 Large language models: The foundation of generative AI
Large language models have rapidly moved from research labs into everyday life, catalyzed by the public debut of ChatGPT, which revealed how capable and accessible modern AI had become. This chapter situates LLMs as the foundational technology behind today’s generative AI, explaining why they matter for work, creativity, and communication and why a basic intuition for how they function is essential. It balances the excitement around their transformative potential with a pragmatic view of their shortcomings and societal risks, aiming to help readers cut through the hype and use these systems responsibly.
The chapter traces NLP’s evolution from brittle rule-based systems to statistical learning and then to deep neural networks, culminating in transformers and the attention mechanism that unlocked scale, speed, and context handling. It introduces how LLMs are trained through self-supervised next-token prediction, then adapted via fine-tuning and reinforcement learning, and how model size and data shape capability. With this foundation, it surveys what LLMs can do: fluid conversation, text generation and summarization, translation, question answering, coding assistance, and emerging strengths in logical and scientific reasoning—plus early steps into multimodality across text, images, audio, and video.
Alongside their promise, the chapter examines core limitations and risks: the reproduction of bias from web-scale training data, the tendency to hallucinate plausible but false answers, challenges in controlling outputs, and sustainability concerns from the computing and energy required—factors that may concentrate power among a few well-resourced actors. It then maps the competitive landscape—OpenAI, Google, Meta, Microsoft, Anthropic, and a wave of newer entrants—highlighting differing philosophies around capability, safety, openness, and enterprise focus. The result is a clear framework for understanding what LLMs are, what they’re good at, where they fall short, and how the ecosystem is evolving.
The reinforcement learning cycle
The distribution of attention for the word “it” in different contexts.
A timeline of breakthrough events in NLP.
Representation of word embeddings in the vector space
Summary
- The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
- Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
- The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
- Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
- GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
- LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
- In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.
Introduction to Generative AI, Second Edition ebook for free