Overview

1 Large language models: The foundation of generative AI

Large language models have rapidly moved from research labs into everyday life, catalyzed by the public debut of ChatGPT, which revealed how capable and accessible modern AI had become. This chapter situates LLMs as the foundational technology behind today’s generative AI, explaining why they matter for work, creativity, and communication and why a basic intuition for how they function is essential. It balances the excitement around their transformative potential with a pragmatic view of their shortcomings and societal risks, aiming to help readers cut through the hype and use these systems responsibly.

The chapter traces NLP’s evolution from brittle rule-based systems to statistical learning and then to deep neural networks, culminating in transformers and the attention mechanism that unlocked scale, speed, and context handling. It introduces how LLMs are trained through self-supervised next-token prediction, then adapted via fine-tuning and reinforcement learning, and how model size and data shape capability. With this foundation, it surveys what LLMs can do: fluid conversation, text generation and summarization, translation, question answering, coding assistance, and emerging strengths in logical and scientific reasoning—plus early steps into multimodality across text, images, audio, and video.

Alongside their promise, the chapter examines core limitations and risks: the reproduction of bias from web-scale training data, the tendency to hallucinate plausible but false answers, challenges in controlling outputs, and sustainability concerns from the computing and energy required—factors that may concentrate power among a few well-resourced actors. It then maps the competitive landscape—OpenAI, Google, Meta, Microsoft, Anthropic, and a wave of newer entrants—highlighting differing philosophies around capability, safety, openness, and enterprise focus. The result is a clear framework for understanding what LLMs are, what they’re good at, where they fall short, and how the ecosystem is evolving.

The reinforcement learning cycle
The distribution of attention for the word “it” in different contexts.
A timeline of breakthrough events in NLP.
Representation of word embeddings in the vector space

Summary

  • The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
  • Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
  • The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
  • Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
  • GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
  • LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
  • In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.

FAQ

What is a large language model (LLM), and why was ChatGPT’s release a turning point?LLMs are neural networks trained on massive text corpora to predict the next token and generate fluent, human-like language. ChatGPT packaged an LLM for easy dialogue, showcasing how one model could write, summarize, answer questions, and explain concepts—sparking record-setting adoption and making generative AI mainstream.
How do transformers and the attention mechanism enable modern LLMs?Attention lets a model weigh which parts of an input sequence matter most for each token, providing rich context. Transformers apply self-attention across the entire sequence and compute these representations in parallel, yielding both speed and state-of-the-art performance compared with older sequential architectures.
How are LLMs trained: self-supervised, supervised, and reinforcement learning?LLMs primarily use self-supervised learning (predicting hidden or next tokens) so they can learn from unlabeled text at scale. They’re often further refined with supervised fine-tuning on labeled examples and can incorporate reinforcement learning, which uses rewards and penalties to prefer desired behaviors.
What is fine-tuning and why is it useful?Fine-tuning adapts a pre-trained model to a related task by training briefly on a smaller, task-specific dataset. It reuses the model’s general language understanding to improve performance on targeted applications (for example, classification, domain-specific QA, or a stylistic writing assistant) with far less data and cost than training from scratch.
What can LLMs do today?Common uses include language modeling and text generation, open- and closed-book question answering, reading comprehension, translation, summarization, and content creation. They’re also used for coding assistance, and they show emerging abilities in arithmetic, logic, and scientific problem-solving—though with uneven reliability depending on the task.
What are tokens, embeddings, and parameters in an LLM?Tokens are units of text (words or subwords) the model reads and writes. Embeddings map tokens into numerical vectors that capture meaning and context. Parameters (the model’s learned weights) determine how inputs transform into outputs; larger models with more parameters can represent more complex patterns, given sufficient data.
Where do LLMs fall short?Key issues include bias reflecting patterns in training data, hallucinations (confident but incorrect outputs), and difficulty strictly controlling responses under adversarial prompts. Training and inference are energy-intensive, raising cost, accessibility, and environmental concerns.
What are hallucinations and why do they happen?Hallucinations are fluent but false statements produced by a model. They can stem from noisy or incomplete training data, gaps in the model’s internal representations, or the combinatorial explosion of possible continuations, which makes consistently factual long outputs hard to guarantee.
How does training data shape model behavior and bias?LLMs learn patterns from large web-scale corpora (e.g., Wikipedia, books, social media). These sources contain both high-quality knowledge and problematic content; stereotypes, toxic language, and skewed representation can imprint on the model, leading to disparate outputs across identities and contexts.
Who are the major players in generative AI, and how do their strategies differ?OpenAI (ChatGPT, GPT-4/4o, Sora, o1) emphasizes powerful multimodal models; Google (Gemini, DeepMind) pairs foundational research with product integration; Meta (Llama) pushes efficient, open-access models; Microsoft integrates “Copilot” across its suite via its OpenAI partnership; Anthropic (Claude) emphasizes safety and alignment. Others include DeepSeek (efficient MoE models), Cohere (enterprise focus), Perplexity (AI search), Mistral (efficient open models), xAI (Grok), Stability AI, Midjourney, and Runway (image/video).

pro $24.99 per month

  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose one free eBook per month to keep
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime

lite $19.99 per month

  • access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more


choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free
choose your plan

team

monthly
annual
$49.99
$499.99
only $41.67 per month
  • five seats for your team
  • access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
  • choose another free product every time you renew
  • choose twelve free products per year
  • exclusive 50% discount on all purchases
  • renews monthly, pause or cancel renewal anytime
  • renews annually, pause or cancel renewal anytime
  • Introduction to Generative AI, Second Edition ebook for free