1 Large language models: The foundation of generative AI
Large language models have rapidly moved from research labs into everyday life, catalyzed by the public debut of ChatGPT and a wave of generative AI tools. This chapter builds intuition for what LLMs are, how they work, and why they matter—covering their breakthroughs, core mechanics, and the spectrum of applications they enable—while also laying out the limitations and societal risks that demand responsible use. It frames LLMs as general-purpose systems poised to transform how we learn, create, and work, and argues that understanding their strengths and pitfalls is essential for anyone planning to use or build with them.
Tracing NLP’s evolution from brittle rule-based systems to statistical methods and then to deep neural networks, the chapter spotlights the attention mechanism and the transformer architecture as the turning point that unlocked scale, speed, and capability. Pretraining on vast unlabeled corpora and fine-tuning for specific tasks produced versatile models like GPT and BERT that excel at language modeling, question answering, summarization, translation, coding assistance, content generation, and even some forms of step-by-step reasoning. Their self-supervised training objective—predicting tokens in context—endows them with flexible, emergent behaviors that power chatbots, enterprise tools, and multimodal assistants now embedded across consumer and professional workflows.
Alongside this promise, the chapter examines core challenges: biases inherited from web-scale training data, hallucinations and limited controllability, and the financial, environmental, and competitive pressures of training and deploying trillion-parameter systems. It surveys the ecosystem shaping the field—OpenAI’s rapid, multimodal releases; Google’s foundational research and platform integrations; Meta’s open-weight strategy; Microsoft’s product-wide Copilot push; Anthropic’s safety-first approach; and rising players like DeepSeek, Cohere, Perplexity, Mistral, and xAI, plus leaders in image and video generation. The takeaway is balanced: LLM capabilities are advancing at an unprecedented pace, but realizing their benefits responsibly requires attention to safety, data privacy, and accountability frameworks that keep people at the center of the technology.
The reinforcement learning cycle

The distribution of attention for the word “it” in different contexts.

A timeline of breakthrough events in NLP.

Representation of word embeddings in the vector space

Summary
- The history of NLP is as old as computers themselves. The first application that sparked interest in NLP was machine translation in the 1950s, which was also the first commercial application released by Google in 2006.
- Transformer models and the debut of the attention mechanism were the biggest NLP breakthroughs of the decade. The attention mechanism attempts to mimic attention in the human brain by placing “importance” on the most relevant information.
- The boom in NLP from the late 2010s to early 2020s is due to the increasing availability of text data from around the internet and the development of powerful computational resources. This marked the beginning of the LLM.
- Today’s LLMs are trained primarily with self-supervised learning on large volumes of text from the web and are then fine-tuned with reinforcement learning.
- GPT, released by OpenAI, was one of the first general-purpose LLMs designed for use with any natural language task. These models can be fine-tuned for specific tasks and are especially well-suited for text-generation applications, such as chatbots.
- LLMs are versatile and can be applied to various applications and use cases, including text generation, answering questions, coding, logical reasoning, content generation, and more. Of course, there are also inherent risks, such as encoding bias, hallucinations, and emission of sizable carbon footprints.
- In January 2023, OpenAI’s ChatGPT set a record for the fastest-growing user base in history and set off an AI arms race in the tech industry to develop and release LLM-based conversational dialogue agents. As of 2025, the most significant LLMs have come from OpenAI, Google, Meta, Microsoft, and Anthropic.
FAQ
What is a large language model (LLM), in simple terms?
An LLM is a neural-network model—today, typically a transformer—that learns from massive amounts of text to predict the next token (word or subword) given prior context. Because they learn general patterns of language, LLMs can be adapted to many tasks (conversation, summarization, coding, translation) and then fine-tuned for specific use cases.How did NLP evolve from rules to today’s LLMs?
NLP progressed from brittle, rule-based systems (like ELIZA) to statistical methods, then to neural networks and deep learning. The pivotal shift came with transformers and self-attention, which enabled parallel processing, long-range context handling, and training on far larger datasets—ushering in modern LLMs.What’s the intuition behind “attention” and transformers?
Attention lets a model focus on the most relevant parts of an input sequence when generating or interpreting a token, much like a reader emphasizing key words. Transformers use self-attention across the whole sequence to capture long-term dependencies while being highly parallelizable, which made them faster and more effective than prior architectures.How are LLMs trained?
They are primarily pretrained with self-supervised objectives (for example, predicting masked or next tokens) over vast text corpora, requiring no human labeling. After pretraining, models are commonly fine-tuned for downstream tasks and may incorporate reinforcement learning-based alignment to improve usefulness and safety.What can LLMs do today?
- Language modeling and text generation (chat, drafting, style transfer)- Question answering (extractive, open-book generative, closed-book)
- Coding assistance (code completion, explanation, tests)
- Content generation (articles, marketing copy, emails)
- Reasoning tasks (math, science, logic—still imperfect)
- Translation, summarization, grammar correction, and more
What are the main limitations and risks of LLMs?
- Hallucinations: fluent but incorrect or fabricated outputs- Bias: models can reproduce societal stereotypes present in training data
- Safety and misuse: harmful content, adversarial prompting, privacy/copyright issues
- Sustainability: high compute, energy use, and industry concentration in a few large players
What is fine-tuning and why use it?
Fine-tuning adapts a pretrained model to a narrower task or domain by further training on targeted data. It leverages the general language knowledge learned during pretraining to reach strong performance with less labeled data, faster development, and lower cost than training from scratch.How do LLMs represent and process text?
Text is tokenized into units (words or subwords), which are mapped to embeddings—numeric vectors capturing semantic relationships. Through layers of self-attention and transformations, the model updates these representations to predict the next token, with learned weights (“parameters”) determining its capabilities.What’s the difference between extractive, open-book generative, and closed-book QA?
- Extractive QA: select the answer span directly from provided context- Open-book generative QA: generate an answer using supplied context, in the model’s own words
- Closed-book generative QA: generate answers without provided context, relying solely on internal knowledge learned during training
Who are the major players in generative AI, and how do they differ?
- OpenAI: rapid, multimodal releases (GPT-4/4o, Sora, o1), deep Microsoft partnership- Google: foundational transformer research; Gemini and Project Astra; emphasis on AI Principles
- Meta: open-access Llama family; push for efficient, widely usable models
- Microsoft: integrates AI as “Copilot” across products; early Bing chatbot lessons; invests via OpenAI partnership
- Anthropic: safety-first “Constitutional AI”; Claude models; significant backing from Amazon and Google
- Others: DeepSeek (efficient MoE), Cohere (enterprise-first), Perplexity (AI search with citations), Mistral (efficient open models), xAI’s Grok; plus image/video leaders like Midjourney, Stability AI, and Runway