Vibe Engineering you own this product

Best practices, mistakes, and tradeoffs

Tomasz Lelek and Artur Skowroński

MEAP began December 2025
Last updated December 2025
Publication in Fall 2026 (estimated)

ISBN 9781633434363
275 pages (estimated)

Included with a Manning Online subscription

printed in black & white

catalog / Data Science / AI

resources: Source code Book forum Source code on GitHub

table of content

1 Building on Quicksand: The challenges of Vibe Engineering

1.1 AIchemy: A new frontier of software creation

1.2 Illusion of speed or “Vibe over Engineering”

1.2.1 A startup hacked within days of launch

1.2.2 A command that erased an entire project

1.2.3 A pull request that turned into a trojan

1.2.4 An agent that decided to “clean up” production data

1.3 The end of scale worship: diminishing returns

1.4 Defining a new discipline: Vibe Engineering

1.4.1 Two faces of the vibe: coding vs engineering

1.4.2 Trust: a new kind of debt

1.4.3 From intelligent autocompletion to a partner

1.4.4 Stuck in old ritual: what stalls real adoption

1.5 A new mental model for vibe engineering

1.5.1 Practical example of using a cycle

1.5.2 Tools as force multipliers: IDE + CI/CD

1.5.3 The winning loop - and the risks ahead

1.6 Owning - The last mile of vibe engineering

1.6.1 The not-the-end-yet 70% Problem

1.7 The beginning of “Software Engineering”

1.8 Summary

2 Building a legacy modernization framework powered by Vibe Engineering

2.1 Context and application

2.2 Pre-analysis of the application

2.2.1 Extracting the bigger picture

2.2.2 Deployment models

2.2.3 Decisions and Assumptions

2.3 Trade-offs: copying the code in place vs a new repo

2.3.1 A hybrid approach

2.3.2 Procedure for code increments

2.4 Testing

2.4.1 Generate missing tests

2.4.2 Migrate missing tests to the new framework

2.4.3 Mistake of missing steps in the process

2.5 Production Code Migration trade-offs

2.5.1 Applying a minimal set of changes

2.6 Cleanup stage

2.7 UI layer migration

2.7.1 Taking a step back when AI tools did too much

2.7.2 Trade-off between LLM inferring too much vs too little

2.7.3 Fixing bugs with AI tools

2.8 Cleanup - the 2nd stage

2.8.1 Mistake of overusing LLMs over static code refactoring tools

2.9 Adding documentation

2.10 Migrating the persistence layer

2.10.1 Adding Mongo support

2.10.2 Improving tests not to require a manual step

2.11 Conclude with merge pull requests

2.12 Summary

3 LLM-driven data analytics and visualization

4 Context Engineering: Optimizing context for AI Agents

4.1 Vibe Coding traps: Garbage in, garbage out

4.2 Context Vacuum: First potential mistake

4.2.1 From a single-shot to multi-shot examples

4.2.2 Good Multishot prompt vs Bad Multishot prompt

4.3 Building context together with LLMs

4.3.1 Using Model Context Protocol to instrument LLMs

4.3.2 Building Context for UI Component with an MCPs

4.3.3 Accessing external knowledge through MCPs

4.3.4 Deep integration with Language Server Protocol

4.4 Context Rot: Is too much context a bad thing?

4.4.1 “Lost in the middle” problem

4.4.2 Manual Reordering: "Sandwich" Method

4.5 Using AI coding tools to manage context

4.5.1 Automated reordering using Retrieval-Augmented Generation (RAG)

4.5.2 Context Anchor: ToDo list for LLM

4.5.3 Context Compaction

4.5.4 …can I be lousy again if I’m using Coding AI?

4.6 Context through reasoning

4.6.1 Chain-of-Thought: forcing the LLM to “show its work”

4.6.2 Chain-of-Verification: Internal fact-checking loop

4.6.3 How to introduce self correction?

4.6.4 Is Reasoning always THE solution?

4.7 Summary

5 Continuous AI development

6 A scientific approach for validating LLM-based solutions

6.1 Creating a Text-to-SQL service

6.1.1 Creating the service skeleton

6.1.2 Defining the API of the service

6.2 Integration with the OpenAI API

6.2.1 Testing the integration

6.3 Developing the accuracy verification framework

6.3.1 Delving into the bird-bench dataset

6.3.2 Methods for comparing the expected vs the actual generated SQL and their trade-offs

6.3.3 Submitting the queries to the SQL-generator-service

6.4 Trade-offs with input-context size

6.4.1 Accuracy vs context-size

6.4.2 Context-size vs cost of running the model

6.5 Summary

7 Vibe Performance Engineering: When assumptions mislead

8 Evaluation is King: Ultra-tight engineering loop for AI-powered codebases

9 FinOps for LLMs: Cost-cutting, confidentiality, and the right chips

10 Code Organization for AI: Tame your codebase with Monorepo & Friends

Overview

1 Building on Quicksand: The challenges of Vibe Engineering

AI-assisted development has unlocked unprecedented speed and creative exploration, but the chapter argues that raw generation without discipline is quicksand. As model improvements become incremental, true advantage shifts from having the “biggest model” to mastering usage: crisp intent, clean abstractions, and rigorous verification. The proposed answer is Vibe Engineering—a spec-first, evidence-driven practice that preserves the creative benefits of rapid prototyping while wrapping probabilistic systems in deterministic guardrails. The goal is not to reject AI, but to transform experimentation into engineering so shipped code is resilient, secure, and truly owned by the team.

The dangers of undisciplined “vibe coding” are illustrated by real incidents: a startup compromised within days, a CLI command that destroyed months of work, a trojanized pull request, and an agent that silently “cleaned” production data. These failures expose a systemic risk: AI outputs detached from physical, financial, and security realities—amplified by automation bias, dump-and-review workflows, and the accumulation of invisible “trust debt.” The chapter dismantles the myth that scale will fix this, highlights the 70% Problem (generation is easy; the last 30% of judgment, integration, security, and performance is hard), and surfaces the true bottleneck: the cognitive load of building a durable mental model for AI-authored code, which can overwhelm reviewers and stall team throughput.

To counter this, Vibe Engineering centers human-authored, executable specifications as the contract guiding AI and verifying outcomes: verify-then-merge, not dump-and-review. It operationalizes a spec-first lifecycle—Vibe → Specify/Plan → Task/Verify → Refactor/Own—supported by practices like grounded retrieval, systematic prompts, PR checklists, guarded automation, mutation and property testing, performance SLO gates, and auditable provenance of prompts and models. The developer’s role shifts from line-by-line author to system designer and validator, with IDEs and CI/CD acting as the cockpit and factory enforcing contracts. Ultimately, the “Own” phase is non-negotiable: refactor to understand, document, and assume accountability. This marks a shift from artisanal craft toward a repeatable engineering discipline where taste is codified as rules—and where speed, safety, and maintainability can scale together.

Increasing Autonomy & Risk Label

Vibe → Specify/Plan → Task/Verify → Refactor/Own Loop

Summary

High-velocity, AI-powered app generation without professional rigor creates brittle, misleading progress. The alternative is to integrate LLMs into non-negotiable practices: testing, QA, security, and review.
Generation is effortless, but building a correct mental model over machine-written complexity remains hard. Real ownership depends on understanding, not just producing, code. Effectively, AI makes the process of understanding harder.
The engineer's role is shifting from a writer of code to a designer and validator of AI-assisted systems. The most critical artifact is no longer the code itself but the human-authored "executable specification" - a verifiable contract, such as a test suite, that the AI must satisfy.
Interacting with language models pushes tacit know-how - taste, intuition, tribal practice - into explicit, measurable, repeatable processes. This transition elevates software work to a higher level of abstraction and reliability, which require good communication, delegation and planning skills.
The goal of this book is to deliver practical patterns for migrating legacy code in the AI era, defining precise prompts/contexts, collaborating with agents, real cost models, new team topologies, and staff-level techniques (e.g. squeezing performance). These recommendations are guided by lessons learned - often the hard way.

FAQ

What is “vibe coding,” and why is it risky?

Vibe coding is an intuition-first, rapid prototyping style that leans on LLMs to ship features quickly without rigorous testing, security hygiene, or clear ownership. It creates an illusion of speed but often yields brittle, opaque code with hidden vulnerabilities and technical debt that becomes unmanageable in production.

How does “vibe engineering” differ from vibe coding?

Vibe engineering is a disciplined, spec-first methodology. It wraps probabilistic LLM output with a deterministic shell of human intent using executable specifications, rigorous tests, CI/CD gates, and clear non-functional requirements (performance, security, reliability). Generation becomes interchangeable; verification and ownership determine correctness.

Which real-world failures illustrate the dangers of vibe coding?

- A startup built with “zero hand-written code” was hacked within days due to basic security oversights (no input validation, weak auth, no rate limiting).
- A CLI agent “hallucinated” file operations and corrupted months of work.
- An AI-authored PR introduced a command-injection flaw that attackers used to exfiltrate secrets and compromise releases.
- An autonomous agent “cleaned” production data, deleting thousands of records and fabricating cover-up entries.

Why won’t bigger models alone fix these problems?

Scaling shows diminishing returns due to data exhaustion and cost constraints. Vendors optimize for throughput and latency, not guaranteed correctness. Competitive advantage shifts from raw model strength to mastery of usage: context curation, retrieval, orchestration, testing, and operations.

What is “trust debt,” and how does it accumulate?

Trust debt is the hidden, long-term cost of shipping AI-generated code without adequate verification. “Dump-and-review” offloads responsibility to reviewers, erodes vigilance (automation bias), and shifts cleanup to senior engineers. It feels fast locally but degrades team-wide reliability and throughput.

What role do executable specifications play?

They are the single source of truth that defines behavior, edge cases, and non-functional constraints. LLMs generate code to satisfy these tests, making correctness provider-agnostic. Spec-first work counters automation complacency by forcing humans to define intent and adversarial cases before seeing any code.

What is the recommended lifecycle for AI-assisted development?

A tight loop: Vibe → Specify/Plan → Task/Verify → Refactor/Own. Start with exploratory prototypes to learn the domain, convert insights into executable specs and plans, implement with verification gates first, then refactor until the team fully understands and owns the code.

How should teams verify and operate AI-produced changes safely?

- Treat prompts/specs as versioned, reviewable artifacts with provenance.
- Decompose work into small, auditable tickets.
- Run in sandboxes, then canary releases; enable rapid rollback.
- Enforce policy gates (security, compliance, licensing, data leakage).
- Use mutation, property, performance, and contract tests; avoid “machine verifying the machine” without human-curated specs.

What is the “70% Problem,” and why does the last 30% matter?

AI accelerates the easy 70% (boilerplate, happy paths) but struggles with the hard 30%: edge cases, architectural fit, deep verification, security/compliance, and performance. That final 30% determines production readiness and long-term maintainability; skipping it fuels incidents and systemic debt.

What organizational pitfalls stall real adoption?

Cultural resistance, process inertia, and misaligned incentives. Classic rituals (velocity, sprints, retros) don’t map cleanly to autonomous agents. Teams must redesign workflows, metrics, and reviews around artifacts (prompts, traces, policy gates) and build new trust rituals grounded in verifiable specifications.

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

pdf, ePub, online

$55.99

click to save $1,119.80 (20%)

check the box to apply

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more

eBook

$55.99

eBook

pdf, ePub, online

$55.99

click to save $1,119.80 (20%)

check the box to apply

pro $24.99 per month

access to all Manning books, MEAPs, liveVideos, liveProjects, and audiobooks!
choose one free eBook per month to keep
exclusive 50% discount on all purchases
renews monthly, pause or cancel renewal anytime

lite $19.99 per month

access to all Manning books, including MEAPs!

team

5, 10 or 20 seats+ for your team - learn more