Data science is presented as a practical, cross‑disciplinary craft focused on making and managing data‑driven decisions. Success depends less on exotic tools and more on clear, measurable goals, sound methodology, collaboration across roles, and a repeatable workflow. To ground the ideas, the chapter uses a real banking example—reducing losses from bad loans—to show how a project unfolds from framing the problem to delivering actionable results.
The work is collaborative, with distinct but complementary roles: the project sponsor (who owns success and business value), the client or domain expert (who represents end users), the data scientist (who sets analytic strategy and executes the science), the data architect (who stewards data assets), and operations (who deploys and runs solutions). Projects iterate through fluid stages: defining a specific, quantitative goal and acceptance criteria; collecting, exploring, and cleaning data while watching for quality issues and bias (for example, training only on already‑approved loans); and modeling to extract insight or predictions. Common tasks include classification, scoring, ranking, clustering, finding relations, and characterization, with method choice guided by business constraints such as interpretability and how results will be used.
Models are evaluated against business goals and sensible baselines, emphasizing not just accuracy but precision, recall, and false‑positive rates, and ensuring performance is meaningfully better than a null or existing approach. Communication is tailored to audience: executives care about business impact (such as potential reduction in charge‑offs), end users need guidance on interpretation and appropriate overrides, and operations needs clarity on runtime, data, and maintenance. Deployment is a beginning, not an end: pilot carefully, monitor behavior, accommodate stakeholder feedback, and plan for updates as conditions change. Throughout, setting expectations and determining lower bounds on acceptable performance keep projects realistic, aligned, and poised for impact.
The lifecycle of a data science project: loops within loops
The fraction of defaulting loans by credit history category. The dark region of each bar represents the fraction of loans in that category that defaulted.
A decision tree model for finding bad loan applications. The outcome nodes show confidence scores.
Example slide from an executive presentation
Summary
The data science process involves a lot of back-and-forth—between the data scientist and other project stakeholders, and between the different stages of the process. Along the way, you’ll encounter surprises and stumbling blocks; this book will teach you procedures for overcoming some of these hurdles. It’s important to keep all the stakeholders informed and involved; when the project ends, no one connected with it should be surprised by the final results.
In the next chapters, we’ll look at the stages that follow project design: loading, exploring, and managing the data. Chapter 2 covers a few basic ways to load the data into R, in a format that’s convenient for analysis.
In this chapter you have learned
- A successful data science project involves more than just statistics. It also requires a variety of roles to represent business and client interests, as well as operational concerns.
- You should make sure you have a clear, verifiable, quantifiable goal.
- Make sure you’ve set realistic expectations for all stakeholders.
Practical Data Science with R, Second Edition ebook for free