How Data Happened
A History from the Age of Reason to the Age of Algorithms
How Data Happened
Online Description
“Fascinating.” —Jill Lepore, The New Yorker A sweeping history of data and its technical, political, and ethical impact on our world. From facial recognition—capable of checking people into flights or identifying undocumented residents—to automated decision systems that inform who gets loans and who receives bail, each of us moves through a world determined by data-empowered algorithms. But these technologies didn’t just appear: they are part of a history that goes back centuries, from the census enshrined in the US Constitution to the birth of eugenics in Victorian Britain to the development of Google search. Expanding on the popular course they created at Columbia University, Chris Wiggins and Matthew L. Jones illuminate the ways in which data has long been used as a tool and a weapon in arguing for what is true, as well as a means of rearranging or defending power. They explore how data was created and curated, as well as how new mathematical and computational techniques developed to contend with that data serve to shape people, ideas, society, military operations, and economies. Although technology and mathematics are at its heart, the story of data ultimately concerns an unstable game among states, corporations, and people. How were new technical and scientific capabilities developed; who supported, advanced, or funded these capabilities or transitions; and how did they change who could do what, from what, and to whom? Wiggins and Jones focus on these questions as they trace data’s historical arc, and look to the future. By understanding the trajectory of data—where it has been and where it might yet go—Wiggins and Jones argue that we can understand how to bend it to ends that we collectively choose, with intentionality and purpose.
1) Citation (Chicago-ish)
- Wiggins, Chris, and Matthew L. Jones. How Data Happened: A History from the Age of Reason to the Age of Algorithms. New York: W. W. Norton & Company, 2023. (pp. 2, 384)
2) Executive Summary (10 bullets, all cited)
-
The authors define “data” broadly as shorthand for the surrounding ecosystem of data-driven algorithmic decision-making systems, and aim to explain how these systems were made (collection/curation) and how their analytic techniques evolved. (p. 6)
-
Core analytic frame: data is inseparable from politics understood as power dynamics; the book’s through-line is how data repeatedly rearranges corporate power, state power, and people power. (p. 7)
-
Method/structure claim: the history “tacks between” (1) infrastructure and labor for collecting/making data public and (2) new mathematical/computational techniques for making claims and decisions from that data—each transition contested and power-shifting. (p. 8)
-
The book argues its history is “actionable”: by showing “small coincidences” and “subjective design choices” that ossified into seeming inevitabilities, readers can better contest and redirect systems toward justice. (pp. 9–10)
-
Quantification can create accountability by checking opaque expert judgment (e.g., standardized reporting), but the same metric logics often flip—rendering workers/citizens transparent to institutions via secretive algorithmic ranking and classification. (pp. 23–24)
-
Early “social physics” and the statistical “average man” made population regularities governable; the normal curve and averages become not just descriptions but reified “laws” about society—an enabling condition for later sorting and “deviance” politics. (pp. 25–26, 36–38)
-
The authors treat statistical racism/eugenics as a recurring pattern: “data” and methods can be mobilized to essentialize inequality; correct critique (e.g., Du Bois on Hoffman’s race statistics) can still fail without power/publicity to force change. (pp. 59–61)
-
Key techniques were not “inevitable scientific progress” but were driven by problem contexts and incentives: industrial needs (Guinness) shaped significance testing; wartime needs industrialized computation and large-scale data analysis. (pp. 81–83, 100–102)
-
The intelligence/military world plays an outsized role in building real-time data systems and computation at scale (Bletchley → NSA → Bell Labs/business), with mission logics prioritizing efficiency and scale over academic niceties. (pp. 99–103, 112–114)
-
Contemporary governance fights (privacy, fairness, ethics, antitrust, Section 230) are framed as contests among powers; the authors close by arguing the future is contingent—not “technology says so”—and depends on how corporate/state/people power resolves current struggles. (pp. 226–231, 256, 268–274)
3) Central Thesis + Purpose (cited)
-
Thesis (1–3 sentences, cited)
-
Data-driven algorithmic systems are historically produced socio-technical arrangements that shape “truth and power,” and their central political consequence is persistent rearrangement of power among corporations, states, and people. (pp. 6–7)
-
Understanding the historical transitions—collection/curation choices, analytic techniques, and the contests around them—enables citizens and institutions to contest and redirect these systems rather than treating them as inevitable. (pp. 8–10)
-
-
The author’s purpose / research question (cited)
-
Provide an “actionable understanding of history” that explains how we “collectively got here” and how we might “choose a different future.” (pp. 9–10)
-
Future-facing guiding question (explicit in the conclusion): what present contests among powers will shape the near future of data? (p. 256)
-
-
Claimed contribution (cited)
- A chapter-by-chapter account of major “intellectual transitions” showing who built/funded them, how they were contested, and how each transition changed who could do what “from what, and to whom,” with special attention to ethical/political valence (rights, harms, justice). (p. 8)
4) Argument Spine (5–9 steps, cited)
-
Modern life is saturated with algorithmic decision systems (“data”), so analysis must cover how they’re built and how their claims become authoritative. (p. 6)
-
Data must be analyzed politically: its recurring effect is rearranging corporate/state/people power within an “unstable game.” (pp. 6–7)
-
To understand present systems, trace the historical coupling of (a) data infrastructure/curation and (b) analytic techniques for making decisions—each transition contested, each changing power. (p. 8)
-
Early state statistics and “social physics” normalized populations via averages/normal distributions, enabling governance-by-number and new categories of deviance. (pp. 25–26, 36–38)
-
Statistical methods become tools for policy and inequality: selective data, proxies, and inference rules can naturalize hierarchies; critique alone is insufficient without power to enforce accountability. (pp. 59–61, 71)
-
Industrial and wartime imperatives drive method and infrastructure: hypothesis testing develops under cost constraints; codebreaking industrializes computation and real-time data processing. (pp. 81–83, 100–102)
-
Postwar intelligence and corporate labs propagate scalable data storage/analysis; later, AI shifts from rules/logic toward data-driven prediction optimized on performance metrics. (pp. 112–114, 116, 163)
-
Data science institutionalizes this metric-optimization logic within corporate strategy and products, while ethics and governance lag—creating contemporary struggles over privacy, fairness, and accountability. (pp. 183–185, 210, 226–231)
-
The near future depends on current contests—legal, organizational, and collective—among corporate/state/people power; alternatives require more than “tech fixes.” (pp. 256, 268–274)
5) Key Concepts & Definitions (12–20 items, each cited)
-
Data: shorthand for the “expanse of data-driven algorithmic decision-making systems” that surround us. (p. 6)
- Role in argument: sets scope beyond “datasets” to socio-technical systems that create truth claims and power effects. (pp. 6–8)
-
Politics: “dynamics of power.” (p. 7)
- Role in argument: forces analysis of data systems as power arrangements, not neutral technical artifacts. (p. 7)
-
Corporate power / state power / people power: the key triad for analyzing how data rearranges power. (p. 7)
- Role in argument: the book uses this triad to organize both historical transitions and present/future contests (“unstable three-player game”). (pp. 7, 256)
-
Quantitative reification: “magical” process by which numerical correspondence to observation becomes treated as a thing. (p. 5)
- Role in argument: explains how metrics and categories gain authority and can mislead when conventions are mistaken for reality. (pp. 5, 71)
-
Proxies: when a phenomenon (e.g., poverty) can’t be measured directly, one chooses a measurable stand-in. (p. 71)
- Role in argument: proxies are design choices—necessary but non-neutral—driving misinterpretation and policy risk. (p. 71)
-
Administrative categories: bureaucratic classifications that can be administered at scale and produce analyzable datasets. (p. 71)
- Role in argument: these categories enable governance and analytics but can be mistaken for natural truths. (p. 71)
-
Reification: “thinking that ideas are things”—making a “thing out of an abstraction”; a “dangerous mistake” in statistical reasoning. (p. 71)
- Role in argument: central warning against treating administrative proxies (like pauperism) or statistical categories (race/intelligence) as fixed realities. (pp. 71, 79)
-
Social physics: Quetelet’s term for identifying “numerical laws characterizing society.” (p. 36)
- Role in argument: provides early template for using statistics to govern populations and define normality. (pp. 36–38)
-
L’homme moyen (average man) + normal curve: Quetelet’s focus on the “average man” and the normal curve as a way to understand aggregate human variability. (pp. 25, 36)
- Role in argument: underwrites normal/pathological thinking that later supports deviance classification and hierarchical social projects. (pp. 36–38)
-
Eugenics: a term coined by Galton for a project of “improving” human “races,” blending statistics with heredity claims. (p. 41)
- Role in argument: exemplar of data/method used to naturalize hierarchy and drive policy. (pp. 41–42, 56)
-
Mathematical statistics + hypothesis/significance testing: Gosset/Fisher/Neyman develop methods for deciding among hypotheses using data (“mathematical statistics”). (pp. 81–82)
- Role in argument: shifts scientific authority toward statistical decision procedures; later becomes institutional “gold standard” in domains like drug approval. (pp. 81–82, 95)
-
Data as engineering: NSA communities treated data analysis “as an engineering problem more than a scientific one.” (p. 112)
- Role in argument: highlights mission-driven logics (efficiency, scale, secrecy) shaping methods and infrastructures that later diffuse outward. (pp. 112–114)
-
Knowledge acquisition bottleneck: difficulty translating human expertise into explicit rule bases for expert systems. (p. 131)
- Role in argument: helps explain the move away from rules toward data-driven prediction/learning. (pp. 131–134)
-
Big data (volume, variety, velocity): “big data” described as viable when cost-effective tools tame massive data volume/velocity/variability. (p. 135)
- Role in argument: marks a phase where infrastructure scale and institutional accumulation of personal data transform privacy and power relations. (pp. 135–137)
-
Common task framework: community organized around maximizing a shared score; enables “efficient and unemotional judging” and rapid deployment. (p. 181)
- Role in argument: explains how machine learning progress accelerates via competitions and metric-optimization norms. (pp. 181–182)
-
Key performance indicator (KPI): industry term for quantified goal correlated with business/product objectives (e.g., engagement). (p. 181)
- Role in argument: ties machine learning optimization directly to organizational power and incentives (growth, attention capture). (pp. 181, 254)
-
Data science: industrially, ML/statistics plus engineering and “concrete data work” to build products; academically, broader skills from “data janitorial work” to communication. (p. 184)
- Role in argument: identifies the professional/institutional form that operationalizes ML at scale—and becomes a locus of authority/funding contests. (pp. 184–185, 206–207)
-
Privacy-by-design techniques (k-anonymity, differential privacy): k-anonymity ensures no record is unique; differential privacy adds noise so the original database is never revealed, with explicit privacy–utility trade-offs. (p. 226)
- Role in argument: shows “technical fixes” require subjective design choices and do not substitute for governance/power. (pp. 226–228)
-
Fairness metrics (independence, separation, sufficiency): multiple plausible definitions; some mutually incompatible; politics shifts with definition. (pp. 227–228)
- Role in argument: demonstrates limits of purely technical fairness optimization and the need to think in terms of justice and socio-technical systems. (pp. 228–229)
-
Persuasion architecture: composite of algorithm + product interface optimized for influence; now accessible to “anybody” via platform tools. (p. 248)
- Role in argument: connects data/ML optimization to information power in markets and politics (microtargeting, consent engineering). (pp. 248–250)
6) Mechanisms / Causal Logic (cited)
-
Mechanism 1: Quantification → reification → authority → governance
-
Claim: numerical summaries (averages/normal distributions) become treated as underlying social “laws,” enabling governance interventions based on statistical regularities. (pp. 36–38)
-
How it works: once an average is treated as “more real” than individuals, deviations can be framed as errors/pathologies demanding management. (pp. 36–38)
-
Conditions: requires institutionalized data collection and accepted statistical formalisms (state statistics; normal curve). (pp. 25–26, 36)
-
-
Mechanism 2: Proxy choice + administrative categories → misinterpretation → policy harm
-
Claim: when direct measures are unavailable, proxies (like “pauperism”) and administrative categories make analysis possible but can be misread as natural facts. (p. 71)
-
How it works: categories “exist” via codification; reification (“ideas are things”) turns conventions into seemingly objective truths. (p. 71)
-
Conditions: highest risk when categories align with contested social hierarchies (race, intelligence, poverty). (pp. 71, 79)
-
-
Mechanism 3: Problem-driven missions → methods + infrastructure → diffusion of capability
-
Claim: pursuit of specific solutions (beer quality; codebreaking) drives creation of methods/infrastructure that later generalize. (pp. 81–83, 100–101)
-
How it works: cost constraints and time pressure favor decision procedures and scalable computation (significance tests; bombes/Colossus; Bayesian enumeration). (pp. 83, 101–102)
-
Conditions: concentrated resources (industrial capital; wartime budgets), and organizational secrecy or coordination enabling scale. (pp. 81, 102)
-
-
Mechanism 4: Metric optimization → community organization → rapid deployment (but value capture)
-
Claim: success in ML accelerates when problems are reframed as optimizing a clear metric within a common task framework. (pp. 163, 181)
-
How it works: competition on a single score enables large-scale coordination and produces systems “ready for immediate deployment,” while sidelining intelligibility and embedding value choices in metrics. (pp. 181–182)
-
Conditions: access to large datasets + incentives (prize money; corporate KPIs). (pp. 180–181, 254)
-
-
Mechanism 5: Attention economy + microtargeting tools → scalable persuasion power
-
Claim: as information becomes cheap, attention becomes scarce/valuable; advertising + data/ML enables optimized persuasion architectures. (pp. 234, 248)
-
How it works: platforms collect data, segment audiences (e.g., lookalike audiences), and optimize delivery for engagement, enabling marketing-to-politics transfer. (pp. 248–250)
-
Conditions: large-scale surveillance data + monetization models (ad-supported platforms) + rapid scaling capital. (pp. 250–254)
-
-
Mechanism 6: Ethics frameworks without power → “inert” ethics; tech-fix limits
-
Claim: applied ethics (IRBs/Belmont) provides principles and procedures, but corporate incentives and capture can neutralize ethics; technical fixes require governance. (pp. 214, 230)
-
How it works: firms may adopt ethics processes to preempt regulation; “ethical AI” funding structures can redirect critique; fairness/privacy tech cannot fix structural injustice alone. (pp. 213–214, 229–230)
-
Conditions: depends on who has authority to enforce constraints; otherwise accountability is “without effects.” (pp. 61, 230)
-
7) Evidence, Cases, and Illustrations (cited)
-
FAIR/Accountability/Transparency and data availability bias: Hanna Wallach warns algorithmic systems reflect values/assumptions and that “social data” is plentiful but systematically missing key human factors (e.g., “happiness,” “drive”), risking biased models. (pp. 12–13)
-
What the author uses it to show: early framing of the modern stakes—why “fairness, accountability, and transparency” emerge as central in data politics. (pp. 12–14)
-
How strong is the inference? Strong for the claim that value judgments + measurement constraints shape outcomes; it’s explicit and directly tied to modern system design. (pp. 12–13)
-
-
Quetelet + Nightingale on evidence-based governance: the authors attribute to Nightingale a call for governments to legislate only with “true knowledge of the laws of social life,” enabled by Quetelet’s statistical projects. (p. 25)
-
What the author uses it to show: early fusion of data with statecraft and social improvement ambitions. (pp. 25–26)
-
How strong is the inference? Moderate-to-strong: presented as illustrative intellectual lineage; the book uses it as a framing exemplar of “social physics” ambitions. (pp. 25–26)
-
-
Mahalanobis and the limits of statistical inference about people: Mahalanobis critiques Pearson/Seal’s caste correlations and insists on combining quantitative methods with “ethnological evidence” and contextual explanation. (pp. 56–58)
-
What the author uses it to show: “hubris” risks in quantifying human difference; importance of data choice and domain knowledge in interpretation. (pp. 58, 79)
-
How strong is the inference? Strong for the claim that method alone is insufficient; Mahalanobis is quoted and positioned as a counter-model. (pp. 56–58)
-
-
Hoffman/Prudential race statistics vs Du Bois critique: Hoffman argues data prove racial hierarchy and supports discriminatory insurance practices; Du Bois attacks Hoffman’s data choice and reasoning, stressing socioeconomic causes and limits of inference. (pp. 59–61)
-
What the author uses it to show: data can be mobilized to naturalize inequality; even correct audits may fail “without some configurations of power or of publicity.” (p. 61)
-
How strong is the inference? Strong historically within the chapter: authors connect analysis to real policy outcomes (insurance denial; Jim Crow veneer) and emphasize power dynamics. (pp. 60–61)
-
-
Proxies, administrative categories, and reification (pauperism): poverty debates used “pauperism” as a proxy—an administrative category—creating analytical tractability but reification risk (“slippage from the process to the thing”). (p. 71)
-
What the author uses it to show: “design choices” in measurement are political and can distort truth claims—especially in domains like race/intelligence. (p. 71)
-
How strong is the inference? Strong: the text explicitly defines the mechanism and generalizes its danger. (p. 71)
-
-
Guinness, Gosset, and significance as cost-based decision: Gosset develops small-sample inference and insists “certainty” targets depend on the “pecuniary advantage” relative to experimental costs. (pp. 81–83)
-
What the author uses it to show: methods are shaped by incentives and constraints; “deciding” differs across statistical philosophies (profit, truth, rational choice). (pp. 81–82)
-
How strong is the inference? Strong: supported by explicit narrative and quoted correspondence logic tying method to economics. (p. 83)
-
-
Bletchley Park/Colossus and industrial-scale data analysis: Bletchley combines computation, labor, and data for codebreaking; later effort scaled so non-elite staff could process “thousands of items of data, at speed,” with secrecy obscuring contributions (especially women operators). (pp. 100–102)
-
What the author uses it to show: war drove the industrialization of data analysis and early “computers,” setting patterns for later large-scale data work. (pp. 100–103)
-
How strong is the inference? Strong: the chapter directly argues Bletchley is a “watershed” where data became pragmatic engineering at scale. (p. 100)
-
-
NSA and “data as engineering” (Bayes at scale): declassified NSA materials show large-scale Bayesian hypothesis testing, cost constraints, and rejection of philosophical objections given mission needs; “a million or more consecutive experiments.” (p. 112)
-
What the author uses it to show: intelligence missions shaped scalable real-time analytics; different values dominate than in academia (efficiency, COMSEC). (pp. 112–113)
-
How strong is the inference? Moderate-to-strong: bounded by classification limits the authors acknowledge, but supported by explicit declassified examples and direct quotations. (p. 112)
-
-
SABRE/SAGE: military-to-commercial real-time data infrastructure transfer: IBM pitched airline reservation systems as data-processing; SABRE drew lessons from SAGE air-defense networking and real-time decision infrastructure; the same “frontier technologies” moved from enemy-aircraft data to customer data. (pp. 135–136)
-
What the author uses it to show: big data needs big infrastructure, often built first in military/intelligence contexts then commercialized—rearranging corporate and state power. (pp. 135–137)
-
How strong is the inference? Strong as a concrete transfer narrative; the author directly frames it as technology transfer from defense to business. (pp. 135–136)
-
-
NIST OCR competition and metric-driven ML culture: handwriting recognition evaluations prize accuracy/efficiency at scale; this shift in values (optimize a metric) helped organize communities and foreshadow ML’s later trajectory. (pp. 160–163)
-
What the author uses it to show: how “optimization of metrics for real world applications” becomes a dominant value shaping ML and data science. (p. 163)
-
How strong is the inference? Strong within the chapter’s logic: authors explicitly connect competition design to value shifts and later ML expansion. (p. 163)
-
-
Netflix Prize and common task framework: Netflix releases a huge ratings dataset and offers $1M for a 10% improvement; competition incentivizes performance-first ensembles and produces a “common task” culture. (pp. 180–182)
-
What the author uses it to show: large datasets + single-score evaluation accelerate ML and deployment norms (KPI logic), sidestepping interpretability. (pp. 181–182)
-
How strong is the inference? Strong: the chapter explicitly interprets the competition as emblematic of metric optimization shaping modern AI. (pp. 181–182)
-
-
Corporate ethics conflicts (Google Ethical AI team; Facebook emotional contagion): Google’s ethics team collapses amid firings/contested claims over harms of large language models; Facebook responds to backlash and regulatory threat by “evolving” IRB-like review for industry research. (pp. 212–214)
-
What the author uses it to show: ethics programs are constrained by profit models and governance; self-regulation can be preemptive and fragile. (pp. 213–214, 230)
-
How strong is the inference? Strong for internal tension; the authors narrate events and explicitly tie them to incentive structures and regulatory risk. (pp. 212–214)
-
-
Fairness/justice tensions in algorithmic systems (COMPAS; multiple fairness definitions): ProPublica’s “Machine Bias” dispute shows competing definitions of fairness; technical fixes can’t resolve structural issues like feedback loops in policing; authors argue for “data justice” beyond ethics. (pp. 227–229)
-
What the author uses it to show: fairness is not a single technical target; definitions carry politics; algorithm is part of socio-technical system. (pp. 227–229)
-
How strong is the inference? Strong: the text explicitly states incompatibilities and limits, grounding them in a well-known case and formal measures. (pp. 227–229)
-
-
Persuasion architectures and microtargeting (Bernays → platform tools): “Torches of freedom” was engineered PR; modern platforms democratize persuasion via tools like lookalike audiences, transferring adtech into politics with capabilities changing faster than norms/laws. (pp. 232, 248–250)
-
What the author uses it to show: data+ML amplify persuasion power and blur marketing/politics—strategic implications for democracy and governance. (pp. 248–250)
-
How strong is the inference? Moderate: authors explicitly connect tools to political use, but also stress no consensus on ultimate effects of attempted manipulation. (p. 250)
-
-
Venture capital + blitzscaling + KPI-driven growth: VC enables rapid scale ahead of norms/regulation (Uber/WeWork examples); platform “growth” teams optimize KPIs and treat “connecting more” as de facto good. (pp. 253–254)
-
What the author uses it to show: financial arrangements and metrics embed value choices and accelerate power concentration—often before governance catches up. (pp. 253–254)
-
How strong is the inference? Strong about acceleration dynamics; the chapter explicitly contrasts decades-long norm formation (cars) with software/VC pace. (pp. 251, 253–254)
-
-
Section 230 + people power as “friction”: Section 230’s immunity shapes platform power; people power can act internally (leaks, salary transparency, unions) and externally (data/talent boycotts) to introduce “friction.” (pp. 268–273)
-
What the author uses it to show: future outcomes are contested through law and collective action—not solely technology. (pp. 268–274)
-
How strong is the inference? Moderate-to-strong: authors present these as plausible contest arenas and mechanisms, not deterministic predictions. (pp. 256, 268–274)
-
8) Chapter/Section Map (high-yield, cited)
-
Prologue (sets the problem + framework):
-
Defines “data” as algorithmic decision systems and positions the project as a “history of truth and power.” (p. 6)
-
States the central analytic aim: explain how data rearranges corporate/state/people power. (p. 7)
-
Lays out an “actionable” historical method focused on transitions, design choices, and contingency (break/reset possibility). (pp. 8–10)
-
-
Part I (from statecraft/social improvement to mathematical statistics):
-
Framed as moving from data in statecraft to using data to “improve society” and the “mathematical baptism” into “mathematical statistics.” (p. 8)
-
Stakes: transitions reshape rights/harms/justice, not just military/financial power. (p. 8)
-
Ends with statistical decision procedures institutionalized and funded through wartime/postwar structures. (pp. 95–97)
-
-
Ch 1 — The Stakes:
-
Establishes modern stakes around fairness/accountability/transparency and the inevitability of values entering systems via data limits. (pp. 12–13)
-
Explains numerical accountability as a response to distrust of expert “black boxes.” (p. 23)
-
Argues a reversal: accountability metrics often make citizens/workers transparent to institutions via secret algorithms. (p. 24)
-
-
Ch 2 — Social Physics and l’homme moyen:
-
Introduces Quetelet’s “social physics” and “average man” as a way to govern via statistical regularities. (pp. 25–26, 36)
-
Shows reification: averages and normal curve become “underlying truths and causes,” not mere descriptions. (p. 36)
-
Links these statistical imaginaries to politics of gradual reform and control. (pp. 26, 38)
-
-
Ch 3 — The Statistics of the Deviant:
-
Traces how statistics of human difference (Galton/Pearson) power projects like eugenics and deviance classification. (pp. 40–42)
-
Uses Mahalanobis as a cautionary counterpoint: insistence on contextual “ethnological evidence” and interpretive restraint. (pp. 56–58)
-
Draws lesson: statistical tools can reinforce hierarchy unless checked by rigorous inference and political vigilance. (pp. 56–58)
-
-
Ch 4 — Data, Intelligence, and Policy:
-
Race science case: Hoffman uses insurance data to argue innate racial inferiority; data becomes policy weapon. (pp. 59–60)
-
Du Bois and other critics show how data choice and inference errors drive harmful conclusions; auditing needs power to matter. (pp. 60–61)
-
Develops conceptual warnings about proxies, administrative categories, and reification in policy data. (p. 71)
-
-
Ch 5 — Data’s Mathematical Baptism:
-
Shows significance/hypothesis testing emerging from industrial constraints (Guinness), not abstract purity. (pp. 81–83)
-
Contrasts Gosset (profit/engineering), Fisher (scientific truth), Neyman (rational choice) as different “deciding” philosophies. (pp. 81–82)
-
Links postwar military funding to abstraction and the institutional location of statistics as “mathematics.” (pp. 95–97)
-
-
Part II (war, computation, big data, ML, data science):
-
Opens with WWII codebreaking as martial data practice and the birth of digital computation. (pp. 8, 100–101)
-
Follows Bletchley → Bell Labs → business engineering applications and personal record keeping/privacy concerns. (pp. 8, 114–115, 135–137)
-
Tracks AI’s shift from rule-based dreams to data-driven prediction and the rise of data science at corporate scale. (pp. 116, 181–185)
-
-
Ch 6 — Data at War:
-
Bletchley industrializes codebreaking via specialized machines and scaled labor; secrecy shapes historical memory. (pp. 100–102)
-
NSA institutionalizes real-time data + scalable Bayesian hypothesis testing under mission constraints (“COMSEC point of view”). (p. 112)
-
Bridges to Bell Labs/Tukey and business adoption of large-scale data storage/analysis. (pp. 114–115)
-
-
Ch 7 — Intelligence without Data:
-
Key claim: early AI pursued “intelligence” via logic/rules, not data; modern meaning differs. (p. 116)
-
Expert systems hit “knowledge acquisition bottleneck,” exposing limits of rule-encoding expertise. (p. 131)
-
“Back to data”: pattern recognition and learning from large datasets thrive “behind the fence” with military-industrial ties. (pp. 133–134)
-
-
Ch 8 — Volume, Variety, and Velocity:
-
Connects military/intelligence computing to commercial big-data infrastructure (SAGE → SABRE; IBM). (pp. 135–136)
-
Frames privacy/justice issues as personal record keeping expands across institutions. (p. 136)
-
Shows metric-driven evaluation culture (OCR competition) as an organizing force for ML development. (pp. 160–163)
-
-
Ch 9 — Machines, Learning:
-
Argues ML narrows to prediction/classification; this narrowing enables the contemporary “AI” boom. (p. 165)
-
Traces neural nets’ fall/rise and the shift away from symbolic reasoning toward statistical methods. (pp. 166–167)
-
Uses Netflix Prize/common task to illustrate performance-first norms, KPI logics, and deployment readiness. (pp. 180–182)
-
-
Ch 10 — The Science of Data:
-
Defines “data science” as a hybrid of ML/statistics plus engineering and “grubby” data work; job role emerges in industry. (pp. 183–185)
-
Emphasizes institutional contest over authority/funding/disciplinary identity (“own data science”). (pp. 185, 206)
-
Bridges to ethics: early recognition of ethical stakes, but “ethics without expertise” and scaling problems. (p. 210)
-
-
Part III (ethics, persuasion economics, future contests):
-
Tracks applied ethics origins (Belmont/IRBs) and corporate self-regulation tensions. (pp. 214, 230)
-
Explains how ad-supported persuasion and venture capital accelerate platform power and political influence. (pp. 248–254)
-
Concludes with future framed as contests among corporate/state/people power with actionable levers. (pp. 256, 268–274)
-
-
Ch 11 — The Battle for Data Ethics:
-
Case studies (Google, Facebook) show organizational power conflicts in embedding ethics. (pp. 212–214)
-
Surveys technical privacy/fairness solutions and their trade-offs; fairness has multiple incompatible definitions. (pp. 226–228)
-
Critiques “tech fix” approaches and highlights capture dynamics; stresses ethics needs power and justice framing. (pp. 229–230)
-
-
Ch 12 — Persuasion, Ads, and Venture Capital:
-
Roots persuasion in PR (Bernays) and frames attention as scarce resource in an information-rich world. (pp. 232, 234)
-
Shows microtargeting and persuasion architectures as democratized, blurring commerce and politics; warns norms/laws lag. (pp. 248–250)
-
Explains VC/blitzscaling and KPI-driven “growth” as accelerants of power concentration and ecosystem harms. (pp. 251, 253–254)
-
-
Ch 13 — Solutions beyond Solutionism:
-
Reframes “future of data” as near-term contests among powers, not prophecy; aims for democracy/justice compatibility. (pp. 256, 274)
-
Identifies key arenas (e.g., antitrust, Section 230) that may rebalance state vs corporate power. (pp. 268–269)
-
Argues alternatives exist (privacy not “dead”; surveillance ads not necessary); change is slow, granular, coalition-driven—not a simple technical tweak. (p. 274)
-
9) Answers to Seminar Questions (use my pasted questions verbatim)
How important is it to understand the source of data and how it was curated?
-
Direct answer (3–6 bullets, cited)
-
Central: the authors explicitly include “how data was created and curated” as part of what must be explained—not just algorithms. (p. 6)
-
Data availability and measurement constraints shape what models can represent (e.g., abundant “actions” data but missing key human factors), producing systematic bias risk. (pp. 12–13)
-
Proxy choice and administrative categories are non-neutral design decisions; reification mistakes can convert conventions into “truth,” especially in policy contexts. (p. 71)
-
Historical harms (e.g., race statistics and insurance) hinge on what data is selected and how it’s interpreted; critique focuses on “choice of data” and limitations. (p. 60)
-
At scale, institutional power determines whether audits of data/algorithms have effect; “It isn’t enough to be right.” (p. 61)
-
-
Best supporting passage (paraphrase + cite; optional short quote ≤25 words)
-
Paraphrase: The book frames its scope as explaining both curation and technique, implying the data’s origin is constitutive of the system’s power. (p. 6)
-
Short quote: “We explore how data was created and curated…” (p. 6)
-
-
Limitation/counterpoint grounded in the text (cited)
- The authors note their material focuses primarily on the United States, so “source/curation” lessons are demonstrated mainly through US-centric institutions and histories. (p. 10)
-
One discussion question I can ask the room
- If “curation” is a power lever, what institutional arrangements (military or civilian) make curation auditable without compromising operational security?
In what way does data rearrange power?
-
Direct answer (3–6 bullets, cited)
-
The authors’ explicit claim: data rearranges power among corporations, states, and people; this is the book’s organizing goal. (p. 7)
-
Data systems change who can do what “from what, and to whom” by enabling new decisions and interventions at scale. (p. 8)
-
Numerical accountability can check institutional “black boxes,” but the same mechanisms can be turned to surveil and discipline lower-status groups (workers/citizens). (pp. 23–24)
-
Intelligence and corporate sectors build large-scale data infrastructures; capabilities migrate across sectors (e.g., defense → airlines/customers). (pp. 135–136)
-
The present and future are framed as contests among powers, shaped by law and collective action—not tech determinism. (pp. 256, 268–274)
-
-
Best supporting passage (paraphrase + cite; optional short quote ≤25 words)
-
Paraphrase: The prologue states the objective is a framework for data’s “persistent role” in rearranging the triad of powers. (p. 7)
-
Short quote: “rearranging power: corporate power, state power, and people power.” (p. 7)
-
-
Limitation/counterpoint grounded in the text (cited)
- The authors acknowledge that knowing/analysis (e.g., audits) doesn’t automatically produce change; effects require configurations of power/publicity. (p. 61)
-
One discussion question I can ask the room
- Which form of power (state, corporate, people) is currently the key “swing player” in your domain—and what evidence would change your assessment?
How does the pursuit of a solution drive data versus data enabling the development of various solutions?
-
Direct answer (3–6 bullets, cited)
-
The authors repeatedly show solution-seeking drives method and data practice: Gosset builds significance testing to make profitable brewing decisions under small-sample constraints. (pp. 81–83)
-
Wartime solution pressure drives industrial-scale data analysis and specialized computation (Bletchley’s codebreaking; time and scale constraints). (pp. 100–102)
-
Metric-optimization contests (OCR, Netflix) demonstrate an engineering style: define a score, assemble data, and compete to improve performance—organizing whole communities. (pp. 163, 181)
-
Once infrastructures exist (databases, networked systems), accumulated data enables new solutions (e.g., personalized targeting; recommendation systems) that were not the original purpose. (pp. 135–137, 248–250)
-
The authors treat “solutions” as value-laden: optimizing a KPI is a choice that privileges some outcomes (engagement/growth) over others (justice, intelligibility). (pp. 181, 254)
-
-
Best supporting passage (paraphrase + cite; optional short quote ≤25 words)
-
Paraphrase: Bletchley is presented as a shift where data becomes “engineering and problem-solving,” not academic truth-seeking. (p. 100)
-
Short quote (≤25 words): “marks a watershed… when data leapt to a pragmatic new existence defined by engineering and problem-solving.” (p. 100)
-
-
Limitation/counterpoint grounded in the text (cited)
- The book also emphasizes that infrastructures for collecting/making data public are a separate “hard work” track from methods; not all “solution pursuit” is algorithmic—sometimes it is institution-building. (p. 8)
-
One discussion question I can ask the room
- In military planning, where do we see “KPI-ification” of objectives—and what important outcomes become invisible when the metric becomes the mission?
Does it matter why a leader, nation, or strategist seeks data and how does that affect its use?
-
Direct answer (3–6 bullets, cited)
-
Yes: the book contrasts motivations—profit (Guinness), war (Bletchley/NSA), governance (state statistics), persuasion (platform advertising)—and shows they shape methods, values, and acceptable trade-offs. (pp. 81–83, 100–102, 248–250)
-
Mission logics determine epistemic standards: NSA analytics prioritize efficiency and COMSEC rather than academic philosophical rigor. (p. 112)
-
Data can be sought to justify preexisting hierarchy (Hoffman’s racial claims), in which case “data” becomes a legitimating weapon; critics focus on data choice/inference. (pp. 59–60)
-
When data is sought for persuasion and growth, systems are optimized for engagement and targeting, enabling manipulative/discriminatory uses by states and corporations alike. (pp. 250, 254)
-
Ethical commitments often collide with profit motives; corporate “ethics teams” can become “fig leaf” structures when incentives and power align against critique. (p. 258)
-
-
Best supporting passage (paraphrase + cite; optional short quote ≤25 words)
- Paraphrase: The NSA example explicitly frames mission-driven analytics where a “COMSEC point of view” renders some statistical approaches impractical; values drive method choice. (p. 112)
-
Limitation/counterpoint grounded in the text (cited)
- The authors caution against assuming persuasive technologies work as claimed; on political manipulation (e.g., Cambridge Analytica), they note there is “no consensus on the ultimate effects.” (p. 250)
-
One discussion question I can ask the room
- How do we institutionalize “motivation audits” (why we seek data) in strategic organizations without turning them into box-checking exercises?
How does this reading affect the way you may consider the military’s pursuit of data for decision making, both today and in the future?
-
Direct answer (3–6 bullets, cited)
-
The book situates military/intelligence as foundational in scaling data collection and computation (Bletchley → NSA), suggesting the military’s data pursuit is historically a major driver of capabilities that later diffuse widely. (pp. 100–103, 112–114)
-
Military/intelligence data practice is framed as engineering under constraints: speed, volume, secrecy, and cost trade-offs can override academic standards and visibility. (p. 112)
-
Decision advantage is not just “more data”: the authors emphasize infrastructure, curation, and problem framing (metrics) as decisive; competitions and KPIs can narrow objectives to what is measurable. (pp. 8, 163, 181)
-
Ethical governance cannot be bolted on: applied ethics traditions (Belmont/IRB) and modern algorithmic accountability struggles show the need for institutions with enforcement power. (pp. 214, 230)
-
The future is framed as contestable: organizations can choose alternatives to opaque or unethical systems; “tech fixes” alone won’t solve legitimacy and justice problems. (pp. 274, 229)
-
-
Best supporting passage (paraphrase + cite; optional short quote ≤25 words)
- Paraphrase: NSA work illustrates how mission demands prioritize scalable Bayesian testing, with huge experiment volume and attention to computational cost; this is a distinct model of “data for decision.” (p. 112)
-
Limitation/counterpoint grounded in the text (cited)
- Classified domains restrict visibility and critique; the authors explicitly rely on “a small number of declassified works,” implying our understanding of military data practice is inherently partial. (p. 112)
-
One discussion question I can ask the room
- If “ethics without power may be inert,” what does “powerful ethics” look like inside military data pipelines—procurement, model validation, deployment, and post-deployment oversight? (p. 230)
10) Strengths / Weaknesses / Gaps (cited where applicable)
Strengths
-
Clear analytic frame linking data to truth and power and explicitly organizing around corporate/state/people power rather than “tech progress” narratives. (pp. 6–7)
-
Strong mechanism-first historical method: each chapter treated as a contested transition shaped by funders, institutions, and social stakes (rights/harms/justice). (p. 8)
-
Explicitly “actionable” posture—argues contingency and design choices matter, so strategy and governance can change futures (anti-determinism). (pp. 9–10)
Weaknesses / contestable assumptions
-
Scope is acknowledged as primarily US-focused, which may underrepresent alternative state–society–market configurations shaping data politics elsewhere. (p. 10)
-
Some contemporary influence claims are necessarily underdetermined: on political manipulation effects, authors note “no consensus,” limiting causal certainty for strategy arguments. (p. 250)
-
“Technical fix” critiques are persuasive, but the book sometimes must generalize across heterogeneous institutions (academia, big tech, intelligence) whose incentive structures differ; readers may want tighter comparative treatment. (pp. 112, 230)
Gaps / “what’s missing”
-
Non-US/non-Western data histories beyond selected cases are not foregrounded, consistent with the authors’ own scope statement. (p. 10)
-
Readers interested in detailed operational doctrine for algorithmic accountability (metrics, audits, enforcement) may find the book emphasizes political conditions more than step-by-step institutional design. (pp. 19, 61)
-
Cyber operations as such (offense/defense doctrine) appear indirectly via intelligence/data infrastructure rather than as a standalone domain; the book’s focus is data power writ large. (pp. 99–103, 112–114)
11) “So What?” for Strategy + This Course (cited)
-
Treat “data” as a strategic capability stack (collection/curation + methods + institutions), not a commodity: power accrues to those who control infrastructure and the rules of inference/decision. (pp. 6, 8)
-
Data politics is triadic: strategists should map effects across state, corporate, and people power—and anticipate shifts among them (including coalition dynamics). (pp. 7, 256)
-
Measurement is a battlefield: proxies and administrative categories can be exploited, contested, or weaponized; “reification” is a predictable failure mode in policy analytics. (p. 71)
-
“Accountability” is not a report—it is institutional force: audits and correct analyses can be inert without power/publicity mechanisms that compel change. (pp. 19, 61)
-
Military/intelligence pursuit of data historically drives scalable computation and real-time analytics; these capabilities routinely diffuse to commerce and politics, shaping the wider information environment the military operates within. (pp. 112–115, 135–136)
-
AI/ML’s modern form is optimization of metrics: strategic risk emerges when organizations confuse KPI performance with mission success, narrowing objectives to what is measurable. (pp. 181, 254)
-
Influence operations should be analyzed through “persuasion architectures”: data + ML + interface design + ad markets blur marketing/politics and enable granular targeting. (pp. 248–250)
-
Governance levers (antitrust, platform liability, privacy regimes) are strategic terrain that can rebalance corporate–state power and change platform operating models. (pp. 268–269)
-
People power (workers, users, publics) can impose friction via collective action, boycotts, and internal regulation-like behavior—potentially shaping platform choices and data practices. (pp. 270–273)
-
Anti-determinist implication: alternatives exist; strategy should include building/choosing different systems and coalitions, not just “adding a term to a cost function.” (p. 274)
12) Paper Seed: How I Can Use This Book (cited)
Tailor to {{PAPER_PROMPT}} and {{MY_CASES}}:
-
Working paper prompt / research question: Not provided by user in chat.
-
My likely case(s)/actors/region(s): Not provided by user in chat.
3–5 candidate thesis statements I could write
-
Thesis: Data-driven decision systems are a form of cyber power because they institutionalize a “data as engineering” logic that privileges speed/scale/secrecy, reshaping what counts as actionable truth in security organizations. (p. 112)
-
Supporting claims (book-grounded)
-
Intelligence communities institutionalized real-time data collection and scalable Bayesian testing, explicitly prioritizing mission constraints over academic standards. (p. 112)
-
Wartime codebreaking shows the industrialization of data analysis/computation as a decisive strategic capability. (pp. 100–101)
-
The book’s frame treats data as rearranging power among state/corporate/people power—security organizations sit inside that triad. (p. 7)
-
-
Likely counterarguments (label inference if not in book)
- Inference: In some security contexts, secrecy and speed are necessary; ethical transparency requirements may be strategically infeasible.
-
Strongest evidence in the book
- The NSA “engineering” discussion and scale/cost/Bayes passages. (p. 112)
-
-
Thesis: “Data curation” is the underappreciated locus of strategic advantage and vulnerability: proxy and category choices drive reification, enabling both governance and systematic harm; contesting curation is contesting power. (pp. 6, 71)
-
Supporting claims (book-grounded)
-
The authors explicitly treat how data is “created and curated” as part of what must be analyzed. (p. 6)
-
Proxy selection and administrative categories are non-neutral design choices that can be reified into false “things.” (p. 71)
-
Historical cases show data selection/inference can justify policy harm (Hoffman) while critique can fail absent power. (pp. 60–61)
-
-
Likely counterarguments (inference)
- Inference: Some domains can mitigate proxy bias through robust experimentation or cross-validation; harms are not inevitable.
-
Strongest evidence in the book
- The proxy/reification explanation and Hoffman/Du Bois case. (pp. 59–61, 71)
-
-
Thesis: The fusion of machine learning metric optimization (common task + KPIs) with persuasion architectures constitutes an information-power regime that blurs commerce and politics, creating strategic risks for democratic governance. (pp. 181, 248–250)
-
Supporting claims (book-grounded)
-
Machine learning’s modern success is tied to common tasks and single-metric optimization that enables rapid deployment. (pp. 181–182)
-
Platforms provide microtargeting tools (lookalike audiences) that democratize persuasion without requiring deep psychological expertise. (p. 248)
-
Capabilities are “changing faster than our norms, rules, and laws,” implying governance lag as a strategic vulnerability. (p. 250)
-
-
Likely counterarguments (book-grounded + inference)
-
Book-grounded: authors say there is “no consensus on the ultimate effects” of some political manipulation efforts. (p. 250)
-
Inference: Effects may be context-dependent and mitigable by institutional resilience or media literacy.
-
-
Strongest evidence in the book
- Persuasion architecture + norms/laws lag; ML metric culture. (pp. 248–250, 181–182)
-
-
Thesis: Ethical AI cannot be treated as a technical subsystem; without enforcement power and justice-oriented governance, ethics initiatives risk capture and irrelevance—making “ethics” itself an arena of power competition. (pp. 230, 229)
-
Supporting claims (book-grounded)
-
Corporate ethics teams can function as internal critique models but face revenue-aligned power dynamics (“fig leaf” framing). (p. 258)
-
Fairness has multiple incompatible technical definitions; technical optimization cannot on its own address structural injustice. (pp. 227–229)
-
The authors explicitly argue: “ethics without power may be inert.” (p. 230)
-
-
Likely counterarguments (inference)
- Inference: Some technical privacy/fairness solutions meaningfully reduce harm; incrementalism might outperform structural reforms in the near term.
-
Strongest evidence in the book
- Regulatory capture discussion + ethics/power line + justice vs fairness critique. (pp. 229–230)
-
-
Thesis: In the near future, the decisive battles over data power will be institutional and legal (antitrust, platform liability like Section 230) combined with people-power “friction,” not merely algorithmic innovation. (pp. 268–273)
-
Supporting claims (book-grounded)
-
The authors frame the future as resolution of present contests among corporate/state/people power. (p. 256)
-
Section 230’s immunity shapes platform behavior and may face reinterpretation/legislation, changing algorithmic deployment incentives. (pp. 268–269)
-
People power mechanisms (leaks, unionization, boycotts) can introduce friction and shape company choices. (pp. 270–273)
-
-
Likely counterarguments (inference)
- Inference: Legal changes may be slow or captured; firms can adapt around regulation; global jurisdiction fragmentation can blunt effects.
-
Strongest evidence in the book
- Section 230 discussion + people-power mechanisms and “friction” concept. (pp. 268–273)
-
“Evidence blocks” (bullet form, not full prose)
-
Claim: Data curation choices are political levers. → Evidence: authors foreground “created and curated” data as part of what matters. → Warrant: what gets measured and how shapes what can be decided. → Implication: treat data pipelines as strategic terrain, not neutral inputs. (p. 6)
-
Claim: Proxies/admin categories produce predictable reification risk. → Evidence: pauperism as proxy; reification defined as making conventions into “things.” → Warrant: administrative codification is not natural reality. → Implication: audit categories and incentives before trusting “data-driven” policy. (p. 71)
-
Claim: Without power, accountability/audits may not change outcomes. → Evidence: “It isn’t enough to be right,” even when audits are illuminating. → Warrant: institutional/publicity power determines effects. → Implication: build enforcement and publicity mechanisms into governance of algorithmic systems. (p. 61)
-
Claim: Intelligence missions shape analytics toward engineering and scale. → Evidence: NSA “engineering” framing; million experiments; cost concerns. → Warrant: mission constraints set epistemic standards and tool choices. → Implication: military “data advantage” depends on infrastructure + governance, not just sensors. (p. 112)
-
Claim: ML progress and deployment are driven by metric optimization cultures. → Evidence: common task framework and deployment readiness; KPI logic. → Warrant: single-score evaluation organizes communities and products. → Implication: strategic oversight must scrutinize metric selection as value selection. (pp. 181–182)
-
Claim: Persuasion architectures + microtargeting change information power faster than governance adapts. → Evidence: democratized persuasion tools; “changing faster than our norms, rules, and laws.” → Warrant: institutional lag yields vulnerability. → Implication: prioritize governance, transparency, and resilience alongside technical defenses. (p. 250)
13) Quote Bank (10–20 quotes, each ≤25 words, each cited)
-
“We use ‘data’ here as shorthand for the expanse of data-driven algorithmic decision-making systems surrounding us nearly everywhere.” (p. 6)
-
“Our goal is to provide a framework for understanding the persistent role of data in rearranging power: corporate power, state power, and people power.” (p. 7)
-
“Statistics doesn’t simply represent the world. It transforms how we categorize and view the world.” (p. 61)
-
“Administrative categories are powerful conventions… but are not truths of nature, just existing to be found.” (p. 71)
-
“The only useful function of a statistician… is to make predictions, and thus to provide a basis for action.” (p. 95)
-
“the task was not to provide a fertile habitat for individual genius, but rather to scale up and industrialise the techniques” (p. 101)
-
“You might expect the analysis of data to be central to this project. It wasn’t.” (p. 116)
-
“Data Science represents an inevitable (re)-merging of computational and statistical thinking in the big data era.” (p. 206)
-
“Ethics, no matter how well considered and well intentioned, tends not to scale well.” (p. 210)
-
“Ethics without power may be inert, and power without ethics lacks any positive social and political direction.” (p. 230)
-
“In commercial broadcasting the viewer pays for the privilege of having himself sold.” (p. 234)
-
“We don’t have to use unethical or opaque algorithmic decision systems, even in contexts where their use may be technically feasible.” (p. 274)
14) Quick-Reference Index (for future retrieval)
-
Topics → best pages (cited)
-
Definitions/scope of “data” as algorithmic decision systems → (p. 6)
-
Data and power triad (corporate/state/people) → (pp. 7, 256)
-
Accountability and its reversal into surveillance/discipline → (pp. 23–24)
-
Proxies, administrative categories, reification → (p. 71)
-
Social physics / average man / normal curve → (pp. 25–26, 36–38)
-
Scientific racism and data-driven policy harm (Hoffman/Du Bois) → (pp. 59–61)
-
Significance testing (Gosset) and incentive-driven method → (pp. 81–83)
-
WWII codebreaking and industrial-scale data analysis → (pp. 100–102)
-
NSA “data as engineering” + Bayes at scale → (p. 112)
-
Big data infrastructure transfers (SAGE→SABRE; IBM) → (pp. 135–136)
-
Metric optimization culture (OCR, Netflix; common tasks) → (pp. 163, 181–182)
-
Data science definitions and institutional contests → (pp. 184–185, 206–207)
-
Ethics, privacy, fairness trade-offs → (pp. 226–230)
-
Persuasion architectures, microtargeting, political implications → (pp. 248–250)
-
Venture capital/blitzscaling + KPI “growth” logic → (pp. 253–254)
-
Governance levers: antitrust/Section 230; people power “friction” → (pp. 268–273)
-
Anti-determinism and alternatives beyond “tech fixes” → (p. 274)
-
-
People/organizations mentioned → best pages (cited)
-
Chris Wiggins / Matthew L. Jones (author positioning) → (p. 9)
-
Adolphe Quetelet → (pp. 25–26, 36)
-
Florence Nightingale (quoted on “laws of social life”) → (p. 25)
-
Francis Galton (eugenics) → (p. 41)
-
Karl Pearson / Edward J. H. “Student” Gosset / Ronald Fisher / Jerzy Neyman → (pp. 81–83)
-
Frederick Hoffman / Prudential Insurance / W. E. B. Du Bois → (pp. 59–60)
-
Bletchley Park / Alan Turing / Colossus operators → (pp. 100–102, 117)
-
National Security Agency (NSA) / Solomon Kullback (named via diffusion) → (pp. 99, 112–114)
-
IBM / American Airlines / SABRE / SAGE → (pp. 135–136)
-
NIST OCR conference; Yann LeCun mentioned in context of submissions → (p. 162)
-
Netflix Prize / BellKor’s Pragmatic Chaos → (pp. 180–181)
-
Jeff Hammerbacher (data scientist title) → (p. 183)
-
Google Ethical AI team; Timnit Gebru; Margaret Mitchell → (pp. 212–213)
-
ProPublica / COMPAS / Northpointe (fairness dispute) → (p. 227)
-
Edward Bernays (PR/propaganda) → (p. 232)
-
Cambridge Analytica (popularized manipulation concerns) → (p. 250)
-
-
Methods/data sources → best pages (cited)
-
Significance testing / Student’s t-test; cost-based certainty → (pp. 81–83)
-
Randomized controlled trials as benchmark (drug efficacy context) → (p. 95)
-
Bayesian hypothesis testing at scale (NSA) → (p. 112)
-
Databases and real-time networked systems (SAGE→SABRE) → (pp. 135–136)
-
Benchmark competitions and metric optimization (OCR) → (pp. 160–163)
-
Common task framework and deployment logic (Netflix Prize) → (pp. 181–182)
-
“Data janitorial work” and hybrid skill stack in data science → (p. 184)
-
k-anonymity and differential privacy (privacy–utility trade-off) → (p. 226)
-
Fairness definitions (independence/separation/sufficiency) and incompatibilities → (pp. 227–228)
-
Persuasion architecture and microtargeting tools (lookalike audiences) → (p. 248)
-