Podcast

This episode is part of a series — click "See More" for all the episodes.


Central Problem

What is the real epistemological problem posed by big data, and how can we evaluate information quality (IQ) in an era characterized by data overabundance? The chapter addresses two interconnected issues: (1) the epistemological problem of big data is not the excessive quantity of data, but the identification of “small patterns” — the subtle and meaningful patterns hidden in oceans of data; (2) the standard definition of information quality as “fit-for-purpose” is correct but insufficient, since it does not account for the multi-purpose and re-purposable nature of information.

The fundamental difficulty lies in the tension between “purpose-depth” (how well information serves its original purpose) and “purpose-scope” (how easily it can be re-used for new purposes). Information can be highly suited to a specific purpose but terrible for other unforeseen uses — as demonstrated by the dramatic cases of the Dutch census during the Holocaust or British postal codes used for purposes other than mail delivery.

Main Thesis

The central thesis is twofold. First: the problem with big data is not that there is too much of it, but that small and meaningful patterns (“small patterns”) are invisible to the computationally naked eye. The solution is not purely technological but epistemological: competencies are needed to know which questions to ask, which data to collect, curate, and query. As Plato states in the Cratylus (390c), the game will be won by those who “know how to ask and answer questions.”

Second: information quality (IQ) must be analyzed through a bi-categorical approach that distinguishes between:

  • P–purpose (Production purpose): the purpose for which information is originally produced
  • C–purpose (Consumption purpose): the purposes (potentially unlimited) for which the same information can be consumed/reused

This approach allows evaluating the different dimensions of IQ (accuracy, completeness, timeliness, accessibility, etc.) relative to purpose, avoiding both relativism and absolutism. IQ is not an absolute but a relational property: the same information can have high quality for one use and low for another.

Historical Context

The chapter is situated in the era of the “zettabyte flood” — the term coined by Floridi to describe the data tsunami characterizing post-industrial societies. In 2006, humanity had accumulated 180 exabytes of data; by 2011 the zettabyte barrier (1,600 exabytes) was surpassed, with projections of 44 zettabytes by 2020.

The institutional context includes failed attempts to define information quality: the US Information Quality Act (2000) left key concepts undefined; the British Kennedy Report (2001) recognized that “all healthcare is information-driven” but as late as 2004 the NHS admitted there was no agreement on what IQ is. The first International Conference on Information Quality dates to 1996, the ACM’s Journal of Data and Information Quality to 2006.

From a technological standpoint, since 2007 the world produces more data than available storage: we have moved from the problem of what to save to the problem of what to delete.

Philosophical Lineage

flowchart TD
    Plato --> Floridi
    Bacon --> Floridi
    Borges --> Floridi
    Wang --> Floridi
    Batini --> Floridi

    class Plato,Bacon,Borges,Wang,Batini,Floridi internal-link;

Key Thinkers

ThinkerDatesMovementMain WorkCore Concept
Plato428-348 BCEAncient PhilosophyCratylusKnowing = knowing how to ask and answer questions
WangcontemporaryInformation Quality (4 of 9)Data Quality (1998)IQ dimensions and categories
BatinicontemporaryData ManagementData Quality (2006)IQ taxonomies, fit-for-purpose
Borges1899-1986LiteratureThe Analytical Language of John WilkinsArbitrary taxonomies
Bacon1561-1626EmpiricismNovum OrganumOpen marketplace of ideas

Key Concepts

ConceptDefinitionRelated to
Small patternsSubtle and meaningful patterns hidden in big data, invisible without epistemological analysisBig Data, Epistemology
Zettabyte floodThe era characterized by data accumulation in the zettabyte orderFloridi, Information Society
Fit-for-purposeStandard conception of IQ as adequacy to purposeInformation Quality (4 of 9), Pragmatism
P–purposePurpose for which information is originally produced (Production)Floridi, IQ
C–purposePurpose(s) for which information is consumed/reused (Consumption)Floridi, IQ
Purpose-depthHow well information serves the specific original purposeFloridi, IQ
Purpose-scopeHow easily information can be re-used for new purposesFloridi, IQ
Bi-categorical approachIQ analysis method distinguishing P–purpose and C–purposeFloridi, Methodology
IQ dimensionsInformation properties: accuracy, completeness, timeliness, accessibility, etc.Information Quality (4 of 9), Data Science
RelationalismPosition that IQ is relative to purpose, neither absolute nor relativistFloridi, Epistemology

Authors Comparison

ThemeFloridiBatiniPosizione standard
IQ definitionBi-categorical: P–purpose vs C–purposeFit-for-purpose with objective dimensionsGeneric fit-for-purpose
Big data problemSmall patterns (epistemological)Volume and complexity (technological)Too much data (quantitative)
SolutionEpistemology + technologyQuantitative metricsMore computational power
RelativismRelationalism (no relativism)Objective measures possibleOften implicit
Re-purposingCentral, source of tensionRecognized but not systematizedIgnored or undervalued

Influences & Connections

  • Predecessors: Floridi ← influenced by ← Plato, Bacon, Wang, Batini
  • Contemporaries: Floridi ↔ dialogue with ↔ MIT Information Quality Program, ACM JDIQ
  • Followers: Floridi → influenced → Data ethics, Information governance, Analytics philosophy
  • Opposing views: Floridi ← criticized by ← Tecnologisti (soluzione puramente tecnologica), Relativisti IQ

Summary Formulas

  • Floridi: The problem of big data is small patterns; the solution is epistemological, not just technological. IQ requires a bi-categorical approach distinguishing P–purpose and C–purpose.
  • Plato: “The one who knows is the one who knows how to ask and answer questions” — fundamental criterion for navigating the ocean of data.
  • Bi-categorical approach: For any information I, IQ must be evaluated with respect to both P–purpose (production) and C–purpose (consumption), recognizing that values may differ.

Timeline

YearEvent
1996First International Conference on Information Quality
2000USA: Information Quality Act (Data Quality Act)
2001UK: Kennedy Report on information in NHS
2006ACM launches Journal of Data and Information Quality
2006Humanity reaches 180 exabytes of data
2007Data production surpasses available storage
2011Zettabyte barrier surpassed (1,600 exabytes)

Notable Quotes

“The real, epistemological problem with big data is small patterns.” — Floridi

“The game will be won by those who ‘know how to ask and answer questions’.” — Floridi, citando Plato (Cratylus 390c)

“Half of our data is junk, we just do not know which half.” — Floridi