Information Quality (4 of 9)

Podcast

This episode is part of a series — click "See More" for all the episodes.

Central Problem

What is the real epistemological problem posed by big data, and how can we evaluate information quality (IQ) in an era characterized by data overabundance? The chapter addresses two interconnected issues: (1) the epistemological problem of big data is not the excessive quantity of data, but the identification of “small patterns” — the subtle and meaningful patterns hidden in oceans of data; (2) the standard definition of information quality as “fit-for-purpose” is correct but insufficient, since it does not account for the multi-purpose and re-purposable nature of information.

The fundamental difficulty lies in the tension between “purpose-depth” (how well information serves its original purpose) and “purpose-scope” (how easily it can be re-used for new purposes). Information can be highly suited to a specific purpose but terrible for other unforeseen uses — as demonstrated by the dramatic cases of the Dutch census during the Holocaust or British postal codes used for purposes other than mail delivery.

Main Thesis

The central thesis is twofold. First: the problem with big data is not that there is too much of it, but that small and meaningful patterns (“small patterns”) are invisible to the computationally naked eye. The solution is not purely technological but epistemological: competencies are needed to know which questions to ask, which data to collect, curate, and query. As Plato states in the Cratylus (390c), the game will be won by those who “know how to ask and answer questions.”

Second: information quality (IQ) must be analyzed through a bi-categorical approach that distinguishes between:

P–purpose (Production purpose): the purpose for which information is originally produced
C–purpose (Consumption purpose): the purposes (potentially unlimited) for which the same information can be consumed/reused

This approach allows evaluating the different dimensions of IQ (accuracy, completeness, timeliness, accessibility, etc.) relative to purpose, avoiding both relativism and absolutism. IQ is not an absolute but a relational property: the same information can have high quality for one use and low for another.

Historical Context

The chapter is situated in the era of the “zettabyte flood” — the term coined by Floridi to describe the data tsunami characterizing post-industrial societies. In 2006, humanity had accumulated 180 exabytes of data; by 2011 the zettabyte barrier (1,600 exabytes) was surpassed, with projections of 44 zettabytes by 2020.

The institutional context includes failed attempts to define information quality: the US Information Quality Act (2000) left key concepts undefined; the British Kennedy Report (2001) recognized that “all healthcare is information-driven” but as late as 2004 the NHS admitted there was no agreement on what IQ is. The first International Conference on Information Quality dates to 1996, the ACM’s Journal of Data and Information Quality to 2006.

From a technological standpoint, since 2007 the world produces more data than available storage: we have moved from the problem of what to save to the problem of what to delete.

Philosophical Lineage

flowchart TD
    Plato --> Floridi
    Bacon --> Floridi
    Borges --> Floridi
    Wang --> Floridi
    Batini --> Floridi

    class Plato,Bacon,Borges,Wang,Batini,Floridi internal-link;

Key Thinkers

Thinker	Dates	Movement	Main Work	Core Concept
Plato	428-348 BCE	Ancient Philosophy	Cratylus	Knowing = knowing how to ask and answer questions
Wang	contemporary	Information Quality (4 of 9)	Data Quality (1998)	IQ dimensions and categories
Batini	contemporary	Data Management	Data Quality (2006)	IQ taxonomies, fit-for-purpose
Borges	1899-1986	Literature	The Analytical Language of John Wilkins	Arbitrary taxonomies
Bacon	1561-1626	Empiricism	Novum Organum	Open marketplace of ideas

Key Concepts

Concept	Definition	Related to
Small patterns	Subtle and meaningful patterns hidden in big data, invisible without epistemological analysis	Big Data, Epistemology
Zettabyte flood	The era characterized by data accumulation in the zettabyte order	Floridi, Information Society
Fit-for-purpose	Standard conception of IQ as adequacy to purpose	Information Quality (4 of 9), Pragmatism
P–purpose	Purpose for which information is originally produced (Production)	Floridi, IQ
C–purpose	Purpose(s) for which information is consumed/reused (Consumption)	Floridi, IQ
Purpose-depth	How well information serves the specific original purpose	Floridi, IQ
Purpose-scope	How easily information can be re-used for new purposes	Floridi, IQ
Bi-categorical approach	IQ analysis method distinguishing P–purpose and C–purpose	Floridi, Methodology
IQ dimensions	Information properties: accuracy, completeness, timeliness, accessibility, etc.	Information Quality (4 of 9), Data Science
Relationalism	Position that IQ is relative to purpose, neither absolute nor relativist	Floridi, Epistemology

Authors Comparison

Theme	Floridi	Batini	Posizione standard
IQ definition	Bi-categorical: P–purpose vs C–purpose	Fit-for-purpose with objective dimensions	Generic fit-for-purpose
Big data problem	Small patterns (epistemological)	Volume and complexity (technological)	Too much data (quantitative)
Solution	Epistemology + technology	Quantitative metrics	More computational power
Relativism	Relationalism (no relativism)	Objective measures possible	Often implicit
Re-purposing	Central, source of tension	Recognized but not systematized	Ignored or undervalued

Influences & Connections

Predecessors: Floridi ← influenced by ← Plato, Bacon, Wang, Batini
Contemporaries: Floridi ↔ dialogue with ↔ MIT Information Quality Program, ACM JDIQ
Followers: Floridi → influenced → Data ethics, Information governance, Analytics philosophy
Opposing views: Floridi ← criticized by ← Tecnologisti (soluzione puramente tecnologica), Relativisti IQ

Summary Formulas

Floridi: The problem of big data is small patterns; the solution is epistemological, not just technological. IQ requires a bi-categorical approach distinguishing P–purpose and C–purpose.
Plato: “The one who knows is the one who knows how to ask and answer questions” — fundamental criterion for navigating the ocean of data.
Bi-categorical approach: For any information I, IQ must be evaluated with respect to both P–purpose (production) and C–purpose (consumption), recognizing that values may differ.

Timeline

Year	Event
1996	First International Conference on Information Quality
2000	USA: Information Quality Act (Data Quality Act)
2001	UK: Kennedy Report on information in NHS
2006	ACM launches Journal of Data and Information Quality
2006	Humanity reaches 180 exabytes of data
2007	Data production surpasses available storage
2011	Zettabyte barrier surpassed (1,600 exabytes)

Notable Quotes

“The real, epistemological problem with big data is small patterns.” — Floridi

“The game will be won by those who ‘know how to ask and answer questions’.” — Floridi, citando Plato (Cratylus 390c)

“Half of our data is junk, we just do not know which half.” — Floridi

Warning

This annotation was normalised using a large language model and may contain inaccuracies. These texts serve as preliminary study resources rather than exhaustive references.

Home

Explorer

Information Quality (4 of 9)

Properties

Podcast

Central Problem

Main Thesis

Historical Context

Philosophical Lineage

Key Thinkers

Key Concepts

Authors Comparison

Influences & Connections

Summary Formulas

Timeline

Notable Quotes

Table of Contents

Backlinks

Co-Citations

Similar Notes

Suggested Links

Community