Background:

The last few weeks in the Long-Term Information and Compliant Storage Initiative within the Data Management Forum, we’ve had an interesting discussion going on the differences between information and data. Why? To phrase it more like a prominent RIM practitioner did, “I remain a bit mystified at the tremendous effort that some appear to go to in order to “define” a field of endeavor that already exists.”

The reasons why are important. My response:

” I don’t think the problem is the definition of either data or information at a human recognition-level or a records-mgmt-level. The problem we’re trying to get to occurs deep in the bowels of the storage and IT industries who have not yet confronted these requirements. A simple example, currently SNIA defines data as “anything digital” and architecturally has no provision to accommodate information services such as preservation, authenticity, etc. This has to be resolved and we’re trying through this dialog to begin the process of pulling the whole community along and down that path.

Just adopting high-level, human-oriented definitions will unfortunately, not direct our efforts in the right direction because we have to get the services and computer symantecs to work with the processes to achieve the requirements automatically. Where we are trying to head the boat is not just managing “specific records”, but all the information within the IT domain. ”

And to help further, I defined these sets of objectives for an improved definition of information in a “long-term retention and preservation” context.

DEFINITION OBJECTIVES

a. to define how to operate ‘information services’
b. to assure all of our architectures and services tie together from information-to-data-to-storage
c. to assure that information services are properly architected to achieve their goals – and we keep using the fact that preservation without integrity or authenticity is broken as an example.

– To reach these goals for information services, we require that the system be able to test and recognize that the information it is operating on has “metadata” (I’ll use metadata in its broadest context right now) and that it operate on the entire object, not just the data (content) putting the metadata at risk of being disassociated and lost or becoming non-authentic.

In the library sciences and archival communities (including references such as ISO 14721 ‘Open Archival Information Systems’) they have derived some very simple and elegant definitions for digital data and information. Here are some reference definitions:

  • Information: Any type of knowledge that can be exchanged. In an exchange, it is represented by data. An example is a string of bits (the data) accompanied by a description of how to interpret a string of bits as numbers representing temperature observations measured in degrees Celsius (the representation information). (Source: OAIS)
  • Data that has been given value through analysis, interpretation, or compilation in a meaningful form. (Source ARMA)
  • In the digital library community, the definition commonly used for a digital object is a combination of identifier, metadata, and data. And, a digital object is defined as a discrete unit of information in digital form. (Source PREMIS)

Here’s a simple way to look at these definitions from a storage perspective that I am proposing we adopt for the IT and storage world to meet these objectives. (Without human interpretation being required.) They go like this:

Point 1: “Blocks aggregate into data and data aggregates into information. ” If this principle is commutative, which I believe it is, then:
a. information decomposes into data (the content) plus metadata plus representation information [the ISO model]
b. data decomposes into blocks and blocks into bits

Point 2: Try these definitions on for size and remember, we’re not talking about human readable or human interpreted definitions – we’re talking about computer symantecs and automation

DEFINITIONS – context digital, information servicesInformation definition in a storage context

BLOCKS - raw I/O elements – The unit in which data is stored and retrieved on disk and tape devices. Blocks are the atomic unit of data recognition – SNIA Dictionary

DATA - a block object: a logical collection of associated blocks forming readable content. ( bit-mapped images, text, tables and rows in a db, sound, etc)

INFORMATION - a data object: a logical container including the data (content) plus its metadata and/or representation information giving it context and relevance. (files, programs, documents, video, etc.) Digital information is data interpreted using its metadata and representation information.

As the OAIS reference model says, “In general, it can be said that Data interpreted using its Representation Information yields Information.” Next up – definitions for information services, data services, and meta-data… that support these objectives.

One Response to “Is it digital “information” or is it “data””

Rolland Renault

February 15th, 2010 - 2:51 pm

You know what.. all I am going to say is… yes.. yes.. you are right.

Leave a Reply