Archive for the ‘Uncategorized’ Category

Reference Model Updates

February 15th, 2011 No Comments

Just to let people know, we continue to make excellent additions to the two reference model sites, Long-Term Digital Preservation and ILM2.0

  • LTDPRM: News and blog feeds for the preservation community – in one location, you can stay up to date on key developments and publications in the industry
  • LTDPRM: Video feeds and access to training and educational materials
  • LTDPRM: With launch of the SNIA’s Cloud Archive SIG, I’ve put up an important set of resources on requirements for digital archive (preservation) services in the cloud.
  • ILM2.0:  Postings on buyer’s guides, classification, 
    IT and Information Governance in the Cloud, moving information into the cloud, etc.

In both domains we are building a community of world-wide expert participants.  Please register to have full access to the content and to contribute. These are public domain sites with full collaborative access.  Both sites have weekly telcons to progress the work and build relationships – check the calendars and join in.

“Building a Terminology Bridge” introduces some important ideas for practice shifts in the IT domain to better align IT practices with business requirements. I want to know if  readers agree, disagree, or have suggestions on how to better frame the discussion. Here is a list of some of those. You’ll find more in the report:

1. Archive — stop referring to migrating information and data down a tier of storage systems as ‘archiving’. An archive is a specialized repository for the long term retention and preservation of information and data. If that repository does not ingest information and apply specialized preservation services, it is not an archive, it is just a disk array or tape system. Use migration and/or tiering rather than ‘archiving’. No preservation services, not an archive either, just a storage system. Don’t call it an archive. In the late 1980′s we called shelf storage an archive. That was the root of the problem. It is no longer appropriate.  Use retention and preservation instead of archive and your thinking will always be right.

2. “Preservation begins day 1″ – the old school thinking based on records mgmt of paper processes was to apply disposition after information or data are inactive or expired. The logic was “Put important information into the archive when we are done with it.” In litigious and compliance driven domains, that thinking (meaning move information around and apply disposition rules after it expires ) only adds cost, complexity, and creates errors. Better to classify information upon creation and store it in the right places to begin. If it needs to be retained long term and preserved, then it place it or a copy in a preservation store to begin with.

Some practices are already aligned this way and have proven successful. Examples:
1. Email Archives – capture email and attachments upon creation and ingest them into a preservation process based on business rules.
2. Oracle 11g  - actually allows this as ASM will manage a ‘preservation partition’ transparently integrated into the database and ILM Assistant will allow classes of information to reside there automatically.

3. Disposition – at the end of life or use is too late, too costly, too prone to error, and won’t be done. Why do you think the typical datacenter has 25% of its capacity occupied by expired information and data? The alternative — classify and set requirements and policies for information and data up front. Disposition rules are one set of those policies. When information expires, the path is cleared to delete it providing no litigation holds are in effect.

4. Authenticity and metadata – pay attention to the value of metadata and its role in verifying authenticity of litigation evidence. That is the key message for IT. We are not far away from the day when litigation evidence requirements will creep out of the litigation review process and put requirements on IT for control of metadata through the entire IT process. Guess what — many IT processes damage, destroy, merge, ignore, or confuse metadata. Start correcting those errors now. If data has metadata associated with it, take as much care of the metadata as you do the data.  To help IT and vendors get their heads around this, start using the term ‘information object’ or ‘digital object’ to describe the case where data and metadata are associated. A file is an information object. Data is the content. Not all metadata resides within the object – and that is just the point. Objects can contain links and to maintain referential integrity, the entire object has to be protected and preserved as it is processed.  If you dedupe a file by ignoring the metadata, look at the damage you just did.  If you encrypt the data but not the metadata, look at what remains exposed.

That’s enough to start. What do you think of these points?

From FAQs for the SNIA report: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”

Authenticity: is defined in a digital retention and preservation context as a practice of verifying a digital object has not changed. Authenticity attempts to identify that an object is currently the same genuine object that it was “originally” and verify that it has not changed over time unless that change is known and authorized.  (The term integrity is not to be confused with authenticity.  The objective of “integrity” is to prevent corruption or damage and is defined as the consistency, accuracy, and correctness of stored or transmitted data or information. Integrity and authenticity are both required to preserve information and data assets.) Authenticity verification requires the use of metadata. The critical change for IT practices is that metadata is now very important and must be safeguarded with the same priorities the data is. IT practices that damage, merge, ignore, or scramble metadata are no longer appropriate.

From FAQs for the SNIA report: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”

Archive:  the report advocates that IT practices adopt a more consistent usage of the term ‘archive’ to facilitate interaction with other departments within the organization. To the archival, preservation, and records communities, an archive is a specialized repository with preservation services and attributes. Typical IT use of the verb “archiving” actually refers to a practice based on ILM called “tiering,” the migration of inactive, reference, or expired information to a lower tier of storage to reduce cost and improve storage efficiencies. A lower tier of storage is not an archive with preservation-class services.  Another IT (and vendor) misuse happens when ‘archive’ is confused with backup. Backup media saved offline or offsite does not constitute an archive (a preservation store with preservation services) nor should backup media be confused with an archive or with tiering.