Archive for June, 2009

“Building a Terminology Bridge” introduces some important ideas for practice shifts in the IT domain to better align IT practices with business requirements. I want to know if  readers agree, disagree, or have suggestions on how to better frame the discussion. Here is a list of some of those. You’ll find more in the report:

1. Archive — stop referring to migrating information and data down a tier of storage systems as ‘archiving’. An archive is a specialized repository for the long term retention and preservation of information and data. If that repository does not ingest information and apply specialized preservation services, it is not an archive, it is just a disk array or tape system. Use migration and/or tiering rather than ‘archiving’. No preservation services, not an archive either, just a storage system. Don’t call it an archive. In the late 1980′s we called shelf storage an archive. That was the root of the problem. It is no longer appropriate.  Use retention and preservation instead of archive and your thinking will always be right.

2. “Preservation begins day 1″ – the old school thinking based on records mgmt of paper processes was to apply disposition after information or data are inactive or expired. The logic was “Put important information into the archive when we are done with it.” In litigious and compliance driven domains, that thinking (meaning move information around and apply disposition rules after it expires ) only adds cost, complexity, and creates errors. Better to classify information upon creation and store it in the right places to begin. If it needs to be retained long term and preserved, then it place it or a copy in a preservation store to begin with.

Some practices are already aligned this way and have proven successful. Examples:
1. Email Archives – capture email and attachments upon creation and ingest them into a preservation process based on business rules.
2. Oracle 11g  - actually allows this as ASM will manage a ‘preservation partition’ transparently integrated into the database and ILM Assistant will allow classes of information to reside there automatically.

3. Disposition – at the end of life or use is too late, too costly, too prone to error, and won’t be done. Why do you think the typical datacenter has 25% of its capacity occupied by expired information and data? The alternative — classify and set requirements and policies for information and data up front. Disposition rules are one set of those policies. When information expires, the path is cleared to delete it providing no litigation holds are in effect.

4. Authenticity and metadata – pay attention to the value of metadata and its role in verifying authenticity of litigation evidence. That is the key message for IT. We are not far away from the day when litigation evidence requirements will creep out of the litigation review process and put requirements on IT for control of metadata through the entire IT process. Guess what — many IT processes damage, destroy, merge, ignore, or confuse metadata. Start correcting those errors now. If data has metadata associated with it, take as much care of the metadata as you do the data.  To help IT and vendors get their heads around this, start using the term ‘information object’ or ‘digital object’ to describe the case where data and metadata are associated. A file is an information object. Data is the content. Not all metadata resides within the object – and that is just the point. Objects can contain links and to maintain referential integrity, the entire object has to be protected and preserved as it is processed.  If you dedupe a file by ignoring the metadata, look at the damage you just did.  If you encrypt the data but not the metadata, look at what remains exposed.

That’s enough to start. What do you think of these points?

Terminology is the starting point for Information Governance

I strongly urge you to read and distribute this new report from the SNIA – “Building a Terminology Bridge: Guidelines to Digital Information Retention and Preservation Practices in the Datacenter.” It took 2 years to develop, research, vet, socialize, educate, build consensus within SNIA alone. An effort that tried my patience and fortitude at times. But, I’m here for the long run and this report is a masterful contribution to the industry.

This report is essential for a long list of practices such as these:

  • Digital Preservation (Archive)
  • Cloud Archive
  • Retention Management
  • Risk Management
  • Security
  • Information management
  • ILM, ILM2.0
  • Information Governance
  • Long-term Retention and Deletion
  • Data management

    Building-a-Terminology-Bridge-Cover

    We encourage review and feedback.

    Blog-roll
    =====================
    Digital Curation Blog: SNIA “Terminology Bridge” report
    By Chris Rusbridge

    =====================

    SNIA Builds a Bridge—to Somewhere Important
    Rick Bauer, Dir.Technology and Education, SNIA
    =====================

    .

    .

    .

    .

    The Billion Year Ultra-Dense Memory Chip -

    I love storage technology – the demand more more, cheaper, and faster will never end.  Berkeley Labs brings us one of the most interesting technologies yet.

    http://newscenter.lbl.gov/feature-stories/2009/06/03/billion-year-ultra-dense-memory-chip/

    One of the drivers now is long-term preservation. If we had long-term media, it would slow down the rate of and number of required migrations – we postulate.  In any case, the domains of logical and physical migration are where we need to put a lot of effort and R&D otherwise the costs of preserving information for the long-term overwhelm everything else.  This is where NARA is putting its money – to develop a long term storage architecture. It will be fun to watch all this unfold over the next 10 years.

    I’ve been accused of throwing historical IT practices under the bus in my last posts. Well, in my opinion, we should.

    IT practices that confuse or just don’t meet the business requirements or only add cost and complexity need to go away.  The times are changing. We saw that clearly with regulatory compliance and eMail. We see it with eDiscovery and litigation review. Many IT practices damage metadata resulting in damage to authenticity.    The courts keep getting closer and closer to exposing bad IT practices and I submit we need to start somewhere making improvements.

    Metadata is a good example. Many IT practices damage, mix, confuse, or just plain ignore the value of metadata. (And, consequently denigrate its use to demonstrate authenticity.) This has to change.
    a) Yes, it wasn’t until 2008 that Sedona recognized metadata in litigation evidence, but now it is important.
    b) Aguilar v. Immigration & 
Customs Enforcement Div., 2008 U.S. Dist. LEXIS 97018 ( Nov. 21, 2008 ) changed it all again, making certain metadata a key part of litigation evidence.

    Another example is confusing archive and preservation – regulatory compliance hammered that. I believe that the IT premise we have to move toward could be framed “Preservation begins at creation.”  The IT practice of archiving at the time information becomes inactive or expired is too late, too costly, too complex, and too risky in the face of litigation and compliance risk.

    Oh, let’s add ‘deletion’ to the list:  Even the records community is at fault here. The whole idea of ‘disposition after information expires’ is ludicrous for the digital datacenter. I maintain disposition policies must be made up front – consistent with ‘preservation policies begin at creation.’

    This could be a stimulating conversation. Chip in.

    Oh, and I’m far from alone in this opinion. Change is hard and the top barrier is human and cultural on one side and resistance from the vendor community protecting their installed base of revenue by propagating the myth on the other.  I can’t blame them. I can only blame the IT community. I really like this anecdote from the “Backup Blog:”  ”…Having said that, the biggest obstacle to fixing backup is not technology. It is inertia. It is cultural. It is fear of change. It is ingrained process. It is the fact that we have done things one way for so long that the reason we are going things has been forgotten…”

    From FAQs for the SNIA report: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”

    Authenticity: is defined in a digital retention and preservation context as a practice of verifying a digital object has not changed. Authenticity attempts to identify that an object is currently the same genuine object that it was “originally” and verify that it has not changed over time unless that change is known and authorized.  (The term integrity is not to be confused with authenticity.  The objective of “integrity” is to prevent corruption or damage and is defined as the consistency, accuracy, and correctness of stored or transmitted data or information. Integrity and authenticity are both required to preserve information and data assets.) Authenticity verification requires the use of metadata. The critical change for IT practices is that metadata is now very important and must be safeguarded with the same priorities the data is. IT practices that damage, merge, ignore, or scramble metadata are no longer appropriate.