“Building a Terminology Bridge” introduces some important ideas for practice shifts in the IT domain to better align IT practices with business requirements. I want to know if readers agree, disagree, or have suggestions on how to better frame the discussion. Here is a list of some of those. You’ll find more in the report:
1. Archive — stop referring to migrating information and data down a tier of storage systems as ‘archiving’. An archive is a specialized repository for the long term retention and preservation of information and data. If that repository does not ingest information and apply specialized preservation services, it is not an archive, it is just a disk array or tape system. Use migration and/or tiering rather than ‘archiving’. No preservation services, not an archive either, just a storage system. Don’t call it an archive. In the late 1980′s we called shelf storage an archive. That was the root of the problem. It is no longer appropriate. Use retention and preservation instead of archive and your thinking will always be right.
2. “Preservation begins day 1″ – the old school thinking based on records mgmt of paper processes was to apply disposition after information or data are inactive or expired. The logic was “Put important information into the archive when we are done with it.” In litigious and compliance driven domains, that thinking (meaning move information around and apply disposition rules after it expires ) only adds cost, complexity, and creates errors. Better to classify information upon creation and store it in the right places to begin. If it needs to be retained long term and preserved, then it place it or a copy in a preservation store to begin with.
Some practices are already aligned this way and have proven successful. Examples:
1. Email Archives – capture email and attachments upon creation and ingest them into a preservation process based on business rules.
2. Oracle 11g - actually allows this as ASM will manage a ‘preservation partition’ transparently integrated into the database and ILM Assistant will allow classes of information to reside there automatically.
3. Disposition – at the end of life or use is too late, too costly, too prone to error, and won’t be done. Why do you think the typical datacenter has 25% of its capacity occupied by expired information and data? The alternative — classify and set requirements and policies for information and data up front. Disposition rules are one set of those policies. When information expires, the path is cleared to delete it providing no litigation holds are in effect.
4. Authenticity and metadata – pay attention to the value of metadata and its role in verifying authenticity of litigation evidence. That is the key message for IT. We are not far away from the day when litigation evidence requirements will creep out of the litigation review process and put requirements on IT for control of metadata through the entire IT process. Guess what — many IT processes damage, destroy, merge, ignore, or confuse metadata. Start correcting those errors now. If data has metadata associated with it, take as much care of the metadata as you do the data. To help IT and vendors get their heads around this, start using the term ‘information object’ or ‘digital object’ to describe the case where data and metadata are associated. A file is an information object. Data is the content. Not all metadata resides within the object – and that is just the point. Objects can contain links and to maintain referential integrity, the entire object has to be protected and preserved as it is processed. If you dedupe a file by ignoring the metadata, look at the damage you just did. If you encrypt the data but not the metadata, look at what remains exposed.
That’s enough to start. What do you think of these points?