Terminology is the starting point for Information Governance
I strongly urge you to read and distribute this new report from the SNIA – “Building a Terminology Bridge: Guidelines to Digital Information Retention and Preservation Practices in the Datacenter.” It took 2 years to develop, research, vet, socialize, educate, build consensus within SNIA alone. An effort that tried my patience and fortitude at times. But, I’m here for the long run and this report is a masterful contribution to the industry.
This report is essential for a long list of practices such as these:
I love storage technology – the demand more more, cheaper, and faster will never end. Berkeley Labs brings us one of the most interesting technologies yet.
One of the drivers now is long-term preservation. If we had long-term media, it would slow down the rate of and number of required migrations – we postulate. In any case, the domains of logical and physical migration are where we need to put a lot of effort and R&D otherwise the costs of preserving information for the long-term overwhelm everything else. This is where NARA is putting its money – to develop a long term storage architecture. It will be fun to watch all this unfold over the next 10 years.
I’ve been accused of throwing historical IT practices under the bus in my last posts. Well, in my opinion, we should.
IT practices that confuse or just don’t meet the business requirements or only add cost and complexity need to go away. The times are changing. We saw that clearly with regulatory compliance and eMail. We see it with eDiscovery and litigation review. Many IT practices damage metadata resulting in damage to authenticity. The courts keep getting closer and closer to exposing bad IT practices and I submit we need to start somewhere making improvements.
Metadata is a good example. Many IT practices damage, mix, confuse, or just plain ignore the value of metadata. (And, consequently denigrate its use to demonstrate authenticity.) This has to change.
a) Yes, it wasn’t until 2008 that Sedona recognized metadata in litigation evidence, but now it is important.
b) Aguilar v. Immigration & Customs Enforcement Div., 2008 U.S. Dist. LEXIS 97018 ( Nov. 21, 2008 ) changed it all again, making certain metadata a key part of litigation evidence.
Another example is confusing archive and preservation – regulatory compliance hammered that. I believe that the IT premise we have to move toward could be framed “Preservation begins at creation.” The IT practice of archiving at the time information becomes inactive or expired is too late, too costly, too complex, and too risky in the face of litigation and compliance risk.
Oh, let’s add ‘deletion’ to the list: Even the records community is at fault here. The whole idea of ‘disposition after information expires’ is ludicrous for the digital datacenter. I maintain disposition policies must be made up front – consistent with ‘preservation policies begin at creation.’
This could be a stimulating conversation. Chip in.
Oh, and I’m far from alone in this opinion. Change is hard and the top barrier is human and cultural on one side and resistance from the vendor community protecting their installed base of revenue by propagating the myth on the other. I can’t blame them. I can only blame the IT community. I really like this anecdote from the “Backup Blog:” ”…Having said that, the biggest obstacle to fixing backup is not technology. It is inertia. It is cultural. It is fear of change. It is ingrained process. It is the fact that we have done things one way for so long that the reason we are going things has been forgotten…”
Preservation: managing information in today’s datacenter with requirements to safeguard information assets for eDiscovery, litigation evidence, security, and regulatory compliance requires that many classes of information be preserved from time of creation. Preservation is a set of services that protect, provide availability, integrity and authenticity controls, include security and confidentiality safeguards, and include an audit log, control of metadata, and other practices for each preservation object. The old IT practice of placing information into an archive when it becomes inactive or expired is tiering, not archive, and no longer works for compliance or litigation support because it only adds cost. and risk. Thus, we see products and practices like eMail Archive, Compliance Storage, Preservation Stores, and Database Archives being used to capture and preserve key classes of information and data upon creation.
Archive: the report advocates that IT practices adopt a more consistent usage of the term ‘archive’ to facilitate interaction with other departments within the organization. To the archival, preservation, and records communities, an archive is a specialized repository with preservation services and attributes. Typical IT use of the verb “archiving” actually refers to a practice based on ILM called “tiering,” the migration of inactive, reference, or expired information to a lower tier of storage to reduce cost and improve storage efficiencies. A lower tier of storage is not an archive with preservation-class services. Another IT (and vendor) misuse happens when ‘archive’ is confused with backup. Backup media saved offline or offsite does not constitute an archive (a preservation store with preservation services) nor should backup media be confused with an archive or with tiering.