A recent ABA newsletter has an important article that discusses recent case law surrounding rules for ESI preservation and makes recommendations for changes that effectively extends the retention periods for all ESI.
From: ”A New Set of Rules for e-Discovery Duties and Sanctions” By Nick Brestoff, Published in : EDDE Journal: WINTER 2011 VOLUME 2 Issue 1 — A Publication of the E-Discovery and Digital Evidence Committee ABA Section of Science & Technology Law
“All ESI preserved in accordance with the Preservation Duty shall not be destroyed or materially altered until four years after the Proceeding is final. If such ESI is not then subject to any other Preservation Duty, it may be destroyed. However, if such ESI is subject to a Preservation Duty arising from any other Proceeding, that ESI shall not be destroyed or altered until one year after all such Proceedings are final.”
Juxtaposed Forces:
The impact is interesting in that it is getting harder and harder to delete information and data out of the datacenter. On one hand we have IT’s cost-driven movement to incorporate capacity optimization in the datacenter and on the other legal and regulatory governance that extends retention periods and makes it harder to delete expired information and data – driving up costs and energy consumption. In the preservation world, we are concerned with both, and the paradox of moving towards storage of preservation objects which are larger in size and not dedupe capable is causing capacity growth angst because you must have 3-4 copies of each object distributed for recovery, access, and business continuity.
Thoughts:
- What screams at me is “classification, classification, classification!”
- And, “deletion” as soon as possible – but without order and organization of information throughout the enterprise that is hopeless.
- How to deal with the load, the cost, the complexity – I only see one path. It is the practice approach provided by ILM2.0 . (for more go to www.ilm20.org)
Are there others?
Posted in Cloud Storage Services, Information Governance & ILM2.0
I’m writing and publishing mostly right now on my two reference model sites,
a] Long-Term Digital Preservation Reference Model : www.ltdprm.org
b] Information Lifecycle Management 2.0 (ILM2.0) Reference Model: www.ilm20.org
So, instead of lots of blog posts – jump over to either of these sites and participate in the reference model communities we’ve started there and contribute.


Posted in Data Management Issues, Information Governance & ILM2.0, Long-Term Retention and Preservation
I’ve been accused of throwing historical IT practices under the bus in my last posts. Well, in my opinion, we should.
IT practices that confuse or just don’t meet the business requirements or only add cost and complexity need to go away. The times are changing. We saw that clearly with regulatory compliance and eMail. We see it with eDiscovery and litigation review. Many IT practices damage metadata resulting in damage to authenticity. The courts keep getting closer and closer to exposing bad IT practices and I submit we need to start somewhere making improvements.
Metadata is a good example. Many IT practices damage, mix, confuse, or just plain ignore the value of metadata. (And, consequently denigrate its use to demonstrate authenticity.) This has to change.
a) Yes, it wasn’t until 2008 that Sedona recognized metadata in litigation evidence, but now it is important.
b) Aguilar v. Immigration &
Customs Enforcement Div., 2008 U.S. Dist. LEXIS 97018 ( Nov. 21, 2008 ) changed it all again, making certain metadata a key part of litigation evidence.
Another example is confusing archive and preservation – regulatory compliance hammered that. I believe that the IT premise we have to move toward could be framed “Preservation begins at creation.” The IT practice of archiving at the time information becomes inactive or expired is too late, too costly, too complex, and too risky in the face of litigation and compliance risk.
Oh, let’s add ‘deletion’ to the list: Even the records community is at fault here. The whole idea of ‘disposition after information expires’ is ludicrous for the digital datacenter. I maintain disposition policies must be made up front – consistent with ‘preservation policies begin at creation.’
This could be a stimulating conversation. Chip in.
Oh, and I’m far from alone in this opinion. Change is hard and the top barrier is human and cultural on one side and resistance from the vendor community protecting their installed base of revenue by propagating the myth on the other. I can’t blame them. I can only blame the IT community. I really like this anecdote from the “Backup Blog:” ”…Having said that, the biggest obstacle to fixing backup is not technology. It is inertia. It is cultural. It is fear of change. It is ingrained process. It is the fact that we have done things one way for so long that the reason we are going things has been forgotten…”
Posted in Archive, Data Management Issues, Data Protection, Information Governance & ILM2.0, Long-Term Retention and Preservation, Storage Practices
From FAQs for the SNIA report: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”
Authenticity: is defined in a digital retention and preservation context as a practice of verifying a digital object has not changed. Authenticity attempts to identify that an object is currently the same genuine object that it was “originally” and verify that it has not changed over time unless that change is known and authorized. (The term integrity is not to be confused with authenticity. The objective of “integrity” is to prevent corruption or damage and is defined as the consistency, accuracy, and correctness of stored or transmitted data or information. Integrity and authenticity are both required to preserve information and data assets.) Authenticity verification requires the use of metadata. The critical change for IT practices is that metadata is now very important and must be safeguarded with the same priorities the data is. IT practices that damage, merge, ignore, or scramble metadata are no longer appropriate.
Posted in Data Management Issues, Information Governance & ILM2.0, Long-Term Retention and Preservation, Uncategorized
The top mantra today in the sales process is “reduce cost, improve efficiency.” It seems that if you want to sell anything it has to meet both criteria. Note, the many advertisements we see on the web now that basically claim “storage is free and may actually save you money… ” It is a hard time in vendor-land, but at the same time a healthy time to purge the industry of wrong thinking.
To that end, I keep getting asked where the cost savings opportunities lie and would like to pose a hierarchy as a way to look at the business opportunities. Naturally, several disclaimers and important notes first.
- “Your mileage will vary…”
- Each organization has to approach the issue of cost reduction holistically
- Fixing storage efficiency has a secondary effect of simply resetting the baseline and gaining temporary relief.
- You can not solve the cost problem with point solutions (temporary relief again and probably larger angst when you realize the mistake and waste) – at the root is an organization set of practice problems
- The vendors are not going to tell you all these things because they don’t want you to know them!
- Metrics that are not credited are based on my primary research. The others are from industry sources we are all sharing and propagating so if they are wrong, we are all making the same mistakes at least.
Peterson’s Cost Savings Hierarchy
- Deletion: Delete expired data and information as soon as you can. Expired information and data represents ~20-25% of the entire set of storage capacity under management not counting its level of redundancy. You can get a ‘capex-free’ cost reduction rapidly be deleting expired information. You can keep capacity growth down by continuing to delete information and data as they expire.
Note 1: The term “expired information and data” is one of 4 information states as defined in the report I produced and just published for SNIA titled: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”)
Note 2: You must work with your legal department to define appropriate deletion practices and adhere carefully to those practices. A sudden change in a practice is more likely to cause you to be liable of spoliation than deleting material that shouldn’t have been. Deletion practices are another art I have to write about, but enough for now.
Note 3: Do not let litigation holds stop you in continuing with deletion practices – sounds like bad advice. But, I submit there is a simple and effective work around that legal will agree to in an effort to keep costs under control.
- Virtualize your storage infrastructure: including secondary storage. This is another form of consolidation and we had great success in driving cost out with server and storage consolidation in the late 1990′s. Why consolidation? Think ‘thin provisioning’. Stop allocating storage to applications and living with 35% to 40% utilization (industry sources). Even better storage virtualization controllers provide automated migration capabilities so you can now automate tiering. Here are the value proposition metrics:
– Fix capacity utilization with thin provisioning – 30% to 50% capacity utilization efficiency improvement that provides a short term gain (<1 yr ROI)
– The claim is that automated tiering will reduce storage costs 90%. (Source: IDC 2009)
– I add that reduced storage cost translates to huge reductions in opex. See my post on the Cost of Managing Storage)
- Change Backup: With tiering, we can segregate active, inactive, reference, and expired information and data. Stop backing up everything, but active data. (Why, Because you already backed it up or have it in your preservation store. Get intelligent about your data protection schemes.) Active information and data occupy only 20-25% of your capacity. You just saved 75% of your backup costs and backup operations represent 35% to 55% of the IT budget. Go figure! Reducing backup operations costs is a huge win. And, there are even more ways to save $$ in your data protection methods. Among them is capacity optimization but before we go there chew on these metrics about backup efficiency:
– Oh, by the way. The Capex to do this is $0.
– The first time backup-to-tape success rate is 60-70%,
– 30% of restores from backup fail – why do we accept this?
– 90% of tape or disk capacity used by traditional tape-oriented backup utilities is redundant so the disk utilization efficiency factor in D2D is less than 5% with RAID and required slack space. This means the actual cost of backup is way out of line with its value.
– 48% of test recoveries based on backup from DR fail (Source: Symantec Disaster Recovery Report 2007)
Note 1: Do not, let me say that again, do NOT use backup for your archive. Wrong wrong wrong… Those vendors promoting this are only self-serving a lazy IT practice carrying over from mainframe days.
- Add Capacity Optimization: to appropriate points in your practices. The first is backup. The trend is to use data deduplication plus compression to reduce backup redundancy a factor greater than 90%. It turns out that with this approach, disk-based backup repositories can now operate on par with tape from a cost perspective if you include all those tape operations, upgrades, and offsite media handling expenses in the computation. Clearly, tape still has a role and benefits especially in energy consumption. (Unless we are talking MAID technology.)
Note 1: Do tier your data protection repository to reduce cost further. Even better, federate it with disk and tape and don’t forget to delete expired information and data.
- Reduce litigation/compliance/eDiscovery/security risk and cost: Risk of fines and litigation expense and the overhead costs of eDiscovery add millions of dollars of overhead cost per week to the typical large enterprise. (They average over 550 ongoing suits at any one point in time with a new one every week.) Cost reduce this and you not only save overhead cost, but you profoundly improve the IT infrastructure. Here are some rules to apply:
– Place copies of business critical and compliant information (corporate email, business docs, legal, accounting, etc.) into your preservation store upon creation – not later, not when they are inactive, do not migrate through a hierarchy such as HSM thinking because it only adds cost and increases risk. That is old-thinking in today’s litigious and compliant environment. Do add capacity optimization methods to this repository along with the long list of important preservation services that are needed to preserve data and information long-term.
– Never backup the preservation store. There I said it. That’s blasphemy in some camps. But, guess what, data protection protocols based on the business requirements and operating policies allow you to define the level of redundancy required to overcome risk. You may decide that high integrity storage (RAID) with integrated remote replication and some protocol for versioning will allow you never have to backup again. Kill backup if at all possible to reduce cost. (Now if you caught the drift of placing copies in a preservation store on creation, then why are we also spending so much backing up active data? Step back and rethink your information architecture…)
– Federate disk, tape (and optical if you desire it) in the preservation store and virtualize and tier them so that migration is automated based on business requirements, SLAs, and policies
– Index content as it is ingested into the preservation store – that will short circuit discovery costs and if your policies are set right, you won’t have to go hunting very far to assure you have the content you are looking for. Carefully control this metadata as at some point in the near future you will have to produce metadata to verify authenticity in a legal case.
– Add encryption where appropriate to reduce risk
- Create and run an “Electronically Stored Information Risk Assessment” to look holistically at what is next on the list for your organization. Find out where your risks are and reduce them. Use this same approach to flush out the cost centers and reduce them as well. Remember, IT does not have the entire picture of the organization’s needs so doing these exercises at an information governance committee level is appropriate.
Enough for now – you get the picture I hope. So, what else would you add or where do you think I’m all wet. Let’s talk.
Posted in Data Management Issues, Data Protection, Information Governance & ILM2.0, Service Mgmt, Storage Practices