Archive for the ‘Storage Practices’ Category

The top mantra today in the sales process is “reduce cost, improve efficiency.”  It seems that if you want to sell anything it has to meet both criteria. Note, the many advertisements we see on the web now that basically claim “storage is free and may actually save you money… ” It is a hard time in vendor-land, but at the same time a healthy time to purge the industry of wrong thinking.

To that end, I keep getting asked where the cost savings opportunities lie and would like to pose a hierarchy as a way to look at the business opportunities. Naturally, several disclaimers and important notes first.

  • “Your mileage will vary…”
  • Each organization has to approach the issue of cost reduction holistically
  • Fixing storage efficiency has a secondary effect of simply resetting the baseline and gaining temporary relief.
  • You can not solve the cost problem with point solutions (temporary relief again and probably larger angst when you realize the mistake and waste) – at the root is an organization set of practice problems
  • The vendors are not going to tell you all  these things because they don’t want you to know them!
  • Metrics that are not credited are based on my primary research. The others are from industry sources we are all sharing and propagating so if they are wrong, we are all making the same mistakes at least.

Peterson’s Cost Savings Hierarchy

  1. Deletion:  Delete expired data and information as soon as you can. Expired information and data represents ~20-25% of the entire set of storage capacity under management not counting its level of redundancy. You can get a ‘capex-free’ cost reduction rapidly be deleting expired information. You can keep capacity growth down by continuing to delete information and data as they expire.
    Note 1: The term “expired information and data” is one of 4 information states as defined in the report I produced and just published for SNIA titled: “Building a Terminology Bridge: Guidelines for Retention and Preservation in the Datacenter”)
    Note 2: You must work with your legal department to define appropriate deletion practices and adhere carefully to those practices. A sudden change in a practice is more likely to cause you to be liable of spoliation than deleting material that shouldn’t have been.  Deletion practices are another art I have to write about, but enough for now.
    Note 3: Do not let litigation holds stop you in continuing with deletion practices – sounds like bad advice. But, I submit there is a simple and effective work around that legal will agree to in an effort to keep costs under control.
  2. Virtualize your storage infrastructure: including secondary storage. This is another form of consolidation and we had great success in driving cost out with server and storage consolidation in the late 1990′s. Why consolidation? Think ‘thin provisioning’. Stop allocating storage to applications and living with 35% to 40% utilization (industry sources). Even better storage virtualization controllers provide automated migration capabilities so you can now automate tiering. Here are the value proposition metrics:
    – Fix capacity utilization with thin provisioning – 30% to 50% capacity utilization efficiency improvement that provides a short term gain  (<1 yr ROI)
    – The claim is that automated tiering will reduce storage costs 90%.  (Source: IDC 2009)
    – I add that reduced storage cost translates to huge reductions in opex. See my post on the Cost of Managing Storage)
  3. Change Backup:  With tiering, we can segregate active, inactive, reference, and expired information and data. Stop backing up everything, but active data. (Why, Because you already backed it up or have it in your preservation store. Get intelligent about your data protection schemes.)  Active information and data occupy only 20-25% of your capacity. You just saved 75% of your backup costs and backup operations represent 35% to 55% of the IT budget. Go figure!  Reducing backup operations costs is a huge win. And, there are even more ways to save $$ in your data protection methods. Among them is capacity optimization but before we go there chew on these metrics about backup efficiency:
    – Oh, by the way. The Capex to do this is $0.
    – The first time backup-to-tape success rate is 60-70%,
    – 30% of restores from backup fail – why do we accept this?
    – 90% of tape or disk capacity used by traditional tape-oriented backup utilities is redundant so the disk utilization efficiency factor in D2D is less than 5% with RAID and required slack space.  This means the actual cost of backup is  way out of line with its value.
    – 48% of test recoveries based on backup from DR fail (Source: Symantec Disaster Recovery Report 2007)
    Note 1: Do not, let me say that again, do NOT use backup for your archive. Wrong wrong wrong… Those vendors promoting this are only self-serving a lazy IT practice carrying over from mainframe days.
  4. Add Capacity Optimization: to appropriate points in your practices. The first is backup. The trend is to use data deduplication plus compression to reduce backup redundancy a factor greater than 90%.  It turns out that with this approach, disk-based backup repositories can now operate on par with tape from a cost perspective if you include all those tape operations, upgrades, and offsite media handling expenses in the computation. Clearly, tape still has a role and benefits especially in energy consumption. (Unless we are talking MAID technology.)
    Note 1: Do tier your data protection repository to reduce cost further. Even better, federate it with disk and tape and don’t forget to delete expired information and data.
  5. Reduce litigation/compliance/eDiscovery/security risk and cost:  Risk of fines and litigation expense and the overhead costs of eDiscovery add millions of dollars of overhead cost per week to the typical large enterprise. (They average over 550 ongoing suits at any one point in time with a new one every week.) Cost reduce this and you not only save overhead cost, but you profoundly improve the IT infrastructure. Here are some rules to apply:
    – Place copies of business critical and compliant information (corporate email, business docs, legal, accounting, etc.) into your preservation store upon creation – not later, not when they are inactive, do not migrate through a hierarchy such as HSM thinking because it only adds cost and increases risk. That is old-thinking in today’s litigious and compliant environment. Do add capacity optimization methods to this repository along with the long list of important preservation services that are needed to preserve data and information long-term.
    – Never backup the preservation store. There I said it. That’s blasphemy in some camps.  But, guess what, data protection protocols based on the business requirements and operating policies allow you to define the level of redundancy required to overcome risk. You may decide that high integrity storage (RAID) with integrated remote replication and some protocol for versioning will allow you never have to backup again. Kill backup if at all possible to reduce cost.  (Now if you caught the drift of placing copies in a preservation store on creation, then why are we also spending so much backing up active data? Step back and rethink your information architecture…)
    –  Federate disk, tape (and optical if you desire it) in the preservation store and virtualize and tier them so that migration is automated based on business requirements, SLAs, and policies
    – Index content as it is ingested into the preservation store – that will short circuit discovery costs and if your policies are set right, you won’t have to go hunting very far to assure you have the content you are looking for.  Carefully control this metadata as at some point in the near future you will have to produce metadata to verify authenticity in a legal case.
    – Add encryption where appropriate to reduce risk
  6. Create and run an “Electronically Stored Information Risk Assessment” to look holistically at what is next on the list for your organization. Find out where your risks are and reduce them. Use this same approach to flush out the cost centers and reduce them as well. Remember, IT does not have the entire picture of the organization’s needs so doing these exercises at an information governance committee level is appropriate.
Enough for now – you get the picture I hope.  So, what else would you add or where do you think I’m all wet. Let’s talk.

Storage isn’t free. Never will be. Management costs, opex, overwhelm capex expenditures.  I continue to scale the cost of managing storage, CMS, in my research. Depending on organization size and complexity, the scary thing is that I find it is growing again, ranging from $10k to $35k/TB/yr.

Now a new metric. Let me suggest that while I first published this metric in 1992, and continued publishing primary research on it through 1997, we have to look at it differently today. Here’s the math. The acquisition cost of most classes of disk arrays is between $1k to $4k per TB.  That means the annual ratio of CMS to storage cost is still 7x to 10x (same as in 1992 – another scary thought) but, guess what. That is the wrong way to look at it.

The top problem in storage in the datacenter, I first published in 1994 through today, is storage expansion. No, it is not hard to add disk drives. The problem is that expansion causes all storage practices, management, and services to have to expand as well to accommodate the new storage. It has a ripple effect. Now add cost of managing storage. Adding 1TB adds ~$25k of incremental cost to large organizations per year.  It is the per year thing that gets you now. Storage doesn’t just go away. It has a life. A better way to look at the CMS is over at least a three year life. Even retirement doesn’t mean capacity reduction rather it means replacement, so the cost is ongoing…  But, we have to pick a threshold otherwise this gets ridiculous.  At three years, the real CMS is $30k-$100k/TB  and the factor of Opex to Capex is really 20-30x.  Wow!

The point is that if we recognize the real cost of adding storage to the datacenter, we will be more judicious in its use. If you just stop and recognized that every TB you add of primary storage will add ~$50k of cost, what would you do.  Buy less? Not necessarily. What you definitely would do is cost-reduce your practices by doing things like deletion, deduplication, and tiering You can be more efficient in your use of storage, but that is a one time deal. A change in efficiency does not change the shape of the consumption curve. It just resets the baseline.  You still need to cost reduce your practices.

To summarize, I think that this is the best thing to happen to the datacenter in a long time. Due to budget constraints we are having to pay attention to practices and fix an IT system that is broken and does not scale due to ever growing cost.

I’d like to explain the many ways that information and data will be lost in a typical datacenter. Note that I say, will be lost. Data loss is inevitable, Information is lost the more it is handled, copied, moved, replicated, migrated, and as it ages.

The point is that Loss happens. Let me say that you can not stop data loss? The key questions are “How much will be lost?” “Do you care?” and “what can we do to reduce it?” Here is what I mean by lost?

There are 4 principle classes of loss.

The first category I call poor storage practices. By this I mean several things. In a relatively large file system with millions to trillions of files distributed across multiple sites, servers, desktops, test databases, DR sites, and remote web-servers or service providers trust me, lots of files will be misplaced and effectively lost by users and the system.  Loss occurs if you can’t find it, read it, or interpret it. I’d doesn’t matter how it was caused. All these are valid forms of loss.

Additional storage problems come from poor doc control practices such as losing track of versions or ‘official records’ and are compounded if you are using external services. What happens when files are sent offsite to a web host or storage service and if those services are down, corrupted, or go out of business and you can’t get your files back, Loss happens. As we move into focusing on Cloud Storage we’ll hear more of this problem surfacing. Remember, You risk fines or other penalties during litigation if information can not be discovered and produced. This is a cost of loss.

The second class of loss is through poor security practices. the most obvious is when a hacker or employee gets through your firewalls and takes information, views confidential or private information, or changes or damages information. We have all heard countless stories now of lost notebooks or tapes containing millions of records with personally identifiable information. Those all count as forms of loss. One of the worst nightmares in litigation evidence control is when an ex-employee shows up with historical files and emails that you don’t have since you followed your retention and deletion protocols and permanently deleted them on schedule. They took them while employees and now potentially have an advantage. Perhaps the only perspective to have on losing information is one of damage control and recovery. If you think otherwise, consider the next class of challenges.

The third class of loss is through human or operational errors. Human error is the number one cause of damage or loss and we are not likely to change that fact. It manifests in many ways, but the pertinent issue is whether or not your recovery systems work. Here’s the test. Your systems are faithfully backed up. But, how often and how thoroughly have you tested recovery? Backup works great when it is write once, read never. But, you might be surprised how often recovery is compromised. The alternative is to rebuild information from scratch. Costs estimates to do this vary ranging between $5k to $50k per Megabyte. Factor that thinking into your recovery strategies. ‘

The fourth class of loss is caused by process or practice errors. First – inappropriate deletion processes. Deletion is good. You must delete expired and disposable information when you can otherwise all you are doing is driving up operating costs, storage costs, and risk.

But, do it wrong and you may cause ‘spoliation’. Make sure your processes are correct and cleared with legal and then audit them.

Next, mistakes occur and here are two examples:

1st – during litigation evidence processing, if you lose authenticity, damage chain of custody, the evidence is as good as lost. You may not be able to present it.

2nd – during migration events many things such as these can happen. It is safe to say that Migration causes damage. After two migrations most IT people will openly admit they have lost some portion of the information. Migration data loss is significant. That is why all digital information is at risk long-term. We just don’t have good physical and logical migration practices in place as an industry.

For long-term retention and preservation I strongly urge you to get expert help. Talk to me!

We’ve invited members of the IEEE’s Mass Storage Systems and Technologies workgroup on digital preservation to join with SNIA members in reviewing the requirements for long-term retention and preservation. If you would like to participate in this discussion, please go to the DMF Community’s site and register to access it.

http://community.snia-dmf.org

This is an important conversation as we need to update these requirements and then extend them further as we consider the implications of bringing technologies and architectures to market to solve the two ‘holy grail’ problems of preservation – logical and physical migration. Please participate.

Three years ago we started the work on long-term digital information preservation in the Data Management Forum’s Long-Term Archive and Compliant Storage initiative, LTACSI. One of the first activities we held was a panel discussion at the SNIA’s June 2005 Symposium in Boston. Among the panelists was an archivist, MacKenzie Smith, Assoc-Dir for Technology, MIT Libraries and a datacenter practitioner, Jim Riggs, PERMS Program Manager, US ARMY who has a huge long-term retention challenge. Now the room was full of about 70 storage ‘geeks’ – the types that frequent symposia such as this. But, it also was attended by a few RIM/IT types and a CTOs from the handful of emerging archive systems companies like Permabit and Archivas, some email archiving companies, as well as a contingent of the CAS group from EMC. MacKenzie surprised us all when she told us in clear terms how difficult her work was with today’s storage systems and that the way we looked at ‘archive’ was wrong.

Point 1:

  • Based on feedback we got there, from our engagements with RIM and IT practitioners from ARMA and other groups including the SNIA End-User Council, and then from the important “Long-Term Digital Information Retention Requirements Study” I conducted for SNIA and published in January of 2007, we were continually admonished to stop using the “archive” word as it was too confused.
  • Here is a poignant quote from the survey: Records retention is different than depositing something in an archive. Archiving is a very problematic word and I would suggest not using it. It suggests dumping records into some bottomless pit where they can be forgotten. (Instead) Ingest (them) into a record keeping environment where they can be permanently preserved for long-term records retention seems better.

Point 2:

  • Engagements with ARMA’s RIM community and work on regulatory compliance brought out the importance of retention-periods, the setting of retention requirements, and proper disposition (meaning permanent deletion) of expired information to reduce the volume of information being stored long-term.

  • Paradoxically, our requirements survey as well as many informal audience surveys at conferences tell us that approximately 80% of the IT community still don’t know the requirements for the information they manage. A gauge of this disconnect can be seen in the many retention-requirements documents produced by RIMs that contain 2000 to 4000 specific record types and retention schedules.IT and IT systems can’t handle that type of granularity. (Thankfully, this thinking is dying out as people start talking and working together – we see classification catching on using just a few buckets.)
  • This gap is very important as it led us to begin the work with ARMA in stating that “Collaboration” is the starting point to “information-centric management” just as setting requirements for that information based on its value to the organization is the starting point for Information Lifecycle Management, ILM, based practices. (See the white paper we co-authored: “Collaboration: the New Standard of Excellence” linked on my publications page.)
  • Think about it now. Retention requirements are the focal issue to legal and RIM. Storing it off into a silo the focus of IT because they don’t have the authority to delete anything. No wonder, we have a disconnect around what archive means. Here are some definitions from their 2007 glossaries that illustrate the difference in thinking:

o ARMA – RIM: (context retention) 1. Used for electronic records, it is the procedure for transferring information from an active file to an inactive file, storage medium, or facility. 2. Act of creating a backup copy of computer files. See also BACKUP

o Society of American Archivists – Archivists: (context computing) – To store data offline.

o SNIA – IT: (context ILM) – (verb) To copy or move data for purposes of retention; to create an archive.

  • OK, I have to say something here about using backup for an ‘archive’. Don’t. Completely wrong thinking.  We’re trying to kill that message everywhere we can.

Point 3:

  • More information is being held long-term by more companies than any of us expected. In the requirements survey, 83%, of the 110 responding companies to this question, reported that they have to keep some information over 50 years.
  • What is long-term? Isn’t it relative. Yes, but we still need a number. Read my discussion on the definition of “long-term “ in the requirements study for the details on how this was derived, but for now let me just make the statement. In the LTACSI, we’ve adopted the definition that long-term is the period of time beyond which you start losing data. Today, that number is 10-15 years.

Now you have the background for what I want to say. The point is that we have to shift our thinking to using retention and preservation as the key terms, not archive. Let’s redefine archive similarly to what the digital archivist and library communities did in OAIS as an “electronic archive” defining a type of repository for long-term preservation, not as a verb which the storage community uses to connote “moving data into an electronic archive.” Throw the verb out! It is wrong thinking anyway as the notion of moving information around as it ages just adds cost and complexity. (aha, another discussion thread…)

The beauty of this switch is that it also changes our frame of reference and helps move the organization down the path towards information-centric management. Now, you don’t just say the words and its over. There is important work to do:

  • First, IT, RIM legal, security, and the business groups have to get together and collaborate to identify their information assets, classify them into a manageable number of buckets, and then set the retention requirements. (And, while at it set the other requirements too, please.) The mantra I teach for this process is “collaborate, identify, classify, requirements, implement, measure, improve”.
  • Second, we need the storage industry to recognize that information services such as ILM, retention, preservation, deletion, etc require the capabilities of managing information – not just the data. (See the discussion on the difference between digital information and data to fully appreciate this thought.)
  • Finally, we need a new storage architecture for long-term retention in the datacenter – not just a ‘preservation data store’ or another proprietary silo. And that is the point of this note. With it “archive” and backup go away and are replaced with retention and preservation.

I’ll discuss this architecture in another post titled “Virtualizing the secondary storage tier.”