Comment Policy
Dipesh's Archives
June 2010
May 2010
April 2010
March 2010
January 2010
December 2009
November 2009
September 2009
August 2009
June 2009
More...
About the Author

Since joining CommVault in 2008, Dipesh has been focused on highlighting the value of CommVault's global, embedded data deduplication to audiences far and wide. Before joining CommVault, Dipesh's career spanned a number of companies, including Intel, IBM and NetApp where he worked in a variety of Channel Marketing, Product Marketing, and Product Management roles. A constant thread throughout his career has been a relentless search for ways to harness and convert the power of new technologies into tangible customer value.

BlogVault
David West's Blog – Stop the Data Management Insanity
Dipesh Patel's Blog – Dipesh on Dedupe
Simon Taylor's Blog – Simon Says
Jeff Echols's Blog – CommVault on Cloud Computing
Comment Policy
Around the Web
CommVault@DCIG



Sunday, June 13, 2010

The 'Terrible Twins' Driving Deduplication

It's no surprise that deduplication continues its relentless march into data centers (and remote offices) everywhere. Given the need for ongoing cost containment, it's often the first level of "defense" to free up much needed funds. Funds that can be re-invested back into ongoing maintenance or new IT projects in support of the core business.

But I want to take a quick look at two other underlying drivers that I believe will continue to underpin the growth in deduplication beyond a single budget cycle or fiscal year.

Backing up VM's requires deduplication

The first driver is of course, Virtualization. We have seen a huge amount of interest among our customers in virtualization (mostly VMware, but increasingly Hyper-V as well). The way I think of VM environments is sort of like the underlying premise of "The Matrix". Basically, we're digitizing what were previously standalone physical infrastructures. In addition, with VM environments, there is often a much more rigorous and enforceable need for standardization. For instance, if you're going to rapidly create 500 VMs, then having a set of five standard templates to choose from makes things a lot easier to get up and running, and much MUCH easier to manage!

The net effect though is that backing up virtual machines results in even better deduplication compared to backing up files from physical servers. There's just so much more overlap beyond "just" the data.

eDiscovery and Dedupe

The second driver for deduplication is one that doesn't always receive its fair share of attention: Compliance and eDiscovery. It seems like every time there is a change to data retention requirements, it results in more data, kept for longer periods of time. And if you're in certain industries (like healthcare), you may even have to keep patient records for decades. Here's a nice little site that gives you an informal overview of how long certain types of data have to be kept.

So the "hidden" driver for deduplication is actually the need to keep lots of data for long periods of time.

What does that mean for data management? It means that the scope of deduplication continues to expand beyond nearline backup/recovery. Long-term data retention increasingly becomes a critical area of research when you're looking to implement or expand deduplication. There are several vendors that offer dedupe to tape today. I'll leave it to you (or a future blogpost?) to evaluate how well each approach would work in your environment, but  it's definitely an area worth watching as no matter how cheap disk gets, the need to store 30 year old data on disk could still get expensive when you factor in power, cooling and logistical costs.

Of course, deduplication in this case is a key underlying factor that enables you to build out a better, more cost-effective Compliance & eDiscovery solution. If you want to know more about that, there's an upcoming webinar on that very topic from our own Simon Taylor.

What about you? Are there other key drivers/trends that you feel are going to continue driving deduplication further into your enterprise?

---------------------------------------

Webinar: Stepping Up to Deduplication with Simpana 8

Learn about the shortcut to global embedded deduplication success.
Join us for this webinar for an overview of CommVault's global embedded approach to deduplication and how you can easily implement dedupe using the people, processes and platforms you already have in place today.

Permalink | Submit Comment
Saturday, May 15, 2010

Deduplication Boost or Bust?

Another week, another development in the deduplication space. Just when things looked like they were getting ho-hum with dedupe moving into the mainstream, EMC announced its Data Domain "Boost". Some folks have been asking us what this means for CommVault. Actually, it's way more interesting to think about what this means for the future of Data Domain.

So, the quick version is that Data Domain Boost is built upon Symantec's proprietary OST interface to move the hashing and comparison work largely to the Media Server. This brings them closer to the way CommVault does deduplication (along with the Media Server-based configuration of Symantec's PureDisk technology). I guess imitation is really the sincerest form of flattery after all!

But what I'm really wondering about is how EMC justifies asking for a huge hardware premium for their boxes if most (or all?) of the deduplication is actually being done outside of the Data Domain box? And why would customers be happy being locked into a proprietary interface that only works today if they use Symantec and Data Domain?

Complete Post | Comment (1) | Submit Comment
Tuesday, May 11, 2010

Dedupe Everywhere

The title of this blog is a bit tongue-in-cheek, since "Dedupe Everywhere" has been the latest mantra from one of our competitors, at least with regards to their deduplication story. But I happen to agree with them -- with a twist. We've watched the latest trends in data management and it's become pretty evident that data deduplication is well on its way to becoming quite ubiquitous. First, there were the survey results from ESG that show adoption broadening, especially for larger enterprises. This has been followed by the continued spread of deduplication beyond backup data, to long-term archive data and now even to production data.

NetApp has been the most visible proponent and platform for deduplication of production data. And they're about to get some company: Compellent. CommVault was a Gigabyte sponsor at Compellent's recent C-Drive show, where we got some insight as to where they're headed during their CEO's keynote speech on Monday, May 3. There, Philip Soran called out how the world is beginning the transition beyond a service-based economy to one that's data-driven. For instance, if you look at Facebook, it only took four years to get to 400 million users. And of course deduplication, even at the production tier, starts to provide meaningful benefits, especially when your data is growing exponentially.

Complete Post | Submit Comment
Wednesday, April 07, 2010

One more thought on the dedupe ratio debate

I have enjoyed the dialogue around the topic of dedupe ratios. It's a good thing for customers to hear both sides of the debate, so that's why I've appreciated the view of the "dedupe ratios do matter" folks like Jay Livens of SEPATON, Howard Marks and Curtis Preston. Who knew there were so many?

With all due respect to each of them, I think we all actually agree that better dedupe ratios are a good thing. Of course 20:1 is going to get 5% more savings compared to 10:1. But my main point with the dedupe ratios is that too many vendors start rattling off high dedupe ratios as "bait". It's a bunch of marketing hype, and yes, as a product marketer, I do see the irony.

You should try out different approaches and vendors, and see what best fits in your environment. The dedupe ratio is then going to be a by-product of your dataset change rates and retention policies. Focusing on dedupe ratios puts the cart before the horse.

Complete Post | Comments (2) | Submit Comment
Tuesday, April 06, 2010

The Great Dedupe Ratio Debate

My recent post on how to analyze dedupe ratios and its impact on cost savings has had a healthy amount of traffic. One reader was our industry colleague Curtis Preston, who wrote an interesting post in response, titled "How to REALLY analyze dedupe ratios and their impact on cost savings". I appreciate Curtis' attention on this, as it has catalyzed the debate among other bloggers, in particular Jay Livens from SEPATON and Howard Marks from Network Computing. First of all, it's always great to get different perspectives on things, if nothing else it helps give our community a range of opinions versus all of us blindly marching to one drumbeat. And of course the healthy debate keeps everyone sharp and on their toes.

So from my (evolving) perspective, focusing too much on dedupe ratios, rather than overall acquisition and operational costs, is an imperfect barometer of potential savings. But I still contend that once you've captured 90% to 95% of the potential savings from dedupe (10:1 and 20:1 dedupe ratios), there are likely to be other IT initiatives that then might be better candidates for your time and investment.

Below, I'm summarizing the essence of his arguments (as I interpret them, which you really should read for yourself) and my response and reaction:

1) Dedupe ratios do matter because that changes how much disk you buy.

I agree – better dedupe ratios translate into less outlays in terms of acquisition. BUT, there are a few points I was trying to highlight here to put the focus on dedupe ratios into better perspective (not eliminate, but put it in context):

  • First, some vendors have in the past been touting dedupe ratios of 50:1, 100:1 or even 500:1. In my mind, that's more fluff than reality, so after some point it just starts to seem a bit absurd, especially when most vendors get about the same dedupe ratios (more on that in the next bullet).

  • Second, as I noted, most dedupe vendors will get about the same reduction most of the time. There is the possibility of meaningful differences some of the time. So you should research which approach works best in your environment (for example source- versus target-based deduplication, or file-level versus block-level), and which vendor is best for that approach. But again I wouldn't get too wowed with impressive sounding dedupe ratios. You should really focus on picking the right vendor/approach first. Your actual dedupe ratios will be an outcome of your dataset change rates and retention timeframe.

  • Third, incremental increases in savings are going to shrink. That's a fact of mathematics not deduplication: 5:1 delivers 80% savings, 10:1 delivers an extra 10%, 20:1 delivers an extra 5%. Trying to tweak your deduplication set-up to get an additional 5% savings may not be worth the time, especially if you're managing a smaller dataset. If you have PBs that could be huge, but then your selection criteria is likely to be a lot broader than using dedupe ratios when it comes to adopting deduplication and choosing the right vendor/approach.

  • Using a 100TB example, the differences between 10:1 and 20:1 dedupe ratios is 5TBs, which is great since you're buying only half the storage (5TBs versus 10TBs). But at $2K/TB (CDW is great to do spot checks on the latest prices), that's $10K. It's not just about 5TBs versus 10TBs, it's about an additional outlay of $10K. Is that really the best thing to focus on to incrementally lower your costs? And again you're not comparing vendors but figuring out at what point your overworked IT team can go focus on other initiatives.

Complete Post | Submit Comment

The content of this blog reflects the thoughts and opinions of the author, and does not represent the thoughts, opinions, plans or strategies of CommVault Systems, Inc. ("CommVault") and CommVault undertakes no obligation to update, correct or modify any statements made by the author of this blog. Any and all third party links provided by this blog are not affiliated with, nor endorsed by, CommVault.

 

Subscribe to Dipesh's
RSS 2.0 or Atom feed
Enter your email address:


Delivered by FeedBurner

Resources
Press Releases & Media Coverage
Cloud-Optimized Simpana® e-Kit
Simpana® 8 e-Kit
Industry Analyst Reports
Awards & Accolades
Case Studies
Corporate Fact Sheet
Corporate Background
Corporate Management
Syndication Feeds
Recent Press Releases
RSS 2.0 | Atom
Media Coverage
RSS 2.0 | Atom
David West's Blog – Stop the Data Management Insanity
RSS 2.0 | Atom
Dipesh Patel's Blog – Dipesh on Dedupe
RSS 2.0 | Atom
Simon Taylor's Blog – Simon Says
RSS 2.0 | Atom
Jeff Echols's Blog – CommVault on Cloud Computing
RSS 2.0 | Atom
BlogVault - 'Around the Web'
RSS 2.0 | Atom
BlogVault - 'CommVault @ DCIG'
RSS 2.0 | Atom
Legal | Copyright © 2010 CommVault, all rights reserved.