<?xml version="1.0" encoding="utf-8" ?>

<rss xmlns:dc="http://purl.org/dc/elements/1.1/" 	xmlns:content="http://purl.org/rss/1.0/modules/content/" version="2.0">

	<channel>

		<title>Dipesh Patel's Blog - Dipesh on Dedupe</title>

		<link>http://news.commvault.com/DipeshPatel</link>

		<description><![CDATA[<p>Since joining CommVault in 2008, Dipesh has been focused on highlighting the value of CommVault's global, embedded data deduplication to audiences far and wide.</p><p>Before joining CommVault, Dipesh's career spanned a number of companies, including Intel, IBM and NetApp where he worked in a variety of Channel Marketing, Product Marketing, and Product Management roles.</p><p>A constant thread throughout his career has been a relentless search for ways to harness and convert the power of new technologies into tangible customer value.</p>]]></description>

		<language>en</language>

		<copyright>&#169;1999-2009 CommVault Systems, Inc. All rights reserved. CommVault, CommVault and logo, the &quot;CV&quot; logo, CommVault Systems, Solving Forward, SIM, Singular Information Management, Simpana, CommVault Galaxy, Unified Data Management, QiNetix, Quick Recovery, QR, CommNet, GridStor, Vault Tracker, InnerVault, Quick Snap, QSnap, Recovery Director, CommServe, and CommCell, are trademarks or registered trademarks of CommVault Systems, Inc. Index Engines and Litigation Readiness are trademarks or registered trademarks of Index Engines, Inc. All rights reserved. All product names mentioned are trademarks or registered trademarks of their respective organizations. All other third party brands, products, service names, trademarks, or registered service marks are the property of and used to identify the products or services of their respective owners. All specifications are subject to change without notice.</copyright>

		<lastBuildDate>Sun, 22 Aug 2010 17:33:00 EST</lastBuildDate>

	<item>

		<title>Why Cloud Computing Will Drive Deduplication Demand</title>

		<description><![CDATA[<p>Earlier this year, I <a href="http://news.commvault.com/DipeshPatel/000046_The_Terrible_Twins_Driving_Deduplication.asp">talked about the "terrible twins"</a> driving the need (and subsequent ROI) for <a href="http://www.commvault.com/solutions-deduplication.html" target="_blank">deduplication</a>: increased use of virtual machine environments, and perhaps surprisingly, the increased need for data retention driven by corporate and government-mandated requirements.</p><p>Well, I'd like to state that we may actually start to see a third "twin": cloud computing.</p><p>To set the stage, let's start off with a very brief overview of the evolution of storage. First, we had direct-attached storage - big machines, with "big" dedicated storage attached. Often this was tape. Then we saw a movement to a shared, tape-based approach. That has spurred the evolution in tape capabilities and formats, notably from LTO1 to LTO2 to LTO3, then LTO4 and now with LTO5 in the near future. Each step in that evolution resulted in an approximate doubling of capacity in tape storage. But there are still the inherent limitations to only using tape:  the serial-based nature of tapes, and the logistical delays when tapes are located off-site in a secure location.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000054_Why_Cloud_Computing_Will_Drive_Deduplication_Demand.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000054_Why_Cloud_Computing_Will_Drive_Deduplication_Demand.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 16 Aug 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Deduplication Across the Information Lifecycle: Production Data</title>

		<description><![CDATA[<p>We're back sport racers! In my last post explaining why CommVault software doesn't deduplicate third-party datasets, I mentioned that it also doesn't deduplicate production data, only the data that is archived/backed up using CommVault's Simpana <a href="http://www.commvault.com/products-resource-management.html" target="_blank">data storage management software</a>.</p><p>Am I bummed out by that fact? Not really, and here's why.</p><p>First, when you look at the sources of data <a href="http://www.commvault.com/solutions-deduplication.html" target="_blank">deduplication</a>, backup datasets top the list for the amount of duplicate data because you're often backing up data that is between 95-99% the same as the last time you backed up the dataset. So if you have 30 days worth of backups, then even with a daily change rate of between 1-5%, you're going to get an awful lot of overlap.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000052_Deduplication_Across_the_Information_Lifecycle_Production_Data.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000052_Deduplication_Across_the_Information_Lifecycle_Production_Data.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Tue, 10 Aug 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Deduplication Across the Information Lifecycle: Third-Party Datasets</title>

		<description><![CDATA[<p>We still get asked if CommVault software deduplicates production data, or if our software deduplicate third-party datasets. The answer to both is clear and simple: NO and NO.  I'd like to spend the next two blog posts elaborating on this.  Let's start with the latter first &ndash; deduplicating third-party datasets.</p><p>If all you're looking to do is <a href="http://www.commvault.com/solutions-deduplication.html" target="_blank">deduplication</a> alone, there are a number of vendors out there that focus on just that piece of the puzzle. They'll happily ingest whatever data you throw at them. In most instances, these can be found in the form of a dedicated deduplication appliance. There might be some indigestion along the way, but generally they work OK, at least until they get to the 90-95% capacity mark, at which point many customers see a rapid downgrade in performance often as a result of the need for enough resources during the scheduled general processing. These processes include the need to calculate check-sums on the data to ensure that it's not been corrupted, or the need to move data between nodes for load-balancing (which therefore would improve performance).</p><p><strong>What are the downsides to that approach?</strong></p><p>The first is cost, especially compared to the prices of increasingly cheap commodity disk. When you actually look under the covers, there are only a handful of disk manufacturers worldwide. So you may be paying a premium for essentially the same physical drives and memory under the covers. And just to remind myself how cheap is cheap, I often pop over to places like eBay to get a real world gauge on prices. For general pricing for storage systems/software, you also can check out <a href="http://storagemojo.com/storagemojos-pricing-guide/" target="_blank" onClick="LinkAlert()">a very useful blog/resource here</a>, courtesy of Storage Mojo.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000050_Deduplication_Across_the_Information_Lifecycle_Third-Party_Datasets.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000050_Deduplication_Across_the_Information_Lifecycle_Third-Party_Datasets.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 2 Aug 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>The 'Terrible Twins' Driving Deduplication</title>

		<description><![CDATA[<p>It's no surprise that deduplication continues its relentless march into data centers (and remote offices) everywhere. Given the need for ongoing cost containment, it's often the first level of "defense" to free up much needed funds. Funds that can be re-invested back into ongoing maintenance or new IT projects in support of the core business.</p><p>But I want to take a quick look at two other underlying drivers that I believe will continue to underpin the growth in deduplication beyond a single budget cycle or fiscal year.</p><p><strong>Backing up VM's requires deduplication</strong></p><p>The first driver is of course, Virtualization. We have seen a huge amount of interest among our customers in virtualization (mostly VMware, but increasingly Hyper-V as well). The way I think of VM environments is sort of like the underlying premise of "The Matrix". Basically, we're digitizing what were previously standalone physical infrastructures. In addition, with VM environments, there is often a much more rigorous <u>and</u> enforceable need for standardization. For instance, if you're going to rapidly create 500 VMs, then having a set of five standard templates to choose from makes things a lot easier to get up and running, and much MUCH easier to manage!</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000046_The_Terrible_Twins_Driving_Deduplication.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000046_The_Terrible_Twins_Driving_Deduplication.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Sun, 13 Jun 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Deduplication Boost or Bust?</title>

		<description><![CDATA[<p>Another week, another <a href="http://www.emc.com/about/news/press/2010/20100511-01.htm" target="_blank" onClick="LinkAlert()">development</a> in the deduplication space. Just when things looked like they were getting ho-hum with dedupe moving into the mainstream, EMC announced its Data Domain "Boost". Some folks have been asking us what this means for CommVault.  Actually, it's way more interesting to think about what this means for the future of Data Domain.</p><p>So, the quick version is that Data Domain Boost is built upon Symantec's proprietary OST interface to move the hashing and comparison work largely to the Media Server. This brings them closer to the way CommVault does <a href="http://www.commvault.com/solutions-deduplication.html" target="_blank">deduplication</a> (along with the Media Server-based configuration of Symantec's PureDisk technology). I guess imitation is really the sincerest form of flattery after all!</p><p>But what I'm really wondering about is how EMC justifies asking for a huge hardware premium for their boxes if most (or all?) of the deduplication is actually being done outside of the Data Domain box? And why would customers be happy being locked into a proprietary interface that only works today if they use Symantec and Data Domain?</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000045_Deduplication_Boost_or_Bust.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000045_Deduplication_Boost_or_Bust.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Sat, 15 May 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Dedupe Everywhere</title>

		<description><![CDATA[<p>The title of this blog is a bit tongue-in-cheek, since "Dedupe Everywhere" has been the latest mantra from one of our competitors, at least with regards to their deduplication story. But I happen to agree with them -- with a twist. We've watched the latest trends in data management and it's become pretty evident that data deduplication is well on its way to becoming quite ubiquitous. First, there were the <a href="http://www.enterprisestrategygroup.com/2010/04/2010-data-protection-trends/" target="_blank" onClick="LinkAlert()">survey results from ESG</a> that show adoption broadening, especially for larger enterprises. This has been followed by the continued spread of deduplication beyond backup data, to long-term archive data and now even to production data.</p><p>NetApp has been the most visible proponent and platform for deduplication of production data. And they're about to get some company: Compellent.  CommVault was a Gigabyte sponsor at Compellent's recent <a href="http://cdrive.compellent.com/" target="_blank" onClick="LinkAlert()">C-Drive show</a>, where we got some insight as to where they're headed during their CEO's keynote speech on Monday, May 3.   There, Philip Soran called out how the world is beginning the transition beyond a service-based economy to one that's data-driven. For instance, if you look at Facebook, it only took four years to get to 400 million users.  And of course deduplication, even at the production tier, starts to provide meaningful benefits, especially when your data is growing exponentially.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000044_Dedupe_Everywhere.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000044_Dedupe_Everywhere.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Tue, 11 May 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>One more thought on the dedupe ratio debate</title>

		<description><![CDATA[<p>I have enjoyed the dialogue around the topic of dedupe ratios. It's a good thing for customers to hear both sides of the debate, so that's why I've appreciated the view of the "dedupe ratios do matter" folks like <a href="http://www.aboutrestore.com/2010/03/31/deduplication-ratios-and-their-impact-on-dr-cost-savings/" target="_blank" onClick="LinkAlert()">Jay Livens of SEPATON</a>, Howard Marks and <a href="http://www.backupcentral.com/content/view/305/47/" target="_blank" onClick="LinkAlert()">Curtis Preston</a>. Who knew there were so many?</p><p>With all due respect to each of them, I think we all actually agree that better dedupe ratios are a good thing. Of course 20:1 is going to get 5% more savings compared to 10:1. But my main point with the dedupe ratios is that too many vendors start rattling off high dedupe ratios as "bait". It's a bunch of marketing hype, and yes, as a product marketer, I do see the irony.</p><p>You should try out different approaches and vendors, and see what best fits in your environment. The dedupe ratio is then going to be a <u>by-product</u> of your dataset change rates and retention policies. Focusing on dedupe ratios puts the cart before the horse.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000042_One_more_thought_on_the_dedupe_ratio_debate.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000042_One_more_thought_on_the_dedupe_ratio_debate.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Wed, 7 Apr 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>The Great Dedupe Ratio Debate</title>

		<description><![CDATA[<p>My recent post on how to analyze dedupe ratios and its impact on cost savings has had a healthy amount of traffic.  One reader was our industry colleague Curtis Preston, who wrote an <a href="http://www.backupcentral.com/index.php?option=com_content&task=view&id=305&Itemid=47" target="_blank" onClick="LinkAlert()">interesting post in response</a>, titled "How to REALLY analyze dedupe ratios and their impact on cost savings".  I appreciate Curtis' attention on this, as it has catalyzed the debate among other bloggers, in particular <a href="http://www.aboutrestore.com/2010/03/31/deduplication-ratios-and-their-impact-on-dr-cost-savings/" target="_blank" onClick="LinkAlert()">Jay Livens from SEPATON</a> and <a href="http://www.networkcomputing.com/deduplication/dedupe-ratios-do-matter.php" target="_blank" onClick="LinkAlert()">Howard Marks from Network Computing</a>.  First of all, it's always great to get different perspectives on things, if nothing else it helps give our community a range of opinions versus all of us blindly marching to one drumbeat.  And of course the healthy debate keeps everyone sharp and on their toes.</p><p>So from my (evolving) perspective, focusing too much on dedupe ratios, rather than overall acquisition and operational costs, is an imperfect barometer of potential savings. But I still contend that once you've captured 90% to 95% of the potential savings from dedupe (10:1 and 20:1 dedupe ratios), there are likely to be other IT initiatives that then might be better candidates for your time and investment.</p><p>Below, I'm summarizing the essence of his arguments (as I interpret them, which you really should read for yourself) and my response and reaction:</p><p>1) Dedupe ratios do matter because that changes how much disk you buy.</p><p>I agree &ndash; better dedupe ratios translate into less outlays in terms of acquisition. BUT, there are a few points I was trying to highlight here to put the focus on dedupe ratios into better perspective (not eliminate, but put it in context):<ul>	<li><p>First, some vendors have in the past been touting dedupe ratios of 50:1, 100:1 or even 500:1. In my mind, that's more fluff than reality, so after some point it just starts to seem a bit absurd, especially when most vendors get about the same dedupe ratios (more on that in the next bullet).</p></li>	<li><p>Second, as I noted, most dedupe vendors will get about the same reduction most of the time. There is the possibility of meaningful differences some of the time. So you should research which approach works best in your environment (for example source- versus target-based deduplication, or file-level versus block-level), and which vendor is best for that approach. But again I wouldn't get too wowed with impressive sounding dedupe ratios. You should really focus on picking the right vendor/approach first. Your actual dedupe ratios will be an outcome of your dataset change rates and retention timeframe. </p></li>	<li><p>Third, incremental increases in savings are going to shrink. That's a fact of mathematics not deduplication: 5:1 delivers 80% savings, 10:1 delivers an extra 10%, 20:1 delivers an extra 5%. Trying to tweak your deduplication set-up to get an additional 5% savings may not be worth the time, especially if you're managing a smaller dataset. If you have PBs that could be huge, but then your selection criteria is likely to be a lot broader than using dedupe ratios when it comes to adopting deduplication and choosing the right vendor/approach.</p></li>	<li><p>Using a 100TB example, the differences between 10:1 and 20:1 dedupe ratios is 5TBs, which is great since you're buying only half the storage (5TBs versus 10TBs). But at $2K/TB (CDW is great to do <u>spot checks on the latest prices</u>), that's $10K. It's not <u>just</u> about 5TBs versus 10TBs, it's about an additional outlay of $10K. Is that really the best thing to focus on to incrementally lower your costs? And again you're not comparing vendors but figuring out at what point your overworked IT team can go focus on other initiatives.</p></li></ul></p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000041_The_Great_Dedupe_Ratio_Debate.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000041_The_Great_Dedupe_Ratio_Debate.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Tue, 6 Apr 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Sustainable data reduction</title>

		<description><![CDATA[<p>Some of you may recall that a while back, I talked about how a <a href="http://news.commvault.com/DipeshPatel/000030_Maximum_Data_Reduction_PART_THREE_Beyond_Deduplication.asp">more holistic approach to data reduction</a> that includes storage reporting and management, archiving and deduplication delivers better results than <a href="http://www.commvault.com/solutions-deduplication.html" target="_blank">deduplication</a> alone. Recently, IDC released (a CommVault-sponsored) report on that very subject. There we had a chance to work with IDC to provide greater detail about challenges, benefits and approaches compared to the amount of space typically available for a blog post.</p><p>For those of your interested in learning more, you can <a href="http://info.commvault.com/forms/IDC-AnIntegratedApproachtoDataReductionPart1?cmpgn=70140000000G6Jv" target="_blank">access the report here</a>.</p><p>The key message for me still is focused on simplicity. Storage Reporting Management (SRM), Archiving, and Deduplication have each been available in the market for almost a decade, more or less. So why haven't more organizations combined all three to maximize data reduction? Easy: complexity of implementation and management. These negated the benefits of less storage acquisition. You replaced hardware investment with headcount investment.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000040_Sustainable_data_reduction.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000040_Sustainable_data_reduction.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 22 Mar 2010 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>How to analyze dedupe ratios and its impact on cost savings</title>

		<description><![CDATA[<p>Well folks, it's time to jump back into the thick of things after a little while away from the blog.  One thing I'd like to revisit is a "back-to-basics" topic:  dedupe ratios.  This still is touted as a differentiator by some vendors, but sometimes it can be a bit misleading. Today's post digs a little deeper to help you understand the some of the major drivers behind dedupe ratios and help you understand how to use it as a tool for optimizing your savings.</p><p>First, our own anecdotal experiences (and internal testing) indicate that when it comes time to see what's stored on disk, most vendors (ourselves included) end up with about the same capacity usage.</p><p><u>Your</u> dedupe ratios are going to largely depend on <u>your</u> dataset characteristics: specifically the change-rate between cycles and the retention period. The higher the change rate (ex: 10% versus 2%), the lower the dedupe ratio, as the former rate implies that only 90% of the data was the same from the previous cycle, versus 98% in the latter case. Another factor that affects dedupe ratios is the retention period of your backup or archive jobs: the longer the retention period, the higher the dedupe ratio. That means that you're much more likely to find greater redundancy across 30 backup or archive cycles versus "only" 7 cycles.  One easy way to think about this is to walk through a daily backup scenario, while keeping the change rate between daily backups to be 0%. In that scenario, retention of 7 daily backup jobs will result in a dedupe ratio of 7:1. Extending the retention period to 30 days increases the dedupe ratio to 30:1.</p><p>The graph below illustrates what your dedupe ratios would look like based on 5 different change rates (1%, 2%, 5%, 10%, and 20%) across a 90 day window, starting with your baseline of cycle 1. So you can see that if you have a fairly high change rate of 20%, that's going to plateau your dedupe ratios fairly quickly at about a 5:1 ratio. Archive data is an example of such data that changes a fair bit between cycles, for example if you're only archiving data every 30 days. However if you have a low change rate, that can continue to provide some fairly good ongoing results. In general, we see a range of change rates that are often related to the specific data types and workloads, rather than company specific. However, with deduplication we are seeing much longer retention periods for backups, usually increasing the retention period to about a month's worth of backups kept on disk for fast recovery.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000039_How_to_analyze_dedupe_ratios_and_its_impact_on_cost_savings.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000039_How_to_analyze_dedupe_ratios_and_its_impact_on_cost_savings.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 8 Mar 2010 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>I Don't Do Predictions</title>

		<description><![CDATA[<p>Well folks, I have to say 2009 went by pretty darn quick! It was definitely an exciting start for me as I jumped in with both feet on data deduplication. And it's always nice to have some strong technology to back you up (thank you developers!). But as we began 2010, someone suggested that I do a 2010 Predictions List.</p><p>I'll take a pass on that for this year at least. First of all, I think there are plenty of folks out there who'll cast their gaze into the crystal ball faster than you can say F-Y-I. So instead, I looked around and thought it would be more interesting to review the "deep thoughts" from some of the respected pundits in the industry. They are (in no particular order):<ol>	<li><p><a href="http://www.thebiggertruth.com/2010/01/where-is-the-it-spending-going-to-happen-in-2010/" onClick="LinkAlert()" target="_blank">Steve Duplessie of Enterprise Strategy Group</a>, who reviewed his thoughts on "Where Is the IT Spending Going to Happen in 2010?";</p></li>	<li><p><a href="http://searchdatabackup.techtarget.com/news/column/0,294698,sid187_gci1374545,00.html" onClick="LinkAlert()" target="_blank">Curtis Preston</a>, a frequent contributor to many industry articles and seen at many a storage event; and</p></li>	<li><p>Chuck Hollis from a little company outside of Beantown , <a href="http://chucksblog.emc.com/chucks_blog/2009/11/peering-into-the-storage-crystal-ball.html" onClick="LinkAlert()" target="_blank">gazing into his very own "Storage Crystal Ball"</a>.</p></li></ol></p><p>Steve provided a nice overview of what he sees as good, bad and just plain interesting. Curtis gives us a nice moment to stop and reflect on the events and changes of 2009. Chuck gave us a somewhat extensive peek at where he sees the world going and EMC's place within that context.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000034_I_Dont_Do_Predictions.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000034_I_Dont_Do_Predictions.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Tue, 12 Jan 2010 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>Bigger is Not Always Better</title>

		<description><![CDATA[<p>Whenever EMC/Data Domain rolls out new models of their storage appliances, folks tend to ask me what that means for CommVault.</p><p>Well, regardless of their offering, the answer so far has been the same. New EMC/Data Domain models don't change the fundamental differences between traditional deduplication versus CommVault's approach to deduplication.</p><p><u>All</u> of the dedupe appliance vendors continue to update their models to respond to rapid data growth. That's pretty much expected for two reasons. First, a data growth rate of even 25% per year means that the total amount of data that needs to be backed up doubles in just about three years. Second when you factor in even tighter SLA agreements that demand more and more backup (versus retention) data be kept on disk, it's easy to see why every appliance vendor would need to scale to keep up with bigger workloads.</p><p><i>(For example: If you have a 10TB dataset and back that up 5 days every week, that's 50TBs which is stored on 5TB of usable capacity given a 10:1 dedupe ratio. Now if that production dataset doubles over time to 20TBs, and you decide to now keep <u>20</u> backups on disk instead of the original 5 backups, that's 400TBs of backup data. Deduplicated down 10:1, that would now require 40TBs. That 8X increase is driven by both the growth of the original dataset and the extension of your restore-from-disk window from 5 days to 20 days.)</i></p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000031_Bigger_is_Not_Always_Better.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000031_Bigger_is_Not_Always_Better.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Wed, 9 Dec 2009 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>Maximum Data Reduction (PART THREE): Beyond Deduplication</title>

		<description><![CDATA[<p>In the previous two posts we talked about how maximum data reduction is more than "just" dedupe. </p><p>Keep in mind I said "more than." Deduplication definitely has a role in the data reduction story. So once you actually know where your data is, how much of it is actually being accessed/modified, and have archived what's "stale" then it's time to consider how deduplication can lower the costs even further. Why dedupe now, versus at the beginning? With the stale data moved off to Tier2, you not only have the potential to double the capacity on your Tier1 storage (assuming 50% of data is found to be stale), but also reduce the amount of dedupe processing you need to do with the data that remains. And that's true no matter who you ultimately choose for deduplication.</p><p>So, what would happen with deduplication in the picture? For the sake of argument, let's use a conservative dedupe ratio of 5:1. Now keep in mind that deduplication ratios are much more a factor of data change rates and retention periods. So for ease-of-comparison, we're assuming that you get 5:1 no matter which dedupe vendor you choose to use.</p><p>Our baseline case started with backing up the same 10TBs of data 30 times = 300TBs of cumulative raw backup data. With a 5:1 ratio, that reduces down to 60TBs. Sounds great, and it is if you're only paying $2K/TB for the backup disk capacity. Given that number, the costs come down even further than "archive alone", down to $220K. That is $100K for the 10TBs on primary storage, and $120K for the 60TBs on Tier2 storage.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000030_Maximum_Data_Reduction_PART_THREE_Beyond_Deduplication.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000030_Maximum_Data_Reduction_PART_THREE_Beyond_Deduplication.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Wed, 2 Dec 2009 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>Maximum Data Reduction (PART TWO): Beyond Deduplication</title>

		<description><![CDATA[<p>Following on from the last post, maximum data reduction requires a three-part modular strategy: identify/categorize your data based on demonstrated usage; archive "stale" data; and deduplicate across both archive and backup data. This approach actually delivers better benefits than either archive or deduplication on its own.</p><p>What does mean for you? Well we'll start with a greatly simplified example and walk through the various permutations.</p><p>First, let's establish the baseline for comparison (always need a baseline for comparison). Say you've got 10TBs of data and need full backups for 30 days to facilitate rapid recovery. In a normal scenario that would mean that you would have 10TBs on primary (Tier1 storage), and 300TBs on Tier2 (30 backups * 10TBs per backup). At a cost of $10K/usable TB for production, and $2K/usable TB for Tier2, you'd be spending about $700K for this kind of protection/retention. That's based on $100K for Tier1 (10TBs * $10K/TB), and $600K for Tier2 (300TB * $2K/TB).</p><p align="center"><img align="center" src="http://news.commvault.com/DipeshPatel/images/beyond-deduplication-figure-1.jpg" hspace="0" vspace="0" border="0" /></p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000029_Maximum_Data_Reduction_PART_TWO_Beyond_Deduplication.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000029_Maximum_Data_Reduction_PART_TWO_Beyond_Deduplication.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Fri, 20 Nov 2009 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>Maximum Data Reduction (PART ONE): Beyond Deduplication</title>

		<description><![CDATA[<p>Deduplication is not Data Reduction. Or rather, with all of the buzz around dedupe, it's important to note that deduplication is not the only means to reducing the amount of data being backed up. Data Reduction is a broader category that includes deduplication among other technologies and approaches. This is something that I've been expounding upon during recent Innovate 8 events.</p><p>In the meantime, it seemed to me that in pushing the next "big" thing, many folks out there are happy to pin their hopes and dreams solely on deduplication. But they would be doing themselves a disservice. For one thing, deduplication is a great approach, but it doesn't solve the underlying problems in processes, people or platforms on its own. If you don't fix the root causes, no amount of deduplication will prevent you from facing the <u>same</u> issues, just in one, two, or (if you're lucky) three years.</p><p>For one thing, in many cases it doesn't solve the issues on the front-end, on primary storage. Most folks are implementing deduplication for backup and archive, where they get the most amount of data reduction across multiple backup/archive cycles. However, if the breakdown in your backup process actually originates up-front with the size of your primary datasets and/or length of your backup windows, then deduplication may not be the total answer. So even though the backed up data is going to occupy less space, each uncompressed full backup job (generally required if you're using a device-based target dedupe approach) will still have to churn through larger and larger amounts of production data.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000028_Maximum_Data_Reduction_PART_ONE_Beyond_Deduplication.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000028_Maximum_Data_Reduction_PART_ONE_Beyond_Deduplication.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 16 Nov 2009 8:30:00 -0500</pubDate>

	</item>

	<item>

		<title>The Trouble with Tribbles</title>

		<description><![CDATA[<p>Some of you may not be as familiar with Star Trek as others, but growing up in Canada, I was hooked on it. While I didn't get into all the original show's reincarnations, I still remember the odd episode here and there from back in the day.</p><p><a href="http://en.wikipedia.org/wiki/The_Trouble_With_Tribbles" onClick="LinkAlert()" target="_blank">One episode</a> in particular - "The Trouble with Tribbles" - still resonates with me, especially when I think about virtual environments. I'm sure I'm not the only one who's made the connection. That particular episode is centered on cute cuddly little creatures that start to multiply out of control. This is somewhat akin to virtual machines (VMs). No, VMs are not cute and cuddly, but they do seem to multiply like mad simply because it is so easy to create and deploy a new virtual machine compared to setting up a new physical server.</p><p><em>To paraphase the <a href="http://en.wikipedia.org/wiki/The_Trouble_With_Tribbles" onClick="LinkAlert()" target="_blank">Wikipedia entry</a> on the Star Trek episode: "The "trouble" with VMs is that they reproduce far too quickly and are capable of eating your spare storage capacity barren if their breeding is not controlled; in the words of Dr. McCoy, 'they are born pregnant' and threaten to consume all the available storage."</em></p><div align="center"><a href="http://www.flickr.com/photos/commvault/3926703166/" target="_blank"><img align="center" src="http://farm3.static.flickr.com/2615/3926703166_759052f526.jpg" border="0" width="450" height="300"></a></div><p>Deduplication is, in my mind, one of the key foundation technologies required to help get things (virtually) under control. This was evident during the week of VMworld 2009, recently held in San Francisco. We saw a tremendous amount of interest on the show floor, as the image at the top of this post shows, as customers (and prospects) kept us on our toes all day long at our booth. It was great talking to them about Simpana 8, and the support for virtualization and block-level deduplication that we introduced earlier this year.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000024_The_Trouble_with_Tribbles.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000024_The_Trouble_with_Tribbles.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Wed, 16 Sep 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Comparing CommVault to EMC</title>

		<description><![CDATA[<p>When talking about deduplication with prospects, customers and others, I often get asked what the acquisition of Data Domain by EMC means for CommVault? From a technology perspective, it's crystal clear. The ownership of the technology might have changed, but the underlying technology fundamentals still remain the same. So with that, let's do a bit of a deeper dive into how CommVault deduplicates data, and then compare it to both Avamar and Data Domain's traditional approaches.</p><p>Our approach to deduplication doesn't restrict the processing to the data source, or the storage target. It's a distributed approach that delivers more flexibility. So at the risk of perhaps re-hashing what you may already know, let's do a step-by-step walk through of what happens with CommVault deduplication and then we'll contrast that with EMC's multiple approaches.</p><p><strong>Deduplication: Not Exactly a Big Stretch (if you're CommVault that is)</strong></p><p>Regardless of whether you're deduplicating data or not, in a CommVault environment, the client (the server that hosts the application, database, file-system, etc.) picks the data that needs to be protected based on your backup policies, then optionally compresses the data, optionally encrypts the data and sends it on to the Media Agent (analogous to a Media Server in a Symantec NetBackup environment). The Media Agent then sends the data to the target storage system(s) and updates our index so that we know where the data processed as part of the backup job is held.</p><p>Now, when you throw deduplication into the picture, the client does one additional step: after picking/compressing the data, the client generates a hash signature that is sent along with the compressed/encrypted data to the Media Agent. The Media Agent then compares the data segment's hash signature to what we've seen before and either updates the index with a link to the existing data already on disk, or sends the unique data down to disk to be stored and updates our index and our dedupe database with the latest info. That's "it". The way we deduplicate the data is so deeply embedded into how we handle data in general that it wasn't much of stretch to then add deduplication into the data management mix.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000022_Comparing_CommVault_to_EMC.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000022_Comparing_CommVault_to_EMC.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Wed, 26 Aug 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Re-examining cost savings with TRADITIONAL dedupe</title>

		<description><![CDATA[<p>I've received quite a response <a href="http://news.commvault.com/DipeshPatel/000017_How_much_are_you_REALLY_saving_with_TRADITIONAL_deduplication_appliances.asp">to my last post</a>, through which I shared the results of a cost analysis exercise to explore how much companies are REALLY saving with TRADITIONAL dedupe vendors!  Specifically, the basis for the comparison was to put myself in the shoes of an IT shop evaluating deduplication as a whole versus the "next best alternative" which is simply to use off-the-shelf commodity disk-based storage.</p><p>Most of the commenters wanted to know how I could price generic storage so cheaply in the comparison, in which I used $1,500 per TB as a figure.  Those are fair comments, so let's revisit this comparison using the prices below, which I pulled from the CDW website. Putting myself in your shoes, I wanted to stick to looking at "above board" prices versus getting into company/deal-specific pricing and all the discounting that could entail. So, the table below is just a selection of storage options <a href="http://www.cdw.com/shop/search/hub.aspx?wclss=T2" target="_blank" onClick="LinkAlert()">available through CDW</a> as of late...</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000020_Re-examining_cost_savings_with_TRADITIONAL_dedupe.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000020_Re-examining_cost_savings_with_TRADITIONAL_dedupe.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Tue, 4 Aug 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>How much are you REALLY saving with TRADITIONAL deduplication appliances?</title>

		<description><![CDATA[<p>People often get obsessed with dedupe ratios and dedupe technologies, and have a hard time seeing the "big picture" around how much they're really going to save. Let's compare the savings from the traditional target-based dedupe vendors (like Data Domain, among <u>many</u> others) against standard commodity disk-based storage. I purposely left CommVault out of this and put myself in the shoes of an IT shop evaluating deduplication as a whole versus the "next best alternative" which is simply to use generic off-the-shelf storage. It also helps minimize any accusations that I'm unfairly manipulating the CommVault pricing just to make us look good.</p><p>To start, many traditional target device-based deduplication vendors often tout data reductions in the range of 10:1 to 20:1. So, let's be conservative and use 10:1 as a dedupe ratio to see what that really amounts to in dollars saved.</p><p>So, if you are backing up 100TBs of raw data, then with a 10:1 reduction, you're storing 10TBs on the target deduplication device. This sounds great because you're saving 90TBs of disk capacity. On the other hand, if you choose <u>not</u> to de-duplicate the data, then you'll end up with something like 50TB on regular commodity disk (on average, assuming you still compress the data).</p><p>That sounds like a lot of savings: 10TBs on the dedupe appliance versus 50TBs on commodity disk. But wait, decent commodity storage costs about $1,500 per TB (or even less), and based on our own research and experience, we estimate that the one of the most successful dedupe appliance vendors now charges a street price of between $7,000 and $9,500 per usable TB (averaging the two yields $8,250/usable TB). This is true even though the appliance uses essentially the same disk underneath, which is manufactured by just a handful of the remaining disk OEM manufacturers.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000017_How_much_are_you_REALLY_saving_with_TRADITIONAL_deduplication_appliances.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000017_How_much_are_you_REALLY_saving_with_TRADITIONAL_deduplication_appliances.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Mon, 29 Jun 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>What EMC should be saying</title>

		<description><![CDATA[<p>EMC is certainly a very respected player in the data management and storage management industry segments. They've got a lot of financial clout, sales presence and an array of many different hardware and software technologies assembled over the course of many years. So it's always interesting to me whenever they do any sort of marketing around major industry trends. Deduplication is no exception. Recently they <a href="http://www.emc.com/about/news/press/2009/20090519-01.htm"  target="_blank" onClick="LinkAlert()">issued a press release</a> that talked about increased integration between NetWorker and Avamar.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000015_What_EMC_should_be_saying.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000015_What_EMC_should_be_saying.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Thu, 21 May 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>Responding to Comments</title>

		<description><![CDATA[<p>We got some spirited e-mails in response to my <a href="http://news.commvault.com/DipeshPatel/000009_Whats_Old_Is_New_Again.asp">inaugural blog post</a> so I want to take a moment and re-address a couple of points I made there.</p><p>First, it became obvious that some folks are getting mixed up on some of the nuance. Stating that "tape is not going away" does not mean that tape is going to be used in the same way as it has been in the past. Disk costs are clearly headed in a direction where tape is rapidly becoming much less attractive for general backup and recovery. What I am saying is that there are cases (medical information for example) where the data needs to be kept for <u>decades</u>. For instance, OSHA regulations state the certain records need to be kept for 30 years, and a majority of hospitals surveyed in a <a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2430773" target="_blank" onClick="LinkAlert()">2008 study by the National Institutes of Health</a> kept permanent records of their patients. So when you're looking at literally decades of data retention, tape will continue to be a viable option for those customers.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000014_Responding_to_Comments.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000014_Responding_to_Comments.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Thu, 14 May 2009 8:30:00 -0400</pubDate>

	</item>

	<item>

		<title>What's Old Is New Again</title>

		<description><![CDATA[<p>Welcome to the inaugural post of my blog here at CommVault. It's been an exciting 2009 as we released the latest version of our Simpana software suite in late January with a huge focus on... you guessed it... <u>deduplication</u>! You already may follow <a href="http://news.commvault.com/DavidWest">Dave West's blog</a>, so you may wonder what this one's all about (or about to be, more accurately). I'm going to use this space to help highlight new developments, debunk dedupe "myths", and generally provide a bit more perspective specifically around deduplication. Dave gets to be the nice guy (actually he really is), but I get to be a bit more direct in my observations and opinions.</p><p>One thing I've noticed over the years is that a lot of time, what's old is often new again. - like the <a href="http://www.ft.com/cms/s/2/8c9d2e2c-1a5e-11de-9f91-0000779fd2ac.html" target="_blank" onClick="LinkAlert()">Pet Shop Boys</a>, for instance. It seems that many developments in the industry happen in cycles. Take "virtual machines" as an example. While at IBM, I worked closely with the eServer Group at IBM on the software side of things. At the time, IBM was making a big push around Linux, and VMware was one of the big partners as we sought to evangelize that OS technology. At the same time I came by accident to the realization that IBM actually had a hand in virtual machines <a href="http://en.wikipedia.org/wiki/VM/CMS" target="_blank" onClick="LinkAlert()">since 1972, thirty years earlier!</a> So it made an impression on me that here we had two separate "waves" of adoption, decades apart, which in many respects addressed broadly similar goals but in different contexts.</p>]]></description>

		<link>http://news.commvault.com/DipeshPatel/000009_Whats_Old_Is_New_Again.asp</link>

		<guid>http://news.commvault.com/DipeshPatel/000009_Whats_Old_Is_New_Again.asp</guid>

		<dc:creator>CommVault&#174;</dc:creator>

		<pubDate>Fri, 1 May 2009 8:30:00 -0400</pubDate>

	</item>

	</channel>

</rss>

