Posts Tagged ‘netapp’
Over the last decade we have systematically added a layer of indirection at every interface in the stack. These days we call this virtualization!
On a NetApp Filer we had
> raid disk group > volume
The problem was you could not expand/shrink Raid Groups on the fly, you couldn’t move data easily between different Raid Groups. We get a layer of indirection or virtualization
> raid disk group > aggregate > flex volume>
Since an aggregate was logical instead of physical, it could be expanded or shrunk without changing the volume, you could move data around.
On a USB Disk
If we look inside the disk itself, especially usb flash devices we went from
cylinder, heads, sector > logical table > device abstraction
Again this allowed the rotation of different logical sectors to different physical cells, to ensure a single cell was not rewritten more times than its lifetime.
In a SAN
we put a switch between the Raid Groups and the computer. The switch puts a layer of indirection between the blocks and the computer
You knew all that :-). So what does it have to do with capitalism. My simplistic definition of capitalism is that the system will remove all inefficiencies in a chain and who ever will remove them stands to benefit economically. Or said another way: money finds its way into the right pockets!
So look at the stack today:
chips > motherboard, network, storage, bios > hypervisor > OS > Security, Backup etc > Business, Productivity Apps
Every layer presents an interface to the layer above. Each layer is also owned by different companies in the eco-system. Each of those companies has pressure to maximize its revenue. Tasked with this difficult challenge, you look at the layer above and see what is selling and can you add it to your layer. Happens naturally over time: intel added virtulization support, phoenix bios is adding the hypervisor, operating systems are trying to add backup and security …. The cycle goes on ….
Virtualization will be “innovated” always in a higher layer of the stack and commoditized by the lower layers.
The higher layer in the stack finds a lot of new functionality and benefit by making interface to a lower layer “logical”. They take this to market, till at some point the lower layer realizes that this is their API, they should move virtualization into their layer. The pressure to do this is extreme and the time frame to monetize this really small:
- Imagine the tussle between VMW and the storage vendors. VMW introduces logical disks with cloning, but storage vendors want to offer logical luns and volumes and disk files, as this moves the cloning functionality from the hypervisor to the storage.
- Imagine: Western Digital or Seagate could create multiple disks (vhd/vmdk files) on a single physical disk and then offer the capabilities to grow, shrink, move data between them. Even add networking to the disk controller, then different disks can connect to each other. They can do that if the processing power, memory reach a price point that it can be embedded directly into the component or lower layer. Which is what effectively happened to computing.
- VMW introduces logical network switch, Cisco jumps in with nexus-V
For a consumer this is a good thing, but money and value are shifting down stack across different companies, which have to co-exist in the eco-system (cisco, intel, emc, vmw), yet guard their innovation from becoming commoditized.
I work for a relatively small, but growing, research non-profit. When last I measured it, our data use was growing at a compound rate of about 8% each month; in other words, we double our storage use every nine months or so. (As we’re in the midst of a P2V project where direct-attached storage is moving to our NetApps, we’re actually growing faster than that now, but that’s a temporary bump.) We already have multi-terabyte volumes – so, you do the math… the 16TB aggregate limit (of the 2020) is a real problem for sites like us.
It’s also worth noting that a 16TB aggregate is not a 16TB file system available to a server. 750GB SATA drives become Rightsize 621 GB drives. Then, for RAID-DP, subtract two disks out of each RAID group. Next, there’s the 10% WAFL overhead. And don’t forget to translate from marketing GB to real GB (or GB to GiB, if you will). So that maximum-size 26-disk aggregate made up of 750GB drives winds up as 11.4TB. And – of course – don’t forget your snap reserves after that.
As you mention, backups could be a challenge for large volumes; here’s how we solve it: The 2020 in question was purchased as a SnapVault secondary. Backups go from our primary 3040s to it, and then go via NDMP to tape for off-site/DR purposes. The secondary tier gives us the extended backup window we need to get the data to tape and meet our DR requirements. (I actually think this is a pretty common setup in this day and age.)
Of course, I’m not naive enough to think we can grow by adding drive shelves indefinitely (just added another one last Friday…). My personal opinion is that we’ll ultimately move to an HSM system, especially since much of the storage is used for instrument data (mass spec, microscopy, etc.) that is often difficult for researchers to categorize immediately as to its value. The thought is to let the HSM algorithms find the appropriate tier for the data automatically.
The EMC Celerra Deduplication is substantially different in concept, implementation and its benefits from the block-level deduplication offered by NetApp, Data Domain and others in their products. To understand the differences, let us first look at the comparison of data reduction technologies:
Data reduction technologies
|Technology||Typical Space Savings||Resource footprint|
|Fixed block deduplication||20%||High|
- File-level deduplication provides relatively modest space savings.
- Fixed-block deduplication provides better space savings, but consumes more CPU to calculate hashes for each block of data, and more memory to hold the indices used to determine if a given hash has been seen before.
- Variable-block deduplication provides slightly better space savings; but the difference is not significant when applied to file system data. it is most effective when applied to data sets that contain repeated but block-misaligned data, such as backup data in backup-to-disk or virtual tape library (VTL) environments.
- Compression is different from file-level or block-level deduplication in the granularity at which it applies. It is described as infinitely variable, bit-level, intra-object deduplication. It offers the greatest space savings of all the techniques listed for typical NAS data, and is relatively modest in terms of its resource footprint. It is relatively CPU-intensive but requires very little memory.
The storage space savings realized by compression is far greater than those offered by the other techniques and its resource requirements are quite modest by comparison. However, compression has a disadvantage in that there is a potential performance “penalty” associated with decompressing the data when it is read or modified. This decompression “penalty” can work both ways. Reading a compressed file can often be quicker than reading a non-compressed file. The reduction in the size of data that you must retrieve from the disk more than offsets the additional processing required to decompress the data.
Celerra Data Deduplication
Celerra Data Deduplication combines file-level deduplication and compression to provide maximum space savings for file system data based on
- Frequency of file access: files that are not “new” (creation time older than a configuration parameter), or not “hot”, i.e., in active use (access time or modification time older than a configuration parameter)
- File size: It avoids compressing files either if the files are small and the anticipated space savings are minimal, or if the file is large and its decompression could degrade performance and impact file access service levels.
The space reduction process
Celerra Data Deduplication has a flexible policy engine that specifies data for exclusion from processing and decides whether to deduplicate specific files based on their age. When enabled on a file system, Celerra Data Deduplication periodically scans the file system for files that match the policy criteria and then compresses them. The compressed file data is hashed to determine if the file has been identified before. If the compressed file data has not been identified before, it is copied into a hidden portion of the file system. The space that the file data occupied in the user portion of the file system is freed and the file’s internal metadata is updated to reference an existing copy of the data. If the data associated with the file has been identified before, the space it occupies is freed and the internal file metadata is updated. Note that Celerra detects non-compressible files and stores them in their original form. However, these files can still benefit from file-level deduplication.
Celerra Data Deduplication employs SHA-1 (Secure Hash Algorithm) for its file-level deduplication. SHA1 can take a stream of data less than 2 bits in length and produce a 160-bit hash, which is designed to be unique to the original data stream. The likelihood of different files hashing the same value is so substantially low that a collision rate has been reported after 2^69 hash operations. Unlike in compression, you can disable file-level deduplication in Celerra Data Deduplication.
Designed to minimize client impact
Celerra Data Deduplication processes the bulk of the data in a file system without affecting the production workload. All deduplication processing is performed as a background asynchronous operation that acts on file data after it is written into the file system. This avoids latency in the client data path, because access to production data is sensitive to latency. By policy, deduplication is performed only for those files that are not in active use. This avoids introducing any performance penalty on the data that clients and users are using to run their business.
Interesting post on The changing role of the IT storage pro by John Webster who interviewed the CIO of an unnamed storage vendor
The CIO observed that the consolidation of IT infrastructure driven by server virtualization projects and a future rollout of virtual desktops is forcing a convergence of narrowly focused IT administrative groups. This convergence will cause IT administrators to develop competency in systems and services delivery in the future, rather than remain silo’ed experts in servers, networks, and storage.
Virtualization has brought about the convergence of systems and networks; the convergence of Fibre Channel and Ethernet within the data center changes the nature of the relationships between enterprise IT operational groups as well as the traditional roles of server, networking, and storage groups.
As the virtual operating systems (VMware, MS Hyper-V, etc.) progress, we will see an increased tendency to offer administrators the option of doing both storage and data management at the server rather than the storage level. Backups and data migrations can be done by a VMware administrator for example. Storage capacity can be managed from the virtualized OS management console.
John’s observations tie-in with the lessons from the two preceding posts where we explored Netapp’s virtualization storage features and thin provisioned thin virtual disks, where we learnt that the administrators have to understand not just the file system nuances but also the storage features to use storage for virtualization effectively.
There are excellent posts by two prominent authors that provide a lot of insight into the nuances of using thick or thin provisioning for VM’s: Thin Provisioning Part 1 – The Basics and Thin Provisioning Part 2 – Going Beyond by Vaughn Stewart of NetApp and Thin on Thin – where should you do Thin Provisioning by Chad Sakac of EMC.
Escalating storage costs are stalling the deployment of virtualized data centers and it is becoming increasingly important for customers to leverage storage technology developed by VMware and its storage partners, Netapp and EMC for reducing storage costs.
Disk array block
Disk array blocks
Eager zeroed thick
Use Thin on Thin (Thin vmdk’s and Thin Provisioning on the storage array) for the best storage utilization because they allocate storage capacity from the datastore and storage array only on demand.
The Goal of Thin Provisioning is Datastore Oversubscription The challenge is that datastore, and all of its components (VMFS, LUNs, etc…) are static in terms of storage capacity. While the capacity of a datastore can be increased on the fly, this process is not automated or policy driven. Should an oversubscribed datastore encounter an out of space condition, all of the running VMs will become unavailable to the end user. In these scenarios the VMs don’t ‘crash’ the ‘pause’; however, applications running inside of VMs may fail if the out of space condition isn’t addressed in a relatively short period of time. For example Oracle databases will remain active for 180 seconds, after that time has elapsed the database will fail.
If you DO use Thin on Thin, use VMware or 3rd party usage reports in conjunction with array-level reports, and set thresholds with notification and automated action on both the VMware layer (and the array level (if you array supports that). Why? Thin provisioning needs to carefully manage for “out of space” conditions, since you are oversubscribing an asset which has no backdoor (unlike how VMware oversubscribes guest memory which can use VM swap if needed). When you use Thin on Thin – this can be very efficient, but can “accelerate” the transition to oversubscription.
The eagerzeroedthick virtual disk format is required for VMware Fault Tolerant VMs on VMFS (if they are thin, conversion occurs automatically as the VMware Fault Tolerant feature is enabled). It continues to also be mandatory for Microsoft clusters (refer to KB article) and recommended in the highest I/O workload Virtual Machines, where the slight latency and additional I/O created by the “zeroing” that occurs as part and parcel of virtual machine I/O to new blocks is unacceptable.
VMDK grew beyond the capacity of the data which it is storing. The reason for this phenomenon is deleted data is stored in the GOS file system. When data is deleted the actual process merely removes the content from the active file system table and marks the blocks as available to be overwritten. The data still resides in the file system and thus in the virtual disk. This is why you can purchase undelete tools like WinUndelete.
Don’t run defrag within a thin provisioned VM
the defragmentation process results in the rewriting all of the data within a VMDK. This operation can cause a considerable expansion in the size of the virtual disk, costing you your storage savings.
How to recover storage
First is to zero out the ‘free’ blocks within in the GOS file system. This can be accomplished by using the ‘shrink disk’ feature within VMTools or with tools like sdelete from Microsoft. The second half, or phase in this process, is to use Storage VMotion to migrate the VMDK to a new datastore.
The second half, or phase in this process, is to use Storage VMotion to migrate the VMDK to a new datastore. You should note that this process is manual; however, Mike Laverick has posted the following guide which includes how to automate some of the components in this process. Duncan Epping has also covered automating parts of this process.
The feature set that gives customers storage savings is described in a 42 minute informative video on Hyper-V and Netapp storage – Overview. I have summarized it in a 5 minute long post below.
Enterprise System Storage Portfolio
The Enterprise product portfolio consists of the FA series, V Series storage systems. These systems have a unified storage architecture based on the Data ONTAP, OS running across all storage arrays. Data ONTAP provides a single app interface and supports protocols such as FC-SAN, FCoE-SAN, IP-SAN (iSCSI), NAS, NFS, CIFS. The V-Series controllers also offer multiple vendor array support, i.e., they can offer the same features on disk arrays manufactured by Netapp’s competitors.
- Block-level de-duplication, or de-dupe, retains exactly one instance of each unique disk block. When applied to live production systems, it can reduce data 95% for full backups, especially when there are identical VM images created from the same template, and as much as 25%-55% for most data sets.
- Snapshot copies of a VM are lightweight because they share the same disk blocks with the parent and do not require as much space for the copy as the parent. If a disk block is updated with a snapshot, e.g., if a configuration parameter is customized for an application, or when a patch is applied, the Write Anywhere File Layout (WAFL) file system associates the updated block with the snapshot copy and writes to the disk, leaving the original block and its referrers intact. Snapshot copies therefore impose negligible storage performance impact on running VM’s.
- Thin provisioning allows users to define storage pools (Flexvol) for which storage allocation is done dynamically from the storage array on demand. Flexvol can be enabled at any point in time while the storage system is in operation.
- Thin replication between disks provides data protection. Differential Backups and mirroring over the IP network works at the block level copying only the changed blocks – compressed blocks are sent over the wire It enables virtual restores of full, point in time data at granular levels
- Double parity RAID, called Raid DP, provides superior fault tolerance and provides 46% saving vs mirrored data or RAID 10. You can think of it as being a RAID 6 (RAID 5 + 1 Double Parity disk). RAID DP can lose any two disk in the raid stripe without losing any data. It offers availability equivalent to RAID 1 and allows lower cost /higher capacity SATA disks for applications. The industry standard best practice is to use RAID 1 for important data, RAID 5 for other data.
- Virtual Clones (Flex clones). You can clone a volume / LUN or individual files. Savings = size of the original data set minus blocks subsequently changed in clone. Enables ease of dev and test cycles. Typical use cases: Build a tree of clones (clone of clones), clone a sysprep‘ed vhd, DR testing, VDI
There are several other videos on the same site that show the setup for the storage arrays. They are worth seeing to get an idea of what is involved to get all the machinery working in order to leverage the above features. It involves many steps and seems quite complex. (The hallmark of an “Enterprise-class” product? 😉 ) The SE’s have done a great job of making it seem simple. Hats off to them!