shareVM- Share insights about using VM's

Simplify the use of virtualization in everyday life

Posts Tagged ‘compress

EMC FAST Fully Automated Storage Tiering for storage savings

leave a comment »

Chuck Hollis, VP Global Marketing, CTO, EMC, describes FAST over 3 blog posts. The technology has been in Beta usage by several customers in 2009.

The premise

When you analyze the vast majority of application I/O profiles, you’ll realize that a small amount of data is responsible for the majority of I/Os; almost all of it is infrequently accessed. 

The principle

Watch how the data is being accessed, and dynamically cache the most popular/ frequently accessed data on flash drives, usually the small amount, and the vast majority of infrequently accessed data on big, slow SATA drives.

The storage savings solution

FAST Place  the right information on the right media based on frequency of access
Thin This (virtual) provisioning allocate physical storage when it is actually being used, rather than when it is provisioned.
Small Compression, single-instancing and data deduplication technologies eliminate information redundancies.
Green A significant amount of enterprise information is used *very* infrequently.  So infrequently, in fact, that the disk drives can be spun down, or at the least  be made semi-idle. 
Gone Policy-based lifecycle management – Archiving and Deletion, Federation to the cloud through private and public cloud integration.
The information can get shopped to a specialized service provider as an option

 

… and life goes on!

One thing hasn’t changed, though. The information beast continues to grow

Written by paule1s

December 11, 2009 at 9:29 am

EMC Celerra NAS SAN Deduplication

leave a comment »

The EMC Celerra Deduplication is substantially different in concept, implementation and its benefits from the block-level deduplication offered by NetApp, Data Domain and others in their products. To understand the differences, let us first look at the comparison of data reduction technologies:

Data reduction technologies

Technology Typical Space Savings Resource footprint
File-level deduplication 10% Low
Fixed block deduplication   20% High
Variable-block deduplication   28% High
Compression 50% Medium

 

  • File-level deduplication provides relatively modest space savings.
  • Fixed-block deduplication provides better space savings, but consumes more CPU to calculate hashes for each block of data, and more memory to hold the indices used to determine if a given hash has been seen before.
  • Variable-block deduplication provides slightly better space savings; but the difference is not significant when applied to file system data. it is most effective when applied to data sets that contain repeated but block-misaligned data, such as backup data in backup-to-disk or virtual tape library (VTL) environments.
  • Compression is different from file-level or block-level deduplication in the granularity at which it applies. It is described as infinitely variable, bit-level, intra-object deduplication.  It offers the greatest space savings of all the techniques listed for typical NAS data, and is relatively modest in terms of its resource footprint. It is relatively CPU-intensive but requires very little memory.

The storage space savings realized by  compression is far greater than those offered by the other techniques and its resource requirements are quite modest by comparison. However, compression has a disadvantage in that there is a potential performance “penalty” associated with   decompressing the data when it is read or modified. This decompression “penalty” can work both ways. Reading a compressed file can often be quicker than reading a non-compressed file. The reduction in the size of data that you must retrieve from the disk more than offsets the additional processing required to decompress the data. 

Celerra Data Deduplication

Celerra Data Deduplication combines file-level deduplication and compression to provide maximum space savings for file system data based on

  • Frequency of file access: files that are not “new” (creation time older than a configuration parameter), or  not “hot”, i.e., in active use (access time or modification time older than a configuration parameter)
  • File size: It avoids compressing files either if the files are small and the anticipated space savings are minimal, or  if the file is large and its decompression could degrade performance and impact file  access service levels.

Deduplication is enabled at the file system level and is transparent to access protocols. Mark Twomey‘s post provides an excellent overview of Celerra Data Deduplication.

The space reduction process

Celerra Data Deduplication has a flexible policy engine that specifies data for exclusion from processing and decides whether to deduplicate specific files based on  their age. When enabled on a file system, Celerra Data Deduplication periodically scans the file system for files that match the policy criteria and then compresses them. The compressed file data is hashed to  determine if the file has been identified before. If the compressed file data has not been identified before, it is copied into a hidden portion of the file system. The space that the file data occupied in the user portion of the file system is freed and the file’s internal metadata is updated to reference an existing copy of the data. If the data associated with the file has been identified before, the space it occupies is freed and the internal file metadata is updated. Note that Celerra detects non-compressible files and stores them in their original form. However, these files can still benefit from file-level deduplication.

Celerra Data Deduplication employs SHA-1 (Secure Hash Algorithm) for its file-level deduplication. SHA1 can take a stream of data less than 2 bits in length and produce a 160-bit hash, which is designed to be unique to the original data stream. The likelihood of different files hashing the same value is so substantially low that a collision rate has been reported after 2^69  hash operations. Unlike in compression, you can disable file-level deduplication in Celerra Data Deduplication.

Designed to minimize client impact

Celerra Data Deduplication processes the bulk of the data in a file system without affecting the production workload. All deduplication processing is performed as a background asynchronous operation that acts on file data after it is written into the file system. This avoids latency in the client data path, because access to production data is sensitive to latency. By policy, deduplication is performed only for those files that are not in active use. This avoids introducing any performance penalty on the data that clients and users are using to run their business.

Written by paule1s

December 7, 2009 at 12:51 pm

A year in review: What are our readers looking for?

leave a comment »

Our readers are primarily asking questions like:

  • How can I free up disk space, on Windows, and on ext4, ext3 on Ubuntu and Linux, within virtual disks like vmdk, vhd and vdi?
  • Where can I find the best virtual appliances/ Top 10 virtual appliances?
  • How can I convert from one virtual disk (vmdk to vhd, or vdi to vhd) to another?
  • Who are the competitors for ec2?

An analysis of the search terms shows interesting clusters:

Serial

Topic

% of queries

Search terms

1

ext4 defragmentation

23%

ext4 defrag, defrag ext4, ext4 defragment, defragment ext4

2

ubuntu ext4 defragmentation

14%

ext4 defrag ubuntu, ext4 ubuntu defrag, ubuntu ext4 defrag, ubuntu defrag ext4, defrag ext4 ubuntu, defrag ubuntu ext4

3

vmware virtual appliance

14%

vmware virtual appliance, vmware virtual appliances, top vmware appliances, top 10 vmware appliances, best vmware appliances

4

virtual appliance

5%

virtual appliance, virtual appliances, top appliances, top 10 appliances, best appliances

5

vmware firewall appliance

5%

vmware firewall appliance, vmware appliance firewall

6

ubuntu defragmentation

4%

defrag ubuntu, ubuntu defrag, defragment ubuntu, ubuntu defragment

7

ec2 competitors

4%

amazon ec2 competitors, ec2 competitors

8

windows 7 virtual appliance

4%

windows 7 virtual appliance, virtual applaince windows 7

9

ext3 defragmentation

4%

ext3 defrag, defrag ext3, ext3 defragment, defragment ext3

10

convert vdi to vhd

3%

convert vdi to vhd, vdi to vhd

If I abstract it out, our readers are primarily interested in learning how to free disk storage and where to find the best / Top 10 vmware, Xen and Windows virtual applainces.

Thank you. I appreciate your interest in this blog.

Compressed VM file transfer using DropBox

with 2 comments

I am using DropBox for transferring compressed files including VM’s  between my environment at home, a Mac running Windows XP SP3 in VMware Fusion 2.0.5 and the test machine, a Windows XP SP3 system located in the office lab. Each machines has a DropBox  folder linked to the same account.

Neat product!

I love the simplicity and ease of use. A lot of thought has gone into making the product easy to install, the integration with the host OS (Windows and Mac) is seamless and sets a benchmark for how UI’s for downloadable products should be designed.

Usage model

I compress each file using the Mac’s native file compression and drop into into my DropBox folder. DropBox seems to follow a two-step file transfer process:

  1. It first uploads the file completely from the source DropBox folder to the DropBox folder in the cloud
  2. After the upload is complete, the file is then downloaded from the DropBox folder in the cloud to the destination DropBox folders.

Setup

Speed ratings are from here. I have been able to correlate these speeds with the end-to-end transfer times.

Transfer Type

Speed Rating for my ISP

Observed DropBox Transfer Rate

Upload

120 KB/sec

70 KB/sec

Download

360 KB/sec

210 KB/sec

Near real-time transfer for uncompressed files

DropBox transfers uncompressed files almost instantaneously between the two machines. The files are transferred sequentially and seem to arrive in order. For example,  I transferred a 1.72 GB folder containing 400 photographs and the photos started appearing sequentially 10 – 15 seconds apart.

Compressed files

Compressed files are transferred as a unit, although dedup applies to blocks contained within it. The transfer times are as recorded below:

Original Size

Compressed Size

Upload Time

Download Time

Total Time

4.30 GB

1.6800 GB

6h 40m

2h 12m

8h 52m

2.15 GB

0.6714 GB

2h 27m

0h 48m

3h 15m

1.10 GB

0.2371 GB

0h 56m

0h 18m

1h 14m

Dedup works well with compressed files

DropBox examines the file to be transferred and builds an index of blocks to be transferred. Its de-duplication technology is smart enough to figure out when not to transfer blocks that are duplicates, i.e., have already been transferred before. For example, when I tried to transfer two clones, the first one took a long time to transfer ( a few hours), but the second transfer was very rapid (under five minutes).

Since I am using the free account, I deleted a 2GB VM from my DropBox folder in order to begin my next transfer. I was pleasantly surprised to see that the next VM transfer was very rapid. I suspect this was because the VM that was transferred earlier was still residing in DropBox’s cache even though I had deleted it, so that DropBox discovered common/duplicate blocks and did not upload them from my Mac.

Summary

Nifty tool. Love it. Will use it a lot.

A few feature requests

  • Subfolders: I would like to organize the files by date and category.
  • Timers: I would like to time the uploads and downloads easily.
  • Profile my usage and suggest how long an end-to-end transfer will take
  • Speed up compressed file transfers – improve my effective transfer rate  from ~60% to ~80%- I would like to saturate the available bandwidth for uploads and downloads

Thanks 🙂

Written by paule1s

September 13, 2009 at 5:42 pm

gzip vs dedup: I shrink, therefore I am

leave a comment »

[reposted from rosensharma.wordpress.com]

I stole “I shrink, therefore I am” from my wife’s good friend Arun Verma, who is incredibly creative, and makes some of the best lamps ever. He also does websites and ads if you are interested.

I have a macbook and use vmware fusion to run a windows XP VM. I keep all my data on a hosted folder on the mac’s operating system. So the VM is basically programs and user settings. In addition I have several images which I work with: Red Hat Enterprise, Ubuntu, Win 2K3 etc. Not atypical of someone who either develops or tinkers with technology.

My problem is that out of a 120GB hard disk, I am upto 100GB, and a whopping 60GB of that is virtual images. I have about 8. So I wanted to see if I could compress the virtual images in some fashion. I decided to run a small test of how much dedup would buy me over gzip

w2k3.vhd: Original size: 1.6GB
w2k3.vhd.gz: 712 MB

Further Analysis of the image showed that there were
14K Zero Filled Blocks, and
About 40K blocks occurred more than once

So an in-image-Dedup Optimization: 14K + 40K blocks ~ 200MB
Next I added a windows XP image:
wxp.vhd: 2GB
gzip wxp.vhd –> 921 MB
23K Zero Blocks
43K Additional Blocks Repeated between this and previous image
Dedup Optimization: 66K*4K ~ 250MBClearly gzip would win over a simple dedup. Even with two images xp and w2k3 I guess there are just not enough blocks to make dedup shine. Less than 10% of the blocks are being found. Cloning in some sense avoids large matches in a small set of images like on the desktop.
So the obvious next question was well how about dedup + gzip. Here things got a little more interesting:
gzip + dedup on w2k3.vhd: 720 MB (yes larger than just gzip)
gzip + dedup on wxp.vhd: 963 MB (also larger than gzip)
I was not expecting it to be larger. The raw file is not, but if you add the metadata you have to keep for the blocks, it begins to add up. Its close to gzip + metadata. Which means that gzip does a pretty good job with zero filled blocks and also the repeated blocks.
PS: Blocks in this context are 4K

Written by RS

September 10, 2009 at 10:39 pm

Top 10 Posts for Q1 2009

leave a comment »

Here are the Top 10 posts for Q1 2009, the numbers of views are in parentheses.

  1. Defragment Ubuntu, Fedora, ext3, ext4 (2247)
  2. Most popular VMWare Virtual Appliances for IT Administrators (2186)
  3. VirtualBox – setup, share, shrink, convert (842)
  4. How to convert a VMWare VMDK to a Microsoft, Xen VHD? (810)
  5. How does shrink with vmware disk manager work? (614)
  6. Most popular VMWare Virtual Appliances for Security (607)
  7. Pre-configured VHD (Virtual Appliance) available from Microsoft (593)
  8. Most popular VMWare Virtual Appliances for Web Apps (558)
  9. Virtual Machine Disk Image Compression (320)
  10. rsync vm, vhd for backup, disaster recovery, ec2 (317)

Defragmentation of virtual disk files remains the dominant theme. There is an equal amount of interest in virtual appliances, particularly those for system administrators.

Search terms:

  • ext4 defrag ubuntu
  • ext4 defrag
  • convert vdi to vhd
  • e4defrag ubuntu
  • virtualbox shrink
  • rsync vmdk
  • wubi
  • defrag ubuntu
  • defrag ext3
  • windows 7 virtual appliance
  • defragment ext3
  • vmware appliances
  • defrag ext4
  • xen vhd
  • ubuntu ext4 defrag
  • defrag ext4 ubuntu
  • vmware firewall appliance
  • vmware appliance
  • “vdi to vhd”
  • convert vhd to xen
  • ext3 defrag
  • windows 7 beta vmware virtual appliances
  • defrag fedora
  • ext3 defragmentation
  • virtual appliance windows 7
  • ubuntu defrag
  • hercules load balancer virtual appliance
  • fedora defrag
  • convert vmdk to xen
  • shrink vmware disk

Top 10 referrers for Q1 2009

leave a comment »