shareVM- Share insights about using VM's

Simplify the use of virtualization in everyday life

Posts Tagged ‘file transfer

DropBox dedup only in the cloud

with 3 comments

I had observed in my earlier article that DropBox performs de-duplication in the cloud. This would mean that de-duplication is not performed at the client. In order to test my hypothesis, I performed the following experiment:

I first looked at the size of the DropBox folder on Windows and found it to be 1,723,871,232 bytes.

Next, in the DropBox client, I opened the DropBox folder and simply duplicated the contents of the Public folder by copying the 1.68MB file and pasting it as its copy. I looked at the size of the folder once again and it had doubled to 3,446,513,664 bytes.

If DropBox had been performing dedup at the client, then it should have detected the duplicate blocks between the parent and its copy at source and the folder should not have grown in size at all. As a result, my conclusion is that DropBox dedup’s only in the cloud but not at the client.

Wait, there’s more:

I repeated the same experiment on the Mac after deleting the duplicate file. Here’s what I started out with:

Last login: Thu Sep 17 15:30:58 on ttys000
mace1s:~ paule1s$ du -k DropBox
1152 DropBox/Photos/Sample Album
1516 DropBox/Photos
1682636 DropBox/Public
368 DropBox/sharevm
1684880 DropBox

Notice that the total size of the folder (the last line of the listing above) is 1.68GB.

Next, in the DropBox client, I opened the DropBox folder and simply duplicated the contents of the Public folder by copying the 1.68MB file and pasting it as its copy. I looked at the size of the folder once again and saw:

mace1s:~ paule1s$ du -k DropBox
1152 DropBox/Photos/Sample Album
1516 DropBox/Photos
2600140 DropBox/Public
368 DropBox/sharevm
2602384 DropBox

This is very interesting. I had expected the storage requirements to double to 3,369,760 however, they grew by approx. 1GB. What happened to the remaining 682MB? Did the DropBox client truncate the file? If so, why?

Readers, can you shed some light?

Written by paule1s

September 17, 2009 at 5:34 pm

Compressed VM file transfer using DropBox

with 2 comments

I am using DropBox for transferring compressed files including VM’s  between my environment at home, a Mac running Windows XP SP3 in VMware Fusion 2.0.5 and the test machine, a Windows XP SP3 system located in the office lab. Each machines has a DropBox  folder linked to the same account.

Neat product!

I love the simplicity and ease of use. A lot of thought has gone into making the product easy to install, the integration with the host OS (Windows and Mac) is seamless and sets a benchmark for how UI’s for downloadable products should be designed.

Usage model

I compress each file using the Mac’s native file compression and drop into into my DropBox folder. DropBox seems to follow a two-step file transfer process:

  1. It first uploads the file completely from the source DropBox folder to the DropBox folder in the cloud
  2. After the upload is complete, the file is then downloaded from the DropBox folder in the cloud to the destination DropBox folders.


Speed ratings are from here. I have been able to correlate these speeds with the end-to-end transfer times.

Transfer Type

Speed Rating for my ISP

Observed DropBox Transfer Rate


120 KB/sec

70 KB/sec


360 KB/sec

210 KB/sec

Near real-time transfer for uncompressed files

DropBox transfers uncompressed files almost instantaneously between the two machines. The files are transferred sequentially and seem to arrive in order. For example,  I transferred a 1.72 GB folder containing 400 photographs and the photos started appearing sequentially 10 – 15 seconds apart.

Compressed files

Compressed files are transferred as a unit, although dedup applies to blocks contained within it. The transfer times are as recorded below:

Original Size

Compressed Size

Upload Time

Download Time

Total Time

4.30 GB

1.6800 GB

6h 40m

2h 12m

8h 52m

2.15 GB

0.6714 GB

2h 27m

0h 48m

3h 15m

1.10 GB

0.2371 GB

0h 56m

0h 18m

1h 14m

Dedup works well with compressed files

DropBox examines the file to be transferred and builds an index of blocks to be transferred. Its de-duplication technology is smart enough to figure out when not to transfer blocks that are duplicates, i.e., have already been transferred before. For example, when I tried to transfer two clones, the first one took a long time to transfer ( a few hours), but the second transfer was very rapid (under five minutes).

Since I am using the free account, I deleted a 2GB VM from my DropBox folder in order to begin my next transfer. I was pleasantly surprised to see that the next VM transfer was very rapid. I suspect this was because the VM that was transferred earlier was still residing in DropBox’s cache even though I had deleted it, so that DropBox discovered common/duplicate blocks and did not upload them from my Mac.


Nifty tool. Love it. Will use it a lot.

A few feature requests

  • Subfolders: I would like to organize the files by date and category.
  • Timers: I would like to time the uploads and downloads easily.
  • Profile my usage and suggest how long an end-to-end transfer will take
  • Speed up compressed file transfers – improve my effective transfer rate  from ~60% to ~80%- I would like to saturate the available bandwidth for uploads and downloads

Thanks 🙂

Written by paule1s

September 13, 2009 at 5:42 pm

gzip vs dedup: I shrink, therefore I am

leave a comment »

[reposted from]

I stole “I shrink, therefore I am” from my wife’s good friend Arun Verma, who is incredibly creative, and makes some of the best lamps ever. He also does websites and ads if you are interested.

I have a macbook and use vmware fusion to run a windows XP VM. I keep all my data on a hosted folder on the mac’s operating system. So the VM is basically programs and user settings. In addition I have several images which I work with: Red Hat Enterprise, Ubuntu, Win 2K3 etc. Not atypical of someone who either develops or tinkers with technology.

My problem is that out of a 120GB hard disk, I am upto 100GB, and a whopping 60GB of that is virtual images. I have about 8. So I wanted to see if I could compress the virtual images in some fashion. I decided to run a small test of how much dedup would buy me over gzip

w2k3.vhd: Original size: 1.6GB
w2k3.vhd.gz: 712 MB

Further Analysis of the image showed that there were
14K Zero Filled Blocks, and
About 40K blocks occurred more than once

So an in-image-Dedup Optimization: 14K + 40K blocks ~ 200MB
Next I added a windows XP image:
wxp.vhd: 2GB
gzip wxp.vhd –> 921 MB
23K Zero Blocks
43K Additional Blocks Repeated between this and previous image
Dedup Optimization: 66K*4K ~ 250MBClearly gzip would win over a simple dedup. Even with two images xp and w2k3 I guess there are just not enough blocks to make dedup shine. Less than 10% of the blocks are being found. Cloning in some sense avoids large matches in a small set of images like on the desktop.
So the obvious next question was well how about dedup + gzip. Here things got a little more interesting:
gzip + dedup on w2k3.vhd: 720 MB (yes larger than just gzip)
gzip + dedup on wxp.vhd: 963 MB (also larger than gzip)
I was not expecting it to be larger. The raw file is not, but if you add the metadata you have to keep for the blocks, it begins to add up. Its close to gzip + metadata. Which means that gzip does a pretty good job with zero filled blocks and also the repeated blocks.
PS: Blocks in this context are 4K

Written by RS

September 10, 2009 at 10:39 pm

Virtual disk (VM) transfers in the cloud

leave a comment »

VM Transfer workflow

There are two sets of use cases:

  1. Within a development team
  2. Within IT

Development teams:

Developers carry between one to three VM’s on their laptops. They often transfer them to other developers/QA Engineers in their own team, or other teams for integration testing.

IT (regular file transfer, no streaming):

IT receives a VM that is packaged and ready for deployment – either developed by an in-house/contracting application development team, or buys it from an external vendor.

The VM is transferred to a staging (pre-production) fileshare from which it can be loaded on to one or more test servers.

When the app within the VM passes acceptance tests, it is transferred to a production fileshare, from which it can be loaded on to one or more production servers.

The VM can also be transferred to archival storage.

Written by paule1s

September 9, 2009 at 9:57 pm

DropBox: Cloud service for storing, syncing, sharing files

leave a comment »

I found Dropbox, a nifty service for storing files online, keeping their copies on several of your own computers in sync, or sharing some of them with your friends.

  • You download the Dropbox client (supported on Windows XP and Vista (32 and 64-bit), Mac OS X Tiger and Leopard, as well as Ubuntu 7.10+ and Fedora Core 9+)
  • 2GB of free storage provided with it
  • You can then drag and drop files that you want to store online or share into the Dropbox.
  • Dropbox maintains a snapshot of files
  • If any of the files get updated, it sends only blocks that have changed
  • It also offers the ability to undelete and restore files from the copies that are stored online.
  • You can create Public folders for sharing, files in Public folders have URL’s that you can share with your friends.

While the company seems to be consumer-focused, the service is usable for dull and boring corporate stuff, like instantaneous automatic backups of files that change and also enables disaster recovery.

Someone has used Dropbox for syncing and sharing VM‘s. This is an interesting use case, however, readers should pay heed to the transfer times as image sizes grow

Written by paule1s

April 8, 2009 at 12:38 pm

Top 10 referrers for Q1 2009

leave a comment »

Top 12 referrers over the past 3 months

leave a comment »