shareVM- Share insights about using VM's

Simplify the use of virtualization in everyday life

Posts Tagged ‘free up disk space

Thin Provisioning – when to use, benefits and challenges

with 4 comments

There are excellent posts by two prominent authors that provide a lot of insight into the nuances of using thick or thin provisioning for VM’s: Thin Provisioning Part 1 – The Basics and Thin Provisioning Part 2 – Going Beyond by Vaughn Stewart of NetApp and Thin on Thin – where should you do Thin Provisioning by Chad Sakac of EMC.

Synopsis:
Escalating storage costs are stalling the deployment of virtualized data centers and it is becoming increasingly important for customers to leverage storage technology developed by VMware and its storage partners, Netapp and EMC for reducing storage costs.

vmdk formats:

vmdk formats

VMFS blocks
pre-allocated

Disk array block
pre-allocated

Disk array blocks
pre-allocated

Thin

No

No

No

Thick (Non-zeroed)

Yes

No

No

Eager zeroed thick

Yes

Yes

Yes

 

Recommendations:
Use Thin on Thin (Thin vmdk’s and Thin Provisioning on the storage array) for the best storage utilization because they allocate storage capacity from the datastore and storage array only on demand.

Stewart:

The Goal of Thin Provisioning is Datastore Oversubscription  The challenge is that datastore, and all of its components (VMFS, LUNs, etc…) are static in terms of storage capacity. While the capacity of a datastore can be increased on the fly, this process is not automated or policy driven. Should an oversubscribed datastore encounter an out of space condition, all of the running VMs will become unavailable to the end user. In these scenarios the VMs don’t ‘crash’ the ‘pause’; however, applications running inside of VMs may fail if the out of space condition isn’t addressed in a relatively short period of time. For example Oracle databases will remain active for 180 seconds, after that time has elapsed the database will fail.

Sakac:

If you DO use Thin on Thin, use VMware or 3rd party usage reports in conjunction with array-level reports, and set thresholds with notification and automated action on both the VMware layer (and the array level (if you array supports that). Why? Thin provisioning needs to carefully manage for “out of space” conditions, since you are oversubscribing an asset which has no backdoor (unlike how VMware oversubscribes guest memory which can use VM swap if needed). When you use Thin on Thin – this can be very efficient, but can “accelerate” the transition to oversubscription.

Sakac:

The eagerzeroedthick virtual disk format is required for VMware Fault Tolerant VMs on VMFS (if they are thin, conversion occurs automatically as the VMware Fault Tolerant feature is enabled). It continues to also be mandatory for Microsoft clusters (refer to KB article) and recommended in the highest I/O workload Virtual Machines, where the slight latency and additional I/O created by the “zeroing” that occurs as part and parcel of virtual machine I/O to new blocks is unacceptable.

vmdk growth:

Stewart:

VMDK grew beyond the capacity of the data which it is storing. The reason for this phenomenon is deleted data is stored in the GOS file system. When data is deleted the actual process merely removes the content from the active file system table and marks the blocks as available to be overwritten. The data still resides in the file system and thus in the virtual disk. This is why you can purchase undelete tools like WinUndelete.

Don’t run defrag within a thin provisioned VM

Stewart:

the defragmentation process results in the rewriting all of the data within a VMDK. This operation can cause a considerable expansion in the size of the virtual disk, costing you your storage savings.

How to recover storage

Stewart:

First is to zero out the ‘free’ blocks within in the GOS file system. This can be accomplished by using the ‘shrink disk’ feature within VMTools or with tools like sdelete from Microsoft. The second half, or phase in this process, is to use Storage VMotion to migrate the VMDK to a new datastore.

The second half, or phase in this process, is to use Storage VMotion to migrate the VMDK to a new datastore. You should note that this process is manual; however, Mike Laverick has posted the following guide which includes how to automate some of the components in this process. Duncan Epping has also covered automating parts of this process.

NetApp features for virtualization storage savings

with 3 comments

The feature set that gives customers storage savings is described in a 42 minute informative video on Hyper-V and Netapp storage – Overview. I have summarized it in a 5 minute long post below.

Enterprise System Storage Portfolio

The Enterprise product portfolio consists of the FA series, V Series storage systems. These systems have a unified storage architecture based on the Data ONTAP, OS running across all storage arrays. Data ONTAP provides a single app interface and supports protocols such as FC-SAN, FCoE-SAN, IP-SAN (iSCSI), NAS, NFS, CIFS. The V-Series controllers also offer multiple vendor array support, i.e., they can offer the same features on disk arrays manufactured by Netapp’s competitors.

Features

  • Block-level de-duplication, or de-dupe, retains exactly one instance of each unique disk block. When applied to live production systems, it can reduce data 95% for full backups, especially when there are identical VM images created from the same template, and as much as 25%-55% for most data sets.
  • Snapshot copies of a VM are lightweight because they share the same disk blocks with the parent and do not require as much space for the copy as the parent. If a disk block is updated with a snapshot, e.g., if a configuration parameter is customized for an application, or when a patch is applied, the Write Anywhere File Layout (WAFL) file system associates the updated block with the snapshot copy and writes to the disk, leaving the original block and its referrers intact. Snapshot copies therefore impose negligible storage performance impact on running VM’s.
  • Thin provisioning allows users to define storage pools (Flexvol) for which storage allocation is done dynamically from the storage array on demand. Flexvol can be enabled at any point in time while the storage system is in operation.
  • Thin replication between disks provides data protection. Differential Backups and mirroring over the IP network works at the block level copying only the changed blocks – compressed blocks are sent over the wire It enables virtual restores of full, point in time data at granular levels
  • Double parity RAID, called Raid DP, provides superior fault tolerance and provides 46% saving vs mirrored data or RAID 10. You can think of it as being a RAID 6 (RAID 5 + 1 Double Parity disk). RAID DP can lose any two disk in the raid stripe without losing any data. It offers availability equivalent to RAID 1 and allows lower cost /higher capacity SATA disks for applications. The industry standard best practice is to use RAID 1 for important data, RAID 5 for other data.
  • Virtual Clones (Flex clones). You can clone a volume / LUN or individual files. Savings = size of the original data set minus blocks subsequently changed in clone. Enables ease of dev and test cycles. Typical use cases: Build a tree of clones (clone of clones), clone a sysprep‘ed vhd, DR testing, VDI

There are several other videos on the same site that show the setup for the storage arrays. They are worth seeing to get an idea of what is involved to get all the machinery working in order to leverage the above features. It involves many steps and seems quite complex. (The hallmark of an “Enterprise-class” product? 😉 ) The SE’s have done a great job of making it seem simple. Hats off to them!

Netapp promises to reduce your virtualization storage needs by 50%

leave a comment »

50% Storage Savings Guarantee

NetApp‘s  Virtualization Gurantee Program promises that you will install 50% less storage for virtualization than if you buy from their competition, when you

  • Engage them for planning your virtualization storage need
  • Implement best practices recommended by them
  • Leverage features like, De duplication, Thin provisioning, RAID DP (Double Parity RAID), NetApp Snapshot copies

If you don’t use 50% less storage, you can get the required additional capacity at no additional costs

I learned about this in a 42 minute informative video on Hyper-V and Netapp storage – Overview

Written by paule1s

November 29, 2009 at 11:20 am

Find VM’s older than N days to free up disk space

leave a comment »

I wrote a Python 2.6 script to find and list VM’s older than 90 days on my Windows workstation, so that I could compress them, move them to a 1TB drive attached to my machine, or to a file server, or delete them.

find_old_vms is a tool to find and list old VM’s (vmdk’s, vhd’s) on your hard drives that are older than a given number of days.

Usage: find_old_vms in_this_directory_tree older_than_days

Example: find_old_vms “c:\\” 90

Download for Windows XP, 2003, and Linux. The script uses atime (latest file access time), which is not supported on Windows Vista and Windows 7.

Code:

#
# NOTE: This script uses atime - the last access time for deciding whether a VM
# ****  file is a candidate. On Windows XP, atime is updated every hour, whereas,
#          Windows Vista and Windows 7 do not provide an atime.

import os, sys, glob, time

# dtroot is the pathname for a node in a directory tree 
# age is the number of days for which a file has not been accessed
# size in bytes is the maximum size of a file
def scan(dtroot, age, size):
	""" scan <dir> scans the <dir> on host for virtual images
	"""
	filecount = 0
	wctime = time.time(); # get current time

	for root, subdirs, files in os.walk(dtroot):
		# Build a list of filenames that have a suffix vmd* or vhd*
		vfiles = glob.glob(os.path.join(root,"*.v[mh]d*"))
		for f in vfiles:
			atime = os.path.getatime(f)
			elapsed_time = (wctime - atime)/(60*60*24)
			if elapsed_time > age:
				filecount = filecount + 1
				print f, " last accessed ", int(elapsed_time), " days ago\n",


if __name__ == "__main__":

	import sys
	
	# User asked for help
	if sys.argv[1] == '?':
		print "\nfind_old_vms in_this_directory_tree older_then_days\n",
		print "Find all files with the suffix .v[mh]d* that have not been accessed since older_than_days",
		print "through a recusrive descent starting from the root in_this_directory_tree",
		sys.exit()	

	# Validate that the first argument in_this_directory_tree is a valid path
	if not os.path.exists(sys.argv[1]):
		print sys.argv[1], " is not a valid directory. Please provide another",
		sys.exit()

	# Arbitrarily limit search to 10 years
	invalid_age = 0
	if int(sys.argv[2]) < 1:
		invalid_age = 1
	elif int(sys.argv[2]) > 3650:
		invalid_age = 1

	if invalid_age == 1:
		print sys.argv[2], " is invalid. Please provide between 1 and 3650 days",
		sys.exit()

	scan(sys.argv[1], int(sys.argv[2]), 0)

If you remove the restriction of searching for vImh]d* files, it will help you find other older files as well. I will appreciate your feedback.

Written by paule1s

November 23, 2009 at 12:35 am

Vizioncore vOptimizer Pro to shrink or enlarge VM’s

leave a comment »

You can download vOptimizer WasteFinder – 2.2 free from Vizioncore to analyze your VMware vCenter Server / ESX hosts and report how much free disk space can be recovered from over-allocated virtual storage. It also locates all VM’s that are not properly aligned on Windows 64K partition boundaries and as a result experience decreased I/O throughput and higher latency.

In case you wish to reclaim the free disk space and align VM’s on Windows 64K partition boundaries, you will have to purchase vOptimizer Pro 2.2.248.0 (free 14 day trial).

vOptimizer Pro can not only shrink VM’s to reclaim the over-allocated virtual storage, perhaps more importantly, it can also enlarge VM’s that are running out of storage, effectively preventing painful and costly VM outages.

I just wonder, wouldn’t it be better to simply offer a 14 day free trial that lets users test drive vOptimizer that induces them to buy it?

Written by paule1s

November 17, 2009 at 11:16 pm

A year in review: What are our readers looking for?

leave a comment »

Our readers are primarily asking questions like:

  • How can I free up disk space, on Windows, and on ext4, ext3 on Ubuntu and Linux, within virtual disks like vmdk, vhd and vdi?
  • Where can I find the best virtual appliances/ Top 10 virtual appliances?
  • How can I convert from one virtual disk (vmdk to vhd, or vdi to vhd) to another?
  • Who are the competitors for ec2?

An analysis of the search terms shows interesting clusters:

Serial

Topic

% of queries

Search terms

1

ext4 defragmentation

23%

ext4 defrag, defrag ext4, ext4 defragment, defragment ext4

2

ubuntu ext4 defragmentation

14%

ext4 defrag ubuntu, ext4 ubuntu defrag, ubuntu ext4 defrag, ubuntu defrag ext4, defrag ext4 ubuntu, defrag ubuntu ext4

3

vmware virtual appliance

14%

vmware virtual appliance, vmware virtual appliances, top vmware appliances, top 10 vmware appliances, best vmware appliances

4

virtual appliance

5%

virtual appliance, virtual appliances, top appliances, top 10 appliances, best appliances

5

vmware firewall appliance

5%

vmware firewall appliance, vmware appliance firewall

6

ubuntu defragmentation

4%

defrag ubuntu, ubuntu defrag, defragment ubuntu, ubuntu defragment

7

ec2 competitors

4%

amazon ec2 competitors, ec2 competitors

8

windows 7 virtual appliance

4%

windows 7 virtual appliance, virtual applaince windows 7

9

ext3 defragmentation

4%

ext3 defrag, defrag ext3, ext3 defragment, defragment ext3

10

convert vdi to vhd

3%

convert vdi to vhd, vdi to vhd

If I abstract it out, our readers are primarily interested in learning how to free disk storage and where to find the best / Top 10 vmware, Xen and Windows virtual applainces.

Thank you. I appreciate your interest in this blog.

Compressed VM file transfer using DropBox

with 2 comments

I am using DropBox for transferring compressed files including VM’s  between my environment at home, a Mac running Windows XP SP3 in VMware Fusion 2.0.5 and the test machine, a Windows XP SP3 system located in the office lab. Each machines has a DropBox  folder linked to the same account.

Neat product!

I love the simplicity and ease of use. A lot of thought has gone into making the product easy to install, the integration with the host OS (Windows and Mac) is seamless and sets a benchmark for how UI’s for downloadable products should be designed.

Usage model

I compress each file using the Mac’s native file compression and drop into into my DropBox folder. DropBox seems to follow a two-step file transfer process:

  1. It first uploads the file completely from the source DropBox folder to the DropBox folder in the cloud
  2. After the upload is complete, the file is then downloaded from the DropBox folder in the cloud to the destination DropBox folders.

Setup

Speed ratings are from here. I have been able to correlate these speeds with the end-to-end transfer times.

Transfer Type

Speed Rating for my ISP

Observed DropBox Transfer Rate

Upload

120 KB/sec

70 KB/sec

Download

360 KB/sec

210 KB/sec

Near real-time transfer for uncompressed files

DropBox transfers uncompressed files almost instantaneously between the two machines. The files are transferred sequentially and seem to arrive in order. For example,  I transferred a 1.72 GB folder containing 400 photographs and the photos started appearing sequentially 10 – 15 seconds apart.

Compressed files

Compressed files are transferred as a unit, although dedup applies to blocks contained within it. The transfer times are as recorded below:

Original Size

Compressed Size

Upload Time

Download Time

Total Time

4.30 GB

1.6800 GB

6h 40m

2h 12m

8h 52m

2.15 GB

0.6714 GB

2h 27m

0h 48m

3h 15m

1.10 GB

0.2371 GB

0h 56m

0h 18m

1h 14m

Dedup works well with compressed files

DropBox examines the file to be transferred and builds an index of blocks to be transferred. Its de-duplication technology is smart enough to figure out when not to transfer blocks that are duplicates, i.e., have already been transferred before. For example, when I tried to transfer two clones, the first one took a long time to transfer ( a few hours), but the second transfer was very rapid (under five minutes).

Since I am using the free account, I deleted a 2GB VM from my DropBox folder in order to begin my next transfer. I was pleasantly surprised to see that the next VM transfer was very rapid. I suspect this was because the VM that was transferred earlier was still residing in DropBox’s cache even though I had deleted it, so that DropBox discovered common/duplicate blocks and did not upload them from my Mac.

Summary

Nifty tool. Love it. Will use it a lot.

A few feature requests

  • Subfolders: I would like to organize the files by date and category.
  • Timers: I would like to time the uploads and downloads easily.
  • Profile my usage and suggest how long an end-to-end transfer will take
  • Speed up compressed file transfers – improve my effective transfer rate  from ~60% to ~80%- I would like to saturate the available bandwidth for uploads and downloads

Thanks 🙂

Written by paule1s

September 13, 2009 at 5:42 pm