Posts Tagged ‘vhd’
a process that reduces the amount of fragmentation in file systems. It does this by physically organizing the contents of the disk to store the pieces of each file close together and contiguously. It also attempts to create larger regions of free space using compaction to impede the return of fragmentation.
Generically, the defragmentation of a Windows guest within a virtual disk running on a Windows host (Windows on Windows) requires a three-step process:
- Defragment the guest
- Defragment the virtual disk
- Defragment the host
On a Linux host or guest, the ext3 and ext4 file systems are more resilient to defragmentation.
Windows on Windows
You should perform the following steps whether you are using a Microsoft VHD, VirtualBox VDI or VMware VMDK virtual disk,
- On a Windows guest OS, run the Windows Disk Defragmenter to defragment the files within the volumes stored inside the virtual disk.
- Next, power down the virtual machine and defragment the virtual disk using contig. Defragmenting the virtual disk simply reorganizes the blocks so that used blocks move towards lower-numbered sectors and unused blocks move towards higher-numbered sectors.
- Run the Windows Disk Defragmenter to achieve an overall defragmentation of all files on the host including the virtual disk.
VMware VMDK specific
The following steps can be used generically for VMware VMDK, for Windows on WIndows or any other suppoted platforms. vmware-vdiskmanger:is a standalone tool for defragmenting a growable VMware Workstation, VMware Fusion or VMware Server, vmdk when it is offline. Note that you cannot defragment:
- Preallocated virtual disks
- Physical hard drives
- Virtual disks that are associated with snapshots.
The recommended steps for defragmenting a vmdk are:
- On a Windows guest OS, run the Windows Disk Defragmenter to defragment the files within the volumes stored inside the VMDK.
- Next, power down the virtual machine and defragment the vmdk using the command
vmware-vdiskmanager -d myVirtualDisk.vmdk.Defragmenting the vmdk simply reorganizes the blocks so that used blocks move towards lower-numbered sectors and unused blocks move towards higher-numbered sectors.
- If the host OS is also Windows, run the Windows Disk Defragmenter to achieve an overall defragmentation of all files on the host including the VMDK.
Windows de-fragmentation tool or some other commercial alternative, need 5-15% of free disk space, for the tool to be effective. Sometimes it may need more if you have some very large files (like video or database files). Below is the layout of c-drive of may virtual machine. The red segments you see are the fragmented files.
If you have a file with one large segment, for the defrag to be effective it has to move this segment to a free area and copy the rest of the segments with it to make the file contiguous. If there is no place to copy the large extent of a file, then it wont get defragmented.
The best way to de-fragment is to get an empty disk and copy all the files onto the empty disk. So the more free disk you have the better these tools will perform.
Also how you think about de-fragmentation in a virtual disk is very different than how we think about de-fragmentation in a physical world. Take the above disk it is a virtual disk 2GB Max Extent Sparse
The disk was full and then I extended the disk (with fatVM) and then defragmented one file (you can do that with Mark Russinovich’s Contig Tool http://technet.microsoft.com/en-us/sysinternals/bb897428.aspx). You can see that the files are contiguous (blue) in the extended portion. The original disk clearly requires defragmentation, but without extending it, we would not have been able to get the key database file to be contiguous.
It makes one ask the question whether you really need the traditional way of defrag the virtual disk. It is much faster to extend the disk and/or attach a separate disk and simply copy over all the files and re-place the original disk with the new extended disk.
Another advantage of doing this is that it is much faster than defragging also you can improve the performance of the virtual machine considerably. Also you can take the files which are static (don’t change) by taking the files in a virtual machine which don’t change and making the base new disk for c-drive a flat file instead of a sparse disk as the sparse disk is not really saving you anything once you get full. If you have a parent which is flat and then a child which is sparse you get the best of both worlds.
In my limited experience instead of defrag, do the following
- create a new flat disk, copy all the files from C: to the new disk
- make the new disk your c: drive
- create a clone of the base disk (which by definition is sparse)
- extend the sparse disk
Your virtual machine’s performance will be significantly improved.
A real life experience posted by a member in the VMware vCenter Server Communities yesterday (Feb 8, 2010):
The solution recommended by an expert is:
While this recommendation is consistent with the perceived state of the art, it does have the following impact:
It is not going to affect the running VMs and also ESX but you/VSC may see a disconnect for a while.
Another member recommends a different approach
A different approach would be to extend the c-drive.
We have recently released a tool (fatVM) to make this easy (or easier).
It creates the extended VM in a new directory (with the original as parent). Does not touch the original files. Is able to extend most VM in a couple of minutes.
Here is the link: http://www.gudgud.com/fatvm
A third member is contemplating a similar move:
I have a 4 host ESX 3.5U4 system.My VCenter is pointing to an external SQL server. I am about to upgrade to vSphere and want to have the SQL running on on the VCenter server itself – most likely using SQL Express. I have the same concern about space.
You must have noticed the pattern that is emerging. Your C:drive can get full when you are using a database system, or a log aggregation server, within a VM that has a pre-allocated disk and size of the data is growing. As a best practice, review your apps for potential of data growth before pre-allocating the size of the VM.
When you analyze the vast majority of application I/O profiles, you’ll realize that a small amount of data is responsible for the majority of I/Os; almost all of it is infrequently accessed.
Watch how the data is being accessed, and dynamically cache the most popular/ frequently accessed data on flash drives, usually the small amount, and the vast majority of infrequently accessed data on big, slow SATA drives.
The storage savings solution
|FAST||Place the right information on the right media based on frequency of access|
|Thin||This (virtual) provisioning allocate physical storage when it is actually being used, rather than when it is provisioned.|
|Small||Compression, single-instancing and data deduplication technologies eliminate information redundancies.|
|Green||A significant amount of enterprise information is used *very* infrequently. So infrequently, in fact, that the disk drives can be spun down, or at the least be made semi-idle.|
|Gone||Policy-based lifecycle management – Archiving and Deletion, Federation to the cloud through private and public cloud integration.
The information can get shopped to a specialized service provider as an option
… and life goes on!
One thing hasn’t changed, though. The information beast continues to grow
The feature set that gives customers storage savings is described in a 42 minute informative video on Hyper-V and Netapp storage – Overview. I have summarized it in a 5 minute long post below.
Enterprise System Storage Portfolio
The Enterprise product portfolio consists of the FA series, V Series storage systems. These systems have a unified storage architecture based on the Data ONTAP, OS running across all storage arrays. Data ONTAP provides a single app interface and supports protocols such as FC-SAN, FCoE-SAN, IP-SAN (iSCSI), NAS, NFS, CIFS. The V-Series controllers also offer multiple vendor array support, i.e., they can offer the same features on disk arrays manufactured by Netapp’s competitors.
- Block-level de-duplication, or de-dupe, retains exactly one instance of each unique disk block. When applied to live production systems, it can reduce data 95% for full backups, especially when there are identical VM images created from the same template, and as much as 25%-55% for most data sets.
- Snapshot copies of a VM are lightweight because they share the same disk blocks with the parent and do not require as much space for the copy as the parent. If a disk block is updated with a snapshot, e.g., if a configuration parameter is customized for an application, or when a patch is applied, the Write Anywhere File Layout (WAFL) file system associates the updated block with the snapshot copy and writes to the disk, leaving the original block and its referrers intact. Snapshot copies therefore impose negligible storage performance impact on running VM’s.
- Thin provisioning allows users to define storage pools (Flexvol) for which storage allocation is done dynamically from the storage array on demand. Flexvol can be enabled at any point in time while the storage system is in operation.
- Thin replication between disks provides data protection. Differential Backups and mirroring over the IP network works at the block level copying only the changed blocks – compressed blocks are sent over the wire It enables virtual restores of full, point in time data at granular levels
- Double parity RAID, called Raid DP, provides superior fault tolerance and provides 46% saving vs mirrored data or RAID 10. You can think of it as being a RAID 6 (RAID 5 + 1 Double Parity disk). RAID DP can lose any two disk in the raid stripe without losing any data. It offers availability equivalent to RAID 1 and allows lower cost /higher capacity SATA disks for applications. The industry standard best practice is to use RAID 1 for important data, RAID 5 for other data.
- Virtual Clones (Flex clones). You can clone a volume / LUN or individual files. Savings = size of the original data set minus blocks subsequently changed in clone. Enables ease of dev and test cycles. Typical use cases: Build a tree of clones (clone of clones), clone a sysprep‘ed vhd, DR testing, VDI
There are several other videos on the same site that show the setup for the storage arrays. They are worth seeing to get an idea of what is involved to get all the machinery working in order to leverage the above features. It involves many steps and seems quite complex. (The hallmark of an “Enterprise-class” product? 😉 ) The SE’s have done a great job of making it seem simple. Hats off to them!
50% Storage Savings Guarantee
- Engage them for planning your virtualization storage need
- Implement best practices recommended by them
- Leverage features like, De duplication, Thin provisioning, RAID DP (Double Parity RAID), NetApp Snapshot copies
If you don’t use 50% less storage, you can get the required additional capacity at no additional costs
I learned about this in a 42 minute informative video on Hyper-V and Netapp storage – Overview
I wrote a Python 2.6 script to find and list VM’s older than 90 days on my Windows workstation, so that I could compress them, move them to a 1TB drive attached to my machine, or to a file server, or delete them.
find_old_vms is a tool to find and list old VM’s (vmdk’s, vhd’s) on your hard drives that are older than a given number of days.
Usage: find_old_vms in_this_directory_tree older_than_days
Example: find_old_vms “c:\\” 90
Download for Windows XP, 2003, and Linux. The script uses atime (latest file access time), which is not supported on Windows Vista and Windows 7.
# # NOTE: This script uses atime - the last access time for deciding whether a VM # **** file is a candidate. On Windows XP, atime is updated every hour, whereas, # Windows Vista and Windows 7 do not provide an atime. import os, sys, glob, time # dtroot is the pathname for a node in a directory tree # age is the number of days for which a file has not been accessed # size in bytes is the maximum size of a file def scan(dtroot, age, size): """ scan <dir> scans the <dir> on host for virtual images """ filecount = 0 wctime = time.time(); # get current time for root, subdirs, files in os.walk(dtroot): # Build a list of filenames that have a suffix vmd* or vhd* vfiles = glob.glob(os.path.join(root,"*.v[mh]d*")) for f in vfiles: atime = os.path.getatime(f) elapsed_time = (wctime - atime)/(60*60*24) if elapsed_time > age: filecount = filecount + 1 print f, " last accessed ", int(elapsed_time), " days ago\n", if __name__ == "__main__": import sys # User asked for help if sys.argv == '?': print "\nfind_old_vms in_this_directory_tree older_then_days\n", print "Find all files with the suffix .v[mh]d* that have not been accessed since older_than_days", print "through a recusrive descent starting from the root in_this_directory_tree", sys.exit() # Validate that the first argument in_this_directory_tree is a valid path if not os.path.exists(sys.argv): print sys.argv, " is not a valid directory. Please provide another", sys.exit() # Arbitrarily limit search to 10 years invalid_age = 0 if int(sys.argv) < 1: invalid_age = 1 elif int(sys.argv) > 3650: invalid_age = 1 if invalid_age == 1: print sys.argv, " is invalid. Please provide between 1 and 3650 days", sys.exit() scan(sys.argv, int(sys.argv), 0)
If you remove the restriction of searching for vImh]d* files, it will help you find other older files as well. I will appreciate your feedback.