Poor ESXi 4 NFS Datastore Performance with various NAS systems

Hello!

In testing, I found that I get between a half and a quarter of the e/s inside a guest performance when ESXi 4 systems connect to the using NFS data store if the clients connect to the exact same NFS share.  However, I don't see this effect if the data store using iSCSI or local storage.  This has been reproduced with different systems running ESXi 4 and NAS systems.

My test is very simple.  I created naked CentOS 5.4 minimum installation (completely updated 07/04/2010) with VMware Tools loaded and the creation time of a file of 256 MB using jj.  I create the file on the root (a VMDK stored in different data warehouses) partition or a directory of the NAS mounted via NFS, directly in the comments

My crucial test configuration consists of a single test PC (Intel 3.0 GHz Core 2 Duo E8400 CPU with a single Intel 82567LM-3 Gigabit NC and 4 GB RAM) running ESXi 4 connected to a printer HP Procurve 1810 - 24 G, which is connected to a VIA EPIA M700 NAS system running OpenFiler 2.3 with two 1.5 to 7200 tr / MIN SATA disks configured in front of software RAID 1 and dual Gigabit Ethernet NIC.  However, I have reproduced it with different ESXi PC and NAS systems.

This is a release of one of the tests.  In this case, the VMDK is a store of data stored on the NAS via NFS:

-


root@iridium /sync #; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 0,524939 seconds, 511 MB/s
Real 0m38.660s
user 0m0.000s
sys 0m0.566s
root@iridium /# mount/mnt 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines
root@iridium /# cd/mnt
synchronization of the # mnt root@iridium; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 8,69747 seconds, 30.9 MB/s
Real 0m9.060s
user 0m0.001s
sys 0m0.659s
mnt root@iridium#.

-


The first dd is a VMDK stored in a connected via NFS data store.  The dd ends almost immediately, but the synchronization takes nearly 40 seconds!  It's less than 7 MB per second transfer rate: very slow.  Then I get the exact same NFS share that ESXi is used to store data directly in the comments and repeat the DD.  As you can see, the SD is longer and the synchronization takes no real time (as befits a NFS share with active sync), and the whole process takes less than 10 seconds: this is four times faster!

I don't see these results on data warehouses mounted via NFS.  For example, here is a test on the guest even running from a mounted via iSCSI data store (using the exact same SIN):

-


root@iridium /sync #; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 1,6913 seconds, 159 MB/s
Real 0m7.745s
user 0m0.000s
sys 0m1.043s


root@iridium /# mount/mnt 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines
root@iridium /# cd/mnt
synchronization of the # mnt root@iridium; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 8,66534 seconds, 31.0 MB/s
Real 0m9.081s
user 0m0.001s
sys 0m0.794s
mnt root@iridium#.

-


And the same comments linking internal SATA drive of the PC ESXi:

-


root@iridium /sync #; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 6,77451 seconds, 39.6 Mbps
Real 0m7.631s
user 0m0.002s
sys 0m0.751s
root@iridium /# mount/mnt 172.28.19.16:/mnt/InternalRAID1/shares/VirtualMachines
root@iridium /# cd/mnt
synchronization of the # mnt root@iridium; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 8,90374 seconds, 30.1 MB/s
Real 0m9.208s
user 0m0.001s
sys 0m0.329s
mnt root@iridium#.

-


As you can see, the performance of NFS direct comments for each of the three are very consistent.  ISCSI and the performance of the store local data disk are both a bit better than that - as I expect.  But the mounted via NFS data store gets only a fraction of the perfomance of the any of them.  Obviously, something is wrong.

I was able to reproduce this effect with an Iomega Ix4 - 200 d as well.  The difference is not as dramatic, butalways important and consistent.  Here is a test of a guest of CentOS using a VMDK stored in a data store provided by an Ix4 - 200 d via NFS:-.

root@palladium /sync #; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 11,1253 seconds, 24.1 Mbps
Real 0m18.350s
user 0m0.006s
sys 0m2.687s
root@palladium /# mount/mnt 172.20.19.1:/nfs/VirtualMachines
root@palladium /# cd/mnt
synchronization of the # mnt root@Palladium; Sync; Sync; time {dd if = / dev/zero of = bs = 1 M count = test.txt 256 sync; sync; sync ;}}
2560 records in
256
0 records out
268435456 bytes (268 MB) copied, 9,91849 seconds, 27.1 MB/s
Real 0m10.088s
user 0m0.002s
sys 0m2.147s root@palladium mnt-#.

-


Once more, the direct NFS mount gives very consistent results.  But using the diskette provided by ESXi on a mounted NFS datastore gives still worse results.  They are not as terrible as OpenFiler test results, but they are constantly between 60% and 100% longer.

Why is this?  What I've read, NFS performace is supposed to be within a few percent of the iSCSI performance, and yet I see between 60% and 400% worse performance.  And this isn't a case of the SIN is not able to provide correct NFS performance.  When I connect to the NAS via NFS directly inside the guest, I see much better than when ESXi connects to the same NAS (the same proportion!) via NFS.

The configuration of ESXi (network and network cards) is 100% stock.  There is no VLAN in place, etc., and ESXi system has only one

Single Gigabit adapter.  It is certainly not optimal, but it doesn't seem to me to be able to explain why a virtualized guest is able to get a lot better performance NFS as ESXi itself to the same NAS.  After all, they both use the same exact suboptimal network configuration...

Thank you very much for your help.  I would be grateful any idea or advice, you might be able to give me.

Hi all

It is very definitely a performance O_Sync problem. It is well known that NFS VMware shops still use O_Sync for writes little matter what share put for a default value. VMware uses a custom file locking system so you really can't compare it to a normal NFS share connection to a different NFS client.

I have validated that the performance will be good if you have an SSD cache or storage target with enough reliable battery backup.

http://blog.laspina.ca/ubiquitous/running-ZFS-over-NFS-as-a-VMware-store

Kind regards

Mike

vExpert 2009

Tags: VMware

Similar Questions

  • Anyone using Lightroom with a NAS system

    You can use Lightroom with NAS?

    Yes, it's the only way you can use a NAS. It works fine if your network is fast enough.

  • Is it possible create Oracle RAC on NFS datastore?

    Hello

    Is it possible create Oracle RAC on NFS datastore?   With the VMFS data store, we use VMDK files as the Oracle RAC shared virtual disks with the Comptrollership and multi-writer SCSI Paravirtual, what about the NFS datastore? is the controller SCSI Paravirtual and writer multi function supported on the NFS datastore?

    Unless I'm missing something, this is not supported on NFS.

  • Facing problems of performance with ESXi

    Dear Sir

    I am facing a problem and I couldn't find a solution for her, I have two virtual environment on the site of min and the other on the recovery site, and I use SRM between them.

    The main site contains 12 SUN 6270 blade system 5.1 of VMware ESXi and vcenter 5.1 with clarion Cx4 (SAN storage)

    The DR site contain 5 blade Dell 5.1 of VMware ESXi and vCenter 5.1 with clarion CX4 (SAN storage) system

    I'm Clariion Storage VMAX, my question is on the main site when I add and mount LUN to ESXi hosts when I rescan there took an hour to complete a new analysis, the performance is very bad, in the face of the strange disconnect on the issue of the host, although I checked the compatibility between the Sun and ESXi 5.1 It is compatible

    the DR I'm not confronted with this problem it is fast and the performance is excellent.

    I need a solution please

    Fixed after I removed the clarion and fix some issues on Sun blades fiber cards

    performance of becoming better than before

  • slow writes - nfs datastore

    Greetings.

    I note that some write throughput problems I see with a based NFS datastore. Seems I'm not the only one who is seeing this, but so far have given little information in making it better.

    Try the update of ESXi V4 1 on a Poweredge T110 with 4 GB of memory, xeon X 3440 CPU and 1 250 GB sata drive.

    The NFS is based datastore served a machine of OpenSUSE 11.2 on a network of 1000Mb and speed and duplex has been verified to be correctly set on both machines.

    Initially I converted a server image OpenSUSE 11.2 VMware VMware ESXi server (12 GB) in a based NFS data store. It worked, but was incredibly slow, medium flow 2.7 MB/sec.

    Once, I found 3 MB/s writing was everything that I have the NFS datastore using jj. I tried both leave within the virtual machine and also in the ESXi console to the same store location.

    Performance of network using iperf, shows ~940mb/s between the virtual machine and the NFS server so when drives are out of the way, the net is doing well.

    I ended up changing the following advanced settings to see if it is any kind of problem memory buffer;

    NFS.maxvolumes to 32

    NET.tcpheapsize to 32

    NET.tcpheapmax to 128

    Which seem to help, access write from the virtual machine to the NFS data store went from 3 MB/s to 11 MB/s - 13 MB/s. So, there is certainly some slowdowns self-imposed via the default settings are defined.

    Tried to mount the NFS datastore even directory directly as / mnt in the virtual machine hosted and low and write to/mnt watch throughput ~ 25 Mbps. do the same exact command to another linux only box on the same network that I see about the same rate with the stand-alone server see about 2 MB/s more so no problem there.

    I suspect that there may be other elements in which the ESXi NFS based datastore is 50% less efficient than straight NFS. Have other any golden treats to try to obtain the ESXi storage NFS write speed up to something similar to what can be done with native NFS mounted in the virtual machine?

    TIA

    Check the mounting options on underlying partition, for example by the file system,

    -ext3 - rw, async, noatime

    -xfs - rw, noatime, nodiratime, logbufs = 8

    -reiserfs - rw, noatime, data = writeback

    Then export options use (rw, no_root_squash, async, no_subtree_check)

    Check that the IO Scheduler is correctly selected based on underlying hardware (use a rewrite if material noop).

    Increase the NFS threads (if 128) and Windows TCP to 256K.

    Finally ensure comments partitions are 4K aligned (this should not affect sequential performance well).

    I worked on a few notes on NFS, which cover all of this (not complete yet): http://blog.peacon.co.uk/wiki/Creating_an_NFS_Server_on_Debian

    HTH

    http://blog.peacon.co.UK

    Please give points for any helpful answer.

  • Windows do not install ISO in the NFS datastore

    Hi all

    I searched this forum for a few days and tried to suggestions from different positions without success. I recently installed ESXi 5.1 update 1. I have setup a NFS datastore on the same computer by using a USB external hard drive. I was able to install RHEL6 using an iso from the NFS data store. The problem is that I can't install Windows by using an iso of Windows 7. Whenever the virtual computer is booted, it aims to achieve crashes and boot TFTP. No iso standard is detected. I tried the following:

    1. guaranteed 'Connected' and 'Connect at Power On' options for CD/DVD are verified. However, I have noticed that when the virtual machine starts, the 'Connected' for Windows option becomes not controlled. This is not the case for the Linux VM.

    2. change the boot order in BIOS to the first boot from CD/DVD.

    3. uncontrolled ' connect at power on "for network adapters.

    Even after these changes, VM trying to do a start-up network and TFTP.

    The next thing I did:

    4 network cards removed from the BIOS (by changing configuration).

    For the moment, VM does not network boot attempt, but complains that the operating system was not detected.

    Few details on the NFS datastore:

    1. 1 TB external USB with 2 Configuration of partitions ext4 as an NFS share to the RHEL6 server on the same machine.

    2. NFS configured correctly because I can install from an iso RHEL6 very well.

    Am I missing something? Nothing wrong with iso of Windows. I used it elsewhere. Also tried a different iso Windows without success. Help, please. Thanks in advance for your time.

    Kind regards.

    As the ISO for the operating system files are big and sometimes take a considerable amount of clusters on the hard drive make a control office (or a scan of the drive) can fix corrupt ISO file. and to make sure that your ISO is not corrupted try to open it with Winrar and extract a file from it.

    Yours,
    Mar Vista

  • MS on NFS and iSCSI with cluster?

    Clustering for vSphere management guide States that only storage fibre channel is supported of the CF/MSCS clusters, so I'm sure I already know the answer to that, but...

    I have a client who wants to run a NetApp storage solution active cluster failover, configure the disks in the operating system for Windows 2008 R2 servers on an NFS volume and application driven on a DRM iSCSI data.  IOPS / s and network side, someone set up a cluster like this before, and if so what were your experiences?  I'm just trying to get an idea of what kind of potential problems to wait on the road.  I also assume that VMware is not going to support because they advise against it expressly.  Thoughts?



    -Justin

    According to me, that it will simply not work. ISCSI initiator Software in ESX/ESXi does not support persistent reservations SCSI-3, which is required by MSCS on 2008 and above. With the help of the RDM will not change it. I don't know if iSCSI HBA will work.

    The workaround for this is to use software iSCSI initiator inside 2008. Operating system can sit on NFS datastore. Quorum and the data must be on iSCSI LUNS connected via the Windows iSCSI initiator.

  • NFS datastore = > no host not connected, impossible to remove

    Hello

    I have a NFS datastore (it was an ISO repository), I need to delete.   So I deleted all records from this share

    My problem is I disassembled it all hosts and the data store is still visible in my inventory and I am unable to remove it.

    When I try "Datastore Mounte to... additional host", the Wizard run in an endless loop and does not load the list of hosts.

    On my hosts, the NFS share is not visible. So nothing stuck because of a file in use.

    Have you already encountered this problem?

    Sorry found the culpit... instant on the virtual machines (with mapped CD-ROM).

  • ISCSI performance with HP DL160 G6 and ESX 4.0

    I feel iSCSI performance with HP DL160 G6 and ESX 4.0 servers and would like to help with this.

    I have Setup iSCSI on 3 * ESX 4.0 host.

    Each have a corresponding single target and a single LUN on the SAN. The LUNS are formatted as fmfs3 types.

    Each host is connected to the QNAP San in the same way via a Gigabit switch.

    All NICs operate at 1 GB.

    To test the performance I use vmkfstools to clone a drive from the local store to the LUN

    On the first host (a DL140 G3) performance is good to 48 MB/s.

    On the 2 DL160 performance is very poor and barely reach 2 Mbps

    During cloning, DL160 servers save also on several occasions by the following in/var/log/wmkwarning:
    Vmkernel servername: 0:05:23:53.214 cpu2:4308) WARNING: NMP: nmp_DeviceRetryCommand: device 'naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx': waiting for updated status of fast path for blocked with e/s failover. No prior reservation is on the device.
    Vmkernel servername: 0:05:23:54.195 cpu3:4210) WARNING: NMP: nmp_DeviceAttemptFailover: try tipping unit "naa.xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" world - order 0x4100051a20c0

    I made sure the servers are up-to-date:
    DL160 G6 Configuration:
    ESX 4.0.0 build-702116
    2 * NIC HP NC362i GB (firmware 1.7.2, driver 1.3.19.12.1)
    1 * smart Array P212 controller (firmware 5.14)

    The SAN server configuration:
    QNAP Ts-459u-rp (firmware Build 0302 T 3.6.1)

    I looked everywhere for an answer.
    I also tried the "esxcfg-advcfg - s 1/VMFS3/FailVolumeOpenIfAPD" as suggested on some positions.
    All to nothing does not.

    Any help / advice would be appreciated!

    is this software iscsi or nobody to support iscsi?

    Because all messages from vmhba0... Usually the software iscsi Setup with vmhba33 or vmhba36...

    Can you check vmhba0 keeps the volume...

  • ESXi v4 HP ML370G6 Performance issue - Nehalem has HT, cores to run 16 or 8 cores?

    I'm about to go live with my last system for a client. The G6 is all simply amazing performance even compared to the G5 that I use for my own business. Nothing is clear on the advantages and disadvantages of enable Hyperthreading on Xeon E5530. With 2 processors QuadCore, HT active gives me 16 cores. The free version of ESXi gives me a max 4 hearts SMP for any machine I create.

    I'm after what has worked well for other clients on the G5 and just double the virtual processors more. It remains to ask me if I win something using DO NOT HT. 24Gb of RAM, P410i/512 MB BBWC, 5x146Gb 10K DP ENT disks:

    • 1 kernel: Win2003x32 for AD

    • Core 2: Win2003x32 for the AD backup and print server (16 printers)

    • 4 core: Win2003x32 a little misbehaved 16 accounting software that now will not require turning off the other stuff to restart the ancient shit!

    • 4 core: Win2003x64 for file server - his own Enet Go through port 4 that comes with the G6 - virtually without any traffic to other servers

    • Basic 4: Win2003x32 & SQL Workgroup

    • Core 2: Win2003x32 & Exchange 2003 for the half company - due 16 GB limit on store

    • Core 2: Win2003x32 & Exchange 2003 for the half company - due 16 GB limit on store

    • Core 2: ClarkConnect (ClearOS) (Redhat 5) mail server Antispam, firewall, AV, ftp server.

    It seems to work well in my tests, and the two Exchange servers work much better than the ML310G3 of the specialized units that are now extinct. There are a dozen of heavy users and a dozen more. A bunch of remote users who primarily use the groupware in the Clarkconnect box.

    Any input welcome before going live this weekend.

    GregBradley wrote:

    Sounds like I should take into account the box 8 CPU or, as G5 (2 x Xeon 5440) units. Does this mean I have to turn off HT in the BIOS? Gives me 8-processor "faster" rather than "slower 16"?

    Generally, there is no benefit to disabling HT on Nehalem (in stark contrast with previous implementations HT).

  • HP BL460c slow performance with Mez Qlogic FC.

    I am under 16 blades ESXi with Qlogic FC mezzanine cards.  Performance have been greate to the Build number:

    153840 on March 20, 2009.   After the upgrade to the Build number: 153875, qlogic module took more than 30 minutes to load and later all the modules load very, very slowly.  I tried every generation since March 20, and they all lead to very slow startup time and slow CF performance with lots of timeouts.

    Today, took me to update to the latest version of the HP BIOS for server, network cards, Qlogic and the ILO.  And I had no luck.  Does anyone have any suggestions?

    Thank you

    you use virtual connect and if so what version do you use?

  • Moving to VM - NFS Datastore - no vmotion - invalid

    Hey people,

    Having a problem here to move a virtual machine from one ESX host to another (with VC).  First of all, let me tell you that I don't have vmotion (working on fixing that), but I have no shared storage (NFS datastore).

    If the virtual machine is hosted by esx1 on this NFS data store.  I stop the virtual machine and remove it from the inventory.  Then, I go to esx2 and browse the data store.  Find the vmx file and add to the inventory.  Then, the virtual machine appears in the inventory, but is grayed out with (invalid) beside him.

    I'm sure I could add a new virtual machine and use the existing vmdk files like discs, but I would rather simply add to the inventory with the existing configuration.

    Is this possible?

    Thank you very much

    Grant

    -


    Without vmotion you should always be able to migrate cold the VM - Power Down the VM - right click on the virtual machine name, select migrates - select another ESX host - you can change the storage or leave it where it is at.

    This will allow to get cold migrate the virtual computer with the ing to remove and re add to the inventory of VC.

    If you find this or any other answer useful please consider awarding points marking the answer correct or useful

  • How to open zip files that are included with various tutorials?

    How to open zip files that are included with various tutorials?  Everytime I try I get the message, "It seems that the file has been moved or renamed."  I am a total novice with this stuff!  Thanks for your help.

    On Windows? Right-click on the .zip file that you want to unpack (unzip) and click on "Extract all" from the context menu.

  • For NFS Datastore vSphere alarms

    So I would like to create an alarm that corresponds to a (not state) event that fires when a NFS data store is disconnected.  I found the trigger "Interruption of the connection to the NFS server", but it doesn't seem to work at all.  Also, I would only triggers the action when the host is not in Maintenance Mode, because that would be very annoying for an outgoing call because a host has restarted for patches and generated alarm type disconnected "NFS Datastore.

    use triggers esx.problem.storage.apd. *.

    When NFS disconnects you will get official messages in the file vmkernel.log on the host

  • Sort by additional columns in vSphere Datastore Performance dashboard

    Hello

    I was wondering why I couldn't sort on one of the columns in the default dashboard for the vSphere "Datastore Performance". If I can't perform a sort on the values that I want to see that kind of defeats the purpose that he wanted to show?

    I can always manually scroll vertically to see the bottleneck manually, but it would be nice to have the ability to sort.

    Anyone know if it possible in some way, I have not discovered?

    SortingInAdditionalColumns.jpg

    Kind regards

    Erik Alm

    You are right that additional columns added to the widget resource attributes cannot be sorted (asc or desc). I share your frustration in that. For the moment, there is no way to do this in vC Ops 5.8. I look forward to changes in future versions that allow this feature.

Maybe you are looking for