VMware HA problem with isolated host.

Hello, we have two IBM x 3850 M2 running ESX 3.5 U4 (153875).  Both are attached via NAS (NFS) to an IBM N3600 (Netapp FAS2050C).  Each server has two NETWORK adapter configured on their system console vSwitches (team) and there is an additional private network running for the storage and vMotion (with two NIC of each).

We have DRS and HA enabled for our cluster with two nodes with the following parameters of HA:

  • Host allowed failures: 1

  • Enable the VMs to be powered even if they violate constraints of availability

  • VM restart priority: medium

  • The host Isolation response: stop the virtual machine

  • Enable VM monitoring (high)

If I pull the power on one of the hosts, virtual machines are automatically provisioned on the host survivor as expected.  However, if I simulate double NIC failure on one of the hosts by unplugging both the System Console env, we lack in the following behavior:

  1. On the host that has been isolated (prodsys-vm1), the logs indicate that the server has detected it is isolated and begins to shut down its virtual machine.

  2. The host of survivor (prodsys-vm2) notes that prodsys-vm1 disappeared.

  3. prodsys-vm2 saves the VM "isolated" and tries to turn on.  The following error message is observed for each VM has failed:

[2009-07-24 13:00:17.352 'vm:/vmfs/volumes/2e5dc29c-712e74ba/Test System/Test System.vmx' 3076461472 info] Question info: Cannot open the disk '/vmfs/volumes/2e5dc29c-712e74ba/Test System/Test System.vmdk' or one of the snapshot disks it depends on.
Reason: Device or resource busy., Id: 0 : Type : 2, Default: 0, Number of options: 1
[2009-07-24 13:00:17.352 'BaseLibs' 21044144 info] Disconnect check in progress: /vmfs/volumes/2e5dc29c-712e74ba/Test System/Test System.vmx
[2009-07-24 13:00:17.367 'ha-eventmgr' 3076461472 info] Event 82 : Message on Test System on prodsys-vm2.esri.com in ha-datacenter: Cannot open the disk '/vmfs/volumes/2e5dc29c-712e74ba/Test System/Test System.vmdk' or one of the snapshot disks it depends on.
Reason: Device or resource busy.

  1. prodsys-vm2 then unregisters each virtual computer.

  2. Wait several minutes, but no other attempts are made to register and/or marketing the virtual machine failed.

  3. Now, if I register manually an of from the prodsys-vm2 failed VM console, it is immediately and without further interaction with me under tension.  In addition, this seems to trigger the re-registration of VM chess which is then subsequently automatically switched on without error.

The obvious conclusion here is that prodsys-vm2 does not prodsys-vm1 enough time to stop the virtual machine before trying to restart.  I imagine that this could potentially be adjusted by getting the das.failuredetectiontime (I see a recommendation of the 1960s).

A few questions though:

  • Why don't prodsys-vm2 try again to register and start the virtual machine failed after the first attempt?

  • Why when I joined one manually it suddenly decided to register and start up of the rest on its own?

  • Is it possible to keep my time failuredetection low (for faster recovery) and still be able to avoid this situation?  I could see a situation where maybe even 60s would be high enough.  It seems that this should be handled with more elegance that just get a time-out value...

Of course, there are some fixes that might apply to our facilities and those who can give a try.  Will also lift it in support, but hoping someone out there might have some ideas.

Thank you!

Sorry,

I forgot the second half of this message:

VMware High Availability (HA)

Virtual Machines using a NFS data store could fail after an HA failover event

When you have the overcommitment of memory with virtual machines on a NFS datastore, it creates a vswp file, which is a size swap file non-zero. In this scenario if HA failover events occur and the AP are defined on THAT VM leave power on, you may have a failure of virtual machine on the host where the virtual machine was originally executed before the HA event.

If you don't have an overcommitment of memory with virtual machines on a datastore NFS, so HA failover events occur with the parameter THAT VM leave it turned on, in addition to the migration of the virtual machine running on the original host may fail.

Solution: Apply Patch ESX350-200905401-BG to ESX Server 3.5 and hosts Patch ESXe350-200905401-I-BG of ESX Server 3i version 3.5 host computers.

When a Virtual Machine running on a NAS data store is configured to be stopped or left turned on in response to the isolation of the host, the Virtual Machine may attempt to run simultaneously on two hosts an event of network isolation

Multiple network that causes failure host isolation and loss of access to the network for the data store, if a virtual machine is configured with the setting stop VM or VM leave it turned on in case of isolation of the host, the virtual machine may not respond indefinitely. As HA tries to turn off the virtual machine and restart on another host, two instances of the virtual machine may appear in the VI Client. There is no data corruption, because HA and VMFS properly control access to the data of the virtual machine, but the original virtual machine becomes inadmissible. After access to the data store is restored on the isolated host, the original virtual machine can be manually powered down.

Solution: In environments NFS or iSCSI, select power off the virtual machine as the response of virtual machine in a cluster by default if a host is isolated.

Tags: VMware

Similar Questions

  • Problem with Windows host process Rundll32

    I am also having a problem with the windows Rundll32 host process. It has stopped working. It is very difficult to find help for windows software problems. I paid a small fortune for my computer and software. It shouyld be a number to call for help.

    Hi Scott,

    (a) when you get this error message?

    If this problem occurs when you view and then try to set up an audio playback device disabled on a Windows Vista-based computer, follow the steps mentioned in the article.

    Error message when you try to configure an audio playback device disabled in Windows Vista: "Windows host process (Rundll32) has stopped working.

    http://support.Microsoft.com/kb/953916

    You can also follow the steps by SpiritX, see link below
    http://social.answers.Microsoft.com/forums/en-us/vistaprograms/thread/cbaa877c-9f51-45B4-9860-889e27a24226

    Please see the link below for Microsoft and support Customer Service

    How and when to contact Microsoft and support Customer Service
    http://support.Microsoft.com/kb/295539

    Please let us know if you need assistance, we will be more than happy to help you.

    Thank you, and in what concerns:

    Ajay K

    Microsoft Answers Support Engineer

    Visit our Microsoft answers feedback Forum and let us know what you think.

  • Problem with the host view

    Hi all.

    I have a problem with the display of my physical host information. I installed FglAM on the server, deploy the cartridge of the OS, create agent. But when I access the dashboard of the hosts, I see that my host currently does not monitor.

    Agent log is good (logfile to join). I see same alarm of this host, but no performance and other information.

    Thank you

    Nikolai, I recommend that you should open a case of pension for later analysis.

    David Mendoza

  • Problems with self-hosted web fonts

    Then I applied a web font self-hosted on my site and it worked. But recently in some browsers the police did not evidence, instead of it being replaced by helvetica or some other standard web fonts. Why would she do that? I tried to clear my browser cache and it doesn't work. It also happens to some of the people visiting my site which is a huge problem.

    Follow-up in the other post Hosted Web Fonts

  • Problem with the hosts file


    Hello

    Although I shbx the Hosts file on my desktop as advised on other discussions, that I erased the lines of containers Adobe, I use my pitch to the can't always Photoshop program. Step 'Necessary connection', when I click "Connector" I always get the message "connect to the Internet" although I'm connected.

    What I find weird, is that I should replace the old 'hosts' file so that I have now the old 'hosts' file type 'File' and the new file "hosts" of type 'text Document '.

    You read it again.

    http://helpx.Adobe.com/x-productkb/policy-pricing/activation-network-issues.html

    The rest, that we cannot know, since you did not provide system info or other details, nor exactly, tell us what changes you actually paid to, the hosts file.

    Mylenium

  • Problems with the host group and I suspect Kaspersky Firewall is causing problems.

    kaspersky firewall blocking homegroup sharing of suspect, please notify

    original title: homegroup

    You posted on a forum for users of the software program anti-malware for Microsoft Security Essentials (MSE).

    We cannot help you in this forum.

    Suggest you contact Kaspersky Support: http://support.kasperskyamericas.com/

  • The problem with Windows update - automatic updates service won't start not (error 126)

    original title: the problem with Windows update

    Please help me to get Windows Update operational return.

    XP Media Center Edition, Version 2002, SP3

    Had a problem with "Generic host process for Win32 has encountered a problem and needs to close."  Follow-up of the instructions on this link:
    http://support.Microsoft.com/kb/931852

    Seems to have cleared the error message.  So far so good.

    However, now a Windows Update does not work.  Security Center Windows were told to go to control panel > system and use the Automatic Updates tab.  But, if I go to control panel > system, the Automatic Updates tab does not appear.

    I also went to the control panel > Services.  Tried to start the automatic updates service.  Received this error message: could not start the service automatic updates on Local computer.  Error 126: The specified module could not be found

    I already followed several instructions on this subject:

    http://answers.Microsoft.com/en-us/Windows/Forum/windows_xp-windows_update/cannot-access-Windows-automatic-update-after/aeaec175-01f4-47ed-8F97-55b854af4220

    In the thread above, I was following the post on 30 August.  According to Option A, I completed step 5 for the uninstallation of McAfee Security.  At this time, my network card has stopped working.  On the recommendation of the officer, I posted to the Malwarebytes forum.  The officer determined Malwarebytes that I don't have a malicious software.  I ended up posting on forum McAfee to get my network card works again.  Then I reinstalled McAfee and start to run Windows Update.

    I started with Option A, step 8 to reset Internet Explorer using http://support.microsoft.com/kb/923737.  The difficulty it has worked very well.

    Then Option A, step 9 says reset Windows Update using http://support.microsoft.com/kb/971058.  When I ran the difficulty in default or aggressive mode, I got an error message saying: he cannot start wuauserv.  The message asks me to verify that the user has privileges to take this step.  Well, I checked through user accounts who am the only one user set up on this PC and I have administrator privileges.  Help me please run this difficulty or recommend another way to get Windows Update works again.

    [1]... looks like you are saying that my system is so wet that a clean install is the (only?) way to go.

    [2] if I do a clean install, not only what I need to reinstall Windows and all my applications, I also have to reinstall the drivers... Etc. ?

    [3] I think I have the disks supplied with the PC system. Is it better to use than to use the recovery partitiion?

    [4A]... is there a possibility that some of [my data are] infected?

    [4] preventing me from simply infect the whole PC again?

    A1. Yes, and it was my recommendation in August, too.

    A2. Yes.

    A3. Only not really well using the restore Partition hidden to make the "destructive recovery" will include all drivers, etc., that have been installed at the factory.

    A4a. Yes, a real possibility.

    A4b. You run at least three 3 tests on the backup data before restoring one (1) with your installed & updated up-to-date anti-virus application (for example, Microsoft Security Essentials) and two 2 with good reputation, free online scans (e.g., ESET;) Bit Defender; Kaspersky; HouseCall; (F - Secure).

    If you have saved your data on CD/DVD, the disc scanner. If you have backed up your data to a flash drive, SD card, or another external drive AND KB971029 is installed, analyze the drive.

    If you have any questions or need additional help, please start a new thread in this forum: http://answers.microsoft.com/en-us/windows/forum/windows_xp-system

    Once again, good luck!

  • Problem with Simple Contact form fields do not come into the home.

    Hello

    The form of a Simple Contact on our site is not letting users enter their names and e-mail addresses.  The other fields for cell phone and Message are working well.

    I though this might be a problem with the hosting, but the site is hosted on Business Catalyst (badfishy.businesscatalyst.com).

    I created other test sites and insert the Simple Contact form which worked perfectly, so I'm not sure what I did to the main site to cause this problem.

    Any help would be appreciated muchly.

    See you soon

    Ben

    Hi Ben

    I checked the site and it seems that the frame of the accordion is extended to the covering page so 2 fields in form because of which fields are not active because they are behind the plot of the accordion.

    Please reduce the accordion framework such as the fields of the form are not behind the frame and then it should work.

    Thank you

    Sanjit

  • Problems with the server of Cirrus cause this

    Hello

    We are have developed an application that uses the Cirrus server for two months now and today, we have difficulty with the Netconnection.call method we use to call the OnRelay method on the other peer. When a peer calls the calling method on the netconnection OnRelay on the peer is called from time to time (so it's really not reliable).

    We are curious to know if this could be related to recent problems with the Cirrus Server? If this isn't the case, which could be a reason the calls work only part of the time?

    Kind regards

    Kevin

    While the problems with our hosting provider are not yet resolved, the cluster must operate in a nominal way. which should include the correct and relatively reliable relay/onRelay function operation.

    Note that relay/onRelay is rate limited to approximately 1 relay 2 seconds (approximation with a sliding window and about 10 relay every 20 seconds). This speed limit is in place, because the function of relay/onRelay is expensive to provide and a developer he abused. the limit is implemented by dropping applications for relay for a while if the rate exceeds the limit point. the ceiling is by sender NetConnection.

  • Host of Ubuntu 12.04, OSX 10.7 comments. Problems with VMWare folder sharing :(

    Hello

    first of all, I have problems with copy/extraction of files in OSX in this folder. For example in the way to decompress. If I use other tools like Keka it works. No doubt Keka don't try to put things like times etc. ?

    Unpack the errors:

    creation: prog/bin/developer /.
    chmod (Directory attributes) error: operation not permitted

    inflating: prog/bin/Developer/test.dll chmod (file attributes) error: operation not permitted
    (warning) cannot set time

    It also seems that some applications have problems with the name of the directory. I guess that whitespace in files shared VMware. Why is - that someone would use white-space in such a record? Unfortunately, VMWare automatically chooses this name.

    Error when trying to load the project "/ Volumes/VMware Shared Folders/myproject/project.sln": the directory name is invalid.

    How can I fix these problems?

    Thank you!

    First VMware does not support running any version of Mac OS X in VMware Workstation and secondly the Apple of SLA for Mac OS X Lion, specifically article 2B (iii), must it be virtualized while running under Mac OS X Lion himself like a lot less than all of the other requirements of installed on the Apple brand hardware SLA etc.  So since VMware Workstation runs under Windows/Linux and your steps of virtualization of Mac OS X Lion on Apple hardware brand while running under Mac OS X Lion, then help may be provided to you that he might / should be considered as a violation of the VMware Community terms of use in the use of the scenarios you use OS X Lion.

  • problem with the addition of the hosts of DVS

    Hello

    I have a problem with addign DVS bed and breakfast. I can create distributed vSwitch, but I can't add all hosts. The only thing I see is an empty space, where I shoul be the hosts and their network cards.

    Can it be an effect of inconsistency in versions of vCenter Server and ESX host (vCenter Server is with U1 and ESX 4.0 without this update). Maybe I should update hosts before I will be able to add these interfaces to DVS.

    Thanks for any help.

    Enterprise Plus is required for distributed switches.  If you have a license below, you will be able to create them, but not to use them.

    You have Enterprise Plus?

    http://www.VMware.com/products/vSphere/buy/editions_comparison.html

  • VMware in network bridge: Freezed host with WS6.5 and Windows Vista x 64 or Windows 7 x 64 host operating system

    Hello

    I have a problem with VMware Workstation 6.5 on 64-bit Windows host.

    First, I installed VMware Workstation 6.5.0 118166 on my x 64 Windows Vista SP1 host, and I had a weird problem when working with network on my host system, no matter there is a guest running or not. The problem is when I try to connect/disconnect from VPN connections or plug/unplug my network from my LAN socket cable, etc., host OS freezes suddenly with no answer to anything whatsoever, no movement of the mouse, without strikes, nothing.

    After sometimes, I installed Windows 7 Beta 7000 x 64 on my computer and have no problem with it until I installed VMware Workstation 6.5.0 118166 on it: the gel problem once again. I thought it may be a problem on this specific build of VMware, so I downloaded and installed the latest version (6.5.1 126130) on my system, but the problem persists.

    After playing with a large number of settings, I find that if I delete the service "VMware Bridge Protocol" of my network card, the problem does not happen.

    I searched the forum but I can't find anything like that, so I wonder if anyone can help me with that?

    My network card is an edge ' Realtek RTL8168B/8111 Family PCI - E Gigabit Ethernet NIC (NDIS 6.0) b ' on my card mother Gigabyte GA-MA770-DS3.

    Other features of the system:

    AMD Athlon 64 X 2 5200 + CPU

    6.0 GB of RAM

    Seagate ST3500320AS SATA HARD drive

    nVidia GeForce 8600GT 512 MB video card

    Thank you.

    Shadmehr wrote:

    My network card is an edge ' Realtek RTL8168B/8111 Family PCI - E Gigabit Ethernet NIC (NDIS 6.0) b ' on my card mother Gigabyte GA-MA770-DS3.

    Which driver version you have installed for your network (NIC) card?   This link shows a version of driver Vista/Win2008 6.216 (dated 2009/02/26) available.  The same driver should work as well for Win7.

    If the above does not resolve the problem, view the properties for the NETWORK adapter and temporarily disable all 'Task Offload"or other characteristics"discharge"to see if that helps.

  • Problems with USB on Vista x 64 (host)

    I'm runnning x 64 Vista Home Premium on my laptop HP DV4-1225dx. I use Workstation 6.5.1. Every time I have try to start (guest) be it XP SP3 or the new Windows 7 Beta 7000 build I get this error:

    A USB host supported driver not found. If you have installed the software on the host USB traffic monitoring system, please delete.

    Virtual USB controller has been disconnected.

    I have followed everything installed USB. All of my related to USB drivers are signed by Microsoft, so I don't see why they are not supported. Unless its cause I'm running a 64 bit OS. I have searched the threads and found nothing that corrects this problem. I don't want my host operating system to a 32-bit, so why I am trying to get my computer to work. If there is a solution please point me in the right direction.

    This is a problem with a HP USB the driver module

    http://thecompudoc.blogspot.com/2008/12/host-USB-driver-problem-of-VMware-on-HP.html

  • Problem with the start of VMware ESXi 5.0

    I just installed VMware ESXi 5.0 on a new Cisco UCS B200 series blade with two 300 GB hard drives configured in a RAID 1 mirror.  I went through and completed the installation of VMware ESXi 5.0 on this server.  When the installation is complete and the server restarted, he did not initiate the ESXi where I can change the IP address and VLAN.  Instead, I get this text string after the initial boot sequence that is shown in the attachment.  I have a guest who said Shell > do not know why I can not start correctly in ESXi 5.0.  Thank you!  Paul

    Hi Paul,.

    Looks like you boot to the EFI shell. What is the startup policy that you have configured on this server service profile? It should look like the one below.  If there is a problem with the boot order, you should be able to type "EXIT", then enter on the EFI shell to exit the prompt. If your startup is similar to the one below and you still experience this issue, try to downgrade and re - ack the blade.

    Let me know if it helps.

  • Help for BBM blackBerry Smartphones & Facebook does not not for 9320 - problem with "host Routing Table? -Virgin

    So, today I received my new 9320 at Virgin Mobile, first Blackberry & love the phone! but I can't use the BBM or Facebook app...

    At first I could not even access the browser while that connected to my wifi... then I phoned Virgin & they helped me to reset the settings on my BB that I could use the browser etc...

    I thought it was problem solved, until I discovered BBM and Facebook, use app world separate service? Anyway, I phoned up to Virgin because I wanted to do this job, they are included in my package & I did not understand why I can't access any of them...

    After a long phone call the problem has proved that the "host routing table" was empty and (according to in Virgin) there is a problem with new BlackBerry receiving these details... they said this isn't a problem on the end there & told me there is nothing more they can do so to click 'register now' and wait for the details...

    24 hours later and nothing, so I hope someone here can help me, make me a BB the whole point is things miss me actually lol and I feel now I'm paying for a phone contract I can't really use it, without any help from my provider?

    Any help?

    Or

    Anyone with a new BB knows something like that recently? Thank you

    Wow... Virgin you gave really there. You see, you PAY for 100% of your services and 100% of your formal support... at the moment, they seem to be or you deliver. Only they have the ability (in fact the RESPONSIBILITY!) to degenerate RIM requiring improved support of cases (from your description, it must be that... with a HRT empty, nothing that anyone here can do). End users have no free path to receive assistance from the RIM at all - only via the escalation. So, what I would do if I were you, is their ring back... but this time do not let you fob OFF... insist that, because you HAVE them, you have a contract with them and they are about to be in violation of this contract - they must solve your problem, degenerate into RIM if they wish.

    Good luck!

Maybe you are looking for

  • HP Pavilion dv6-3310ej: Upgrade CPU

    Hi HP. I want to improve my processor. Current specification: Laptop model: HP Pavilion dv6-3310ej CPU: type of processor - Intel Core i5 CPU M 480 @ 2.67GHz BIOS version: F.29 OS: Windows 7 professional 64 bit manufacturer: Hewlett-Packard With the

  • Skin wireless keyboard

    HI people, Does anyone found a cover/skin for the new (iMac 21.5 "Oct 2015) wireless keyboard? The 'old' have smaller left and right arrows. Thank you Adam

  • How to remove AVG firewall in windows xp service pack 3

    I use windows xp service pack 3 I removed AVG Antivirus but can not remove the AVG firewall. Firewall AVG is not there to add and remove programs. In the Security Center, it is mentioned that the AVG firewall is running. I would like to remove AVG fi

  • HP Deskjet 3055A: connection Deskjet HP 3055 has to the University (eduroam) wifi network

    Hi, I tried to connect the printer with the wifi of the University network. But on the results of the diagnosis, it is said that more than one access point/router wireless was found which corresponds to wireless network (SSID). The results of the dia

  • the printer says it is offline. How to make it online?

    Sometimes, printer does not print because it disconnects. How can I get this back online? Please email me at {removed personal information}