Failure of the Cluster stretched VSAN scenarios

I read of CHogan valuable document VSAN stretched Cluster I guess I need more help on the following issues

Locally, the reading is 50% in the site where the computer virtual is deployed and 100% on the site of the replica?

I have to install the witness on a geographical site or it will work on sites with different cluster?

Thank you

Town of reading is 100% on the very site where the virtual computer. In other words, the virtual machine is always reading in the same place and not cross the link between the sites.

You need a 3rd site for the witness. You cannot place it in one of the other sites, because if this site does not, you have lost the majority of the components (replica 1 + 1 = 66% control), so that the virtual machine is no longer accessible.

Tags: VMware

Similar Questions

What is failure of the cluster?

I want to know the definition of Fail on the cluster and why we we there?

It's normally a network configuration in which, if a single machine in the cluster (or set of computers) fails then another machine will resume the work of this computer failed on a hot standby basis (IE so that all work that has been processed by the machine failed is not lost).

It ensures that work is not lost.

Failure of the master node and the backup in VSAN 6.0

Just curious as to what will happen to VSAN Cluster if master node and backup is omitted VSAN 6.0 cluster. Lets assume there are 10 esxi hosts in this cluster.

It assumes that you implement a FFT = 2, which means all your VMS survive this double failure.

Another host in the cluster will be the responsibility of the master, and the other will take responsibility of the backup.

This may take a little longer than normal, as the new master and the backup to learn more about the configuration and rebuild the contents of the directory (that's what the backup node already for a quick conversion to master the master goes down).

Other than that, affect without side effects.

6 host Cluster VSAN - I want to change the IP vmkernel VSAN

Hello
as the title says, I have 6 cluster host VSANS 6.2 (with some VMS on the store of data VSAN, off right now). What is the best method to change the addresses IP of VSAN vmkernel, without loss of data...
Someone did he do such a thing? The last byte will change slightly to the decline in the number... None VLAN / subnet etc. changes... I have just change the VSAN vmkernel and change the last octet...
See you soon
Paul.

I re-IPed hosts and their IP of vmk corresponding VSAN in maintenance mode as you describe. With all that in maintenance mode, you can just go and change it. I don't think that there is no danger of data loss. If you make a mistake and all start, it would detect split partitions network or other network through the assessment of health problems and you would have a data store does not work until you fix the network problems.

Common cause of failure of the Mirage server in clustered environment and how customers will be switched to the other server in a cluster

Hello
Can someone share me information about common cause of failure of the Mirage server in clustered environment.
And how customers will be switched to the other server in a cluster to continue their operations from the server failed.
Kind regards
C Bathesha

In general, Mirage servers are not lacking. It is very rare (and not, for example, to problems of storage or endpoitns, which are more common).

May raise problems of overload, memory or hardware malfunction too little.

After that you make server standard troubleshooting (the etc system event log), you must file a Service request by VMware.

Error "host cannot communicate with all other nodes in the cluster of enabeld VSAN.

Hello community,
We have a problem (?).
We have a cluster VSAN enabled with four hosts. Everything seems perfect,
-the configuration is good,
-Displays the page state VSAN "network status: (green arrow) Normal."
-Displays the disk management page "status: healthy" for all of our groups of disks.
-Same 'esxcli vsan cluster get"on each host returns a 'HEALTHY '.
But we have a yellow exclamation littly on each host 'host cannot communicate with all other nodes in the cluster of enabeld VSAN.
Anyone with the same problem? Anyone with an idea or a hint?
Thank you!

Update vcenter to the latest version and the error disappears. Problem solved! The 'old' version of vcenter performed since September 2014, strange.

Thank you very much for your help!

HA - Admission Control-number of failures of the host in the cluster can tolerate

I currently have a 9 host cluster with the game of "Number of failures of the host in the cluster can tolerate" 1 and the VM to be powered even if they violate... 'allow' checked.
When I look in my VI client to the HA cluster box it says "Current ability to failover" 7 and "Ability to failover" set up 1.
The 1st statement really limits my cluster just 1 instead of the host or 3 of my guests would die if the VM to be powered even if they violate... "allow" ignore it? I know that you can set for a maximum of 4 failures of the host, but who does not eat resources on other hosts?
I think I know the answer, but for some reason I guess any second myself.
Thank you!

> Does that mean that since I do 'Allow Virtual Machines to be powered even if they violate constraints of availability' have set up that it does not matter how many host fail? He will try to turn on VM as much as possible? Thank you

Exactly

ODI scenario failure during the passage of variables as parameters in the scenarios of the child

I have defined a script in such a way that the variables in this scenario will pull data from other tables of configuration and pass those values as parameters to another this scenario in the first scenario of parent. I tested in DEV and DEV, test environments that are shared with the same topology with repositories different works. It has worked well. But when I moved the code to the QA environment, then I get the following error message. While attemption to run the scenario with the following command,
Execution code:
OdiStartScen '-SCEN_NAME = XXXXX ""-SCEN_VERSION =-1 ""-LOG_LEVEL = 5 ""-PROJECT_NAME. "" "" PASS = # PROJECT NAME. V_PASS""-PROJECT_NAME.» DB_URL = #PROJECT_NAME. V_DB_URL""-PROJECT_NAME.» DB_SCHEMA = #PROJECT_NAME. V_DB_SCHEMA""-PROJECT_NAME.» DB_LINK = # PROJECT NAME. V_DB_LINK""-PROJECT_NAME.» DB_USER = # PROJECT NAME. V_DB_USER ".
Error message:
oracle.odi.oditools.OdiToolInvalidParameterException: error when setting the parameters on the tool
at com.sunopsis.dwg.function.SnpsFunctionBase.getCoreOdiTool(SnpsFunctionBase.java:618)
at com.sunopsis.dwg.function.SnpsFunctionBase.getSunopsisApi(SnpsFunctionBase.java:494)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.executeOdiCommand(SnpSessTaskSql.java:1431)
at oracle.odi.runtime.agent.execution.cmd.OdiCommandExecutor.execute(OdiCommandExecutor.java:44)
at oracle.odi.runtime.agent.execution.cmd.OdiCommandExecutor.execute(OdiCommandExecutor.java:1)
at oracle.odi.runtime.agent.execution.TaskExecutionHandler.handleTask(TaskExecutionHandler.java:50)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.processTask(SnpSessTaskSql.java:2913)
at com.sunopsis.dwg.dbobj.SnpSessTaskSql.treatTask(SnpSessTaskSql.java:2625)
at com.sunopsis.dwg.dbobj.SnpSessStep.treatAttachedTasks(SnpSessStep.java:558)
at com.sunopsis.dwg.dbobj.SnpSessStep.treatSessStep(SnpSessStep.java:464)
at com.sunopsis.dwg.dbobj.SnpSession.treatSession(SnpSession.java:2093)
at com.sunopsis.dwg.dbobj.SnpSession.treatSession(SnpSession.java:1889)
to oracle.odi.runtime.agent.processor.impl.StartScenRequestProcessor$ 2.doAction(StartScenRequestProcessor.java:580)
at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:216)
at oracle.odi.runtime.agent.processor.impl.StartScenRequestProcessor.doProcessStartScenTask(StartScenRequestProcessor.java:513)
to oracle.odi.runtime.agent.processor.impl.StartScenRequestProcessor$ StartScenTask.doExecute (StartScenRequestProcessor.java:1066)
at oracle.odi.runtime.agent.processor.task.AgentTask.execute(AgentTask.java:126)
to oracle.odi.runtime.agent.support.DefaultAgentTaskExecutor$ 2.run(DefaultAgentTaskExecutor.java:82)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.Exception: disagreement quote...!
at com.sunopsis.core.SnpsObject.extractParametersLine(SnpsObject.java:174)
at com.sunopsis.dwg.function.SnpsFunctionBase.getCoreOdiTool(SnpsFunctionBase.java:580)
... 18 more

The variable values for this new environment a probably double-quotes ("") or a special character which is to 'break' the command line. Use this trick ODI to understand all the variables values used in this step and use the values of the test in a procedure separate with just values, not variables:

http://devepm.com/2014/02/28/execution-variables-trick-for-old-versions-of-ODI/

It will be useful.

failure of the system, looking for answers

Need help for the guys. Had a major failure of the system last week, and I'm a bit confused as to what happened and why I was unable to get free. I lost a bit of my confidence in Vsphere, and I'd like to renew that.
A bit on the environment, esxi 5.0 U1, with servers HP Gen 8 3 node cluster running. HP P4500 SAN iscsi for vmfs data storage. ESXi is installed on HP's CF cards in servers. This cluster has been upward and running for about 6 months, not a single hiccup before that.
Wednesday we have migrated 2 of our vm (SQL and progress) to this cluster 5.0 database, the virtual machine went on different hosts in the cluster. The virtual machine was running on a cluster 2 older nodes with esxi 4.1 U2. Machines migrated fine, fine tools updates, reboot, everything seemed fine.
About 2 hours after that the migration, we started to receive calls that our Progress database is down, users could not connect. Then we started getting more calls, other machines were inaccessible. Glancing from vcenter I could see all the virtual machines in question were on the same host, and I was unable to open the console in vsphere for VMs on that host. The host showed he was connected, showed all the vm connected, but I could not open the console or the desktop remotely to one of the VMs on that host. I started studying, and of course this host fell all the SAN Iscsi network connections. The railways have shown dead. Ports of nic card viewed active, the switch ports showed activity, but the connections were down. I could always ping the address of management for the host however, and ports for vmotion were in place.
At this point, I started to try and vmotion VMs on that host data base progress, he would not migrate, just sat at 8% prepare to migrate. I tried other virtual machines with the same result. I started to wonder why HA had not kicked, and why I couldn't move anything. At this stage the host started disconnecting from the cluster. I could always ping on the host, but vsphere showed as disconnected. I couldn't move my VM, and I couldn't go to the host via vcenter, via the vsphere client pointed directly to the host, or by using the DCUI.
So I called VMware support, got an engineer on the line with me, and it became clear we were going to have to power cycle that host and crash all virtual machine running on it. It wasn't a very pleasant for me answer because this database of progress is our main production system, and I was afraid of corruption. We had no choice, so we did. When the host came back upward, fortunately the virtual machine came very well. VMware engineer digging in the newspapers and said that this was with a particular NIC card driver with known issues. He showed me the KB on this issue, and it seemed to be a known issue. We have updated these drivers on all hosts, and that's all.
My problem with this is that, how is the cluster ran fine for 6 months without problem, and how come the redundant path to the SAN did not keep the connection active when with the path with the bad nic card driver failed? I have 2 different, with 2 NICs different paths for the Iscsi SAN. The other card had no known driver issue. Why both paths failed not just in cause of a driver on one of the cards problem? More worrying is also how is it that I could not immigrate anything, and why no HA kick in?
Sorry for the novel here, but without all the details is not part of a story. My biggest concern, that's why I couldn't move anything? In the event of a host failure, what you're supposed to do in order to migrate the machines if they don't migrate via vsphere client? We were down for about 2.5 hours, and a lot of questions were thrown on me the senior management as to why my "system available, redundant high" took hours to retrieve...
Guys here any ideas, thoughts on how I would have handled it differently, reasons why I should be confident everything is fine now?
Thanks for your time
Kevin

trink408 wrote:

Thanks Matt.

I guess some of the problem is not completely understand how the HA or Vmotion. I was under the impression once that the virtual machine was not reachable HA wouldn't kick and move the virtual machine? The virtual machine was not to ping requests or accessible by the Office remotely. I couldn't get their power off or do anything through vcenter either.

If they had no link usable storage, this seems possible. HA simply does not address the failures of storage. Not at all. It IS possible to activate at the level of the VM HA, but it isn't on by default - must be activated on a per-VM basis. You should read the book by Duncan Epping on this - it's the bible of the AH.

So in case something happens with the connection of storage, you have really no way to vmotion anything right here, and your only option is to kill the host / VM running on it, and then migrate or leave HA move them?

Fix.

I didn't know that I wouldn't be able to vmotion virtual machine if the host has lost its connection to storage. The other hosts in the cluster saw storage.

The source host is controlling these VMDK, as far as the other guests are concerned (they see a lock on the file) and when they ask the host if its still alive, he answers (because it has not powered the network lost or down, which are the failure mode THAT HA is designed to handle). So they take charge.

The reason why it took so long was because I didn't kill the host with my concerns for the database. I was hoping that the VMware engineer may have a way to gracefully close things down. So he spent some time looking around and trying to determine what was going on. Ultimately we just power cycle the host, so I could have done much earlier, and if we are facing this again in the future, I would.

As long as you follow the seller advised for more decent databases (keeping the transaction logs, etc.), don't have good backups, theres no real risk of data corruption. VMware does not significantly change the i/o path, in order to have the same exposure than on a physical host.

I still do not understand why the two paths were marked as dead and the host completely lost connection to the SAN, as well as to show finally disconnected in vcenter.

Well the host went offline because it got stuck in an all-paths-down scenario, which is common for 5.0. 5.1 solves this problem a little. I don't know why all roads fell, but I suspect that you have a misconfig somewhere... normally you expect at least 4 paths in a system set up correctly on the left. Check with HP to ensure that you follow best practices.

Maybe all the IP stack has been corrupted or something?

Possible, of course, but unlikely. Never heard before.

I appreciate the help and the preview, I thought that the problem was a lack of understanding on my part and fully accept it.

HA and vMotion are complex. At least now you can go back to mgmt and tell them why he fell more (because it was a failure outside the scope of the software scenario) and may ask for money to build a proper SQL cluster. Definitely recommend the book of Duncan Epping: http://www.amazon.com/VMware-vSphere-Clustering-Technical-Deepdive/dp/1463658133/ref=la_B002YJMRCY_1_2?ie=UTF8&qid=1363622697&sr=1-2

How to set the cluster error in postexpression?

Hello

I created a c language #-driver that returns a 0 for the pass or - 1 for failure in the functions 'int MyFunction().

Now I use this function for teststeps.

Question is: How can I use this returnvalue to set the cluster error?

So that one - 1 causes an error.

I think it can be done somehow in the post expression.

How can I put a

If (returnvalue == - 1).

{

Result.Error.Code = 10100

Result.Error.Msg = "an error has occurred."

Result.Error.Occured = True

}

Thanks for help

Hi OnlyOne,

Check out this example (stored in TS4.0)

The tower is done using a breakets conditional and literal.

Locals.nReturnValue is-1? {Step.Result.Error.Code = 10100, Step.Result.Error.Msg = "Error occurred", Step.Result.Error.Occurred = True}: {}

Concerning

Jürgen

How to use the cluster VI elements in TestStand

I have a LabVIEW VI, which has an output cluster containing 10 hooks and 10 Boolean values. In TestStand 4.1.1 I inserted a VI of pass/fail in my test sequence and that related the LabVIEW VI VI of success/failure. I have seen all the strings and Boolean values listed separately in the table of settings under the tab module. I wanted to have each of these cluster items listed in the report, so in the value fields, I inserted Step.Result.ReportText, but elements of the cluster were not included in the report.

I then tried to turn the cluster in a custom, data type that has been a success. In TestStand to sub tab module to LabVIEW adapter, I created a customized data type under space corresponding to the cluster of exit from VI. If I go in the variable pane of TestStand, I can see the custom data type and each of the 10 channels and 10 Boolean values are present in the form of individual variables. I tried right clicking on the variables of cluster and selecting Properties and then go advanced and checking the PropFlags_IncludeInReport box, but the values of variables have not appeared in the report.

What I want to do is to have each of the string values and Boolean values appear in the test report. So what I am doing wrong?

Hello

You can use the additional result.

You assign inhabitants out of the VI and then you can insert an Additionl result for this step. See in the parameters of the step.

Don't forget to insert your Local in the area concerned in the additional result.

http://forums.NI.com/NI/board/message?board.ID=330&message.ID=22838#M22838

The link can also help

Concerning

Ray Farmer

Failure of the motorway E - there are network connectivity problems or this peer has different support encryption (AES / non - AES software)

I have two websites and a cluster CPU as below:

site - 1

CUCM pub-1

Pub 1 CUC

imppub

exp-c-1

exp-e-1

site-2

cucm2

CuC2

IMP2

exp-c-2

exp-e-2

!

I do a bunch of highway E and C, but it I show error on cluster of highway E:

Failure - there are network connectivity problems, or this peer has the support of different encryption (AES / non - AES software)

This highway is part of a cluster, but is not the master of the configuration. Configuration changes made on this highway may be lost. More information about the Clustering help page.

!

What is the solution to fix it?

Locally add cluster of highway or having to make cluster of 4 motorway between locations?

How far are the peers of the highway on the other, what is the round trip time? This should be within 30ms.

Can you confirm that all highways in the cluster have the same keys option installed on all peers, as it is a requirement, call license quantities can be different, but the features option enable/disable keys must be the same. Additionally, make sure that the version of the software installed is the same, that you have a version mismatch or could be active where encryption the other not.

Regarding the "this highway is part of a cluster, but is not the master of the configuration... '. "this message is normal for a node that is not the master, as it is said just only make changes on the master, since all changes on the slave will be replaced by the master.

Suggest you watch on creation of Cluster Expressway and maintenance Deployment Guide (X8.8) in case you have not yet.

ISE PSN node will not be joining the cluster

Hi all

Has anyone seen a problem where an NHP cannot join the cluster?

We join node of PSN

-Node is saved successfully (current synchronization)

-1 hour later - node replication failure.

-Replication synchronization failed because the secondary database is down

I have a client where admin node and PSN are separated by the firewall.

We let in two directions

Admin <-->PSN

ICMP

HTTPS

1521

Firewall not showing drops.

DNS and NTP are ok.

Current topology is 1 NHP, 1 Admin node.

Works very well in our test lab, but not clients environmnet.

See you soon

Peter.

Thank you for the update we and good work on the search for the solution! You should probably mark it as resolved now

In addition, it is quite rare (at least for me) for nodes of ISE to be separated by firewalls. There are a lot of ports/protocols that must be opened between them is usually more of a pain to manage. In addition, sometimes ports will change too. For example, the fueling port agent has been changed not too long ago...

Thanks for the note!

The problem of the distribution of the data in the cluster of NoSQL Oracle databases

Hello
I write (about 10 G) data in an Oracle Databae NoSQL cluster (which consists of three nodes: ud1, Node2, node3). It has only one table in the DB and I create an index on a field.
The amount of data in each node is about 16G.
Then I write the same data (about 10G) in an Oracle NoSQL Database (Only one node) and I create an index on the same lot.
But the amount of data is about 46G.
So I assumed that each node holds no data complete. Then if the failure of a node, the cluster can still work but some data cannot be queried. Am I wrong?
There is an intersection in the form of database on each node?

You should take a look at the documentation. For example:

http://docs.Oracle.com/CD/NoSQL/HTML/AdminGuide/introduction.html

-mark

Host cannot communicate with all other nodes in the cluster virtual SAN allowed

I get this error after you apply the latest patch:- http://kb.vmware.com/selfservice/microsites/search.do?language=en_US & cmd = displayKC & externalId = 2135115
So all the hosts are now review: VMware ESXi, 6.0.0 3380124
After the reboot, I get the error message on the host cannot communicate with all other nodes in the cluster active virtual san...
However, the VSAN health check is green, 'get esxcli vsan cluster' shows all 6 members of the cluster ok... and if I reboot a crowd happens ok without the error... so if I reboot another host, it will come without the error... but then the host on which I rebooted before it displayed the error...? So I can't ever get more than 1 host without the error after a reboot.
I checked multicast that removes ok, in fact all of the checks in contradiction with the fact that why I get this error...
Someone at - it ideas? Could this be the latest patch...
Paul...

Hello, this has been repeated here: 6.0 U1b - hosts cannot communicate thanks, Zach.

Failure of the Cluster stretched VSAN scenarios

Similar Questions

Maybe you are looking for