Pages

Sunday, August 4, 2013

Understanding vmkernel routing

The vmkernel routing mechanism works largely the same as any standard unix routing table, with a few twists.

Lets start with the basics of how routing works.  Routing is done based on the destination address.  That address is compared with all of the networks that are specified in the routing table.  Based on the network address, and the subnet mask, if there is a match, the vmkernel will use that interface to send the packet out.  The last entry will be the default gateway, which matches all remaining addresses.  You can view the vmkernel routing table using esxcfg-route -l (esxcli network ip route ipv4 list in 5.1).  In order to get a clear picture, I'm going to show the configuration of the vmkernel interfaces as well. (click on the image for a larger version)
In this configuration, I have four vmkernel interfaces, they are labeled for management, iSCSI, vMotion, and heartbeat.  Keep in mind how they are labeled has nothing to do with how traffic is actually sent.   Basic vmkernel routing is very simple, lets say the host wants to communicate with the ip 192.168.2.20, based on the routing table, this traffic will go out interface vmk2.  If however we want to communicate with 172.20.2.10, because we don't have a local entry that matches, we send it to the default gateway, and from there it will get to it's destination (hopefully).  Notice that even though I have four interfaces, there are only three local entries in the routing table.  This is because I have two vmkernel interfaces on the same subnet.  When there are two interfaces on the same subnet, the first one create will always be the one used for outgoing traffic.  More on this later.

There can be only one (default gateway).  Despite the fact that there is a screen in the vSphere client that says "Default Gateways" and that it shows up on every vmkernel interface you create or edit, there is only one default gateway.  The vmkernel does not do any type of dynamic routing.  So any time you see this screen, you are always editing the same default gateway.
So don't worry about the fact that the default gateway listed isn't on the same subnet as the interface you are editing.  Unless you are editing the management interface, it shouldn't be.

Okay, that is the easy stuff, now lets look at what makes the vmkernel routing table a little more exciting.  We'll start with this screen.

 See those checkboxes that say "Use this port group for".  You might think that if you check the one for vMotion then the vmkernel would send all vMotion through this vmk, and if you select the one for management traffic, then that vmk would be used for management traffic.  You would be mostly right, but it's not quite that simple.  Let's start at the top.
Use this port group for vMotion.  More accurately what you are saying is "have vCenter tell the other host that this ip is the one to use to initiate a vMotion on."  That is a bit wordy so I can see why they don't use it, and the difference is subtle.  Where it matters is if you have two interfaces on the same subnet, one used for vMotion, and for instance, one use for management traffic.  Part of your vMotion traffic might go out the wrong interface.  More information on the two interfaces same subnet issue at the bottom.
Use this port group for Fault Tolerance logging.  Mostly the same as the vMotion interface, however be aware that what VMware means by "logging" is the vLockstep data that is sent from the primary VM to the secondary VM.  This is a lot of data, plan accordingly.
Use this port group for management traffic.   This one is a bit misleading.  What exactly is "management traffic"?  It has nothing to do with the vSphere client, or vCenter.  I can (and have) connected both the client and vCenter server to the vMotion or iSCSI interface.  As long as I can get to that subnet.  This can be very helpful when trying to resolve an issue where the connection to vmk0 was accidently dropped.  What VMware means by "management traffic" is actually "HA traffic."  Any interface with this box checked will be used for HA heartbeats.  Aside from vmk0, I usually create a secondary heartbeat interface on the same subnet as my VMs, because really, that is the subnet I want to make sure is up.  Aside from this, I know of no other purpose of the "management traffic" checkbox.

None of these check boxes affect the routing table.  They are only there to tell other hosts which IP to communicate to, they have nothing to do with out the vmkernel routes outgoing traffic.  The one possible exception to this is if you are using multi-nic vMotion.  I have not done any expirementing with multi nic vMotion yet.  I will save that for a future entry.

iSCSI Port Binding
One thing that can affect routing is iSCSI Port Binding.  If you are using software iSCSI, and you bind vmkernel interface to the iSCSI initiator, then how your iSCSI traffic communicates is now handled by the iSCSI initiator, not the vmkernel routing table.  Just enabling iSCSI is not enough, you have to add the vmk to the iSCSI initator using the Network Configuration table under the storage configuration.
If you have not followed these steps to bind your interfaces to your iSCSI initiator, and you have more than one vmk configured for iSCSI, you are currently only using one of them.

Now, the biggest cause of problems with vmkernel routing... (drumroll please...)

Two interfaces on the same subnet
If you have two vmkernel interfaces on the same subnet, the first one created will always be the one used to send traffic to that subnet, regardless of what the traffic type actually is.  This can lead to strange network issues, and means you should follow one simple rule: Every traffic type should have it's own subnet.  For instance, one subnet each for management, vMotion, Fault Tolerance, vSphere Replication, and IP storage.  Only two of these traffic types support multiple vmk interfaces.  vMotion and iSCSI.  I don't know much about Multi NIC vMotion yet, but iSCSI can have multiple vmks on the same subnet if you are bound to the iSCSI initiator.  If you break these rules, bad things can happen.   For instance, if you have management and iSCSI on the same subnet, your iSCSI traffic might end up going out your management vmk.  This can be very bad news if your iSCSI vmk is bound to a 10GB NIC but your management vmk is bound to a 1GB NIC.  This is a somewhat common problem on hosts that have been upgraded from ESX to ESXi.  Or lets say you have your iSCSI and vMotion vmks on the same subnet.  Only half of the vMotion connection will get created correctly (the other half accidently going out the iSCSI interface) and you will get vMotion network errors.   There are more wacky ways that having multiple traffic types on the same subnet can cause problems with networking.  I've seen plenty, but I am sure there are others that I haven't even imagined yet, so don't do it. 

No comments:

Post a Comment