Pages

Sunday, December 1, 2013

Playing around with Netflow on the vSphere Distributed Switch.

I'll start of this post by saying that I am not a Netflow expert, so I have a pretty steep learning curve here.  As with most things, I learn by finding all the available material I can, and then playing around with it to see if how it works matches my expectations.  I found very little primary documentation on Netflow in the vDS, so I had to do a lot of experimenting.

As of vSphere 5.1, the vDS supports Netflow v10, also known as IPFIX.   A flow consists of packets with the same source and destination ip addresses, ports, and protocol.  There are two flows for every connection, one in each direction. Basic information about the flow is then sent to the collector.  The collector is a third party solution that gathers the flow data and provides useful information to the network administrator.  In the vDS implementation, the information that is provided to the collector is the number of octets and packets.
(more information below the break)

Monday, October 14, 2013

Static vs Dynamic Port binding on the vSphere Distributed switch (vDS)

I've been playing around with port binding this week.  I will save ephemeral port binding for it's own post, as there is a lot more to it.  I'll also save port binding on the ESXi host until a later post, and focus on vCenter port binding.

Static Port binding is the default.  With static port binding a vnic is assigned a port with that vnic is placed in to the portgroup.  As long as the vnic is in the portgroup, it is consuming a port, regardless of whether it is turned on.  By default static portgroups are configured to be elastic so more ports can be added when you run out.  If they are fixed rather than elastic, when you try to add a vnic that exceeds the number of available ports, you will get an error message.

Dynamic port binding functions more like a standard switch.  It has been deprecated in favor of static port binding with elastic growth.  With dynamic port binding, a vnic is not assigned a port until the virtual machine is powered on.  That binding is maintained until the VM shuts down, at that point, the port becomes available again.  The vnic will have a preference for getting the same port when it's powered on, and other vnics will not use that port if they have another option, so it's sort of a "soft" static port binding.  The difference between the two does not become apparent until you use up all of your available ports.  First off, dynamic ports are not elastic, so the number of ports does not grow.  Secondly, as dynamic binding will allow you to assign more vnics to a portgroup than there are ports.  This means that if you consume all your available ports, and power on a VM, it will not get a port, and you will not get a warning.  The only way you will know this occured is that the VM does not communicate, and the connected box under edit settings is unchecked.
If all of the available ports have been used at least once, but some are available due to powered off VMs, then the vnic will get the next available port and "bump off" the vnic that was there before.  When the old vnic tried to connect in the future, it will look for a new port, and not have a preference for the old one, even if it is available.

The biggest reason that staying on the same port matters is gathering port level statistics.  With static binding, vDS port statistics persist across vMotions, reboots, and any other action that does not change portgroups. 

Friday, September 20, 2013

Neat trick for reaching an ESXi host when vmk0 is misconfigured

Last night while reconfiguring my network after my new layer 3 switch, I had to change the IP address of vmk0 on my ESXi host.   I had to change the IP, default gateway, and VLAN all in one go.  Apparently that doesn't actually worked, after making the changes and hitting Save, I could not ping the host with it's new IP address.  I have no local monitor on the host, and for some reason the IPMI console isn't working (that is a later troubleshooting step).

I'm pretty sure I know what's wrong, but how can I get in to the host to find out.  I do have ssh enabled, but of course can't reach vmk0.  So my next thought is how can I reach vmk1 (the iSCSI interface).  When you enable ssh, it's enabled on all interfaces, not just the management interface.  I didn't have any VMs on the storage subnet, but it turns out I can ssh in to my Synology Diskstation.  From there, I was able to ssh in to the iSCSI interface on the ESXi host.  Sure enough, the VLAN change and the default gateway didn't get updated, just the IP address.  After a quick change from the command line I was up and running:
# esxcfg-vswitch -v 30 -p "Management Network" vSwitch0
# esxcfg-route 10.10.30.1
I've used that trick before when I was working as an Escalation Engineer for VMware.  It's a very hand trick and saves a lot of time by not having to hook up a console (or in one customers case, driving 30 miles to the datacenter).  Of course, remote consoles are the better answer, as this trick only works if ssh is already enabled.

Saturday, August 31, 2013

What else you need to study for the VCP-DCV after taking the vSphere Install Configure Manage Class

Toward the last day of an ICM (Install, Configure, Manage) class, I am often asked about what is going to be on the exam, and why everything on the exam is not in the class.  There isn't enough time to teach everything needed for the exam unless that was all that we focused on, passing the exam.  If you want classwork to cover all of the exam topics, you have two choices, take the Fast Track class (5, 10 hour days) or take the ICM and the What's New class (2 day course).  The What's New class covers upgrading from a previous version, advanced storage and networking, and autodeploy.

Once you have taken the ICM (or perhaps even before) I recommend that you download the Exam Blueprint and take the Practice Exam on the certification page.  If you are interested in more advanced practice exams, check out VMwares new official practice exam partner, MeasureUp.

Here are the subjects that are not covered at all in the ICM class:
 Studying those four additional topics should cover the majority of the exam questions.  However I still recommend that you use the Mock Exam, Measure Up, the certification book or study websites to make sure that your knowledge is complete before taking the exam.  You can create a lab in a box using Lab Guides AutoLab

The best web resource that I know of for study is Damian Karlson's VCP5 Resources page.

 If you come across a subject that you want to study in more detail (detail beyond what you might need for the exam) I would start with Sean Crookstons VCAP-DCA page.

Monday, August 26, 2013

Updates from VCI Day 2013 at VMworld

Yesterday was VCI day at VMworld.  I was lucky enough to be able to fit it in to my schedule to attend, and get a day to explore San Francisco to boot.  We did get an overview of some of the new releases coming out, most of which you can read about directly from VMware.  Here are a few things that stood out to me from the technical discussion.
  • Faster performance in the web client, and a new warning in the Windows client warning that it was going away (we all knew that was coming).
  • Drag and Drop  in the web client.
  • Sphere Flash Read Cache and vSAN
  • SSO has been completely rewritten under the hood.
  • The vCenter Virtual Appliance embedded database now supports up to 500 hosts and 5000 VMs
  • Packet capture from the command line now supports vnics and vmnics in addition to vmks.
In addition to the updates to tech, there are some updates to the certification program.  VMware has announced their new entry level cert, the VCA.  There is a 3 hour online class and online test to get the cert.  The VCA is intended to show "I'm familiar with Cloud/Virtual Desktop/Viritualization Technology" not I am qualified to work on it.  If you are at #vmworld, you can get a discount and check it out while you are there.  

Something else new in the certification world.  VMware is partnering with measureup.com as their official VMware Practice Exam Partner.  I will take the opportunity in the next few weeks as I am studying for my Cloud cert to test out the program and report back.

Finally, VMware has introduced a new tool to help navigate online and instructor led training, VMware Learning Paths.  This site along with VMware Learning Videos and VMware Ceritification Videos can help you get the most out of your training dollars and your time with an instructor.

Sadly I am not attending the rest of VMworld this year, but there is lots to follow on twitter and Google plus using #vmworld.

On a personal note, for years when someone would give me directions that included "up"  like "go up to Wadsworth" I would inform them that up is not a direction, it does not tell me where to go unless I'm in a helicopter.  Well, on Saturday I learned that in San Francisco, "up" is in fact a direction.


Monday, August 12, 2013

Changes to ping command in ESXi 5.1

Along with the changes mentioned in my last post related to how ESXi responds to a ping, there are also some changes to the ping command.

There are two new options:
-I <interface> outgoing interface - for IPv6 scope or IPv4
      (IPv4 advanced option; bypasses routing lookup)
-N <next_hop>  set IP_NEXTHOP - requires -I option
      (IPv4 advanced option; bypasses routing lookup)

The -I option has been around for a while, but it never did what people thought it did.  It had no effect for IPv4.  Now it does effect IPv4 so that you can select the outgoing interface for a ping, rather than relying out the routing table.  This give you the ability to test whether the second interface of a multipathing group has connectivity.

The second option, -N, I have not played around with yet, but it appears to allow you to specify the next hop, effectively temporarily adding in a router for your destination.  I will update this article when I've had a chance to play around with it, or if anyone else has had a chance to experiment with that option, let me know what the results were in the comments.

Sunday, August 11, 2013

Changes to ICMP Ping response in 5.1

In ESXi 5.1, the default behavior of ping has changed from previous releases.  According to KB 2042189 " ICMP Echo replies are now only sent back out the same interface that the Echo Request was received on"  What exactly does that mean, and why does it matter?

Lets say we have three interfaces:
mgmt   vmk0  10.10.20.12
iSCSI1 vmk1  10.10.30.12
iSCSI2 vmk2  10.10.30.13

The iSCSI array is at 10.10.30.50, and both vmk's are bound to the iSCSI intiator.  Since pings are not iSCSI traffic they are not handled by the initiator, instead they are handled by vmkernel routing.  In previous versions, this would mean all ping replies went out vmk1, since it was the first interface in the routing table.  Even if you ping vmk2, the reply will go out of vmk1.  This isn't usually a problem, but what happens if the vmnic that vmk1 is bound to goes down?  vmk2 is still up, but cannot reply to pings.  It isn't very often that this will cause an issue, but there are times when it does.

This change might also cause problems if you have your routing tables set up in such a way that the packets don't follow the same route in both directions.  For instance:
  
In previous versions of ESXi, the response would have used the routing table (green) even though this was not the same path that the packet was transmitted on.  With 5.1 (red), the reply has to go out vmk1, and because there is no way for it to reach the source address from there, the ping will fail.  Generally you won't see this type of configuration, and you really shouldn't be testing interfaces other than management from outside of it's subnet, but it does happen sometimes.

This is a relatively minor change, but it could cause some unexpected results, so it is important to be aware of.

See also Changes to ping command in ESXi 5.1.

Sunday, August 4, 2013

Understanding vmkernel routing

The vmkernel routing mechanism works largely the same as any standard unix routing table, with a few twists.

Lets start with the basics of how routing works.  Routing is done based on the destination address.  That address is compared with all of the networks that are specified in the routing table.  Based on the network address, and the subnet mask, if there is a match, the vmkernel will use that interface to send the packet out.  The last entry will be the default gateway, which matches all remaining addresses.  You can view the vmkernel routing table using esxcfg-route -l (esxcli network ip route ipv4 list in 5.1).  In order to get a clear picture, I'm going to show the configuration of the vmkernel interfaces as well. (click on the image for a larger version)
In this configuration, I have four vmkernel interfaces, they are labeled for management, iSCSI, vMotion, and heartbeat.  Keep in mind how they are labeled has nothing to do with how traffic is actually sent.   Basic vmkernel routing is very simple, lets say the host wants to communicate with the ip 192.168.2.20, based on the routing table, this traffic will go out interface vmk2.  If however we want to communicate with 172.20.2.10, because we don't have a local entry that matches, we send it to the default gateway, and from there it will get to it's destination (hopefully).  Notice that even though I have four interfaces, there are only three local entries in the routing table.  This is because I have two vmkernel interfaces on the same subnet.  When there are two interfaces on the same subnet, the first one create will always be the one used for outgoing traffic.  More on this later.

There can be only one (default gateway).  Despite the fact that there is a screen in the vSphere client that says "Default Gateways" and that it shows up on every vmkernel interface you create or edit, there is only one default gateway.  The vmkernel does not do any type of dynamic routing.  So any time you see this screen, you are always editing the same default gateway.
So don't worry about the fact that the default gateway listed isn't on the same subnet as the interface you are editing.  Unless you are editing the management interface, it shouldn't be.

Okay, that is the easy stuff, now lets look at what makes the vmkernel routing table a little more exciting.  We'll start with this screen.

 See those checkboxes that say "Use this port group for".  You might think that if you check the one for vMotion then the vmkernel would send all vMotion through this vmk, and if you select the one for management traffic, then that vmk would be used for management traffic.  You would be mostly right, but it's not quite that simple.  Let's start at the top.
Use this port group for vMotion.  More accurately what you are saying is "have vCenter tell the other host that this ip is the one to use to initiate a vMotion on."  That is a bit wordy so I can see why they don't use it, and the difference is subtle.  Where it matters is if you have two interfaces on the same subnet, one used for vMotion, and for instance, one use for management traffic.  Part of your vMotion traffic might go out the wrong interface.  More information on the two interfaces same subnet issue at the bottom.
Use this port group for Fault Tolerance logging.  Mostly the same as the vMotion interface, however be aware that what VMware means by "logging" is the vLockstep data that is sent from the primary VM to the secondary VM.  This is a lot of data, plan accordingly.
Use this port group for management traffic.   This one is a bit misleading.  What exactly is "management traffic"?  It has nothing to do with the vSphere client, or vCenter.  I can (and have) connected both the client and vCenter server to the vMotion or iSCSI interface.  As long as I can get to that subnet.  This can be very helpful when trying to resolve an issue where the connection to vmk0 was accidently dropped.  What VMware means by "management traffic" is actually "HA traffic."  Any interface with this box checked will be used for HA heartbeats.  Aside from vmk0, I usually create a secondary heartbeat interface on the same subnet as my VMs, because really, that is the subnet I want to make sure is up.  Aside from this, I know of no other purpose of the "management traffic" checkbox.

None of these check boxes affect the routing table.  They are only there to tell other hosts which IP to communicate to, they have nothing to do with out the vmkernel routes outgoing traffic.  The one possible exception to this is if you are using multi-nic vMotion.  I have not done any expirementing with multi nic vMotion yet.  I will save that for a future entry.

iSCSI Port Binding
One thing that can affect routing is iSCSI Port Binding.  If you are using software iSCSI, and you bind vmkernel interface to the iSCSI initiator, then how your iSCSI traffic communicates is now handled by the iSCSI initiator, not the vmkernel routing table.  Just enabling iSCSI is not enough, you have to add the vmk to the iSCSI initator using the Network Configuration table under the storage configuration.
If you have not followed these steps to bind your interfaces to your iSCSI initiator, and you have more than one vmk configured for iSCSI, you are currently only using one of them.

Now, the biggest cause of problems with vmkernel routing... (drumroll please...)

Two interfaces on the same subnet
If you have two vmkernel interfaces on the same subnet, the first one created will always be the one used to send traffic to that subnet, regardless of what the traffic type actually is.  This can lead to strange network issues, and means you should follow one simple rule: Every traffic type should have it's own subnet.  For instance, one subnet each for management, vMotion, Fault Tolerance, vSphere Replication, and IP storage.  Only two of these traffic types support multiple vmk interfaces.  vMotion and iSCSI.  I don't know much about Multi NIC vMotion yet, but iSCSI can have multiple vmks on the same subnet if you are bound to the iSCSI initiator.  If you break these rules, bad things can happen.   For instance, if you have management and iSCSI on the same subnet, your iSCSI traffic might end up going out your management vmk.  This can be very bad news if your iSCSI vmk is bound to a 10GB NIC but your management vmk is bound to a 1GB NIC.  This is a somewhat common problem on hosts that have been upgraded from ESX to ESXi.  Or lets say you have your iSCSI and vMotion vmks on the same subnet.  Only half of the vMotion connection will get created correctly (the other half accidently going out the iSCSI interface) and you will get vMotion network errors.   There are more wacky ways that having multiple traffic types on the same subnet can cause problems with networking.  I've seen plenty, but I am sure there are others that I haven't even imagined yet, so don't do it. 

Tuesday, July 30, 2013

Troubleshooting a virtual network

 In this entry I'm going to discuss my methodology for troubleshooting a virtual network. As a Network Escalation Engineer for VMware, I had many occasions to both troubleshoot network issues, and teach others how to do so.  If you have ever been through one of my whiteboard lessons, you may recognize some of the diagrams below. This entry will probably evolve over time, and for clarity, I will be posting updates in the text, rather than at the end.

For this discussion, I will be using the following diagram, representing the basic idea of a virtual network connected to a physical network.  I am mostly going to focus on standard switches and standard VLANS, I won't cover the Nexus 1000v, vCNI, VXLAN, or any advanced features, although the basic ideas won't change much.  Click on any of the images below for a larger version.

The diagram has two ESXi hosts, each with two VMs.  All the VMs are connected to the same portgroup.  Each host is connected to two switches for redundancy.  The two physical switches are then connected to a root switch, which is connected to a router (or may in fact be a router)









Troubleshooting networking, sometimes you have to think outside the network.

Josh Townsend’s recent post on vmtoday, PCoIP Packet Loss? Don’t blame the network is a fantastic example of troubleshooting.  In addition, it illustrates something I ran in to frequently as an Escalation Engineer.  Namely, that just because a problem is exhibiting the signs of a network issue, sometimes you have to expand your search.   Once you have eliminated the possible network problems, you have to be willing to look at other areas that might cause similar symptoms.
A very good example of that is packet loss.  In addition to network problems (and the very occasional insidious vmkernel problem), anything that causes a vm to pause, even momentarily, can cause it to drop packets.
Storage is one possibility that can cause this, as Josh pointed out.  Another possibility is too many vCPUs.  While not exactly a pause, if the VM can’t get all it’s processors scheduled, it’s not going to be able to pull all of the packets off of the ring buffer, and they get dropped.  Relaxed coscheduling helps with this, but does not eliminate it.  I can’t count the number of cases that were resolved by reducing the number of vCPU’s from 4 to 2.  (Be aware of HAL/kernel compatibility when changing between 1 and 2 vCPUs).  Another issue that can cause packet loss/VM pausing is the CDROM drive.  If the iso that is mounted is not available (there can be multiple reasons for this) but the operating system keeps trying to access it, this will lead to small frequent pauses, often resulting in every other ping being dropped.  A common cause of this on ESX(i) 5.0 and below is mounting an ISO on a VMFS datastore, and then vMotioning it to the 9th host to access that image.  VMFS only supports 8 hosts accessing the same read only file at one time.

Related information:
KB 1015797
KB 1005362
KB 1010184

Monday, July 29, 2013

vsish for networking

vsish, or the vmkernel system information shell, provides behind the curtain information on the running vmkernel, similar to the way /proc provides information on a running linux kernel.  For more information, see What is VMware vsish? from William Lam over at VirtuallyGhetto.

First a word of caution, vsish is not supported unless directed to use it by VMware Support.  Do not make uneducated changes to the vmkernel or you can significantly reduce performance, or cause a purple screen.

I am going to focus specifically on network nodes that can be helpful for retrieving information.


  • /vmkModules/cdp CDP information for vmnics
  • /net/pNics/vnmicX/stats Vmnic statistics
  • /net/pNics/vmnicX/properties Driver/firmware information, other properties
  • /net/tcpip/v4/neighbors/ Arp cache information
  • /net/portsets/vSwitchX/ports/#####/X will be the vSwitch number ##### is the port number from esxtop
    • status information on the port, including what device it is connected to 
    • stats standard switch counters
    • clientStats counters from the vnic perspective
    • teamuplink what uplink this port is bound to
    • vmxnet3/rxSummary  additional counters for the vmxnet3
    • vmxnet3/txSummary additional counters for the vmxnet3
  • /net/portsets/DvsPortset-#/ports/######/ Same as above, for the vDS
  • /system/heaps/NetPktHeap/###### Current NetPktHeap status, below 30% free of max size and problems may occur, check high and low, but low is more important. (this is usually only a problem in 4.0 with more than two 10GB Nics)

Sunday, July 28, 2013

First Post

Like many of my colleagues, I have decided to start writing a blog.  Mostly about VMware, although I may cover some other subjects on occasion.  I am going to mostly try to stay away from how to information, as that is posted on other sites that do a better job than I could, and instead focus on what is going on under the hood and troubleshooting.  I often find myself thinking “I wonder how this works.”  This blog will record those explorations. Of course, initial intentions and how things evolve are often different, so don’t hold me to this.