As of vSphere 5.1, the vDS supports Netflow v10, also known as IPFIX. A flow consists of packets with the same source and destination ip addresses, ports, and protocol. There are two flows for every connection, one in each direction. Basic information about the flow is then sent to the collector. The collector is a third party solution that gathers the flow data and provides useful information to the network administrator. In the vDS implementation, the information that is provided to the collector is the number of octets and packets.
(more information below the break)
Configuring Netflow on a vDS is a two step process, first it has to be configured in the general vDS settings and then it has to be enabled on each portgroup that you want to track flows on. In the web client, the configuration is done from vCenter > Networking > dvSwitch > Manage > Settings > Netflow. Click on Edit, and you'll get the following screen.
IP address: The first field is the IP address of the collector. This IP needs to be routable from the vmkernel.
Port: The second field is the Port of the collector. The standard value is 2055, however I have seen other values, such as 9996. You can generally get this information from your netflow collector.
Switch IP address: This is the address that will be used as the src IP when packets are being sent to the collector. The collector uses this to determine which device is sending the flows. The src IP configured for the vDS does not matter at all in terms of how the packet is routed. It will be routed based on the vmkernel routing table and sent out the appropriate vmkernel interface (but with the src IP set to the one configured). If there is no src IP configured, then it is going to be sent from the appropriate vmkernel IP for that destination. This means that the collector would see each host on the vDS as a separate device.
Active flow export timeout: An active flow is one that is still sending data. Even if the flow is still active, the vDS will send the flow data to collector every 60 seconds. This value is configurable from 60 to 3600 seconds.
Idle flow export timeout: An idle flow is one that is no longer sending data. After 15 seconds of no additional traffic, the flow data will be send to the collector. This value is configurable from 10-600 seconds.
Sampling rate: The manual says "The sampling rate determines what portion of data NetFlow collects, with the sampling rate number determining how often NetFlow collects the packets. A collector with a sampling rate of 2 collects data from every other packet. A collector with a sampling rate of 5 collects data from every fifth packet." More on this in a moment.
Process internal flows only: An internal flow is one that never gets to the physical network. This generally means it's between VMs on the same VLAN, same vDS, same host. These are flows that would not be picked up by a physical switch sending flow data to the collector. So if you have physical switch configured for netflow, we only need to collect internal flows.
Once Netflow is configured for the vDS, then it needs to be enabled on each portgroup. Flows appear to be collected based on the output interface. If you only enable it on for the VM portgroup, you will see flows going to the VM. So don't forget to enable it for the Uplink portgroup as well.
At this point in time, I am a little unclear on how the sampling rate works. Most online documentation indicates that the sampling rate should indicate that we collect 1 out of x packets. So if it is set to 2, then we should collect 1 out of 2 packets, or 50%. My analysis doesn't bear this out however. I set up two Linux VMs on two different portgroups, enabled netflow, and examined the netflow packets that were sent to the collector. For the test I would ssh from one VM to the other, cat a couple files, then disconnect. I did this two times each at sampling rates of 0,1,2,10, and here is the data that I collected.
| Trial | Sampling | Octets | Percentage | Packets | Percentage |
|---|---|---|---|---|---|
| 0-1 | 0 | 742457 | -- | 243 | -- |
| 0-2 | 0 | 743041 | -- | 247 | -- |
| 1-1 | 1 | 390612 | 52% | 122 | 50% |
| 1-2 | 1 | 327876 | 44% | 122 | 50% |
| 2-1 | 2 | 355716 | 48% | 82 | 34% |
| 2-2 | 2 | 191288 | 26% | 80 | 33% |
| 10-1 | 10 | 57164 | 7.7% | 23 | 9.4% |
| 10-2 | 10 | 72480 | 9.8% | 23 | 9.4% |
"Percentage" is the percentage of the packets compared to the sampling rate of 0. To me, this seems to indicate a couple of things, "Sampling rate" appears to be 1:x rather than 1 out of x. For instance, at a sampling rate of 0, we are sampling 1, dropping 0. At a sampling rate of 1, we are sampling 1, dropping 1. At a sampling rate of 10, we are sampling 1, dropping 10 (vs sampling 1, dropping 9, as would be indicated by 1 out of 10). I also notice how far the octets can vary based on the sampling rate. We are collecting the same number of packets, but the amount of data in the sampled packets can change.
One final note, in looking at the packet captures, I do not see anything in the IPFIX data to indicate the sampling size, so I am unclear on how the collector is going to know to adjust the data to account for the sampling rate. If you know the answers to any of these questions, please email me or post in the comments. Otherwise I will see what additional information I can gather and update this article.

No comments:
Post a Comment