A weird security policy!

A couple of months ago I had an interesting conversation with someone responsible for information security within a computer network related to national security of a large European state (which shall remain unnamed). When we hit the topic of link-layer security, that someone said something like: “In our networks, we enforce a VLAN (Virtual LAN) ID distance of 10 for security reasons! That is, VLAN ID 1 is reserved, the next valid VLAN ID is 11 and then 21 and so forth.”

I was about to call bullshit πŸ’© on that but then I wasn’t so sure anymore. Unfortunately, I had no chance for a follow-up discussion, leaving me thinking since then. πŸ€” Today I’d like to share my thoughts on this weird policy.

What is a VLAN ID?

To understand the implications of the distance-of-10 security policy, we have to take a look at what a VLAN ID is and where it is stored. An IEEE 802.1Q VLAN ID is a 12-bit number used to partition a physical network into logical Ethernet networks. Multiple link-layer networks can share the same physical link and bridges (switches) without (if everything works well) interfering with each other. To achieve this, there is an extra field inserted between the source MAC address and the EtherType.

Part of this so-called 802.1q header is set to a fixed value or used for prioritization and congestion control. The last 12 bit, however, are what we call the VLAN ID.

What is the security problem?

As you may have noticed, unlike in other protocols (e.g. most VPN protocols), segmentation using VLAN ID is not achieved by segmentation but by annotation. Also, there is no (additional) encryption taking place. It is, therefore, necessary to have all active link-layer equipment understand the 802.1q standard and act according to it. In practice, we want to use a managed switch to properly control were frames of a particular VLAN ID are allowed to pass the device in ingress or egress mode. Unmanaged devices (“dumb switches”) usually ignore the 802.1q header and just forward everything as if there was no 802.1q header at all. Obviously, a dumb switch imposes a thread to a network that is segmented using VLAN ID, not matter how large the distance between the individual VLAN ID is. So we can ignore this threat for the following discussion.

The actual threats to a VLAN ID segmented network are:

  • VLAN hopping due to implementation errors
  • VLAN hopping due to bit errors

VLAN hopping means, that a frame destined for a particular VLAN ends up in another VLAN because an attacker was able to confuse the switch regarding the true VLAN ID. Here is some more information on the topic. It is even funnier in virtual environments. I attended this talk at DEFCON 24 and found it somewhat related. As far as I know, there is no VLAN hopping attack that can be mitigated or made less likely by choosing a VLAN ID distance of exactly 10. So I am going to ignore the whole implementation error discussion, too. This leaves us with bit errors as a possible technical cause for VLAN hopping.

On the organizational level, there may be additional reasons for having a particular distance between two VLAN ID. We will discuss a few of them.

Fat fingers

A simple reason for a fixed VLAN ID distance may be the mitigation of consequences of typos in such a high-security environment. Let’s say we are the admin in said network and we are in a hurry to configure VLAN ID 11 on a couple of switch ports. If there would be no distance between the VLAN ID, the next valid (and probably used) VLAN ID would be 12. It is easy to slip from the 1 key to the 2 key on a standard keyboard. Thanks to the distance, frames for VLAN ID 11 ending up in the empty VLAN ID 12 are likely to cause some disruption but that is still better than leaking information.

Makes sense? I don’t think so! Let’s say we are still in a hurry (admins are always super stressed, right?) and we have to configure VLAN ID 21. Again, a slippy keyboard and our fat fingers don’t get along and we accidentally mistype the 2 for the 1. Congratulations, we may now be serving traffic for VLAN ID 21 in VLAN ID 11, which is valid and in use.

The security policy did not help us much, this can’t be the reason for its existence.

Grouping

Let’s assume this wasn’t a high-security environment but a regular company headed to stellar growth over the next couple of years. As the company grows, so does its IT infrastructure. Maybe said company wants to add a second production line using the same IP addressing scheme as with the first one. Link-layer segmentation comes in handy here. Avoiding duplicate address confusions by separating the production line robots while still allowing them to use the same physical network. I saw a similar setup in the automotive industry once and it made me wonder WTF they are doing there. πŸš—

Back to our odd security policy: Having reserved some space between VLAN ID of organizational entities allows these entities to grow into the reserved space. If the first production line uses VLAN ID 11, the second would use VLAN ID 12, the third VLAN ID 13 and so on. This might provide a small benefit because now network admins just have to remember that everything from VLAN ID 11 to VLAN ID 20 πŸ™ˆ is production, and everything from VLAN ID 21 to 30 belongs to e.g. finance. This may lessen confusion and helps with plausibility checks when configuring a VLAN ID on network hardware. However, the moment the reserved space between two VLAN ID is used it violates the security policy. πŸ€·β€β™‚οΈ

I still can’t figure out how this can be any useful in a high-security environment.

Bit errors

The issue I have with this theory is, that Ethernet frames have a Frame Check Sequence (FCS) at the end of every frame. The FCS uses a Cyclic Redundancy Check (CRC) performed over the frame, to detect (most) bit errors. There must be a special case of multiple bits being changed on the wire in such a way, that the result still matches the FCS, or, that the CFS itself is also compromised in a way that it validates the changed frame contents. In other words, this is very unlikely. We may be struck by lighting⚑️ with a higher probability.

But for the sake of the argument and for education reasons, let’s assume we have a network that is creating bit errors in the VLAN ID part of the 802.1q header and somehow also causing bit errors in the FCS accordingly. It’s stupid, but let’s go with it for a while.

Once a VLAN ID is modified, we want the corresponding frame to get discarded or maybe end up in Nirvana. We do not want the frame to hop into another logical segment. The question now is, how do we have to choose the set of valid VLAN ID so that they have the least chance of hopping? The answer is actually quite easy: We want the bit-level distance between every two VLAN ID from the set to be as high as possible. The higher the number of bits that would need to be changed in transit, the less likely a frame can hop into another valid segment. We call this distance between two bit sequences the Hamming Distance and we would optimize or set to contain only VLAN IDs with a high Hamming distance between each other. The limiting factor is the total number of VLAN IDs that are used. The more VLAN IDs are in the set, the smaller the distance between them.

Just for fun, I wanted to know the distance between every two consecutive VLAN IDs from our security policy. Because I am too lazy to do it by hand, I made a computer help me.

Here is the Golang code for reference:

package main

import (
	"fmt"

	"github.com/tmthrgd/go-popcount"
)

func main() {
	var dist uint64
	var a uint64
	var b uint64

	for i := 1; (((i + 1) * 10) + 1) <= 4096; i++ {
		a = (uint64(i) * 10) + 1
		b = a + 10
		dist = popcount.Count64(a ^ b)
		fmt.Printf("a=%v b=%v distance=%v\n", a, b, dist)
	}
}

Running the code gives us this:

$ go run distance.go
a=11 b=21 distance=4
a=21 b=31 distance=2
a=31 b=41 distance=4
βœ‚οΈ
a=4071 b=4081 distance=3
a=4081 b=4091 distance=2

Let’s see the histogram over the output using Hamming distances as bins:

Actually, not that bad. There are no two VLAN IDs that have a Hamming distance of 1. There are plenty of bit distances of 2, 3, 4 and there is even one 10.πŸ˜‰ Maybe this isn’t such a bad policy after all? Well, remember that we are still high up in the ivory tower of theoretical bit errors and we have lost contact to real life networking probably three or more paragraphs ago. Please also note, that we calculated the Hamming distances between every N and N+1 which has a runtime complexity of O(n). It may get even worse if we compare every N to every other N (which raises the runtime complexity to O(n^2)

Conclusion

From what I have come up with so far, this policy makes little to no sense. Besides, in a high-security environment, one might want to use physically segmentation anyway. That is, running multiple cables with only one logical segment on each cable.

If anyone likes to enlighten me on why this VLAN ID distance policy might make sense, please do so! It has been teasing my brain long enough… 😜