Whitebox Switching at the Access Layer

Whitebox switches make use of generic and generally inexpensive hardware along with a network operating system that can be purchased and installed separately. Often the hardware and software come from different vendors, and there are several reasons this practice is becoming more common especially in the data center.

First, the underlying hardware utilizes more generic components which, in theory, reduce the cost of the switch. This is as opposed to vendors that use proprietary chipsets sometimes claiming they are able to offer unique features as a result of their technology.

Second, decoupling hardware from the network operating system provides greater ability to centralize management and use programmatic tools such as Chef and Ansible. It also opens the door for a vendor-neutral approach to networking utilizing OpenStack which can provide vendor interoperability and manage network hardware as pools of resources.

Third, because these normally Linux-based network operating systems don’t rely on (or are minimally reliant on) the underlying hardware, they can be completely customized.

In recent years, whitebox switching has begun to trickle down from webscale-sized companies to large enterprise data centers that need some of the same features normally needed by only the biggest infrastructures in the world. Requirements such as network functions virtualization and a compostable infrastructure were common only in those huge organizations but are now being implemented in much smaller environments.

What I’m interested in lately is how this is relevant to the non-webscale enterprise. It’s still debatable whether or not there is much of a cost savings with whitebox switches at all, and even pretty good sized enterprise organizations do just fine with their favorite vendor switches. The campus access layer, for example, requires very few advanced Layer 3 features and tends to be somewhat static with regard to configuration changes. Rarely do changes to the access layer require any sort of orchestration or the deployment of end-to-end services.

If the compelling reasons to move to a whitebox model are cost savings, increased programmability, and customizable operating systems, I wonder what the actual benefit is changing an access layer from a single or only several models from the one vendor to a whitebox solution.

In my experience, access switches are among the cheapest network devices even from the biggest network vendors such as Cisco. Also, an access layer is typically comprised of many devices all doing the same thing which means it’s easy to get a deep discount when buying in bulk. Without comparing a Cisco and whitebox BoM, it’s hard for me to definitively say one option is cheaper than the other, but I suspect buying a cheap Dell switch and a separate NOS license will not give me huge cost savings.

Network programmability is certainly an advantage for busy network administrators, but the access layer is normally static enough to make programmability interesting but not necessarily a reason to overhaul an entire switching environment.

Typically, access switches aren’t replaced until they get very old and not when new features are released. I understand that there are exceptions like supporting 802.1x at the port level, for example, but normally these kinds of feature additions are made through iterative code upgrades. The access layer just doesn’t do that much to require significant customization and the frequent addition of new features. I know we can always think of exceptions, but keep in mind that I’m speaking of a typical mid-sized enterprise access layer, not a webscale company’s data center.

To me, the access layer is already pretty much a compostable part of the infrastructure. How many models do you have in your access layer? I can think of maybe three, and they’re all simple, cheap, and easy to swap out.

Even though I don’t see whitebox switching as extremely relevant to an enterprise access layer, I still really like the idea of centralized management and increased programmability so long as the cost is a wash. What I’m doing is taking steps to deploy just a few switches in the corner of the network such as a network closet servicing only a few endpoints. This way I can see first hand the real benefits or drawbacks of bothering with this technology in this often forgotten part of the network.

IP Infusion presented at Networking Field Day 15 introducing themselves as the whitebox NOS we’ve all used but never heard of. ZebOS, in particular, has been used by by OEM manufacturers for almost 20 years on hardware from vendors such as NEC, Brocade, F5, Riverbed, and quite a few others. They started in data centers but also provide solutions for service providers with the goal to provide the very best network OS for whitebox and NFV.

Their expertise and experience is in carrier grade networks, but in the last few years they’ve seen the shift to software at every level of the network. They developed a complete finished product specifically for end-users in the enterprise, OcNOS, and it’s NFV companion, VirNOS, both based on ZebOS.

Their NOS supports most layer 2 and Layer 3 features and is based on a modular design meaning customers can purchase and use only the features they need such as simple switching or advanced routing functions. ZebOS also provides data plane integration by supporting a variety of chipsets in order to create a hardware abstraction layer.

Hardware abstraction means that protocols have no dependency on the silicon itself. The hardware abstraction layer has dependencies on SDKs such as the Broadcom SDKs, but the protocol modules that sit on op of the hardware abstraction layer don’t have dependencies on the underlying hardware.

This opens up to us a new world of centralized network programmability for the access layer using REST APIs and NETCONF while also supporting traditional management methods such as the CLI and SNMP.

If the cost of licensing OcNOS and a small pile of Dell or Edge-Core switches is similar to what you would spend on a closet switch refresh, this is a great way to PoC whitebox switching at the access layer and provide the hands-on experience to inform us whether or it’s relevant to this part of the network just yet.

I’m not convinced that it’s time to schedule a hardware refresh of an enterprise’s access layer and move to whitebox switching, but I really like the idea of better centralized management and network programmability. I’m interested in testing this out in one small part of the access layer while keeping in mind that there’s nothing wrong with changing my opinion later on.



Disclaimer: Gelstalt IT, the organizers of Networking Field Day, provides travel and expenses for me to attend Networking Field Day. I do not receive cash compensation as a delegate. Also, I do not receive compensation for writing about or promoting Networking Field Day.

Top 10 Ways to Break Your Network

Check out the first Network Collective video podcast, Top 10 Ways to Break Your Network, in which experienced network engineers share their most memorable blunders and the lessons learned from them.

Here’s the website:

The header image was used with permission from Michael Nelson who was one of the Twitter participants during the first show. Check out his site here.

By Engineers, For Engineers

If you haven’t heard, the networking community is awesome. I’ve made some great friends, developed strong new relationships, and I’ve had the incredible luxury to bounce ideas off some seriously talented people. However, whether it’s through various Slack groups, Google hangouts, or private email chains, it’s all been relatively private. Not much makes its way onto Twitter, and not as much as I’d like makes it into blog posts.


TELoIP at Networking Field Day 15

No Networking Field Day would be complete without a presentation from an SD-WAN vendor. The technology is now established and maturing into a ubiquitous WAN solution across small and large enterprises alike, so at the upcoming Networking Field Day 15, I’ll be focused on how TELoIP, one of the presenters at the event, differentiates itself from its competitors.


BGP Default Route Failover Using Reachability

Sometimes political, financial, or logistical hurdles determine how we solve networking problems. In these tricky situations we may not be able to solve the problem the way we’d prefer, but we still need to solve the problem.

In this post I’m going to look at how we can solve a WAN failover scenario when we have a default route learned from both of our service providers and a reachability problem via our primary ISP.


Amazon S3 Outage: We’ve All Been There

I’ve been thinking a little bit about the Amazon S3 incident. Not really the incident, actually, but the responses to it. More than once I read something along the lines of “I’m sure that guy got fired” with regard to the engineer who entered the fatal command.

Sure, that’s kind of funny for a quick tweet or in the greater context of a blog post on change control, but for me, I’m not sitting at my desk shaking my head right now. Instead, I’m reminded about the times I did the exact same thing (on a much smaller scale) and will probably do it again.


How Do You Know That’s True?

About a thousand years ago, rather than configure routers, I taught high school English.

One day, instead of unpacking our favorite Shakespearean sonnet, I was sidetracked by a student who asked me how we know anything about electrons and how they orbit the nucleus of an atom. Apparently he asked his physics teacher the period before and got a pithy “electrons are the essence of a negativity.”