Layer 2 Oriented Designs Fail at Internet Scale

This post’s title summarizes a tenet of designing large IP networks. Layer 3 networks have numerous advantages in efficient use of available paths, troubleshooting (think visibility), and fault domain containment to name a few.

The tech industry’s market leaders offer evidence of Layer 3’s superior scaling properties. The world’s largest ISPs use Layer 3 oriented designs as do content providers Google and Facebook. Amazon’s AWS, the undisputed leader of public cloud providers, is built on Layer 3. Wonder why you can’t get broadcast or multicast in your EC2 instance? Now you know.

Layer 2 still has too much focus in the network designs of several segments of the industry. The mobile broadband providers–with their history of walled-garden environments–will need to re-architect their networks on a Layer 3 foundation. Keeping up with the massive influx of bandwidth (at ~$50 USD/month per subscriber!) forces this change. These Layer 2 relics contribute negatively to the cost per bit and customer satisfaction.

Layer 2 oriented designs are also dominant in corporate data centers and corporate networks in general. I’ve probably heard every reason why the Layer 2 focus is necessary. I’d argue that over 90 percent of Layer 2 requirements stem from old assumptions about building corporate IT infrastructure. Of course, overhauling these networks is non-trivial, and I don’t underestimate the massive effort required. But it can be done. An excellent 2008 Network World article called The Google-ization of Bechtel describes Bechtel’s IT revamp. If Bechtel can fundamentally change its IT architecture, why can’t your organization?

Before investing in hacks to make Layer 2 scale, consider how a Layer 3 oriented design can reduce outages, simplify new service introduction, and scale existing services to meet business needs.


IPv6 – Just 96 More Bits?

My favorite professor in college joked that the answer to most questions in computer science is, “It depends.” How true. I’ve found few absolutes in my years working on IP networks. If you compare IPv6 (128 bit address space) with IPv4 (32 bit address space), is IPv6 just 96 more bits?

IPv6–with its long hexadecimal addresses–can be intimidating for engineers who have built a career on IPv4 networking. I recall my hesitation to get involved in turning up customer tunnels to Sprint’s IPv6 overlay network in the early 2000s. I felt that I could fix any problem thrown my way on IPv4. Why invest time to learn IPv6 when its adoption was so limited? Clearly, my sentiment my short-sighted, and I corrected my thinking.

I take the reassuring approach in talking to engineers who are new to IPv6. Gaurab Upadhaya of Limelight Networks put it well: only 96 more bits, no magic. After engineers understand IPv6 basics such as addressing, SLAAC, and neighbor discovery, they’ll begin to understand that the simple, connectionless packet service provided by IP is the same for both versions. Routing is routing (albeit with OSPFv3 for IPv6). The best practices for building scalable IPv4 networks carry over to IPv6 largely intact.

I tend to emphasize the differences between IPv4 and IPv6 in talking about network strategy, design (including migration and transition mechanisms), security, and business continuity. In these areas, the protocol deviations drive the discussion. Let’s take security.  IPv6 implementation introduces new attack vectors. Neighbor Discovery and Router Advertisements, not present in IPv4, can be   subject to denial of service and spoofing. Also, there is new way for malcontents to communicate with your infrastructure. Does your security policy align for IPv4/IPv6 and does it account for IPv6-specific security issues?

IPv4 exhaustion is imminent; engineers and IT leaders will be forced to make critical decision about IPv6 in 2012. Their discussions will likely include aspects of both IPv4/IPv6 similarities and differences.  As long as organizations are having these talks with the intent to act in the near term, they are better off than they would be by ignoring IPv6.

A Milestone for my Business

January 2012 marks the three-year anniversary of striking out on my own under the Brooks Consulting moniker. I’ve had a blast providing services to companies such as Clearwire, T-Mobile, Alcatel-Lucent, and Cisco. I’m excited to be working on IPv6, LTE, cloud to corporate network integration, and other fascinating aspects of our industry.

That’s enough on me.

This post is about thanking my professional and social network for advice, feedback, referrals, and business. I am thankful to have met so many people during my ten years at Sprint and my time working independently who have been gracious with their time. Although I have a small budget for online advertising, my engagements will almost always stem from personal/business ties. I hope to continue to finding ways to give back to my network. I will be in the 51% of businesses that survive the first five years, and I am confident that I’ll have many more people to thank in 2014.

The AWS VPC and the Network Engineer

Amazon AWS is doing amazing things with its IaaS platform. As a networking guy, I find the networking features very impressive. AWS made a wise choice in using Layer 3 as the networking foundation. I suppose AWS engineers recognize what should be a widely held belief in networking–Layer 2 does not scale. The connection of the VPC to corporate data centers presents a compelling value proposition for customer interested in offloading work to the cloud. What I want to focus on in this post is how the integration of cloud and corporate network affects the network engineer.

I design IP networks for my clients. I know my way around basic Linux system administration and can probably figure most things out with patience and Google. I respect talented sys admins who understand the service that the IP network provides to their systems and can communicate simple network conditions (e.g., “I can’t ping the default gateway”). Who will be integrating the VPC and the corporate network? Clearly, both network engineers and sys admins will be involved. You wouldn’t want a sys admin making critical IP design decisions any more than you’d want me standing up a hadoop cluster.

Network engineers will have to adapt their thinking to the virtualized environment. This is a new way of thinking about moving packets. Networking components in the physical world are about as un-elastic resources as possible. I would argue more so than servers. Getting to a point in which network engineers can grasp the flexibility in VPC is going to require investment on their part in learning–the same way learning IS-IS would for an engineer who knows OSPF.

Educating network engineers in VPC networks is in Amazon’s best interests. It’s going to be guys like me who will get calls from potential clients wanting to tie their VPC into their network. The existing documentation does little to further that goal. I had to reach the VPC guide several times before obtaining a degree of comfort. Elastic Network Interfaces? Implied routers? Subnet routing tables? These concepts are not intuitive for network engineers.

Here’s how I recommend that Amazon could educate my networking brethren.

  1. Write a guide on the VPC intended for network engineers. Think about how Juniper write JUNOS documentation for engineers with an Cisco background. This is a very effective way to quickly get smart folks up-to-speed.
  2. Document use cases & recommended architectures for VPC that involve VPC to VPC and VPC to data center connectivity. Cisco excels in this area with its Cisco Validated Designs. Mimic their approach. Today, the documentation is limited to connecting a VPC gateway to a router with IPsec. This barely scratches the surface of how customers will use the networking capabilities of the VPC.
  3. Create online training that steps through the configuration of a VPC. Adding a hands-on component with “actual” VPCs shouldn’t be that difficult for a company that does virtualization at a massive scale.
  4. Talk to internal and external networking savvy engineers. I’ve met some sharp engineers who work on Amazon’s backbone. By engaging them and engineers outside of Amazon, the company could gain valuable insight on networking.

Migrating to the VPC should be as frictionless as possible for businesses. The accelerated set-up of a stable and scalable VPC will translate into more revenue for Amazon.

Adventures in AWS, DNS, and IPv6

This post describes how I used AWS Elastic Load Balancers and Route 53 to enable IPv6 connectivity to the zone apex of my company’s domain.

Recently I moved my company’s page to Amazon’s AWS. I needed IPv6 support, and the hosting company I was using kept promising IPv6 in 2 to 3 months but never delivered. I used the process I outlined in a previous post to make my site reachable via IPv6 using Elastic Load Balancers. I recommend reading that post before continuing if you don’t know how to do this.

In implementing IPv6 connectivity for my site, I stumbled on a problem that I had not considered. The URL for my company is The URL is already long; I don’t want to put on company material, email signature, and business card. The “naked” domain, meaning the top of the zone, is called the zone apex. Per RFC 1034, CNAMEs cannot co-exist with required NS and SOA records. The IPv6 hack using AWS Elastic Load Balancers needs a CNAME. Fortunately, AWS does some proprietary magic and accommodates CNAMEs at the zone apex (see announcement here).

You must use AWS’s Route 53 tool for your zone. This wasn’t a problem for me. I prefer Route 53’s zone management GUI over GoDaddy’s. I realized that the Route 53 GUI appears not to support AWS’s on-the-fly conversion from CNAME to A/AAAA record. I had to use the CLI tools to add the records. I used the elb-associate-route53-hosted-zone command twice–once with the –rr-type A (the default) and once with the –rr-type AAAA flag–to add the entry. For more information, check out this section of the Elastic Load Balancing Developer Guide.

I posted a question to ServerFault to see if there was a way to perform the association in the Route 53 GUI. Jesper Mortensen provided a very helpful response. He believes the association can’t be made in the GUI.

Does all of this sound daunting? Well, I probably took a more difficult path than necessary. I’ve read that DNS30 has a GUI to manage Route 53 that includes a method to instruct Route 53 to do the CNAME to A/AAAA record conversion. You may want to take this approach, especially if you don’t already have the EC2 and Load Balancing API tools installed on your system.

In responding to my question at ServerFault, Jesper pointed out that there is an effort underway to standardize the use of CNAMEs at the zone apex. The Internet Draft is here.


UPDATE (1/5/2012) – A friendly engineer from the AWS Route 53 team contacted me and provided instructions for creating alias resource record sets in the Route 53 console. I confirmed that these work.

Here are the steps.

1. click create record set
2. for zone apex record just leave the name field blank
3. select the type of alias you want to make A or AAAA (all steps after this are the same for both types)
4. Select the yes radio button.
5. Open the EC2 console in another tab and navigate to the list of your load balancers.
6. Click on the load balancer and look at the description tab in the pane below the list. Sample output below

DNS Name: (A Record) (AAAA Record) (A or AAAA Record)

Note: Because the set of IP addresses associated with a LoadBalancer can change over time,
you should never create an “A” record with any specific IP address. If you want to use a friendly
DNS name for your LoadBalancer instead of the name generated by the Elastic Load Balancing
service, you should create a CNAME record for the LoadBalancer DNS name, or use Amazon Route 53
to create a hosted zone. For more information, see the Using Domain Names With Elastic Load Balancing

Status: 0 of 0 instances in service

Port Configuration: 80 (HTTP) forwarding to 80 (HTTP)

Stickiness: Disabled(edit)

Availability Zones:

Source Security Group:

Owner Alias: amazon-elb

Hosted Zone ID:

7. Now copy the Hosted zone ID in the above case ‘ Z3DZXD0Q79N41H’ and paste it into the field labeled ‘Alias Hosted Zone ID:’
8. Now copy the DNS Name in the above case ‘‘ and paste into the field ‘ Alias DNS Name:’
-Just an FYI this DNS name is the same for both A and AAAA alias records. (do not use ‘‘)
9. Click create record set or at this time you can select yes to weight the record and provide a weight between 0-255 and a setID such as ‘my load balancer’