Monday, July 4, 2011

Data Center ISP Load Sharing Part 4 – Tuning

Does full Internet routing table work the best for load sharing with multi-homed ISP connections? We have shown it is often not the case.

Part 1 of the posting shows  the challenges of dual ISP design, the traditional approach of outbound load sharing based on entire internet routing table will largely depend on the particular ISPs.

Part 2 of the posting shows the advantage of simple default route based Internet load sharing design.

Part 3 of the posting introduces a design that combines the simplicity of default based load sharing to dual ISP, and flexibility of selectively filtering subsets of Internet routes for optimal path selection.

In this final part we look at why and how the results should be fine-tuned.

At the initial design stage, you could estimate the number of routes filtered in from your respective ISPs, using the BGP regular expression you designed. Route count estimate provides the basis for your filtering design.  For example, you can count that approximately 50000 routes will be allowed in by a filter specifying only adjacent networks to a tier one ISP. You also count that approximately 40000 routes will be allowed in by another filter specifying adjacent networks as well as those one hop away from a tier two ISP.

After implementation, you will notice the actual number of specific networks allowed in will be less than the combined total of 50000 plus 40000. The total number (for example 80000) is less than the total due to duplicates. In other words, you learn the same 10000 routes from both ISPs because those networks are adjacent to both ISPs. This is common and to be expected.

You might have expected the duplicate routes to be split more or less evenly across the two ISPs, which is often not the case. Therefore, the effect of duplicate routes on load sharing requires some careful observation. ISPs may operate in different tiers of the internet hierarchy, thus affecting the routes they advertise to have shorter or longer AS path length.  AS path length is a primary criterion in BGP path selection, therefore you will likely see almost all of the duplicate routes favoring one ISP. This may affect load sharing, thus require further adjusting the filters.

A second example is the influence of ISP metrics. Some ISP may advertise routes with a metric, while others advertise all routes with zero metric. Zero metric will be preferred if other more priority criteria is equal.

The diagram shows the original design may have expected a load sharing design of 5/4. However, the result shows load sharing between ISP1 and ISP2 turns out to be 5/3, due to all duplicate routes favoring ISP1. Depending on your specific requirements, fine-tuning of the filter may be necessary.

Design for redundant Internet architecture is unique to every organization’s requirements, its national or global data center architecture, the ISPs selected, and the nature of its Internet traffic. The scenarios described hopefully have provide simple templates as references to adjust for your particular data center.

Sunday, July 3, 2011

Data Center ISP Load Sharing Part 3 – Route Selection

Part 2 of the posting shows the advantage of simple default route based Internet load sharing design. This part further optimize the design.

Using the entire Internet routing table for outbound load sharing proves to be resource intensive, and ineffective for load balancing. Default route only provides simplicity and better load balancing. To further optimize, a subset of Internet routes, when selected according to the unique environment, can complement the default route design very well.

Route Selection
Route selection refers to filtering and allow a subset of the Internet routing table to be introduced into the data center. The desired effect is to take the shorter path to content that is directly attached to specific ISPs, while the rest of the traffic load share equally to both ISPs.

The effectiveness of the design is largely based on route selection techniques applicable to the specific data center environment. In the example shown below, BGP regular expression is used to select a subset of Internet destinations adjacent to each ISP.

ISP1 is a tier one, therefore has more directly attached networks. BGP expression is used to select those directly attached networks, with the objective that traffic destined for those networks will exit on this ISP for optimal path.

ISP2 is a tier two, with less number of directly attached networks. BGP expression is used to select those directly attached networks as well as those one additional hop away, with the objective that roughly equal number of specific target networks will prefer ISP2 as the exit point, thus achieving load sharing with both ISPs.

Implementation
On the respective internet router connected to each ISP, AS path filtering is applied on a route map, which is then applied to BGP inbound route filtering. As a result, the default route, as well as a subset of Internet routes is received from each ISP, in order to optimize outbound traffic to take the more direct path to destinations.

ip as-path access-list 1 permit ^3549_[0-9]*$

route-map ISP1in permit 10 
 match ip address prefix default 
route-map ISP1in permit 20 
 match as-path 1 

router bgp
 neighbor … route-map ISP1in in

Verification and Tuning
At the planning stage, counting number of routes using BGP regular expression filter may serve to arrive at the initial route selection design. By filtering in similar amount of specific routes from each ISP, the desired load sharing can usually be achieved. 

However, equivalent number of routes does not always result in equivalent amount of traffic. Over time, actual load on the respective ISP connections will provide more accurate information about traffic in the particular data center. ISP specific characteristics may also factor in. Part 4 will show why fine-tuning may be necessary.

Saturday, July 2, 2011

Data Center ISP Load Sharing Part 2 – Default Method

Part 1 of the posting shows  the challenges of dual ISP design, the traditional approach of outbound load sharing based on entire internet routing table will largely depend on the particular ISPs. And when routes received from ISPs have different characteristics such as AS Path and metric, the result is often undesirable. To achieve better outcome by design, in part 2 we will start with a simple alternative.

Replacing the entire Internet routing table with just the default route is an extremely simple method that offers a number of advantages.

Load balancing
Instead of getting the entire Internet routing table, only default route is received and installed in the routing table. As a result, IGP can load balance to two equal cost default routes. For outbound traffic, the simple design achieve near 50/50 load balancing, as well as resiliency.

Simplicity, Stability and Resource Efficiency
The design is extremely simple to implement and support. Resource usage on devices can be greatly reduced, from holding tens and thousands of Internet routes, to just default route. Route flapping and any disruptive convergence due to instability in any parts of the Internet is virtually eliminated.

The simplicity advantage is well suited for a large number of enterprise data centers.

Disadvantage
The design essentially “splits” the Internet in half, by two equal cost default routes to dual ISPs. Therefore, the exit point may not be optimal, especially for networks directly attached to an ISP, which may require the longer path to get to.

For vast majority of applications, the selection of exit ISP is not noticeable. However, lower latency access of large amount of media content may be highly desirable when a direct path is available. An optimized solution is presented in part 3.