I pretty agree with the content of the section.
But I would like to add some stuffs, maybe can we do some recommandation or at least a constation between usage of static/IGP/BGP.
I see this in this way :
- Some years ago, static routing couldn't provide dynamicity of failure detection for multiconnected site where PE-CE link layer not able to detect failure (ethernet non direct link ...), but today usage of BFD with static routing ensure detection in all cases, so static routing can be used for multiconnected sites.
- IGP : running IGP between PE-CE, IMHO, has a sense only to extend customer IGP between sites and make SP network transparent.
- BGP : BGP is a well designed protocol that can be used for two purposes :
- providing dynamic advertisement of lot of routes (it seems impossible to provision hundreds of static routes for an access !)
- providing failure detection
I took IGP case a bit out of my analysis ...
For a monoconnected site, we could see that static routing is fine where number of routes is low (threshold to define : 10/15 ?) and where routes are not changing everyday (otherwise provision activity on PE would be important for this access), otherwise BGP would be prefered. In this case, there is no need of fast failure detection, as it's a monoconnected site. So, setting minimum holdtime to a high value (180) to protect PE may be fine (no need of BFD).
For a multiconnected site, whatever the protocol used, as you mentionned, it's better to rely on BFD or other connectivity detection mechanism for fast detection rather than tuning protocol timers.The main issue is that CPE are sometimes low cost and not supporting BFD :(
Then choosing protocol is just a matter of number/dynamicity of routes on the access (same as for monoconnected sites) -> choose static when number of route low, BGP otherwise. In case of BGP, if BFD is used, setting minimum holdtime to a high value (180) to protect PE may be fine (BFD ensuring detection). If BFD not available on CPE, for this specific session, setting a minimum holdtime to a protecting/tested value (15) to protect PE and as you mention number of session with fast timers must be tracked.
Compared to what you mention, the main point to be added in BCP is the difference that should be made between monoconnected sites (no BFD , no fast detection needed) and multiconnected sites (fast detection needed). Sometimes, some SP are using same rule for all BGP access, but it's not pretty good for scaling ...
One other issue to deal with could be persistent flapping of PE-CE BGP session (link issue ... Negotiation issue ...) => BGP DampPeerOscillation needed there but not well implemented
2) Network Event
As for PE-CE issues, we can fall into process priorization issue there. When a PE loose a direct link, or when there is a link failure near a set of PEs. PEs have to update their FIB and possibly hundreds thousands of VPN routes because the change cause an interface change or a MPLS transport label change. We already saw that some routers with bad (old) FIB implementations and bad process priorization are going at 100% CPU updating ISIS routes in RT and FIB and then VPN routes in RT and FIBs. During the 100% CPU, router is loosing PE-CE BGP sessions or ISIS adjacency because of bad process priorization.
Now with H-FIB implementation in codes, the issue is more hidden as it requires less processing :)
3) Route Scale
In the number of routes a PE must support, you can add ISIS routes, LDP FECs, and possibly TE tunnels that are impacting the global scaling too.
Here I propose to separate this section between PE scaling, ASBR scaling, RR scaling and address each part separately. You are mainly talking about PEs here but ASBR and RRs are bottlenecks too.
"Most PE routers use the absence of a
given VRF instance (or RD/RT filtering) to limit the number of routes
that they must actually carry, but this is sometimes of limited
utility for a couple of reasons. First, it leads to an inconsistent
routing table footprint from one PE router to the next, and it can
change with every new customer turned up on the router. This leads
to non-deterministic performance and scale."
=> I don't agree on all stuffs there. Limiting routes imported by PE is clearly helping controlplane. If you are importing all routes, then scaling impact is clearly implementation dependant : at least, router will have more memory consumption (so it should support millions of BGP routes !), and possibly CPU usage too (more nexthop reachability computation depending on how it's done...).
As you mentionned, if PE doesn't import all routes, it requires to send route-refresh to RR each time a new VRF is provisionned. With million routes on RR, it's impacting, as it could take some minutes (5-10-20 min) to receive the routes , and only few routes will be accepted , all others will be denied. Formatting RIB-OUT upon route-refresh is something costly for the RR (could impact transient update propagation time). In our case, we are aggregating route-refresh request at RR level every x seconds to permit the RR to format one time and serve multiple PEs with the same update formatting action.
RTC is clearly helping there by just formatting/sending the requested routes and I think there is no issue with using RTC (this is another debate !)
"First, it leads to an inconsistent
routing table footprint from one PE router to the next, and it can
change with every new customer turned up on the router"
=> Yes, but is this an issue ? I agree that it could be better that all PEs having same set of routes, but even with current hardware , this doesn't scale ...
to non-deterministic performance and scale." => VPN footprint of customers could be really different, I agree that some customers are spanning among most of PEs, but it's not the case of all. Based on our experience, PEs have different profiles in terms of VRFs.
"In addition, customers may request the use
of BGP multipath for faster failover or better load balancing, which
has the net effect of installing more active routes into the table,
rather than simply selecting the single best path."
=> I think MP has a greater impact on FIB rather than in pure controlplane even if there is generally FIB structures in controlpane ...
Regarding RD policy, I agree with your points, but now the choice can change as there is some solution like add path, ORR, best external that could permit fast restoration with same RD policy (as in non VPN environment) => but I agree that you still do not increase number of nets, but you will increase number of paths ...
Rob and I have completed a revision of our draft discussing L3VPN scaling considerations. We've made some changes to the document structure to make it flow better, and we think that we've added enough of the body that it is ready for discussion during a WG meeting. However, since L3VPN is not meeting during IETF in Paris, we're wondering if we should perhaps ask for time in the Routing area open meeting/RTGAREA WG instead?
Either way, comments are still very welcome, especially if you can help us bolster the currently weak section on multicast VPN scale.
This document discusses scaling considerations unique to
implementation of Layer 3 (IP) Virtual Private Networks, discusses a
few best practices, and identifies gaps in the current tools and
techniques which are making it more difficult for operators to cost-
effectively scale and manage their L3VPN deployments.
This E-mail and any of its attachments may contain Time Warner Cable proprietary information, which is privileged, confidential, or subject to copyright belonging to Time Warner Cable. This E-mail is intended solely for the use of the individual or entity to which it is addressed. If you are not the intended recipient of this E-mail, you are hereby notified that any dissemination, distribution, copying, or action taken in relation to the contents of and attachments to this E-mail is strictly prohibited and may be unlawful. If you have received this E-mail in error, please notify the sender immediately and permanently delete the original and any copy of this E-mail and any printout.
Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.
This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, France Telecom - Orange is not liable for messages that have been modified, changed or falsified.