Routing Working Group
27 October 2016
At 10 a.m.:
JOAO DAMAS: Good morning. This is the Routing Working Group session at RIPE 73, if you want to be in the Cooperation Working Group, that is taking place in the main room. This is the side room.
I am Joao Damas, co‑chair of the Routing Working Group, my co‑chair is sitting in the first row, in case you want to contact us, so there is a face to put it.
We will start by going through the usual administrative bits, the minutes, well first I'd like to thank the RIPE NCC for providing a minute taker, Anand, and the Jabber scribe monitor and of course to take the stenographers for actually making it so easy for everyone else to follow these things.
The minutes from the previous meeting, unfortunately there was a bit of a miscommunication and then I dropped the ball and I only posted them on line yesterday, I hope you have time during the next couple of days or so to go through them. But instead of approving them right now, given the short time, we will leave that for one week ‑‑ in case anyone detects anyone and then we will declare them final.
A couple of reminders from the PC, one, the period to submit lightning talks is open until this afternoon at 3 p.m., so if you are thinking about it, that is the time you have. I mention on the RIPE elections, there was a software glitch during ‑‑ around 30 minutes when the period for voting opened, and there were about 10 votes cast during that period that were not properly registered so if you remember that you voted in that very early part of the voting, make sure that your vote was counted by, if you go to the system and try to vote again and it was registered it will show what you voted for and if it's clear that means it wasn't and please, make sure it's done.
Agenda bashing. This is the proposed agenda for today, it's quite packed, actually. There is some room, perhaps, at the AOB, does anyone have any need to change this? In that case, let's just go ahead with the first speaker today, Alexander.
ALEXANDER AZIMOV: Good morning, I am network architecture and I am going to discuss with you ISP border definitions. Before starting, let's discuss goals. So, we are living in the world of borders, towns has borders, countries have borders, ISPs are not unique, ISPs must have a well‑defined borders. And the reason of these borders is money.
Most of ISPs are private companies and their goal is making money for their shareholders, and speaking about BGP, BGP was invented to support, to provide native support for traffic engineering where traffic is money. So BGP is not a links state of distance ‑‑ protocol it's unique money protocol and of course, when BGP was developed it was developed to support this goal, and there at least ‑‑ there are at least two main mechanisms to achieve this goal. First one is well‑known, a local preference values. These normally is ‑‑ it's just a pointer to the money, to the ISPs is gaining money. And the second one is eBGP over iBGP comparison, it supports hot potato routing: Get rid of traffic as soon as possible. Both of these rules ‑‑ invented for use by single ISN. But let us imagine that for some reason one ISN betrays another one, you know this situation is quite common because in Europe and US their markets are ‑‑ the number of acquisitions is steadily growing, so do you think these acquisitions will affect the main goal of ISPs? Of course not. And more than, the expectations of shareholders is that their benefit from such united network will exceed the benefit from both of these networks previously. So, is it so simple to achieve these union of networks from the technical side? Okay. We can just change their system number but here we have a problem, because you are not able just to change it; you will need to ask all your external connections to change it, too. It's okay if your autonomous system number have little amount of external connections and no clients but if you have a lot of external connections and clients that do not want for some reasons to change their configuration files you will be forced to look for other options. So, let's see what we can do.
First problem is transmission of local preference value. In plain BGP you are not permitted to transfer ‑‑ transmit local attributes through eBGP sessions, and you will be forced to use communities to represent preference values and on each side of both autonomous systems create a ‑‑ a conversion process from communities to preference values. And of course, it's really hard to support such system because changing one part of network will require synchronization and change in another part of network. Second problem is hot potato routing. As we just had a look, both of these connections are external, and there could be numbers of situations when a route learns from autonomous system two and from external connection will have same IS path length and in this case we will have same likelihood to have hot potato routing and other routing. So what options do you have to fix this situation? You have at least two options: First one is to use origin attribute to put ‑‑ to make routes that are locally more preferable. This is a working solution until you are already using origin attributes for some other goals. Second is to, these Czechs between different parts of your network from all other external connections. It will also work but it could turn out to be not very cost‑effective. And the third problem is increased of IS path length. Normally when a route goes through network with single ASN path length is increased one time, of course we can prepenned it but if you are announcing to customers normally you will not use. In this case the route goes through two ASN numbers and theist path will be increased at least two times. Some networks are crying to overcome this by restricting prep ending of ASN 1 but such solution makes such network vulnerable, vulnerable to routesers because IP path is not only distance but also provides other protection.
So, what we have as a result: We have opportunity to unite two different ASNs in one network, we are able to synchronize local preference values with community so support hot potato routing. We are not able to fix with increased IP path length but still such network will be really hard to develop and you will spend enormous time to support it. So, one can ask a question: Where is IETF? Where is the update BGP? And someone could be surprised but IETF is working, and during last year there was an update to BGP protocol, with migration, mechanism for autonomous systems. So it's very similar to these pictures. So the idea is that one ASN slowly eats another one. So let's take a look at how it is maintained on the technical side.
The ‑‑ local, the way of synchronization in open message is changed, now instead of one ASN your BGP router will try to use open message both ASN, in it fails with one it will try another one. And it will ‑‑ also makes a new definition of internal BGP session. BGP session is internal if your neighbour is ‑‑ globalising or localising. So how the migration process looks like.
At first you need to upgrade your, all your route reflectors and unite them in cluster. After that, you will be ‑‑ it will be able to slowly migrate all other routers to new ASN number so the idea is you don't need to migrate all of them. If there is some external Czechs that you forced to see legacy, it's okay you will be able to use it but if these connections are there AS path will still be increased.
What do we have as a result? This RFC really simplifies the problem but still has limitations. Just imagine that for some reason you don't want to use only one autonomous number but keep both of them, all you have not two ASNs to migrate or you have ten to migrate, it's not uncommon, and what do we have here? But before answering these questions, let's see, take a lot at another solution, which is this is micro level. But at macro level we have very different approach. It is quite well‑known, it is system configurations, it was originally invented to support problem of full mesh in BGP, just ‑‑ smaller more controllable paths and you will ‑‑ and useful mesh in both of these paths. This origin has its own specification of what is internal BGP session, BGP session is internal if your neighbour is in your configuration list. But the most valuable part of it is how it solve the problem of increased paths. Normally, AS paths consist of two types of segments, it's sequence which is ordered system numbers and ‑‑ sequence of autonomous system numbers. In BGP configures were added two additional types: Configuration sequence and configuration set. So inside configuration autonomous systems when the route, the route goes through ASN border these segments, but when the route leaves configuration all these segments are removed, more than these segments are not used when the BGP process of any part of configuration is taking place. So, as we see in these examples, the path from ‑‑ for ASN ‑‑ autonomous system 3 is not increased. So, BGP configurations are very interesting solution and very different from migration. But still, according to the definition of what is BGP configuration, it must be a represented by single autonomous system number. So the question; how do bring multi autonomous system level. But first, let's make an observation:
We have already at least three different definitions of what is internal BGP session. Speaking at least because I am not sure that there is no ‑‑ there is no fourth or fifth one. This is result of two problems: First one is clear; old border definition when different networks is ‑‑ when different ASNs to different networks is not enough flexible any more. And the second problem, for each technical problem instead of creating some native mechanism, we are creating ad hocs. And that's maybe the main problem. At the same time, maybe there is a solution for this problem, too.
A year ago at RIPE meeting at these Working Group, I was for the first time presenting BGP roles, the a new configuration option. It reflects the real world agreement between two BGP speakers about their business relationship. Currently it has five year radius, it's peer custom provider economics and internal and this information must be sent via open messages using capabilities. So this opportunity synchronise and control your neighbour and avoid mistakes. Originally, we proposed the solution to force the problem of route fix and with roles the problem of preventing route leaks is really very simple. Route Leak is a network anominally when loaned from provider or peer and announced to another provider or peer. We propose to use and a new local attribute, internal only to customer attribute, which has zero length, it's just a ‑‑ it is set when route is from customer ‑‑ or from provider or peer and when we have export which if we have ‑‑ we make it export to another provider of peer and this attribute is set, just remove this route, so do not announce routes to providers of peers if this attribute is set. So it's very simple: It's just like communities as they should be, but communities are optional, and this solution is automated.
So, but we think that roles could have other applications, and one of them could be the ASN union. As soon as we have internal roles we can create a new definition based, which is native and based on business relationships. We define ‑‑ BGP session is internal, is if both roles of BGP neighbours is internal. So it's very simple. And internal BGP session, to transmit ‑‑ transmission of local attributes. Such definition solves the problem of hot potato routing from box. Now, we have internal BGP session between different ASNs of our union network and other BGP sessions are external so it's just working.
As a result, we have really simple approach to you night networks but still we have a problem with increased AS path lengths and to solve this, this problem, we also proposed an extension to BGP configurations. It's funny but all we need is to change only two aspects of autonomous systems configurations, first remove limitation that configuration must be represented by single ASN, let configuration be a represented by last global ASN in the path. Second, as soon as autonomous system will not be seen from two, we need to update Route Leak protection.
For internal routing we just keep the idea of configurations. Internal routing is secured with the problem of route loops. For external routing we propose that very summer. If we learn route from external connection just check that it doesn't have segments with autonomous system numbers from configuration list. And that is all. There will be no loops inside such union network.
As a result, the path is not increased and when route is learned from autonomous system one, like we had in previous example, it will be filtered because we know autonomous system one knows two is inside its configuration and that is all.
So, the configuration of such union network could be quite simple, it's just example of BIRD configuration file that defines these configuration from one side. And as we can see, we solved all our problems. We secured hot potato routing, we have no problems with our transmission of local attributes, we have Route Leak prevention working from BOS and the IS path is not increased.
Let's speak about the migration process. Unfortunately, it is not quite simple. First of all, if we want to implement such solution, we need to deploy roles at the connection between these networks. It will already give benefit of united in the network but it will help with hot potato routing, but to ‑‑ to get Route Leak prevention which is working globally and in a dramatic way in the whole network, we will need to deploy roles on each router of union network. And to solve the problem of increased path you will also need to deploy your network ASN configurations. There is an option for partial deploy or ASN configurations but in these cases you will again get the risk of route loops. So, which one is better? Migration or union? We prefer flexibility and I really think that flexibility to be able to choose. If we want to migrate or if we want to unite is a good option. These solution could provide native support for such union networks, which will have benefits from shared connectivity without need to develop ad hocs or have problems with supporting such network. So, that is all. You can see and hear some links we have already draft that is defines roles and route process. The draft about union is in progress, we have working implementation on BIRD with transmission of local attributes so I thank you for your time and I will be glad to hear your feedback. Thank you.
(Applause)
JOAO DAMAS: So, any questions for Alexander at this time? No. So you mentioned draft there. Are you going to send this to the IETF as a draft?
ALEXANDER AZIMOV: Sure, the first draft is already sent to IETF, the second draft is in work.
JOAO DAMAS: Great. And you are doing the BIRD implementation yourselves?
ALEXANDER AZIMOV: Yes.
JOAO DAMAS: Excellent. So, anyone else?
RUEDIGER VOLK: Deutsche Telekom. Two notes. One is I note that you have been only describing ways of integrating existing separate ASes, once in a while I hear people asking for splitting up existing single ones for reasons that usually are not really technical. Do you see a use case for that?
ALEXANDER AZIMOV: Splitting autonomous systems in more smaller controllable paths? You see, in migration standard, you are forced to ‑‑ to migrate one ASN to another. Here, you are free to unite or not unite, it's just an option.
RUEDIGER VOLK: Okay. The other thing is, I am not completely sure whether 27 years ago, when three guys sitting at lunch did the first BGP sketches, really were thinking about commercial relations, they might have put something into a protocol and they didn't.
When we look at the borders that we cross when travelling, we know, yes, we usually have doing to through two gates, one is the customs, which is essentially for commercial implications; and we also have the security which usually is represented by someone looking at your passport, so to try to avoid getting ‑‑ having bad people enter and bad people exit. I'm quite sure that was the initial thing that was most on the mind of the designers and we shouldn't forget that.
ALEXANDER AZIMOV: Okay, thank you.
JOAO DAMAS: Thank you, Alexander.
(Applause)
Next up, Job.
JOB SNIJDERS: Good morning, Working Group. It is very nice to be back here because a half a year ago my friend Ignus was presenting here about 64 bit communities and in the last half year a lot has happened and a lot of that was driven by comments made in this very Working Group, so I hope you will like what you see here.
A little bit of background: We all know and love BGP communities. They are designed to make our routing policies easier, the convention is that the first 16 bits are your ASN and the later whatever you want it to be. An example is 2914: 420, which in NTT's network means this is a customer route. The 2914:3200 is Europe. There are different styles of communities, you will see a lot of design patterns where some BGP communities are meant to inform you about the nature of the route and other BGP communities are meant to signal or trigger or request an action of sorts.
This was designed 20 years ago. Later on, extended communities came along, and allowed us to set a type and provided a few more bits but we will get to it later, it was not the ideal solution.
If you look at what is being deployed today, most transit carriers use standard BGP communities, the ones from 20 years ago to interact with their customers. I will even go as far as saying there are customers that put it specifically in their requests for proposals that there should be support for BGP communities, and not something else.
Now we touch upon something that I think was a little bit of an oversight ten years ago. 32‑bit ASNs, for some strange reason when 32‑bit ASNs came along and were defined in IETF, the BGP community stuff was not updated and if you look at BGP communities, they are 32 values and if we have that convention of the first 16 bits being the ASN, then you can understand ‑‑ you cannot possibly fit 32 bits in a 16‑bit field.
And this problem is becoming worse over time. As we speak already, 20% of the default free zone is a 32‑bit ASN. IANA has run out of 16‑bit ASNs. This means the well has dried up and the only inventory left is at the RIR level and even that inventory is not terribly impressive, we are talking thousands.
So, what now? Large BGP communities, because bigger is better. This is a draft that is going through the IETF and in a subsequent slides I will give some examples and give you an overview of what is done and what is within reach.
However, this is not the first attempt to solve this problem. We have had 4‑Octet specific BGP extended communities but these communities only provide us with 16‑bit of local data and there is a design pattern where you put literal ASN values in BGP communities, and this means that you can not target 32‑bit ASNs with this extended community. Another aspect is, that that RFC actually is for route origin and site origin and is not meant for general purpose Internet routing.
In December 2002, some people came up with a concept they called flexible BGP communities. I think it was a very innovative approach to view the concept of communities but its complexity might have been its downfall. And one of the tricks with flexible BGP communities is that you could embed in the BGP community a group of, say, BGP sessions or places or, you put classifiers in the BGP community itself and then you automated execution of that on your edge devices.
Later on, July 2010, keep in mind we are now almost in 2017, wide BGP communities came along and that is an interesting experiment but there are some down sides. For one, its complexity. Every feature you can think of, it has suffered from feature creep, in my opinion because every use case you can conceive they adopt that use case. And the trouble with such an approach is that you can never finish a specification because if you ‑‑ your scope is too broad then you can keep on adding stuff to the draft.
So this kind of means that for the last ten years, since 32‑bit ASNs became a thing, we have had nothing to show for, which is appalling.
So we have done ‑‑ we took our pitch forks and decided to take the matter in our own hands. Large BGP communities basically is the lowest common denominator we could come up with that addresses the very specific use case of 4 byte ASNs or people that interact with 4 byte ASNs. It really doesn't get any simpler than this. I think that has drawn a lot of support from operators and vendors alike in IETF, which is very nice. On the other hand, it can be excruciate doing read through thousands and thousands of e‑mail on this topic.
This image symbolises, this is basically the only slide you need to learn about large BGP communities. It is the same as RFC 1997, just bigger. That's it. We are done. You can deploy this now.
Even though it is very simple, I consider that simplicity a virtue and it has been consciously designed that way. Simply larger means we've pumped up the amount of space you have for your communities and that's it. It is a transitive community ‑‑ transitive BGP attribute, this is again to align with 1997 RFC BGP communities, because the trancetivity of the path attribute is something that operators rely on. There are many of you that have probably signalled through an intermediate ASN to target a far side ASN, and while there are no guarantees that BGP communities will actually propagate, the times that they do and you need it are there.
It is purposefully 100 percent opaque. There are no semantics and method in the community itself, it is just a number, and the number only has meaning the moment you assign meaning to it. Just like with 1997 communities.
These communities are 96 bits, you might ask why doesn't just go for 64 bits communities, because when we move from 16‑bit to 32‑bit ASNs maybe it's logical if we do a similar thing and say 32‑bit communities should be 64 bit communities. And this is where the Working Group's feedback came in. Ruediger, I would like to thank you for very insightful comments at the last meeting. Viewed Ger argued that this is our one chance finally fix the name space issue. The name space issue is that if you put a literal ASN as the local data in a community, then there is no room left for an action or some other addition Allen coding that triggers, for instance, prepenned or do not export. So, by saying that the first 32 bits are the global administrator or the ASN, that gives very operator 64 bits of local data to play with, and with 64 bits of local data, you can target 32‑bit ASNs and still you have room left to signal an action of sorts.
So this is why it's 96 bits not 64. We need clean name spaces and to avoid colliding with each other's communities.
Another design goal was to make it easy to implement, so that people can copy, paste, code from their RFC 1997 implementations and, you know, just make it a little bit bigger and off you go. I have done a few implementations myself and I can attest to the fact that it's very simple to implement.
The patches that have been submitted to various OpenSource projects are between 1,000 and 2000 lines, which is very small. Another aspect of large BGP communities is that we have chosen to include canonical representation. If operators speak to each other on the phone there can already be language barriers and things, communicating is not always trivial. By firmly suggesting that there is only one right way to represent large communities, it will be much easier for operators to work with each other. And that canonical representation is, it's three different fields separated by; each field is 32 bits, so when I tell somebody 29140666, they should have a mental representation of what they RIPE into the router.
There are a few things purposefully left out of scope. A very important one is that we will not define a mapping system to RFC 1997 communities. You are free to submit a draft yourself in which you do define such a mapping system, but in the base specification, it's kind ‑‑ it would be steering the hornet's nest. At first sight you might think that mapping a 32‑bit value into 69 bit value is trivial. But if you look at what needs to be done, people need to fit everything in their own name space because we now have a better tool to design our networks, it would be a shame if do you a literal copying of your current routing policy and do not take that opportunity to do some clean‑up. Another aspect is that my mapping might differ from your mapping, because I could argue my mapping will be I am just aappending colon zero to all my communities, but others might say if the community is in the range 100 to 2000, subtract two but divide by three, there can be crazy mapping systems to meet the requirements of operators around the world, and I think mapping systems are best left up to the vendors at some later point.
Another thing that is out of scope is that there are no well‑known large communities. At this point in time, there are some reserved values, so if the community at a later stage decides that well‑known large known communities are useful thing, there is room to do that, but as it stands today, the well‑knowns that are defined in 1997 work perfectly, and we are not going to deprecate 1997 any time soon. This will be a multi‑year project if it ever happens. So the well‑knowns will remain in the classic standard, and at this time there is no well‑knowns in large.
This is what the community looks like on the wire. 69 bits, the first 32 bits define the autonomous system number that is owner of the name space. So in the case of NTT, NTT tags its route announcements with informative communities that signal the type of relation, the city, the country and the region in which it was picked up. That's ‑‑ we define those values, that is why, in our case, 2914 will be in the first field. And if a customer of NTT would like to trigger something in NTT's network the customer will set the BGP community with 2914 and what follows after that is what the customer and NTT agreed upon.
So, a way of thinking of this, but it's a mere suggestion because with large BGP communities you are allowed to do whatever the hell you want to do with them, but a mere suggestion is, me: Action: You. Where you is a target ASN and the action is a number that identifies what needs to be done.
We are working with a group of people on RFC 1998 usage document to give some guidance and inspiration how one might deploy their routing policy, but that document is not published yet, it's work in progress.
Now, let's look at some cool things that we can do with large communities that previously were either impossible or hard to do.
NTT has an RFC 1997 community which has 65400: And then a value the peer ASN and what that does is triggers a do not advertise in North America towards that peer ASN. You can see that with the current approach of things, NTT would not be able to peer with a 32‑bit network because you cannot put a 32‑bit value in that peer ASN. In the new world with large BGP communities, we could have something like 2914:65400:peer‑AS, where peer‑AS can be either a two bit ‑‑ a 16‑bit or a 32‑bit ASN value, and another awesome feature here is that suddenly we are not colliding with private ASNs any more in the global administrator field. As you can see, now the community starts with 2914, which clearly signals that this is something that falls within NTT's domain. So that is two problems solved in one go.
The AMS‑IX route server has a feature where you can tag your announcements to the route server with BGP communities and that triggers certain things on the route servers. In this case they currently have zero: Peer‑ASN, and what they could do in the future is use 6667:0:Peer‑ASN. Again, they are not squatting any longer on the reserve value because they can define this within their own name space which is 6667, the ASN of their route server and they can support 4 byte ASNs. And there are many more examples like this where it just makes so much obvious sense to use a larger community.
At the last Routing Working Group, Geoff Huston was like, stop waving your hand, just submit a draft. We have done so in, fact we have submitted eleven drafts. We have been busy. This is a gigantic effort, a lot of people are close to working full‑time on this for weeks on end. There is one thing you can see in this slide, the stricken through early code allocation. Let's go to that part. I am going to stick to my plan. We are currently at large community draft version 5, I expect this weekend will get version 6 out and we are basically on a weekly basis releasing versions of the draft to reflect which was discussed in a Working Group in a previous week. The current phase in IETF terms is that we are in Working Group last call. Working Group last call ends a few days from now, and depending on that outcome, we can either proceed to IETF last call or we have to go back and work more on the documents.
If you look at the time‑line of this effort, the short version is that I hope that we as operators, by end of 2017, have code for the majority of platforms. This aligns quite nicely with the projection ‑‑ projected run out of the 32‑bit reserves, the 16‑bit ASNs reserves that RIRs have but it might be a close call because if somebody starts consuming a lot of 16‑bit ASNs it might be that the RIRs run out of that resource before this is available. There are basically three parallel tracks to this: There is the IETF side of things, the standardisation, which is crucial because if we don't standardise our computers cannot talk to each other. Then, there is a side that is the implementers, the vendors, Cisco, Nokia, bird, open BGP, you name it, all those parties have to implement support for large communities. Also, part of that group is people that provide tools, such as PM X, there are many people use the BGP communities in the field towards the PM X, so infrastructure like PMX /wire shark, all needs to support large communities. It's a very big effort, it's like trying to turn a super tanker.
And then there is the network operators, of course. If you have all these tools, you still need to do some work and implement it. And we needed to popularise this idea, we need to reach out to our customers and tell them, hey, we got this new thing, maybe if you want to do traffic engineering this is what you need to look into.
If we look at a small selection of all possible implementations, I am very proud to say that there are already six implementations available today, and I expect that, today or tomorrow, a seventh implementation will become available as well.
Idea requires that you have two implementations at a minimum for a document to become an RFC so we figure we'd better make sure we are on the safe side and get as many implementations as possible.
On the website large BGP communities .net you can track the latest developments and look at the current status of implementations.
As I mentioned it's not just the routers that need upgrading, it's an Eco system that needs upgrading so that is why tools like TCP dump and PMect are very important to make this technology a success.
I have a funny story about deploying a new technology like this. We figured once we got the early IANA assignment that it would be good to put a prefix in the default free zone that has that path attribute attached and then see what happens. Part of me was fearing that we might crash part of the Internet like with the famous incident in which RIPE NCC was involved a while back, but part of me hoped that it would just work because if it just works that means there is one less show stopper in getting this technology into the hands of operators.
These are the two prefixes. I encourage to you look in your network to verify whether you are receiving these, because if you are not seeing this prefix we need to talk.
Now, we have discovered some problems through this beacon. In my house, I have a /24 and /48, I slept ‑‑ slapped the attributes on this prefix and announced it and told my girlfriend can you do the debugging on behalf of the network operator community and let me know if something doesn't work. And lo and behold, a day later she reports, I cannot access this on this website. So I dive into this and with the Netherlands operator community, we quickly tracked down it was one specific ISP that could not be reached, and they recently were in the news because they proudly proclaimed we bought a gigantic wallway router. Some additional testing with Christian Lars enwho has access to these devices so we made artificial environment and he confirmed one way routers are dropping prefixes that have code path attribute value 30 attached to them. Turns out that wallway has been shipping code that includes an implementation of a completely different technology and, therefore, when they receive an announcement with a large BGP community attached, they compared the length and structure of the thing and they decide this is a crap ‑‑ corrupt attribute, I am treating this as a withdraw. And this means that prefixes which have a large BGP community attached cannot be reached by wall way routers. And of course, that is disastrous to the deployment of a technology like this because we cannot force people to up grade their routers and there will be very little incentive for people to deploy large BGP communities if they know they won't be able to access parts of the Internet afterwards.
So the solution is relatively straightforward: You go back to IANA, you say this one is used, I want a new one. And in the discussions surrounding this incident, another vendor, Cisco, with Nexus OS, also stepped forward and they were brave enough to admit that in publically deployed installations there was an unfortunate series of events and code point 31 was also in use, so we ended up with code point 32 which I think is a very nice number, given what we are trying to solve here.
This morning, a few hours ago, I started announcing new beacons with code point 32, so far everything looks good. I see that none of us are running to our rooms to start fixing our networks. So I hope that this is it, but if we are unluckily we will have to keep moving until we reach 42.
As I mentioned in the parallel time‑lines, there are a couple of to do items. Vendors, you should implement large BGP community support. If you do so, ensure that there is real feature parity with the standard RFC 1997 communities. That means we should be able to set communities, we should be able to match on them, we should be able to delete the communities, we need regular expression support, if you already offer that for RFC 1997, range support if you offer that for the standard communities, just make it look and feel the same way.
Make sure that your show commands also support large communities that is easy to view which routes match this particular large community or if you look at a route and look at the details that you can see these large communities are attached.
Given that there is a plethora of OpenSource implementations available today, if you are a commercial close sourced implementer, feel free to take a look at those and copy that style and structure.
As a vendor, and this is where we operators need show some appreciations for the vendor, it is not just coding and releasing; for a vendor, these are very big projects because they need to train their internal staff, the tack needs to be re‑trained to ensure if there are issues they do the appropriate debugging. It ripples down all the way to the engineering side, all of those people need to be trained before the product can be pushed.
We as network operators, also have some to do items. One of the most important ones is, ask your suppliers for support for large BGP communities. Even though you saw on the earlier slide that for instance, Cisco IOS SR that is an engineering release that supports large BGP communities, companies like Cisco, like Nokia, like Juniper, like to hear from many, many, many different customers that this is the fee fear they want. The more customers ask, the higher it's pushed on the priority list. So, do not be complacent. E‑mail your sales representatives, preferably today, and say, this is what I want, this is what I need, thank you.
It is really a group effort, the more people ask them, the better.
Then, if we get our hands‑on running code, which I hope is end of 2017, then the real work starts. We need to deploy this. We need to update our tools, provisioning software, publish information, communicate with our customers and again provide training to our own staff. So, for the next few months we have a lot to do.
This concludes my presentation. It was created together with Greg Hankins, thank you, Greg. If you want to reuse this presentation and present at your local network operator group meetings or wherever you want, in your company, please e‑mail us and we will give you the master slide deck, we don't claim any copyright, this information needs to be spread as fast and as wide as possible.
Unless anybody has questions about large BGP communities?
RUEDIGER VOLK: Well, okay, I don't have really questions. I would have quite a number of comments. I think telling that larger communities are like your elephant picture is kind of not not transferring the important lessons that have been learned. The ‑‑ and some of the ‑‑ some of the attempts over the past 10 years that failed actually are due ‑‑ the failure is due to not grabbing those lessons. What we are ‑‑ ‑‑ well, okay, the 64 proposal that was discussed in Copenhagen would not have been sufficient.
JOB SNIJDERS: Correct
RUEDIGER VOLK: And that would have been the larger elephant. The important thing is, we need the name space for everybody so that everybody can code along his own preferences, a set of function and parameters, and the parameters have to clue at least one 32‑bit entity, which is the 32‑bit AS, usually. And the old communities did not do the same for 16‑bit ASNs, and so in the past, a lot of us have been playing tricks by squatting on other other parties' supposed name space meaning that our definitions are kind of in conflict. And and getting the notion that, yes, we need the name space for everybody so that functions and parameters can be coded, I think is essential and, for example, the 4‑Octets extended community thing exactly did not grab what the 16‑bit actually should be coding.
JOB SNIJDERS: For a long time there has been a huge disconnect between protocol designers and operators.
RUEDIGER VOLK: Okay. So I think actually making explicit this requirement in the documentation and explanation makes a lot of sense. You have it in some of the slides when you say the larger, it's just the larger elephant, you are skipping, I think, this very important notion.
JOB SNIJDERS: We will search together for a better image.
RUEDIGER VOLK: Yes. So next thing, which is ‑‑ okay, I don't know. Your experiment with the beacons is exercising the transitive corrector of the attribute. We should, however, very clearly understand that the use of the communities should be considered as bilateral agreement for each peering so having the attribute being transitive is nice so that you can leak, but you should actually not expect that what you are putting there spreads everywhere uniformly, and for the misuse of communities and attributes, in the last few years we learned that, yes, to be able to protect our systems we actually have to have filters that allow us to drop attributes that are bad and kind of, kind of ‑‑
JOB SNIJDERS: No ‑‑
RUEDIGER VOLK: When you reported that, yes, the propagation seemed to work fairly well, I was thinking for myself, oh, so everybody ‑‑ nobody is actually filtering that stuff.
JOB SNIJDERS: Which is a good thing.
RUEDIGER VOLK: Which is kind of ‑‑ well okay, what do I know what people are abusing my BGP system to ex filtrate information to some weird place?
JOB SNIJDERS: On the flip side, if do you BGP path attribute filtering you stifle BGP innovation because the filters become still, that is the case with all filters.
RUEDIGER VOLK: Well, the thing is, the functionality that is exercised on my edge is something that I need to have control over, and yes, that may include that I block out some innovations that I do not think I am prepared for.
JOB SNIJDERS: And that is your right, it's your network.
RUEDIGER VOLK: Yes, and you should expect that to happen at least sometimes.
JOB SNIJDERS: So the message to take away from this comment is, if you do BGP path attribute filtering make sure you allow 32 through?
RUEDIGER VOLK: No.
JOB SNIJDERS: Oh.
JOAO DAMAS: Okay ‑‑
RUEDIGER VOLK: I am not interested to have my customers pass through unchecked requests for black‑holing on another ‑‑ well, okay, for another customer, and I'm held responsible for what my system is giving as command to my neighbour.
JOAO DAMAS: Well, there is a usability document coming at some point so...
JOB SNIJDERS: Yes.
SPEAKER: There goes the reason requesting 16‑bit numbers for AS servers. Yes, Wolfgang Tremmel working for company who operates a number of route servers, I would like to thank you for your work and I just wonder why nobody else earlier got the idea.
JOB SNIJDERS: Strategy of the comments.
SPEAKER: Warren, Google. This has been an infinite amount of work, thank you for doing it, a lot of people spoke about it but you stood up and got it done.
JOAO DAMAS: In your slide about the vendors state of implementation, there is missing one. I wonder if that was a space problem in that slide but where is Juniper?
JOB SNIJDERS: Yes, you can see it's a bit crowded. If you go to the website the list is much longer and I have heard rumour that Juniper was thinking of releasing something in the second half of 2017.
JOAO DAMAS: Next year?
JOB SNIJDERS: Yes.
CHAIR: A question from myself. Did you have contact with them already about the subject or not yet?
JOB SNIJDERS: Friends of mine have contact with them. So this is secondhand information.
SPEAKER: Alexander. Just question about the implementations. These implementations, what attributes are they using?
JOB SNIJDERS: Some of them 30, some of them 32, and I expect ‑‑ so the 32 code point we got that assigned 13 hours ago. So, already a part of this has been patched so you can see that the space of movement is very high. And another reason why we changed at this last minute: I felt that now is the time to jump to a different code point because the large communities are not used beyond the beacon announcement. So if we would have waited, say, half a year, then it might have been already been too late. So, it was now or never.
SPEAKER: Are you going to synchronize this data with updates to the patches and so on?
JOB SNIJDERS: Yes, I will add a column that indicates that detail.
JOAO DAMAS: Ruediger, only one minute.
RUEDIGER VOLK: We expect, we expect that any serious implementer will be able to fix this for production code within six weeks. However, however, on production code, we also should be a little bit careful about your expectations when things will be actually available all over the field, there are systems that are on this list where hardware platforms do not get software updates any more. And so, the time, the time we can expect to have the functionality actually on all edge routers, certainly is seriously longer than 12 months.
JOB SNIJDERS: True and on top of that for NTT to refresh the software in the entire network, that takes ‑‑ that is a 12‑month project. So if we get code in 2017, yeah, you will see that deployment will go well into 2018.
SPEAKER: Peter Hessler with open BGPd, so, just before this talk I updated our attribute code points and we are now using 32. As far as which ones are using the old code point and the new code point, with the exception of the Cisco iOS XR which is a Beta you have to request all of the other implementations that have it right now are all OpenSource and they can be updated extremely quick. And as far as deployments the most critical location I see is the IXP route servers themselves and the vast majority of them are already using an OpenSource variant, so them adding this feature locally will be very quick, 32‑bit entities will have a very strong encouragement to upgrade their equipment and everyone else, it would be great but if they don't have a strict requirement to upgrade as soon as possible.
JOB SNIJDERS: OpenSource and availability of route servers software prove to be a very good asset for this effort.
JOAO DAMAS: Okay. Thank you, Job.
JOB SNIJDERS: Thank you.
(Applause)
We have Ben Maddison now.
BEN MADDISON: Morning, everyone. I am from work on line communications, we are actually at the back of this presentation. So, I have asked to be a bit quick, Job had a nice self‑contained subject which he needed 45 minutes or so for so I get ten for a massive and ‑‑ subject. So, I am hoping that the answer to this question in most cases is yes, because that is where you save time. Maybe just a quick show of hands, is everyone come across MANRS in some shape or form? Okay. That is a reasonable proportion.
The idea of MANRS is that we live in a distributed loosely coupled ecosystem of independent operators and getting people to behave themselves with respects things that are complicated like routing security is not always straightforward. One of the tools that we can leverage to get that done is to exert peer pressure, you know, in the sociological rather than routing sense. And MANRS seeks to achieve that by essentially asking people to publically subscribe to a set of principles, a set of four principles that are considered fairly obvious baseline best practice to keep their local part of the routing system as clean as possible. The four actions that we have got at the moment are filtering, which is to essentially make sure that you accept the minimum amount of bad information into your network and propagate that as little as possible to your adjacent networks, in the context of this filtering effectively means for a transit AS to filter ingress from your peers ‑‑ from your customers as strictly as possible and on your egress to your transits and peers as strictly as possible and optionally and preferably to also doing ingress filtering from peers, but that is self‑protection and customer protection rather than system wide protection. Anti‑spoofing is exactly what it says on the tin: It's within reasonable bounds to take steps to prevent invalid source addresses from leaking into your adjacent networks by making sure that the prefixes that your customers announced to you are also the source addresses in packets that you receive were them. You know, that is obviously a fairly hot topic at the moment given the events of the last few weeks. 3 is cords nation and the easiest one, which is making sure you are reachable using the tools we have available as a community, things like the routing registries and Whois and peeringDB, to make sure that if something goes wrong and someone else on the other side of the network needs to contact you in order to get assistance in putting it right, that a way of doing that is readily available, and you have got network e‑mail addresses and phone numbers published somewhere sensible. And number 4, which is quasi‑optional, it is an add‑on, it's the facilitation of global validation, which is I think probably the more ‑‑ most conceptually complex of the four, not the most operationally difficult. That involves making public sufficient information that some third party, without any specific knowledge of your network and the business that you operate, has enough information to compare that with what they are seeing in their routing tables, what they are learning from their transits and peers and customers to ascertain what looks right and what looks wrong. And I suppose one could look at that as a precursor to some future version of path validation.
So, BGP security and routing security in general is a complex topic. It turns out that writing a document about it is a complex task and becomes very large and very sprawling very quickly and the bags under Sander's eyes will attest to that.
The document itself, I believe a link was sent out to both this Working Group and the BCOP Working Group during the course of the week before last and hopefully one or two people have had a chance to look through T it's a long document. It's gone through a significant amount of tidying up over the last few months. Just to kind of run you through it very, very briefly. The reference topology that we use throughout the document is this one, AS 6,4500 is the operator in question that is trying to become MANRS compliant. They have two single home transit customers which are 6,4501 and 2, they have ‑‑ what is presumably supposed to be a settlement free peer which is the green networks, 64511 and a single transit which is 64510 which is blue network up at the top.
For reasons of kind of readability in the document itself, the order of the actions in the base MANRS document do not map to the order of chapters in the BCOP and that can be a source of confusion when people come into it fresh, do not expect that. That is done for readability because a lot of the first introduction to the tool sets that people need to be familiar with come in this action 3 part. As I said, action 3 involves using the tools that I expect every single person in this room has come across in one form or another, the Whois is run by the Internet region Internet registries, peeringDB, a network tool section on a company website where your looking glass lives, so on. That is, in most cases, even for very new entrants to the routing Eco system, something that can probably be ticked off the list straight away. But in the context of the document, it's a very important chunk because there are lots and lots of different tool sets that we use as a community and they all have different interfaces and all have different ways of signing up for use of it. That part of the document becomes fairly long and unwieldy. It's also the case that it's regional. This is, you know, this is RIPE, obviously, and and so people here will be most familiar with the various RIPE interfaces but if there are people in particular who have exposure to tool sets from other regions, in LACNIC, I operate in the AfriNIC region, in ARIN and APNIC and so forth, that is an area where I'd encourage people to get involve and contribute their knowledge of their local tool sets. Action 2 is implementation‑wise the most complex and I think this is probably where this Working Group can be most valuable in terms of providing input.
Source validation looks different depending on your immediate neighbour topology, it looks different on what kind of access topology you use for your customer, whether you are predominantly access or content or transit network and so on. And what works in one person's network very ‑‑ can very, very seriously break another person's network. One of the reasons that we wanted to come to the routing Working Group and not just the BCOP Working Group to present this, is to try and ask for some sanity checking in the advice that we are providing to network operators. And if you choose one area to look through and validate against, would this work in my network, do I do something completely other than this and do I feel comfortable contributing that to the community? I would encourage everybody to get stuck into this as a first port of call.
Action 4, not a lot of people that I am aware of do this as a validating party today. That being said, keeping information complete and clean is a hell of a lot easier than going and sticking it into information system after the facts so I'd encourage people to do it nonetheless. Again, this is an area where there is lots of different tool sets available and some of them are regionalised and so anyone in particular who has specific knowledge of tool sets in other regions, I'd encourage them doing and have a look at that.
And then action 1, which is again an area that directly impacts of forwarding path and so is an appropriate area for this Working Group to spend sometime on, is about filtering ingress from customers and egress to non‑customer networks as I say. And there are lots of different examples using lots of tool sets contained in the document and again, that is a section where it would be very, very valuable for this part of the community to provide some sanity checking.
As I say there has been some rearrangement. This is mainly an administrative question which can be left to the BCOP Working Group.
Just very briefly on some of the challenges we have had. As I say, an implementation of this looks very different in almost every single network on the planet. It turns out that one needs to include a lot of baseline information and a lot of background information in the document in order to be able to make it sane and readable, and that has resulted in a very long document and one that is quite difficult to jump straight into and start reading fresh. We have been bouncing ideas around about dropping the idea of a single self‑contained document and trying to turn it into a little bit of a more kind of a breathing organism where we can have more statical and theoretical parts and use a contributed examples and code snippets and I'd encourage people to, if they have that sort of thing, to contribute to come and speak to me afterwards or ask questions at the mic. And the ‑‑ I suppose if there is one take away from this, for this Working Group, as I said this slide deck was originally created for BCOP Working Group and there were various administrative matters to talk about, what we need from this part of the community is sanity checking. I know the bits I have written down break my network but I can't say for anyone else that they don't break yours. I don't know that my implementations that would work for anyone else. Anyone who is willing to take some time and sanity check one or more parts before we provide it to the wider community and break something, that would be massively valuable. Thank you.
JOAO DAMAS: Thank you, Ben.
(Applause)
Anyone got any questions or comments for Ben at this point? No. I do have one. It's nice to have a document that you can refer people to but do you have any plans to make sure that people get to know that there is such a document like work together with the RIPE NCC to make sure it's in the tutorial and so on?
BEN MADDISON: Yes, how we go about doing that is a subject of discussion at the moment. The original intention when we first engaged with the BCOP Working Group in Copenhagen was eventually to have this turn into a RIPE document which could be just referred to in the following the normal RIPE procedures. I tend to feel that it's turned into too long and complex a document that is likely to change too often for that to necessarily be the appropriate course of action. And this was discussed the other day at the other Working Group quite extensively. We will end up with a middle ground where the conceptual principle stuff lives in some sort of a standards document, whether that is RIPE document or something else, and there is a ‑‑ there is more of a fluid system of referencing around it, in the shape of a WIKI or something of that nature. But certainly the ‑‑ what I had hope is whatever shape that final document takes there, it's easily accessible and well publicised in all of the Internet registry communities and possibly in other places as well. I mean, it's originally an ISOC initiative so there is quite a lot of reference to it on the deploy 360 site already and some talk going on about what does a global BCOP repository look like in general and how would that fit into that kind of a system.
JOAO DAMAS: Great. Anyone else? So please if you have a chance, do some review of the document, I think they would appreciate the feedback. Thank you very much.
BEN MADDISON: Thank you.
(Applause)
JOAO DAMAS: Now, last talk, Robert Kisteleki, summary of the hack‑a‑thon on Saturday.
ROBERT KISTELEKI: As you may have heard, the weekend before the RIPE meeting there was an IXP tools hack‑a‑thon and I was asked to give a quick update on what happened there. We had about, I think it was 40 people or so ended up in seven teams, six, seven people each, and they worked on various tools that they thought are actually useful for the IXP community. So it has something to do with routing most of the time so I was doing it.
We had seven projects and you will see them on these slides. One of them is a universal looking glass, basically the idea was to take MRT dumps from various sources and build a small tool that actually loads it up and in a browser you can query it just like if it was a live looking glass.
Another one is bird's eye, basically building an API and the UI and CLI that can connect to this API that can give an insight into what BIRD is doing, what the parameters are. Apparently there is no official API to do this so there is some screen scraping and magic happening in the background. I understand that there will be a talk about this in the OpenSource Working Group, so if you want to know more about this, please come to that.
Remote peering Jedi, the team used RIPE Atlas to figure out what peerings are remote and in a particular IXP and where is the remote peer actually, so they tried to triangulate that using RIPE Atlas probes. Peer match making, the idea was that there may be a peering sessions that could have yet at the particular IXP and if I did have those sessions then my costs could do down because I could exchange traffic there instead of paying a transit, for example.
IXP valuator, what is the value of having an IXP in relation to local content versus local eyeballs, so the team came up with a tool that tried to measure this, somehow, how local is the content, what path actually go through the IXP in that sense.
Making peering great again or peer me, the team built a tool that if you give it the right input it will basically build based on templates, I think it was ‑‑ builds configurations so you can load it up into your routers and then you are good doing, and Pinder was an idea, it's basically a Tinder software for peering, it kind of automates the communication between the two parties who would like to peer with each other so instead of just sending e‑mails and not knowing where it's really coming from, so when the mail just says, Hi, I want to peer with you, that is kind of knot enough. So this tool actually helps that kind of communication.
So, bottom line is I think that if enough of these tools actually make it out there and instead of a hack‑a‑thon outcome they will be actually used tools then the value of peering coordinators may go down drastically. But in any case, we are maintaining a GitHub repository where you can find all the tools and links to presentations and all the supporting material so please check that out. I don't think you are supposed to be able to read that but you can click on it. That is it.
JOAO DAMAS: Thank you.
ROBERT KISTELEKI: I would not like to take any questions.
JOAO DAMAS: Probably cannot read the link but it's on the slides that are available. So thank you.
At this point we are done with the agenda. Does anyone have any other business? That looks like it's a no. So thank you all for coming. We are done. We will see us all again at RIPE 74 taking place in Budapest, May next year, thank you.
(Applause)
LIVE CAPTIONING BY AOIFE DOWNES RPR
DOYLE COURT REPORTERS LTD, DUBLIN IRELAND.
WWW.DCR.IE