By | November 13, 2023
How TCP's Congestion Control Saved the Internet

Systems approach With the annual SIGCOMM conference taking place this month, we observed that congestion control still gets an hour in the program, 35 years after the first paper on TCP congestion control was published. So it seems a good time to realize how much the success of the Internet has depended on its approach to dealing with traffic congestion.

After my last talk and article on 60 Years of Networking, which focused almost entirely on the Internet and the ARPANET, I received a lot of comments about different networking technologies simultaneously competing for supremacy.

These included the OSI stack (anyone remember CLNP and TP4?), the Colored Book protocols (including Cambridge Ring) and of course ATM (Asynchronous Transfer Mode) which was actually the first network protocol I worked on in depth. It’s hard to understand now, but in the 1980s I was one of many who believed that ATM could be the packet switching technology to take over the world.

I rate congestion control as one of the key factors that allowed the internet to evolve from modest to global scale

ATM proponents used to refer to existing technologies such as Ethernet and TCP/IP as “legacy” protocols that could be carried over the global ATM network once established, if necessary. One of my fond memories from that time is of Steve Deering (a pioneer in IP networking) boldly (and correctly) stating that ATM would never be successful enough to even be a legacy protocol.

One reason I skipped over these other protocols when I covered the history of the network earlier was simply to save space—it’s a little-known fact that my Systems Approach colleague Larry Peterson and I aim for brevity, especially since we got a one-star review on Amazon that called our book “a wall of text.” But I was also focused on how we got to today’s Internet, where TCP/IP has effectively out-competed other protocol suites to achieve global (or near-global) penetration.

There are many theories as to why TCP/IP was more successful than its contemporaries, and they are not easy to test. It is likely that many factors played into the success of Internet Protocols. But I rate congestion control as one of the key factors that allowed the Internet to evolve from modest to global scale.

It is also an interesting study in how the particular architectural choices made in the 1970s played out in subsequent decades.

Distributed resource management

In David Clark’s paper (PDF) “The Design Philosophy of the DARPA Internet Protocols,” a stated design goal is: “The Internet architecture must allow distributed management of its resources.” There are many different implications of that goal, but the way Jacobson and Karels (PDF) first implemented congestion control in TCP is a good example of taking that principle to heart.

Their approach also embraces another design goal of the Internet: to accommodate many different types of networks. Taken together, these principles largely preclude the possibility of any form of network-based access control, a sharp contrast to networks such as ATM, which assumed that a request for resources must be made from an end system to the network before data could flow.

Part of the philosophy of “accommodating many types of networks” is that you cannot assume that all networks have access control. Combine that with distributed management of resources and you end up with congestion control being something that end systems have to deal with, which is exactly what Jacobson and Karels did with their first changes to TCP.

We are trying to get millions of end systems to cooperatively share the bandwidth of bottleneck links in some fair way

The history of TCP congestion control is long enough to fill a book (and we did), but the work done in Berkeley, California, from 1986 to 1998 casts a long shadow, with Jacobson’s 1988 SIGCOMM paper ranking among the most cited the network magazines of all. time.

Slow-start, AIMD (additive increase, multiplicative decrease), RTT estimation, and the use of packet loss as a congestion signal were all in that paper, which laid the foundation for the following decades of congestion control research. One reason for that paper’s influence, I believe, is that the foundation it laid was solid, while leaving plenty of room for future improvement—as we see in the continuing efforts to improve congestion control today.

And the problem is fundamentally hard: we’re trying to get millions of end systems that have no direct contact with each other to cooperatively share the bandwidth of bottleneck links in some moderately fair way by using only the information that can be obtained by sending packets into the network and observing when and if they reach their destination.

Arguably one of the biggest steps forward after 1988 was the realization by Brakmo and Peterson (yeah, that guy) that packet loss wasn’t the only signal of congestion: increasing delays were too. This was the basis of the 1994 TCP Vegas paper, and the idea of ​​using delay rather than loss alone was quite controversial at the time.

However, Vegas started a new trend in congestion control research, inspiring many other attempts to consider delay as an early indicator of congestion before loss occurs. Datacenter TCP (DCTCP) and Google’s BBR are two examples.

One reason I give credit to congestion control algorithms for explaining the success of the internet is that the path to the internet’s failure was clearly shown in 1986. Jacobson describes some of the early episodes of congestion collapse, where throughput dropped by three orders of magnitude. of size.

When I started at Cisco in 1995, we were still hearing customer stories about catastrophic traffic congestion. That same year, Bob Metcalfe, inventor of Ethernet and recent Turing Award winner, predicted that the Internet would collapse as consumer Internet access and the rise of the Web drove rapid growth in traffic. It didn’t.

Congestion control has continued to evolve, with the QUIC protocol, for example, offering both better congestion detection mechanisms and the ability to experiment with multiple congestion control algorithms. And some congestion control has moved into the application layer, eg: Dynamic Adaptive Streaming over HTTP (DASH).

An interesting side effect of the congestion episodes of the 1980s and 1990s was that we observed that small buffers were sometimes the cause of congestion collapse. An influential paper by Villamizar and Song showed that TCP performance dropped when the amount of buffering was less than the average delay × bandwidth product of the flows.

Unfortunately, the result only applied to very small numbers of flows (as acknowledged in the paper) but it was widely interpreted as an inviolable rule that influenced router design for years to come.

This was ultimately rejected by Appenzeller’s buffer sizing work et al 2004, but not before the unfortunate phenomenon of Bufferbloat—truly excessive buffer sizes that led to massive queuing delays—had made it to millions of low-end routers. The Bufferbloat self-test on your home network is worth a look.

So while we don’t get to go back and run controlled experiments to see exactly how the Internet came to succeed while other protocol suites fell by the wayside, we can at least see that the Internet avoided potential failures due to congestion at the right time control was added.

It was relatively easy in 1986 to experiment with new ideas by tweaking the code in a few end systems and then pushing out the efficient solution to a broad set of systems. Nothing in the network needed to change. It almost certainly helped that the set of operating systems that needed to be changed and the pool of people who could make those changes was small enough to see widespread distribution of the initial BSD-based algorithms of Jacobson and Karels.

It seems clear that there is no such thing as the perfect congestion control method, which is why we continue to see new papers on the subject 35 years after Jacobson’s. However, the architecture of the Internet has fostered the environment where effective solutions can be tested and deployed to achieve distributed management of shared resources.

In my opinion, it is a good testament to the quality of that architecture. ®

#TCPs #Congestion #Control #Saved #Internet

Leave a Reply

Your email address will not be published. Required fields are marked *