A Critique of Aeolus: A Building Block for Proactive Transports in Datacenters
The paper "Aeolus: A Building Block for Proactive Transports in Datacenters" (SIGCOMM 2020) raises concerns about buffer overflows in network transport protocols such as Homa, then proposes modifications to the Homa protocol to avoid the performance penalty associated with overflows. Unfortunately, this paper has three serious flaws, which are discussed in detail below:
It is based on a false premise (that buffer overflows are likely in Homa)
Its proposed solution is inferior to Homa and has poor performance
The results are presented in a way that camouflages the problems with the revised protocol
As a result, the paper’s conclusions are invalid.
False assumptions about switch buffer architecture
The paper begins by arguing that the proactive nature of protocols like Homa (where a source can transmit packets to a target without knowing whether there will be adequate buffer space in the top-of-rack switch) can result in buffer overflow and dropped packets, leading to poor performance. The paper artificially limits switch buffers to 200 KB per egress port, then runs experiments with Homa showing that packet drops occur and Homa performance degrades drastically.
The results of the experiment are unsurprising: packet drops result in long delays in Homa, so it's not surprising that performance would degrade severely if there are many packet drops. However, the experimental setup does not reflect real-world conditions: no modern switch partitions its buffers statically between egress ports. Instead, switches provide a shared pool of buffers for all ports. The pools available in modern switches are sufficient to completely eliminate buffer overflows for Homa.
For example, the ATC Homa paper measured the Linux kernel implementation of Homa running with Mellanox 2410 switches, which provide 13MB of buffers that can be dynamically shared among 40 ports. With this configuration, none of the Homa benchmarks resulted in any dropped packets (the highest buffer utilization for any benchmark was 8.5 MB).
Thus, the Aeolus paper attempts to solve a problem that does not yet exist.
For a more comprehensive discussion of buffer usage in Homa, see this article.
Homa+Aeolus is a poor transport protocol
The second problem with the Aeolus paper is its proposed solution. By unnecessarily and severely limiting queue lengths, Homa+Aeolus induces large numbers of unnecessary packet drops. Furthermore, Aeolus reverses Homa’s SRPT priority mechanism: it "de-prioritizes the first-RTT packets", so that when packet drops occur they are likely to happen in short messages. SRPT is the single most important contributor to Homa's performance, so this results in poor performance.
Measurements hide the problems
Unfortunately, the measurements in the paper are flawed in ways that camouflage the problems introduced by the Homa modifications. First, they lump together all messages smaller than 100KB, so the impact on very short messages is camouflaged by the inherently higher latencies of longer messages (100 KB messages behave in very different ways than 100B messages). Second, the graphs make it hard to see 99% tail latency. Third, the paper measures Homa only under the unrealistic conditions where buffer overflows are frequent. By not comparing against Homa in the absence of buffer overflows, the paper hides the damage caused by the protocol changes.
Even so, it’s possible to get a sense of how inferior Homa+Aeolus is. In Figure 12(c), the 99th percentile latency for messages shorter than 100 KB for the Web search workload appears to be at least 200 usec at 54% network load (it’s hard to read the figure precisely). In contrast, Homa’s 99th percentile latency for the same workload at 50% network load ranges from less than 5 usec for very short messages up to about 90 usec for 100 KB messages. Even at 80% network load, Homa’s 99th percentile latencies range from 5-120 usec.
Although it's impossible to know for sure without doing additional measurements, it seems likely that a more thorough evaluation would show that Aeolus has sacrificed most or all of the performance advantage of Homa, making the resulting protocol uninteresting.
To summarize again, Aeolus is based on a problem that doesn't actually exist, its evaluation is inadequate, and it seems likely that the resulting protocol is not competitive.
Author’s Response
One of the Aeolus authors has posted a response to this critique, titled “Severe Loss is not Rare in High speed Datacenter Networks”. This document presents various arguments why packet losses might occur in datacenter networks. I don’t disagree with the issues raised in the document. However, the document makes strong claims such as “Severe packet loss under incast-like traffic is not rare” without providing any data to justify the claims. I followed up with the author to ask if he had any supporting data, and he replied that he did not. The response argues that it will eventually become necessary for all transport protocols to deal with packet loss under incast. I agree that this could happen if link speeds continue to increase faster than switch buffer capacity. But even if this does happen I doubt that Aeolus is the right solution because it penalizes short messages; better to drop packets from the longest messages, where occasional retransmissions will have the smallest effect on performance. In any case, more work is needed in this area, both to understand the degree of the buffer problem and to explore possible solutions.