...
No. It’s important to distinguish between oversubscription and overload. Oversubscription means that the aggregate bandwidth of all of the host downlinks exceeds the aggregate bandwidth of the network core. In an oversubscribed system, if every host were to transmit at full bandwidth to destinations across the datacenter, the core network could become overloaded. However, overload virtually never happens in practice, because hosts typically use only a small fraction of their uplink bandwidth and some of the traffic targets neighbors attached to the same top-of-rack switch. As a result, even with oversubscription, core networks tend to run at relatively low utilization. It would not be cost-effective to underprovision core networks so that they can’t keep up with the actual loads, because this would result in under-utilization of the more expensive host machines.
Does Homa have problems with excessive buffer usage?
No; Homa uses less buffer space than TCP. A recent paper on Aeolus claimed that Homa overloads buffers, resulting in dropped packets and poor performance. However, the paper was based on a false assumption about how buffers are managed in modern switches, and its results are invalid. This claim has been made by a few recent papers, most notably Aeolus. However, these papers based their claims on inaccurate assumptions about switch buffer management. For example, the Aeolus paper assumes that switch buffer space is statically divided among egress ports, whereas in fact switches provide shared pools of buffers, so they can handle brief spikes at a particular port. See the Aeolus rebuttal for more informationdiscussion of the Aeolus claims.
Homa is arguably optimal in its use of buffers. In order to achieve the best performance for a new message in an unloaded network, a sender must be able to unilaterally transmit enough data to cover the time it takes to get a packet to the receiver, process that packet in software, and return some sort of signal back to the sender. This is what Homa does; it calls this unscheduled data and uses the term rttBytes to refer to the amount of unscheduled data that may be sent for each message. If a sender sends less than rttBytes before hearing back from the receiver, then network bandwidth will be wasted. Buffering will occur at the receiver’s downlink if several new messages begin transmitting around the same time. With Homa, the receiver detects the buffering as soon as it receives one packet from each message, and it immediately takes steps to reduce the buffering by withholding grants; again, this is optimal.
Buffering as described above cannot be avoided without sacrificing performance in the unloaded case; if this is a tradeoff you’re willing to make, Homa can be configured to reduce the amount of unscheduled data sent for new messagesOur experience with practical implementations of Homa has found no problems with buffer overflows. For example, the worst-case buffer consumption in benchmarks of the Linux kernel implementation was about 8.5 MB, for a switch with 13.5 MB capacity (these benchmarks used a 25 Gbps network). Extrapolations suggest that Homa will also work with newer 100 Gbps switching chips such as Broadcom’s Tomahawk chips, though buffer space will be tighter in these chips than it is in the current 25 Gbps network.
Our measurements indicate that TCP needs at least as much buffer space as Homa, so any switch that works for TCP is likely to work for Homa. It seems unlikely that people will create switches that don’t work for TCP.
It appears that newer switching chips are increasing their bandwidth faster than their buffer space. Suppose there comes a time where switches no longer have enough buffer space for Homa: will that make Homa useless?
No. To date we have not made any attempt to reduce Homa’s buffer usage, but it seems likely that buffer usage could be reduced significantly. For example, most buffer usage comes from either unscheduled packets or overcommittment. In the worst case, these could be scaled back to reduce buffer consumption (but this would come at some cost in performance). We also have ideas for a optimizations that might reduce buffer usage without any performance impact. See the projects page for details.
Bottom line: it is premature to declare that Homa is impractical because of its buffer usage when (a) we have actual implementation experience that shows this is not a problem, and (b) we have ideas how to reduce buffer usage in the future if that should be needed.
Is Homa resilient against dropped packets?
...