...
Does Homa have problems with excessive buffer usage?
NoThis is an open question. This claim has been made by a few recent papers, most notably Aeolus. However, these papers based their claims on inaccurate assumptions about switch buffer management. For example, the Aeolus paper assumes that switch buffer space is statically divided among egress ports, whereas in fact switches provide shared pools of buffers, so they can handle brief spikes at a particular port. See the Aeolus rebuttal for more discussion of the Aeolus claims.
Our experience with practical implementations of Homa has found no problems There has been no problem with buffer overflows in the existing implementations of Homa. For example, the worst-case buffer consumption in benchmarks of the Linux kernel implementation was about 8.5 MB, for a switch with 13.5 MB capacity (these benchmarks used a 25 Gbps network). Extrapolations suggest that Homa will also work with Our implementation of Homa in RAMCloud, which used Infiniband networking, also had no problems with buffer overflows, though we did not measure Homa’s actual buffer usage.
Extrapolations to newer 100 Gbps switching chips, such as Broadcom’s Tomahawk chips, though buffer space will be tighter in these chips than it is in the current 25 Gbps network.Our Tomahawk-3, suggest there may be challenges for Homa. To see this, take the ratio of total required buffer space to total host downlink bandwidth; this has units of time. It seems plausible that this ratio will remain constant as network speeds scale. In the Linux kernel implementation, Homa used 8.5 MB of buffer space to drive 40 nodes at 25 Gbps: the ratio is 68 microseconds of buffer space. Tomahawk-3 switches offer 128 ports at 100 Gbps, for 12.8 Gbps total bandwidth, and they have 64 MB of buffer space, which is 40 usecs worth. This would appear to be insufficient for Homa. However, with 2:1 oversubscription, only ⅔ of the switch bandwidth will be for downlinks. Assuming that there will be little or no buffering on the uplinks (since Homa can use packet spraying), this would result in 60 usecs of buffering on the downlinks, which is very close to what Homa needs.
Another consideration is that our measurements indicate that TCP needs at least as much buffer space as Homa. Thus, so any switch that works for TCP is likely to work for Homa. It seems unlikely that people will create switches that don’t work for TCP.
It appears that newer switching chips are increasing their bandwidth faster than their buffer space. Suppose there comes a time where switches no longer have enough buffer space for Homa: will that make Homa useless?
...
Homa is resilient in that it will detect dropped packets and retransmit them, but Homa assumes that drops are extremely rare. If packets are dropped frequently, Homa will perform poorly. Packet drops from corruption are extremely rare, so the only thing to worry about is buffer overflow. Fortunately, Homa’s buffer usage is low enough to make buffer overflow extremely unlikely in modern switches.
How does Homa handle incast?
...