The paper “PowerTCP: Pushing the Performance Limits of Datacenter Networks“ (NSDI 2022) describes a new congestion control algorithm for datacenters and compares it against several existing protocols, including Homa. Among other things, the paper claims better performance than Homa, including a 99% reduction in tail latency, and it claims that Homa’s overcommitment mechanism is harmful to performance. Unfortunately, the third-party simulator used to evaluate Homa suffers from numerous problems, including bugs, missing features, and misconfiguration. After discussions with the authors, we have agreed that because of the simulator issues most of the Homa measurements in the paper are incorrect. In addition, there remain anomalies that have not been resolved. Until all of the issues are resolved and measurements are retaken, the paper’s conclusions relating to Homa should be considered unreliable.

The remainder of this article discusses in more detail the problems with the Homa evaluation in the paper. It then argues that there is good reason to believe Homa will outperform PowerTCP, due to its inherent advantages of knowing message lengths and using network priority queues.

I have not scrutinized the paper’s analysis of PowerTCP or its comparison with other systems such as TIMELY and HPCC, and I have no reason to question those measurements. Based on my reading of the paper, its conclusion that PowerTCP provides significant benefits in today’s TCP-based datacenters seems plausible.

The PowerTCP authors have issued their own statement, agreeing with the problems above.

About the simulator

The Homa simulator used for the PowerTCP paper was not developed by either the PowerTCP authors or the Homa developers. It is in a relatively early stage of development, but unfortunately advertised itself in a way that suggested it was endorsed by the Homa creators as a reference implementation. This is not the case, and the wording has subsequently been changed, but the simulator’s original description could easily mislead people into placing undue trust in it.

Known problems with the simulator

As a result of discussions with the PowerTCP authors about anomalies in the Homa measurements, we have uncovered the following bugs and missing features in the simulator:

All of these issues have now been registered in the simulator’s GitHub repository.

Consequences of the simulator problems

Before any of the paper’s Homa measurements or conclusions can be considered valid, the known bugs need to be fixed, missing simulator features need to be implemented, and the remaining anomalies need to be analyzed to see if there are additional simulator problems. Once all of these issues have been resolved, all of the Homa measurements need to be retaken.

Homa vs. PowerTCP: which is better?

Contrary to the paper’s conclusions, there are good reasons to believe that Homa will outperform PowerTCP once all of the simulator problems are fixed. Further measurements are needed to confirm these arguments, but the arguments provide additional reasons to be skeptical of the paper’s claims about Homa until proper measurements have been made.

It is worth noting that PowerTCP addresses congestion in the network core fabric, while Homa does not. PowerTCP appears to offer significant benefits in today’s datacenters, which are based on TCP and suffer core congestion. At the same time, it’s also important to note that core congestion is not an issue for Homa. Core congestion only exists in current datacenters because they use flow-consistent routing, which is required by protocols such as TCP. With flow-consistent routing, core congestion can happen even at very low overall network utilization, when two long flows happen to randomly hash to the same link. In contrast, Homa uses packet spraying; this should eliminate core congestion as a significant factor. Thus there is no need for Homa to address core congestion.

It is also worth noting that Homa’s overcommitment mechanism consumes additional buffer space. Overall, PowerTCP is likely to use less buffer space in the steady-state than Homa. Homa assumes that modern switches have adequate buffer space for its overcommitment mechanism. If this assumption turns out to be false, then the Homa protocol will need to be modified, and this will likely affect its performance. See this article for a more thorough discussion of buffer utilization in Homa.

Overall, it seems likely that PowerTCP provides significant value in TCP environments, but that Homa will outperform PowerTCP as long as it runs in an environment that supports packet spraying and has adequate buffer space.