Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Have receivers tune the amount of unscheduled data that can be sent in incoming messages. Right now this is set to “rttBytes”, the amount of data that can be sent in the time it takes for the first data packet to reach the receiver and the receiver to return a grant packet. However, consider a situation where 50% of a receiver’s incoming link bandwidth is taken by unscheduled packets. In this case, the unscheduled limit could be dropped to rttBytes/2: those packets will only be able to use about half of the receiver’s bandwidth, so by the time the last of those packets has been sent, the receiver should have had enough time to return a grant packet. It seems likely that some sort of adaptive mechanism could reduce buffer utilization, especially under high loads. Receivers could notify senders of the right amount of unscheduled data to send using the same mechanism used to announce priority cutoffs for unscheduled packets.

  • Senders could also reduce the amount of unscheduled data in some situations without loss of performance. For example, suppose a sender is transmitting a high priority message A when a new lower priority message B is initiated. In the current protocol, no packets will be sent for B as long as packets can be transmitted for A; then, once A has completed (or run out of grants), a full load of unscheduled data will be sent for B. However, supposed the sender preempts A to send one unscheduled packet for B, then returns to A. If A continues transmitting for at least one RTT, there’s no need to send additional unscheduled packets for B, since the receiver will already have had plenty of time to receive the initial packet and return a grant. This approach has the disadvantage of delaying A slightly, but that effect might prove to be insignificant.

  • The overcommitment mechanism could be modified to lower the total amount of granted-but-not-yet-received data that can exist at any given time. Right now it is set to the degree of overcommittment (typically 8) times rttBytes, but this could potentially be reduced, either by reducing the degree of overcommitment or by reducing the amount of granted data for each of the messages (perhaps the highest priority messages would be given more grants than lower priority ones?). This optimization could reduce the effectiveness of overcommitment at maintaining full bandwidth utilization, so it would need study.

New API for Receive System Calls

The homa_recv system call in the Linux implementation currently takes a message buffer as argument. This makes it difficult for Homa to pipeline the flow of messages. Ideally, when a large message is being received, data from the message should be copied from kernel buffers to user space as packets are received, so that when the final packet arrives for the message, almost all of the data has already been copied out to user space. However, this is awkward with the current API. The problem is that a shorter message could arrive in the middle of receiving a longer message; in this case, homa_recv must return the shorter message. Any data copied into the message buffer from some other message would have to be discarded; it would have to be re-copied from the kernel’s packet buffers to a new message buffer from a subsequent call to homa_recv. As a result, Homa does not copy any data to user space until the entire message has been received; this means that the copy can’t be pipelined with packet arrivals, and there can be a significant additional delay to copy the entire message. This also increases Homa’s use of kernel packet buffers: all of the packet buffers for a long message must be retained until the entire message has been received.

In order to allow pipelining, applications must make buffer space available to the kernel independently of specific homa_recv kernel calls. For example, an application could make a large area of buffer space available to the kernel. The kernel would then allocate regions of this buffer to incoming messages; it could pipeline data transfers into this buffer and free kernel packet buffers as soon as data was transferred to the application’s buffer. The homa_recv kernel call would no longer specify a particular message buffer; the kernel would simply return information about the region it allocated from the larger buffer. There are numerous details that would have to be resolved to make this approach workable, such as how to reuse space in the application’s buffer once incoming messages have been handled (the buffer could become fragmented, with much of its space no longer in use, but a few large messages that have still not been fully received).

RPC Framework Integration

...