linux – 在将大数据写入套接字时最小化副本



// Allocate buffer buf.
// Store image data in this buffer.


buf = mmap(file,len);  // Imagine proper options.
// Store image data in this buffer.


buf = mmap(in_fd,len);  // Imagine proper options.
// Store image data in this buffer.
int rc;
rc = sendfile(out_fd,file,&offset,count);
// Deal with rc.

似乎(1)和(2)可能会做同样的事情,因为jemalloc可能首先通过mmap分配内存.但我不确定(3).这真的会带来什么好处吗?这个article on Linux零拷贝方法的图4表明使用sendfile可以防止进一步复制:

no data is copied into the socket buffer. Instead,only descriptors
with information about the whereabouts and length of the data are
appended to the socket buffer. The DMA engine passes data directly
from the kernel buffer to the protocol engine,thus eliminating the
remaining final copy.





似乎我的怀疑是正确的.我从 article获得了我的信息.引用它:

Also these network write system calls,including sendfile,might and
in many cases do return before the data sent over TCP by the method
call has been ackNowledged. These methods return as soon as all data
is written into the socket buffers (sk buff) and is pushed to the TCP
write queue,the TCP engine can manage alone from that point on. In
other words at the time sendfile returns the last TCP send window is
not actually sent to the remote host but queued. In cases where
scatter-gather DMA is supported there is no seperate buffer which
holds these bytes,rather the buffers(sk buffs) just hold pointers to
the pages of OS buffer cache,where the contents of file is located.
This might lead to a race condition if we modify the content of the
file corresponding to the data in the last TCP send window as soon as
sendfile is returned. As a result TCP engine may send newly written
data to the remote host instead of what we originally intended to



Be aware,when splicing data from a mmap’ed buffer to a network socket,it is not possible to say when all data has been sent. Even if splice() returns,the network stack may not have sent all data yet. So reusing the buffer may overwrite unsent data.


