Sockets - send()

Fairlight fairlite at fairlite.com
Thu Oct 30 17:40:58 PDT 2008


Simon--er, no...it was Brian K. White--said:
> Probably only most of the time, not by any garantee.
> If Ken is right that fragmented tcp/ip packets are why you are
> receiving less than the total 4000, well packets may be broken
> any time to any size for any number of reasons by any device
> along the way between sender/receiver.
> So simply requesting a packet that is smaller than your own box's
> mtu in no way ensures that you will receive 1000 bytes every time.
> 
> However _most_ ethernet devices have an mtu of 1500, and your
> traffic is only likely to pass through a very few connections that have
> lower mtus (Because of say, PPPoE eating some bytes as overhead
> for the PPPoE protocol itself, this is generally just residential
> DSL, but possibly some types of vpn  might incur some overhead too) 
> TCP/IP itself uses up 44 bytes of each packet too, so the payload
> that holds your data is actually smaller than that.

You can't rely on MTU -at all- in terms of anything other than the absolute
maximum size you'll get in.  Even then, you dare not rely on it with
anything that uses hardwired values, as if the MTU is ever increased you
may hit a segv due to buffer overrun, depending on how your buffers are
handled.  While MTU is a guarantee of fragmentation at the MTU point, you
can fragment in ways that have -nothing- at all to do with MTU whatsoever.
I've seen this lots of times.

> So, _usually_ 1000 bytes of payload data are going to be small
> enough that nothing ever ends up needing to break the packet
> into smaller packets. 1000 bytes from you at the application

Negative.  I've seen breaks at far smaller (e.g., 304 bytes, or even 84
bytes) with MTUs of no less than 1456 along the whole chain.

> I'm surprised you have to worry about this at all.
> Usually this is all transparently handled by the OS
> or by _something_ such that an application never has
> to be aware of it. The OS or the nic driver or something has

I'm just going to go way out on a limb here and guess you haven't actually
written and tested many client/server pairs?  Trust me, the application
-better- be aware of its own protocol.  For that matter, it better -have- a
protocol.  Which sounds like the problem--poor design in using an expected
pre-defined length, rather than implementing an application-level protocol
and adhering to -that-.

> Actually, maybe that's the problem. Maybe you are passing

[snip]

> Or perhaps it's nothing on your end but the person at the other

Maybe, perhaps, possibly, and somewhat half-plausibly.  Great.

> Sorry for all the maybes.

*long sigh*

You don't rely on a specific length.  That's a recipe for disaster, period.

You define an application level protocol with an indicator for
end-of-application-level "packet".  For instance, a packet might end in
\001\000\001 as a three byte sequence.  You keep doing non-blocking recv()
for whatever size you want (I use 8192, as you're most efficient with
bigger packets, as witnessed by vastly increased NFS performance, even with
MTU 1500), and you keep reading into an accumulator space until you get a
packet that contains that pre-defined sequence you decided on, whatever you
chose.  

Then when you hit your application-level end-of-"packet", you know you have
received your application-level "packet" (ie., chunk of data), you handle
what preceeded it as one discrete chunk of information, flush everything
before and including that marker, and start reading for the next
application-level "packet", searching for the designated EOP string.

Rinse and repeat until done.

Unless you're dealing with a handshaking application-level protocol, you
also read in as -fast- as absolutely possible (hence larger chunks).  If
you do a lot of processing, it's really important to handle this in a lower
priority scenario.  If you do a lot of processing, you may suffer data
dropouts--data is getting transmitted to the socket, but you're not reading
it fast enough, too much accumulates before you get back to it, and data
falls off the end.  It arrived as guaranteed by TCP, but the FIFO order is
corrupted due to insufficient buffer space with too-infrequent reading.
What I do in that case is make reading the -top- level priority in any
server loop, writing the second highest priority, and processing the lowest
priority.  Any time at all there's an atomic breakpoint that won't screw
things up, hit the socket to try and get more for the accumulator before
you do anything at all.  If you don't do that, and you spend long enough
doing something else other than reading, you likely break the contiguous
data stream at somewhere around 64KB sitting at the socket.  It varies
by OS, but that's about what I observed on Solaris last time I wrote a
client/server pair.

I don't have to "maybe", "possibly", etc.  I've been tweaking clients and
servers like ForumNet/ICB since 1991, wrote my own ICB client from scratch
in C in 1994-95, and have been writing servers in Perl since 2003.

Key things:

1) Application level protocol.  Design one, implement one.  Adhere to it.
Make it robust enough to do the job.  Break your job down into any tasks
that may require special networking tasks (ie., authentication, etc.).

2) Read data as fast as possible as the highest possible priority job.
You can mitigate the need to do that if you design your application-level
protocol to be handshaking (ie., any new input would not be generated
unless you ACK the last app-level packet received first, thus requesting
a new packet).  If you don't, then you MUST read as fast as possible or
you're screwed.  And if you notice the mitigation technique of handshaking,
you'll note I said application-level protocol, thus referring to that thing
you REALLY MUST HAVE from note #1.  Your tradeoff for handshaking is a bit
slower throughput, and a bit more bandwidth.  These days, nobody will
notice the bandwidth.

Anyway, that's how things -are-, not how they "might" be.

mark->
-- 
"I'm not subtle. I'm not pretty, and I'll piss off a lot of people along
the way. But I'll get the job done" --Captain Matthew Gideon, "Crusade"


More information about the Filepro-list mailing list