TCP
TCP
- Point-to-point: One sender and one receiver
- Reliable, in-order byte stream: No need to message boundaries
- Full duplex data: Bi-directional data flow in same connection. i.e. client and server can send the data at the same time.
- MSS (Maximum Segment Size)
- Cumulative ACKs: only ACK consecutive packets
- Connection-oriented: Handshaking needed
- Flow control with receiver window size (to not overwhelm receiver)
- Congestion control to avoid overflowing network
TCP segment header
- source, destination port number
- sequence number: first byte's sequence number (we assign sequence number per each byte)
- acknowledgement number: sequence number of next expected byte (ACK n, other host's sequence number)
- length: only 4 bits!!
- receiver window size: max 65536, can be increased in options
- options: e.g. increase length of receiver window, congestion control, tcp options, ...
- checksum
TCP timeout
Timeout should be longer than RTT (round trip time), but RTT varies!
Too short timeout will have unnecessary retransmissions.
Too long timeout will have slow reaction to segment loss.
Since RTT varies, timeout should vary, so TCP estimates RTT!
SampleRTT is the measured time from segment transmission until ACK receipt (ignoring retransmissions).
We should average several recent SampleRTT, so we use .
This is called Exponential Weighted Moving Average (EWMA) because influence of past sample decreases exponentially fast.
Typical value of α is 0.125.
Since timeout should be longer than RTT, we use safety margin.
DevRTT is EWMA of SampleRTT deviation from EstimatedRTT.
Typical value of β is 0.25.
This is not an optimal timeout interval, these values are used to make the computer easier to calculate.
TCP Sender
- Create segments with sequence number, then start timer with expiration interval.
- If timeout, retransmit segment and restart timer. (Unlike Go-back-N, send only unACKed segment)
- If ACK received and acknowledges previously unACKed segments, update ACKed segments, then send unACKed segments and restart timer if unACKed segments exists.
- If ACK received and acknowledges previously ACKed segments thrice (i.e. three duplicated ACKs / four same ACKs), fast retransmit - immediately retransmit segment.
TCP Receiver
- If in-order segment arrives and all segments are already ACKed, send delayed ACK, wait up to 500ms for next segment.
- If in-order segment arrives and one other segment has ACK pending (i.e. receiver was delaying ACK), immediately send cumulative ACK (thus ACKing both in-order segments).
- If out-of-order segment with higher-than-expect sequence number arrives (i.e. gap detected), immediately send duplicate ACK.
- If segment that partially or completely fills the gap, immediately send ACK.
TCP flow control
If data is transferred too fast, the socket buffer will overflow.
Any packet will be dropped when buffer overflows!
Flow control: Receiver controls sender, so sender won't overflow receiver's buffer by transmitting too much, too fast.
TCP segmgent header have "receive window" value, the number of bytes receiver willing to accept.
TCP receiver advertises free buffer space in rwnd field in TCP header.
RcvBuffer size is set via socket options. Default value is 4KB, but typically 64KB is used. (64KB is maximum because header value is 16bits.)
Sender limits amount of unACKed (in-flight) data to rwnd value.
This guarantees that receive buffer will not overflow!
In practice, the buffer size can be changed dynamically!
Also, 64KB is too small for Gbps internet...
We added window scale factor (0 ~ 14) for TCP option.
This shifts rwnd value, so theoretically, the buffer size can be up to 1GB (64KB * 2^14).
TCP connection management
During handshake process, we agree to establish connection, and we agree on connection parameters.
e.g. sequence numbers, RcvBuffer size, MSS (Maximum Segment Size), ...
2-way handshake doesn't work
2-way handshake is not enough!
Delay is variable, messages can be lost, reordered, retransmitted, etc.
e.g. client retrasmitted handshake message (req_conn), but this message is super delayed and server accepts it after client termination! (half open connection)
3-way handshake
Seq is Sequence number, SYNbit is control packet.
- Client send TCP SYN msg (with SYNbit=1, Seq=x)
- Server responds TCP SYNACK msg (with SYNbit=1, Seq=y, ACKbit=1, ACKnum=x+1)
- Client establish connection, and responds with ACK msg (with ACKbit=1, ACKnum=y+1)
- Server receive ACK msg, then establish connection.
SYNbit is ON only in the first two handshake packets.
ACKbit is OFF only in the first handshake packet. (recall: we removed NAK!)
Normally handshake doesn't have a payload, but there are some RFCs (fast open) that allows sending data with handshake.
Closing a TCP connection
- Client and server each close their side of connection by sending FIN msg (with FINbit = 1)
- Receiver should respond FIN with ACK.
- Receiver can respond FIN with FIN and ACK if they don't have data to send.
This is not handshake, simultaneous FIN exchanges can be handled!
- Passive close: connection is closed from other side. Send FIN when ready to close, and close connection after receiving ACK.
- Active close: connection is closed from your side. After both FIN is ACKed, we wait for 2MSL (Maximum Segment Lifetime, normally 2MSL = 120s!) before actually closing connection.
This can handle retransmitted FIN, dropped ACK, and doesn't have to worry that FIN might close next connection.
TCP State Transition Diagram
RST is sent when you want to close connection abruptly.
Theorically, two client can try to connect each other, and simultaneous SYN can be sent.
They'll transition to SYN_SENT state, then send SYNACK (transition to SYNC_TCVD), then connection will established when receiving ACK (transition to ESTABLISHED)
Transition from LISTEN to SYN_SENT is in RFC, but it's not used in practice.