94 lines
		
	
	
		
			3.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			94 lines
		
	
	
		
			3.2 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| 	Some less-widely known details of TCP connections.
 | |
| 
 | |
| 	Properly closing the connection.
 | |
| 
 | |
| After this code sequence:
 | |
| 
 | |
|     sock = socket(AF_INET, SOCK_STREAM, 0);
 | |
|     connect(sock, &remote, sizeof(remote));
 | |
|     write(sock, buffer, 1000000);
 | |
| 
 | |
| a large block of data is only buffered by kernel, it can't be sent all at once.
 | |
| What will happen if we close the socket?
 | |
| 
 | |
| "A host MAY implement a 'half-duplex' TCP close sequence, so that
 | |
|  an application that has called close() cannot continue to read
 | |
|  data from the connection. If such a host issues a close() call
 | |
|  while received data is still pending in TCP, or if new data is
 | |
|  received after close() is called, its TCP SHOULD send a RST
 | |
|  to show that data was lost."
 | |
| 
 | |
| IOW: if we just close(sock) now, kernel can reset the TCP connection
 | |
| (send RST packet).
 | |
| 
 | |
| This is problematic for two reasons: it discards some not-yet sent
 | |
| data, and it may be reported as error, not EOF, on peer's side.
 | |
| 
 | |
| What can be done about it?
 | |
| 
 | |
| Solution #1: block until sending is done:
 | |
| 
 | |
|     /* When enabled, a close(2) or shutdown(2) will not return until
 | |
|      * all queued messages for the socket have been successfully sent
 | |
|      * or the linger timeout has been reached.
 | |
|      */
 | |
|     struct linger {
 | |
| 	int l_onoff;    /* linger active */
 | |
| 	int l_linger;   /* how many seconds to linger for */
 | |
|     } linger;
 | |
|     linger.l_onoff = 1;
 | |
|     linger.l_linger = SOME_NUM;
 | |
|     setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
 | |
|     close(sock);
 | |
| 
 | |
| Solution #2: tell kernel that you are done sending.
 | |
| This makes kernel send FIN after all data is written:
 | |
| 
 | |
|     shutdown(sock, SHUT_WR);
 | |
|     close(sock);
 | |
| 
 | |
| However, experiments on Linux 3.9.4 show that kernel can return from
 | |
| shutdown() and from close() before all data is sent,
 | |
| and if peer sends any data to us after this, kernel still responds with
 | |
| RST before all our data is sent.
 | |
| 
 | |
| In practice the protocol in use often does not allow peer to send
 | |
| such data to us, in which case this solution is acceptable.
 | |
| 
 | |
| Solution #3: if you know that peer is going to close its end after it sees
 | |
| our FIN (as EOF), it might be a good idea to perform a read after shutdown().
 | |
| When read finishes with 0-sized result, we conclude that peer received all
 | |
| the data, saw EOF, and closed its end.
 | |
| 
 | |
| However, this incurs small performance penalty (we run for a longer time)
 | |
| and requires safeguards (nonblocking reads, timeouts etc) against
 | |
| malicious peers which don't close the connection.
 | |
| 
 | |
| Solutions #1 and #2 can be combined:
 | |
| 
 | |
|     /* ...set up struct linger... then: */
 | |
|     setsockopt(sock, SOL_SOCKET, SO_LINGER, &linger, sizeof(linger));
 | |
|     shutdown(sock, SHUT_WR);
 | |
|     /* At this point, kernel sent FIN packet, not RST, to the peer, */
 | |
|     /* even if there is buffered read data from the peer. */
 | |
|     close(sock);
 | |
| 
 | |
| 	Defeating Nagle.
 | |
| 
 | |
| Method #1: manually control whether partial sends are allowed:
 | |
| 
 | |
| This prevents partially filled packets being sent:
 | |
| 
 | |
|     int state = 1;
 | |
|     setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
 | |
| 
 | |
| and this forces last, partially filled packet (if any) to be sent:
 | |
| 
 | |
|     int state = 0;
 | |
|     setsockopt(fd, IPPROTO_TCP, TCP_CORK, &state, sizeof(state));
 | |
| 
 | |
| Method #2: make any write to immediately send data, even if it's partial:
 | |
| 
 | |
|     int state = 1;
 | |
|     setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, &state, sizeof(state));
 | 
