TCP可靠传输总结

🧑‍💻 User

What is reliable transmission? Recall that the network layer is unreliable, and this may lead to packets being lost or arriving out of order. This can be an issue for a lot of applications. For example, a file downloaded over the Internet might become corrupted if some packets become lost during the transfer.

One option here is to allow the application developers to take care of the network losses, as is done in UDP. However, given that reliability is an important primitive that is desirable for many applications, TCP developers decided to implement this primitive in the transport layer. Thus, TCP guarantees in-order delivery of the application-layer data without any loss or corruption.

Now, let us look at how TCP implements reliability.

To have a reliable communication, the sender should be able to know which segments were received by the remote host and which were lost. Now, how can we achieve this? One way to do this is by having the receiver send acknowledgments indicating that it has successfully received the specific segment. If the sender does not receive an acknowledgment within a given period of time, the sender can assume the packet is lost and resend it. This method of using acknowledgments and timeouts is also known as Automatic Repeat Request or ARQ.

There are various methods in which it can be implemented:

The simplest way would be for the sender to send a packet and wait for its acknowledgment from the receiver. This is known as Stop and Wait ARQ. Note that the algorithm typically needs to figure out the waiting time after which it resends the packet, and this estimation can be tricky. A small timeout value can lead to unnecessary retransmissions, but a large timeout value can lead to unnecessary delays. Typically the timeout value is a function of the estimated round trip time (RTT) of the connection.

This type of alternate sending and waiting for acknowledgment has a significantly low performance. To solve this problem, the sender can send multiple packets without waiting for acknowledgments. More specifically, the sender may send at most N unacknowledged packets typically referred to as the window size. As the sender receives an acknowledgment from the receiver, it can send more packets based on the window size. In implementing this, we need to take care of the following concerns:

The receiver needs to identify and notify the sender of a missing packet. Thus, each packet is tagged with a unique byte sequence number which is increased for subsequent packets in the flow based on the size of the packet.
Also, both sender and receiver would need to buffer more than one packet. For instance, the sender would need to buffer packets that have been transmitted but not acknowledged. Similarly, the receiver may need to buffer packets because the rate of consuming these packets (e.g., writing to a disk) is slower than the rate at which the packets arrive.

Now let’s look at how the receiver notifies the sender of a missing segment.

One way is for the receiver to send an ACK for the most recently received in-order packet. The sender would then send all packets from the most recently received in-order packet, even if some of them had been sent before. The receiver can simply discard any out-of-order received packets. This is called Go-back-N. In the figure below, packet 7 is lost in the network so the receiver will discard any subsequent packets. The sender will send all the packets starting from 7 again.

Clearly, in the above case, a single packet error can cause a lot of unnecessary retransmissions. To solve this, TCP uses selective ACK ing. The sender retransmits only those packets that it suspects were received in error. Then, the receiver would acknowledge a correctly received packet even if it is not in order. The out-of-order packets are buffered until any missing packets have been received, at which point the batch of the packets can be delivered to the application layer.

Note that, even in this case, TCP would need to use a timeout as there is a possibility of ACKs getting lost in the network.

In addition to using a timeout to detect loss of packets, TCP also uses duplicate acknowledgments as a means to detect packet loss. A duplicate ACK is an additional acknowledgment of a segment for which the sender has already received acknowledgment earlier. When the sender receives 3 duplicate ACKs for a packet, it considers the packet to be lost and will retransmit it instead of waiting for the timeout. This is known as fast retransmit. For example, in the figure below, once the sender receives 3 duplicate ACKs, it will retransmit packet 7 without waiting for a timeout.

References

Kurose-Ross, Edition 6, Section 3.4

Alternative formats

🤖 Assistant

费曼学习法自检清单：TCP 可靠传输 (Reliable Transmission)

📋 自检 Checklist

第一部分：基础概念理解

[ ] 1. 我能解释为什么需要可靠传输？网络层 (Network Layer) 存在什么问题？
[ ] 2. 我能说明 TCP 和 UDP 在处理可靠性上的区别是什么？
[ ] 3. 我能用自己的话解释 ARQ (Automatic Repeat Request) 的核心思想是什么？

第二部分：ARQ 实现方式

[ ] 4. 我能描述 Stop and Wait ARQ 的工作流程，并说明它的缺点是什么？
[ ] 5. 我能解释超时时间 (Timeout) 设置过大或过小分别会导致什么问题？
[ ] 6. 我能说明窗口 (Window Size) 的概念以及它如何提高传输效率？

第三部分：序列号与缓冲

[ ] 7. 我能解释为什么每个数据包需要序列号 (Sequence Number)？
[ ] 8. 我能说明发送方和接收方分别为什么需要缓冲区 (Buffer)？

第四部分：丢包处理机制

[ ] 9. 我能描述 Go-back-N 的工作原理及其缺点是什么？
[ ] 10. 我能解释选择性确认 Selective ACK 如何改进 Go-back-N？
[ ] 11. 我能说明什么是重复确认 (Duplicate ACK)？
[ ] 12. 我能解释快速重传 (Fast Retransmit) 的触发条件和工作原理？

第五部分：综合应用

[ ] 13. 我能画出一个简单的图示，展示数据包丢失后 TCP 如何恢复？
[ ] 14. 我能向一个完全不懂网络的人解释 TCP 如何保证数据不丢失？

✅ 参考答案

1. 为什么需要可靠传输？

网络层是不可靠的 (unreliable)，可能导致：

数据包丢失 (packets lost)
数据包乱序到达 (arriving out of order)

例子：下载文件时，如果部分数据包丢失，文件会损坏 (corrupted)。

2. TCP Vs UDP 在可靠性上的区别

TCP	UDP
在传输层实现可靠性	让应用层自己处理丢包
保证有序、无丢失、无损坏	不保证可靠性

3. ARQ 的核心思想

ARQ = Automatic Repeat Request（自动重传请求）

核心机制：

确认 (Acknowledgment/ACK)：接收方告诉发送方”我收到了”
超时 (Timeout)：发送方等待一段时间没收到 ACK，就重发

4. Stop and Wait ARQ

工作流程：

Code

1	发送方发送 Packet 1 → 等待 → 收到 ACK → 发送 Packet 2 → 等待...

缺点：效率极低，大部分时间在等待。

5. 超时时间设置的影响

超时太短	超时太长
不必要的重传 (unnecessary retransmissions)	不必要的延迟 (unnecessary delays)

最佳实践：超时值基于往返时间 RTT (Round Trip Time) 估算。

6. 窗口 (Window Size)

定义：发送方在等待 ACK 之前，最多可以发送 N 个未确认的数据包。

好处：不用一个一个等，可以连续发送多个包，提高效率。

7. 为什么需要序列号？

接收方需要识别哪个包丢失了
接收方需要通知发送方具体丢了哪个
每个包有唯一的字节序列号 (byte sequence number)

8. 为什么需要缓冲区？

发送方缓冲区	接收方缓冲区
存储已发送但未确认的包	存储已收到但还没处理的包
方便重传	因为处理速度可能比接收速度慢

9. Go-back-N 原理及缺点

原理：

接收方只确认最近按序收到的包
收到乱序包直接丢弃 (discard)
发送方从丢失的包开始全部重发

缺点：一个包丢失 → 大量不必要的重传

Code

1	丢失 Packet 7 → 丢弃 8, 9, 10 → 重发 7, 8, 9, 10

10. Selective ACK 如何改进？

改进点：

接收方确认每个正确收到的包（即使乱序）
乱序包缓存起来，等缺失的包到了再一起交付
发送方只重传丢失的包

11. 什么是重复确认？

Duplicate ACK（重复确认）：

接收方对同一个包发送多次 ACK
表示”我还在等那个丢失的包”

12. 快速重传 (Fast Retransmit)

触发条件：收到 3 个重复 ACK

工作原理：

Code

1
2
3

收到 ACK 6, ACK 6, ACK 6, ACK 6（3个重复）
→ 判定 Packet 7 丢失
→ 立即重传 Packet 7（不等超时）

好处：比等待超时更快恢复。

13. 图示：TCP 丢包恢复流程

Code

发送方                     接收方
  |--- Pkt 6 ------------->|  ✓ ACK 6
  |--- Pkt 7 -----X        |  (丢失)
  |--- Pkt 8 ------------->|  收到，缓存，发 ACK 6
  |--- Pkt 9 ------------->|  收到，缓存，发 ACK 6
  |--- Pkt 10 ------------>|  收到，缓存，发 ACK 6
  |<-- ACK 6 (第3个重复) ---|
  |--- Pkt 7 (重传) ------>|  ✓ 收到，交付 7,8,9,10

14. 用大白话解释 TCP 可靠传输

想象你寄快递给朋友，但快递公司不靠谱，可能丢件。

TCP 的做法：

每个包裹编号（序列号）

朋友收到后发短信确认（ACK）

你等一段时间没收到确认就重寄（超时重传）

如果朋友连续说三次”我还没收到7号”，你立刻重寄7号（快速重传）

朋友会把乱序到的包裹先存着，等齐了再拆（缓冲+按序交付）

🎯 关键术语速查表

英文	中文
Reliable Transmission	可靠传输
ARQ (Automatic Repeat Request)	自动重传请求
Stop and Wait	停等协议
Window Size	窗口大小
Sequence Number	序列号
Buffer	缓冲区
Go-back-N	回退N
Selective ACK	选择性确认
Duplicate ACK	重复确认
Fast Retransmit	快速重传
RTT (Round Trip Time)	往返时间
Timeout	超时