notes/docs/lectures/acn/13_reliability.md

# Reliability

Achieving reliability:

- Re-transmitting lost data
  - This is done by detecting lost via explicit acknowledgment
  - These can be positive or negative

### Stop ‘n’ Wait

Simplest possible paradigm

- Transmit `seq(x)`
- Wait for `ack(x)`
- Transmit `seq(x+1)`

![img](/lectures/acn/img/a.jpeg)

This has really poor performance in high latency and uses high bandwidth (half the bandwidth is overhead (acknowledgements))

**Rate control**: Never sending too fast for the network

**Sliding window**: allow unacknowledged data in flight (data to be sent)

**Retransmission TimeOut**: how long to wait to decide a segment is lost

- This requires estimates of dynamic quantities

- Permit N segments in flight
- Timeout implies loss
- Retransmit from lost packet onward
  - This is bad as imagine if only packet 3 is lost out of 5, this means client will resend 3-5.

##### Congestion Collapse

When network load is too high, it causes *congestion collapse*

Why?

- The routers buffers fill up, traffic is discarded, hosts retransmit
- Retransmit rates increase since more data was lost
- This was solved in “Congestion Avoidance and Control”

#### Stability of the Internet

Flows and protocols **include some sort of congestion control** and adaptation so that they moderate their bandwidth use, limit packet loss as well as get approximately fair share of available network bandwidth

1. **Responsiveness** defined as a number of round-trip times of sustained congestion required to reduce the rate by half
2. **Stability and smoothness** defined as the largest reduction of the sending rate in one round trip time in a steady state scenario
3. **Fairness** towards other flows when competing for bandwidth

Mimicking TCP behaviour for multimedia congestion control results in fairness towards TCP but also in significant oscillations in bandwidth

- Multimedia streaming applications need to **have much lower variation at throughput** over time compared to TCP to result in relatively smooth sending rates that are of importance to the end-user perceived quality.
- The penalty for having smoother throughput than TCP while competing for bandwidth is that multimedia congestion control responds slower than TCP to changes in available bandwidth.
- Thus, if multimedia traffic wants smooth throughput, it needs to avoid TCP’s halving of the sending rate in response to a single packet drop.

###### Packet Loss

- When choosing the method for packet loss detection, it is important to choose a method that **detects packet losses as early and accurately as possible**
  - Incorrect detection & late packet delivery can lead to incorrect packet loss estimation
  - This causes unresponsive & unfair behaviour
- Calculating packet loss rates can be done over various lengths of time intervals.
  - Shorter intervals result in more responsive behaviour but are more susceptible to noise
  - Longer intervals = smoother but less responsive
  - It is important to find a balance
- In order to guarantee sufficient responsiveness to congestion and preserver smoothness, methods for detecting & calculating packet loss must be chosen carefully.
  1. What mechanism can be used for packet loss detection?
  2. What algorithm can be used for packet loss rate calculation?
  3. Where can packet loss detection and calculation happen?

###### Approach

- All sent packets are marked with consecutive sequence of numbers
- When a packet is sent a timeout value for this packet is computed and an entry containing the sequence number and the timeout value is inserted into a list and kept there until packet delivery is acknowledged or considered to be lost
  - If the timeout expires before the packet is acknowledged, the corresponding packet is considered to be lost
- In order to adapt to varying and unpredictable network conditions, the timeout is not fixed, but computed based on one of the algorithms for TCP timeout computation

##### Timeout Based Approaches

This is mostly used for multimedia situations

RTT - round trip times

- Before the first packet is ACK and RTT measurement is made, the sender sets the TIMEOUT to a certain initial value
  - This value is usually **2.5-3 seconds for TCP**
  - For real time interactive multimedia traffic, the timeout value should be set to **0.5 seconds** as this is the time where audio delay affects media
- When the first `RTT` measurement is taken the sender sets the smoothed `RTT` (`SRTT`), `RTT` variance (`RTTVAR`) and `TIMEOUT` in the following way
  - `SRTT = RTT`
  - `RTTVAR = RTT/2`
  - `TIMEOUT = `$\Mu\cdot$`SRTT + 4*RTTVAR`
  - Where $\Mu$ is a constant, which in this implementation is 1.08 (obtained experimentally)
- When subsequent `RTT` measurements are made the sender sets the `RTTVAR`, `SRTT`, TIMEOUT
  - `RTTVAR`$= (1 - \frac{1}{4}) \times$`RTTVAR`$+ \frac14 \times |$`SRTT`$-$`RTT`$|$
  - `SRTT`$= (-\frac18)\times$`SRTT`$+\frac18\times$`RTT`
  - `TIMEOUT`$= \Mu\times$`SRTT`$+ 4\times$`RTTVAR`

###### Packet loss rate calculation

- Real time interactive multimedia approaches typically use the **weighted Loss Interval Average (WLIA)**
- It relies on **using loss events** and **loss intervals** for correct computation of packet loss rate and is in accordance with how TCP performs packet loss calculation
- A **loss event** is defined as a number of packets lost within a single RTT

This can be done either on the sending or receiving side

**Sender-side**: if the packet loss detection is done in the sender, the sender can use timeout mechanism for each packet or gap in sequence numbers of the acknowledged

- The receiver has to acknowledge either every packet or every packet not received
- Acknowledging every packet can introduce high levels of traffic between between sender and receiver
- This is solved by having receivers send report summaries of losses every nth packet or nth RTT

**Receiver-side**: Packet loss is detected in the receiver and explicitly reported back to the sender

- Noticing the gap in the sequence number - a loss event can be assumed
- A loss event is directly forwarded to the sender

##### Sender vs Receiver Detection

Receiver driven packet loss discovery is preferred.

- This is because loss events are sent early as possible
- This means high responsiveness

In the case of very high congestion - where there is no feedback from the receiver

- The pure receiver based loss detection is useless because the sender has no way of calculating packet loss
- In these cases sender enters **self-limitation** - where packet loss is assumed and sending rate is decreased or even stopped

#### Adaption

Once the parameters of a given link are measured (packet loss and round trip times), there is a range of approaches that could be followed when choosing rate adaptation scheme(s).

**Equation-based control** uses a control equation that explicitly gives the maximum acceptable sending rate as a function of the recent loss event rate (loss rates).

**Additive Increase Multiplicative Decrease (AIMD) control** of in response to a single congestion indication.

###### Decision Function

Options for Decision function:

- **On congestion** (overload/packet loss/packet loss increase), **decrease the rate immediately, or periodically****
- On absence of congestion** (underload/no packet loss, packet loss decrease), **increase the rate immediately**

###### Increase/decrease function

Options for **increase phase**: (during underload)

- constant additive increase rate,
- straight jump to the expected value or value calculated by the formula
- multiplicative increase rate

The default for the Internet is **constant linear increase**.

One could argue that a loss estimate of zero indicates that there is no congestion and thus the sending rate should be increased with the maximum possible increase factor until a loss event occurs.

- However, this approach **causes instabilities** in the sending rate and is very susceptible to a noisy packet drop rates.

Options for **decrease phase**

- constant multiplicative decrease factor, TCP-like or TCP-similar like.
- linear decrease
- straight jump to the expected value (calculated by the formula)

The default for the internet is multiplicative decrease (halving)

- Because congestion recovery should be exponential and not linear

Options for **decision frequency**

Decision frequency specifies **how often to change the rate.** **Based on system control theory, optimal adjustment frequency depends on the feedback delay.**

- The feedback delay **is the time between changing the rate and detecting the network’s reaction to that change.**
  - It is suggested that equation-based schemes adjust their rates **not more than once per RTT**.
  - Changing the rate too often results in oscillation
  - Infrequent change of the rate leads to an unresponsive behaviour.

###### Self Clocking

Aim is that transmission spacing matches bottleneck rate

- Avoids consistent queuing at bottleneck
- Queue to smooth out short-term variation

##### Congestion Control

Aim to obey **conversation of packets**

- In equilibrium flow is conservative
- New packet doesn't enter until one leaves

This fails in three ways:

1. Connection doesn't reach equilibrium
2. Sender transmits too soon
3. Resource limits prevent equilibrium being reached

Solutions:

**Slow-start**

- Each ACK opens congestion window by 1 packet
  - Every ACK, `cwnd += 1`
  - Every RTT `cwnd *= 2`
- If a stop occurs, stop or `cwnd == ssthresh`
- Else multiplicative increase

![img](img/ac.png)

**Congestion Avoidance**

1. Network signals congestion occurring
   - Detect loss
2. Host responds by reducing sending rate
   - `ssthresh := cwnd/2` multiplicative decrease
   - `cwnd    := 1` initialises slow start

Avoid congestion by slow increase

`cwnd += 1/cwnd` window increases 1 per window

TCP is not always useful

- Reliability can cause untimely delivery

Audio/Video codecs usually produce frames (not continuous bytestream)

- Losing a frame is better than delaying all subsequent data

UDP encapsulates media using **R**eal **T**ime **P**rotocol

- Sequencing, time stamping, delivery monitoring, no quality of service
- Adds a control channel
  - Back channel to report statistics & participants
- Transport only
  - Leaves encodings & floor control to application