Files
notes/docs/lectures/osc/15_file_systems1.md
John Gatward 4280451f12 test
2026-03-25 12:29:00 +00:00

199 lines
7.3 KiB
Markdown

19/11/20
## Disk Scheduling
### Hard Drives
#### Construction of Hard Drives
> Disks are constructed as multiple aluminium/glass platters covered with **magnetisable material**
>
> * Read/Write heads fly just above the surface and are connected to a single disk arm controlled by a single actuator
> * **Data** is stored on **both sides**
> * Hard disks **rotate** at a **constant speed**
>
> A hard disk controller sits between the CPU and the drive
>
> Hard disks are currently about 4 orders of magnitude slower than main memory.
![hard drive diagram](/lectures/osc/assets/a5.png)
#### Low Level Format
> Disks are organised in:
>
> * **Cylinders**: a collection of tracks in the same relative position to the spindle
> * **Tracks**: a concentric circle on a single platter side
> * **Sectors**: segments of a track - usually have an **equal number of bytes** in them, consisting of a **preamble, data** and an **error correcting code** (ECC).
>
> The number of sectors on each track increases from the inner most track to the outer tracks.
##### Organisation of hard drives
Disks usually have a **cylinder skew** i.e an **offset** is added to sector 0 in adjacent tracks to account for the seek time.
In the past, consecutive **disk sectors were interleaved** to account for transfer time (of the read/write head)
NOTE: disk capacity is reduced due to preamble & ECC
#### Access times
**Access time** = seek time + rotational delay + transfer time
* **Seek time**: time needed to move the arm to the cylinder
* **Rotational latency**: time before the sector appears underneath the read/write head (on average its half a rotation)
* **Transfer time**: time to transfer the data
![hard drive access times](/lectures/osc/assets/a6.png)
Multiple requests may be happening at the same time (concurrently), so access time may be increased by **queuing time**
In this scenario, dominance of seek time leaves room for **optimisation** by carefully considering the order of read operations.
![hard disk delay](/lectures/osc/assets/a7.png)
The **estimated seek time** (i.e to move the arm from one track to another) is approximated by:
$$
T_{s} = n \times m + s
$$
In which $T_{s}$ denotes the estimated seek time, $n$ the **number of tracks** to be crossed, $m$ the **crossing time per track** and $s$ any **additional startup delay**.
> Let us assume a disk that rotates at 3600 rpm
>
> * One rotation = 16.7 ms
> * The average **rotational latency** $T_{r}$ is then 8.3 ms
>
> Let **b** denote the **number of bytes transferred**, **N** the **number of bytes per track**, and **rpm** the **rotation speed in rotations per minute**, the per track, the transfer time, $T_{t}$, is then given by:
> $$
> T_{t} = \frac b N \times \frac {ms\space per\space minute}{rpm}
> $$
> $N$ bytes take 1 revolution => $\frac{60000}{3600}$ ms = $\frac {ms\space per\space minute}{rpm}$
>
> $b$ contiguous bytes takes $\frac{b}{N}$ revolutions.
> Read a file of **size 256 sectors** with;
>
> * $T_{s}$ = 20 ms (average seek time)
> * 32 sectors per track
>
> Suppose the file is stored as compact as possible (its stored contiguously)
>
> * The first track takes: seek time + rotational delay + transfer time
> $20 + 8.3 + 16.7 = 45ms$
> * Assuming no cylinder skew and neglecting small seeks between tracks we only need to account for rotational delay + transfer time
> $8.3+16.7=25ms$
>
> The total time is $45+7\times 25 = 220ms = 0.22s$
> In case the access is not sequential but at **random for the sectors** we get:
>
> * Time per sector = $T_{s}+T_{r}+T_{t} = 20+8.3+0.5=28.8ms$
> $T_{t} = 16.7\times \frac {1}{32} = 0.5$
>
> It is important to **position the sectors carefully** and **avoid disk fragmentation**
### Disk Scheduling
The OS must use the hardware efficiently:
* The file system can **position/organise files strategically**
* Having **multiple disk requests** in a queue allows us to **minimise** the **arm movement**
Note that every I/O operation goes through a system call, allowing the **OS to intercept the request and re sequence it**.
If the drive **is free**, the request can be serviced immediately, if not the request is queued.
In a dynamic situation, several I/O requests will be **made over time** that are kept in a **table of requested sectors per cylinder.**
> Disk scheduling algorithms determine the order in which disk events are processed
#### First-Come First-Served
> Process the requests in the order that they arrive
>
> Consider the following sequence of disk requests
>
> `11 1 36 16 34 9 12`
>
> The total length is: `|11-1|+|1-36|+|36-16|+|16-34|+|34-9|+|9-12|=111`
>
> ![FCFS](/lectures/osc/assets/a8.png)
#### Shortest Seek Time First
> Selects the request that is closest to the current head position to reduce head movement
>
> * This allows us to gain **~50%** over FCFS
>
> Total length is: `|11-12|+|12-9|+|9-16|+|16-1|+|1-34|+|34-36|=61`
>
> ![SSTF](/lectures/osc/assets/a9.png)
>
> Disadvantages:
>
> * Could result in starvation:
> * The **arm stays in the middle of the disk** in case of heavy load, edge cylinders are poorly served - the strategy is biased
> * Continuously arriving requests for the same location could **starve other regions**
#### SCAN
> **Keep moving in the same direction** until end is reached
>
> * It continues in the current direction, **servicing all pending requests** as it passes over them
> * When it gets to the **last cylinder**, it **reverses direction** and **services pending requests**
>
> Total length: `|11-12|+|12-16|+|16-34|+|34-36|+|36-9|+|9-1|=60`
>
> ![scan algorithm](/lectures/osc/assets/b1.png)
>
> **Disadvantages**:
>
> * The **upper limit** on the waiting time is $2\space\times$ number of cylinders (no starvation)
> * The **middle cylinders are favoured** if the disk is heavily used.
##### C-SCAN
> Once the outer/inner side of the disk has been reached, the **requests at the other end of the disk** have been **waiting the longest**
>
> * SCAN can be improved by using a circular => C-SCAN
> * When the disk arm gets to the last cylinder of the disk, it **reverses direction** but **does not service requests** on the return.
> * It is **fairer** and equalises **response times on the disk**
>
> Total length: `|11-12|+|12-16|+|16-34|+|34-36|+|36-1|+|1-9|=68`
##### LOOK-SCAN
> Look-SCAN moves to the last cylinder containing **the first or last request** (as opposed to the first/last cylinder on the disk like SCAN)
>
> * However, seeks are **cylinder by cylinder** and one cylinder contains multiple tracks
> * It may happen that the arm "sticks" to a cylinder
##### N-Step SCAN
> Only services $N$ requests every sweep.
```bash
[jay@diablo lecture_notes]$ cat /sys/block/sda/queue/scheduler
[mq-deadline] kyber bfq none
# noop: FCFS
# deadline: N-step-SCAN
# cfq: Complete Fairness Queueing (from linux)
```
### Driver caching
For current drives, the time **required to seek a new cylinder** is more than the **rotational time**.
* It makes sense to **read more sectors than actually required**
* **Read** sectors during rotational delay (the sectors that just so happen to pass under the control arm)
* **Modern controllers read multiple sectors** when asked for the data from one sector **track-at-a-time caching**.
### Scheduling on SSDs
SSDs don't have $T_{seek}$ or rotational delay, we can use FCFS (SSTF, SCAN etc may reduce performace due to no head to move).