26/11/20

## File System Implementation

1. Contiguous
2. Linked Lists
3. File Allocation Table (FAT)
4. I-nodes (lookups)

### File access

Files will be composed of a number of blocks. Files are **sequential** or **random access**. Random access is essential for example in database systems.

#### Contiguous Allocation

**Contiguous file systems** are similar to **dynamic partitioning** in memory allocation.

> Each file is stored in a single group of **adjacent blocks** on the hard disk
>
> Allocation of free space can be done using **first fit, best fit, next fit**.
>
> * However when files are removed, this can lead to external fragmentation.
>
> **Advantages**
>
> * **Simple** to implement - only location of the first block and the length of the file must be stored
> * **Optimal read/write performance** - blocks are clustered in nearby sectors, hence the seek time (of the hard drive) is minimised
>
> **Disadvantages**
>
> * The **exact size** is not known before hand (what if the file size exceeds the initially allocated disk space)
> * **Allocation algorithms** needed to decide which free blocks to allocate to a given file
> * Deleting a file results in **external fragmentation**
>
> Contiguous allocation is still in use in **CD-ROMS & DVDs**
>
> * External fragmentation isn't an issue here as files are written once.

#### Linked List Allocation

To avoid external fragmentation, files are stored in **separate blocks** that are **linked**.

> Only the address of the first block has to be stored to locate a file
>
> * Each block contains a **data pointer** to the next block
>
> **Advantages**
>
> * Easy to maintain (only the first block needs to be maintained in directory entry)
> * File sizes can **grow and shrink dynamically**
> * There is **no external fragmentation** - every possible block/sector is used (can be used)
> * Sequential access is straight forward - although **more seek operations** required
>
> **Disadvantages**
>
> * **Random access is very slow**, to retrieve a block in the middle, one has to walk through the list from the start
> * There is some **internal fragmentation** - on average the last half of the block is left unused
>   * Internal fragmentation will reduce for **smaller block sizes**
>   * However **larger blocks** will be **faster**
> * Space for data is lost within the blocks due to the pointer, the data in a **block is no longer a power of 2**
> * **Diminished reliability**: if one block is corrupted/lost, access to the rest of the file is lost.

![Linked list file storage](/lectures/osc/assets/b8.png)

##### File Allocation Tables

* Store the linked-list pointers in a **separate index table** called a **file allocation table** in memory.

![FAT](/lectures/osc/assets/b9.png)

> **Advantages**
>
> * **Block size remains power of 2** - no more space is lost to the pointer
> * **Index table** can be kept in memory allowing fast non-sequential access
>
> **Disadvantages**
>
> * The size of the file allocation table grows with the number of blocks, and hence the size of the disk
> * For a 200GB disk, with 1KB block size, 200 million entries are required, assuming that each entry at the table occupies 4 bytes, this required 800MB of main memory.

#### I-Nodes

Each file has a small data structure (on disk) called an **I-node** (index-node) that contains it's attributes and block pointers

> In contrast to FAT, an I-node is **only loaded when a file is open**
>
> If every I-node consists of $n$ bytes, and at most $k$ files can be open at any point in time, at most $n\times k$ bytes of main memory are required.

I-nodes are composed of **direct block pointers** (usually 10) **indirect block pointers** or a combination thereof.

![I-nodes](/lectures/osc/assets/c1.png)