26/11/20 ## File System Implementation 1. Contiguous 2. Linked Lists 3. File Allocation Table (FAT) 4. I-nodes (lookups) ### File access Files will be composed of a number of blocks. Files are **sequential** or **random access**. Random access is essential for example in database systems. #### Contiguous Allocation **Contiguous file systems** are similar to **dynamic partitioning** in memory allocation. > Each file is stored in a single group of **adjacent blocks** on the hard disk > > Allocation of free space can be done using **first fit, best fit, next fit**. > > * However when files are removed, this can lead to external fragmentation. > > **Advantages** > > * **Simple** to implement - only location of the first block and the length of the file must be stored > * **Optimal read/write performance** - blocks are clustered in nearby sectors, hence the seek time (of the hard drive) is minimised > > **Disadvantages** > > * The **exact size** is not known before hand (what if the file size exceeds the initially allocated disk space) > * **Allocation algorithms** needed to decide which free blocks to allocate to a given file > * Deleting a file results in **external fragmentation** > > Contiguous allocation is still in use in **CD-ROMS & DVDs** > > * External fragmentation isn't an issue here as files are written once. #### Linked List Allocation To avoid external fragmentation, files are stored in **separate blocks** that are **linked**. > Only the address of the first block has to be stored to locate a file > > * Each block contains a **data pointer** to the next block > > **Advantages** > > * Easy to maintain (only the first block needs to be maintained in directory entry) > * File sizes can **grow and shrink dynamically** > * There is **no external fragmentation** - every possible block/sector is used (can be used) > * Sequential access is straight forward - although **more seek operations** required > > **Disadvantages** > > * **Random access is very slow**, to retrieve a block in the middle, one has to walk through the list from the start > * There is some **internal fragmentation** - on average the last half of the block is left unused > * Internal fragmentation will reduce for **smaller block sizes** > * However **larger blocks** will be **faster** > * Space for data is lost within the blocks due to the pointer, the data in a **block is no longer a power of 2** > * **Diminished reliability**: if one block is corrupted/lost, access to the rest of the file is lost. ![Linked list file storage](/lectures/osc/assets/b8.png) ##### File Allocation Tables * Store the linked-list pointers in a **separate index table** called a **file allocation table** in memory. ![FAT](/lectures/osc/assets/b9.png) > **Advantages** > > * **Block size remains power of 2** - no more space is lost to the pointer > * **Index table** can be kept in memory allowing fast non-sequential access > > **Disadvantages** > > * The size of the file allocation table grows with the number of blocks, and hence the size of the disk > * For a 200GB disk, with 1KB block size, 200 million entries are required, assuming that each entry at the table occupies 4 bytes, this required 800MB of main memory. #### I-Nodes Each file has a small data structure (on disk) called an **I-node** (index-node) that contains it's attributes and block pointers > In contrast to FAT, an I-node is **only loaded when a file is open** > > If every I-node consists of $n$ bytes, and at most $k$ files can be open at any point in time, at most $n\times k$ bytes of main memory are required. I-nodes are composed of **direct block pointers** (usually 10) **indirect block pointers** or a combination thereof. ![I-nodes](/lectures/osc/assets/c1.png)