BIN
docs/lectures/dms/assets/x.webp
Normal file
|
After Width: | Height: | Size: 15 KiB |
124
docs/lectures/osc/01_scheduling_algorithms.md
Normal file
@@ -0,0 +1,124 @@
|
|||||||
|
05/10/20
|
||||||
|
---
|
||||||
|
The OS is responsible for *managing* and *scheduling processes*
|
||||||
|
>Decide when to admit processes to the system (new -> ready)
|
||||||
|
>
|
||||||
|
>Decide which process to run next (ready -> run)
|
||||||
|
>
|
||||||
|
>Decide when and which processes to interrupt (running -> ready)
|
||||||
|
|
||||||
|
It relies on the *scheduler* (dispatcher) to decide which process to run next, which uses a scheduling algorithm to do so.
|
||||||
|
|
||||||
|
The type of algorithm used by the scheduler is influenced by the type of operating system e.g. real time vs batch.
|
||||||
|
|
||||||
|
**Long Term**
|
||||||
|
- Applies to new processes and controls the degree of multi-programming by deciding which processes to admit to the system when:
|
||||||
|
- A good mix of CPU and I/O bound processes is favourable to keep all resources as bust as possible
|
||||||
|
- Usually absent in popular modern OS
|
||||||
|
|
||||||
|
**Medium Term**
|
||||||
|
|
||||||
|
>Controls swapping and the degree of multi-programming
|
||||||
|
|
||||||
|
**Short Term**
|
||||||
|
- Decide which process to run next
|
||||||
|
- Manages the *ready queue*
|
||||||
|
- Invoked very frequency, hence must be fast
|
||||||
|
- Usually called in response to *clock interrupts*, *I/O interrupts*, or *blocking system calls*
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Non-preemptive** processes are only interrupted voluntarily (e.g. I/O operation or "nice" system call `yield()`)
|
||||||
|
>Windows 3.1 and DOS were non-preemptive
|
||||||
|
>
|
||||||
|
>The issue with this is if the process in control goes wrong or gets stuck in a infinite loop then the CPU will never regain control.
|
||||||
|
|
||||||
|
**Preemptive** processes can be interrupted forcefully or voluntarily
|
||||||
|
|
||||||
|
>This required context switches which generate *overhead*, too many of them show me avoided.
|
||||||
|
>
|
||||||
|
>Prevents processes from monopolising the CPU
|
||||||
|
>
|
||||||
|
>Most popular modern OS use this kind.
|
||||||
|
|
||||||
|
Overhead - wasted CPU cycles
|
||||||
|
How can we objectively critic the OS?
|
||||||
|
|
||||||
|
**User Oriented criteria**
|
||||||
|
*Response time* minimise the time between creating the job and its first execution (time between clicking the button and it starting)
|
||||||
|
*Turnaround time* minimise the time between creating the job and finishing it
|
||||||
|
*Predictability* minimise the variance in processing times
|
||||||
|
|
||||||
|
**System oriented criteria**
|
||||||
|
*Throughput*: maximise the number of jobs processed per hour
|
||||||
|
*Fairness*:
|
||||||
|
|
||||||
|
> Are processing power/waiting time equally distributed?
|
||||||
|
> Are some processes kept waiting excessively long - **starvation**
|
||||||
|
|
||||||
|
### Different types of Scheduling Algorithms
|
||||||
|
[NOTE: FCFS = FIFO]
|
||||||
|
|
||||||
|
**First come first serve**
|
||||||
|
Concept: a non-preemptive algorithm that operates as a strict queuing mechanism and schedules the processes in the same order that they were added to the queue.
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ----------- | ----------- |
|
||||||
|
| Positional fairness | Favours long processes over short ones (think supermarket checkout) || |
|
||||||
|
| Easy to implement | Could compromise resource utilisation |
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Shortest job first**
|
||||||
|
A non-preemptive algorithm that starts processes in order of ascending processing time using a provided estimate of the processing
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ----------- | ----------- |
|
||||||
|
| Always results an optimal turnaround time | Starvation might occur |
|
||||||
|
| - | Fairness and predictability are compromised |
|
||||||
|
| - | Processing times need to be known in advanced |
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Round Robin**
|
||||||
|
A preemptive version of FCFS that focuses context switches at periodic intervals or time slices
|
||||||
|
|
||||||
|
>Processes run in order that they were added to the queue.
|
||||||
|
>Processes are forcefully interrupted by the timer.
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ----------- | ----------- |
|
||||||
|
| Improved response time | Increased context switching and overhead |
|
||||||
|
| Effective for general purpose interactive/time sharing systems | Favours CPU processes over I/O |
|
||||||
|
| - | Can reduce to FCFS |
|
||||||
|
|
||||||
|
Exam 2013: Round Robin is said to favour CPU bound processes over I/O bound processes. Explain why this may be the case.
|
||||||
|
>I/O processes will spend a lot of their allocated time waiting for data to come back from memory, therefore less processing can occur before the time slice runs out.
|
||||||
|
|
||||||
|
If the time slice is only used partially the next process starts immediately
|
||||||
|
The length of the time slice must be carefully considered.
|
||||||
|
>A small time slice (~ 1ms) gives a good response time.
|
||||||
|
>A large time slice (~ 1000ms) gives a high throughput.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Priority Queue**
|
||||||
|
A preemptive algorithm that schedules processes by priority
|
||||||
|
|
||||||
|
>A round robin is used for processes with the same priority level
|
||||||
|
>The process priority is saved in the process control block
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ----------- | ----------- |
|
||||||
|
|Can prioritise I/O bound jobs | Low priority processes might suffer from starvation |
|
||||||
|
|
||||||
|
Low priority starvation only happens when a static priority level is used.
|
||||||
|
You could give higher priority processes a larger time slice to improve efficiency
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Exam Q 2013: Which algorithms above lead to starvation?
|
||||||
|
>Shortest job first and highest priority first.
|
||||||
|
|
||||||
|

|
||||||
|

|
||||||
128
docs/lectures/osc/02_threads.md
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
08/10/20
|
||||||
|
---
|
||||||
|
|
||||||
|
A process consists of two **fundamental** units
|
||||||
|
|
||||||
|
1. Resources
|
||||||
|
- A logical address space containing the process image (program, data, heap, stack)
|
||||||
|
- Files, I/O devices, I/O channels
|
||||||
|
2. Execution trace e.g. an entity that gets executed
|
||||||
|
|
||||||
|
A process can share its resources between multiple execution traces, e.g multiple threads running in the same resource environment.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Every thread has its own *execution context* (e.g. program counter, stack, registers).
|
||||||
|
All threads have **access** to the process' **shared resources**
|
||||||
|
|
||||||
|
>e.g. Files; if one thread opens a file then all threads have access to it
|
||||||
|
>
|
||||||
|
>Same with global variables, memory etc
|
||||||
|
|
||||||
|
Similar to processes, threads have:
|
||||||
|
**States**, **transitions** and a **thread control block**
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The *registers*, *stack* and *state* are all specific to the registers. When a context switch occurs they must be stored in the **thread control block**.
|
||||||
|
|
||||||
|
Threads incur less overhead to create/terminate/switch processes. This is because the address space remains the same for threads of the same process.
|
||||||
|
>When switching from thread A to thread B, the computer doesn't need to worry about updating the memory management unit as they're using the same memory layout.
|
||||||
|
>
|
||||||
|
>This makes switching threads very quick
|
||||||
|
|
||||||
|
Some CPU's have direct **hardware support** for **multi-threading**.
|
||||||
|
>With hyper threading and multi-threading, the thread's execution context isn't saved to the thread control block. Instead the CPU stops using one thread and starts using another.
|
||||||
|
>
|
||||||
|
>This decreases overhead as the execution context doesn't need to be saved and reloaded.
|
||||||
|
|
||||||
|
1. **Inter-thread communication** is easier and faster that **inter-process** communication (threads share memory by default)
|
||||||
|
2. **No protection boundaries** are required in the address space (threads are cooperating, they belong to the same user and have the same goal)
|
||||||
|
3. Synchronisation has to be considered carefully.
|
||||||
|
|
||||||
|
If you opened word and excel, you wouldn't want them running on threads as you don't want word to have access to the memory excel is accessing. However if you just had word open the spell check and graphics libraries would all run on threads as they work towards a common goal.
|
||||||
|
|
||||||
|
### Why use threads
|
||||||
|
|
||||||
|
1. Multiple **related activities** apply to the **same resources**, these resources should be accessible.
|
||||||
|
2. Processes will often contain multiple **blocking tasks**
|
||||||
|
1. I/O operations (thread blocks, interrupt marks completion)
|
||||||
|
2. Memory access: pages faults are result in blocking
|
||||||
|
|
||||||
|
Such activities should be carried out in parallel on threads. e.g. web-servers, word processors, processing large data volumes etc
|
||||||
|
|
||||||
|
**User** threads - happen inside the user space, the OS doesn't need to do anything.
|
||||||
|
|
||||||
|
>**Thread management** (creating, destroying, scheduling, thread control block manipulation) is carried out in user space with the help of a user library.
|
||||||
|
>
|
||||||
|
>The process maintains a thread table managed by the run-time system without the kernel's knowledge (similar to a process table and used for thread switching)
|
||||||
|
|
||||||
|
**Kernel** threads - ask the OS to create a tread for the user and give it to the user.
|
||||||
|
**Hybrid** implementations - is what is used in windows 10
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Pros and cons of user threads**
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ----------- | ----------- |
|
||||||
|
| Threads in user space don't require mode switches | Blocking system calls suspend all running threads |
|
||||||
|
| Full control over the thread scheduler | No true parallelism (the processes still scheduled on a single CPU) |
|
||||||
|
| OS independent | Clock interrupts (user threads are non-preemptive) |
|
||||||
|
| - | Page faults result in blocking the process|
|
||||||
|
|
||||||
|
The user threads don't share the memory management unit therefore if a thread tries to access memory that isn't loaded in the MMU then a page fault will occur, these occur often.
|
||||||
|
|
||||||
|
**Kernel Threads**
|
||||||
|
The kernel manages the threads, user application accesses threading facilities through **API** and **system calls**
|
||||||
|
>The **thread table** is in the kernel, containing the thread control blocks.
|
||||||
|
>
|
||||||
|
>If a thread blocks, the kernel chooses a thread from the same or different process.
|
||||||
|
|
||||||
|
Advantages:
|
||||||
|
>**True parallelism** can be achieved
|
||||||
|
>No run time system needed
|
||||||
|
|
||||||
|
However frequent **mode switches** take place, resulting in a lower performance.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Kernel threads are slower to create and sync that user level however user level cannot exploit parallelism.
|
||||||
|
|
||||||
|
**Hybrid Implementation**
|
||||||
|
>User threads are **multiplexed** onto kernel threads
|
||||||
|
>
|
||||||
|
>Kernel sees and schedules the kernel threads
|
||||||
|
>
|
||||||
|
>User application sees user threads and creates/schedules these (an unrestricted number)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Thread libraries provide an API for managing threads
|
||||||
|
Thread libraries can be implemented
|
||||||
|
>Entirely in user space (user threads)
|
||||||
|
>
|
||||||
|
>Based off system calls (rely on the kernel)
|
||||||
|
|
||||||
|
Examples of thread APIs include **POSIX PThreads**, windows threads and Java threads
|
||||||
|
|
||||||
|
`pthread_create` - Create new thread
|
||||||
|
`pthread_exit` - Exit existing thread
|
||||||
|
`pthread_join` - Wait for thread with ID
|
||||||
|
`pthread_yield` - Release CPU
|
||||||
|
`pthread_attr_init` - Thread Attributes (e.g. priority)
|
||||||
|
`pthread_attr_destroy` - Release Attributes
|
||||||
|
|
||||||
|
$ ~ man `pthread_create` returns the help page
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
```
|
||||||
|
$ ~ HELLO from thread 10
|
||||||
|
$ ~ HELLO from thread 10
|
||||||
|
$ ~ HELLO from thread 10
|
||||||
|
etc
|
||||||
|
```
|
||||||
|
|
||||||
|
This is because by the time the thread is created `i` has already iterated to 10.
|
||||||
|
You cannot guarantee the first thread you create will be the first to run.
|
||||||
141
docs/lectures/osc/03_processes4.md
Normal file
@@ -0,0 +1,141 @@
|
|||||||
|
09/10/20
|
||||||
|
|
||||||
|
|
||||||
|
**Multi-level scheduling algorithms**
|
||||||
|
>Nothing is stopping us from using different scheduling algorithms for individual queues for each different priority level.
|
||||||
|
>
|
||||||
|
> - **Feedback queues** allow priorities to change dynamically i.e. jobs can move between queues
|
||||||
|
1. Move to **lower priority queue** if too much CPU time is used
|
||||||
|
2. Move to **higher priority queue** to prevent starvation and avoid inversion of control.
|
||||||
|
|
||||||
|
Exam 2013: Explain how you would prevent starvation in a priority queue algorithm?
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The solution to this is to momentarily boost thread A's priority level, this will let A do what it what's to do and release resource X so that B and C can run.
|
||||||
|
|
||||||
|
Priority boosting helps avoid control inversion.
|
||||||
|
|
||||||
|
**Defining characteristics of feedback queues**
|
||||||
|
|
||||||
|
1. The **number of queues**
|
||||||
|
2. The scheduling algorithms used for individual queues
|
||||||
|
3. **Migration policy** between queues
|
||||||
|
4. Initial **access** to the queues
|
||||||
|
|
||||||
|
Feedback queues are highly configurable and offer significant flexibility.
|
||||||
|
|
||||||
|
<ins>**Windows 7**</ins>
|
||||||
|
|
||||||
|
> An interactive system using a pre-emptive scheduler with dynamic priority levels.
|
||||||
|
>
|
||||||
|
> Two priority classes with 16 different priority levels exist.
|
||||||
|
>
|
||||||
|
> 1. **Real time** processes/threads have a fixed priority level. (These are the most important)
|
||||||
|
> 2. **Variable** processes/threads can have their priorities **boosted temporarily**.
|
||||||
|
>
|
||||||
|
> A **round robin** is used within the queues.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
If you give a couple of the threads the highest priority level, you can freeze your computer. (causes starvation for low priority threads)
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
<ins>**Scheduling in Linux**</ins>
|
||||||
|
|
||||||
|
> Process scheduling has evolved over different versions of Linux to account for multiple processors/cores, processor affinity, and **load balancing** between cores.
|
||||||
|
>
|
||||||
|
> Linux distinguishes between two types of tasks for scheduling:
|
||||||
|
>
|
||||||
|
> 1. **Real time tasks** (to be POSIX compliant)
|
||||||
|
> 1. Real time FIFO tasks
|
||||||
|
> 2. Real time Round Robin tasks
|
||||||
|
> 2. **Time sharing tasks** using a pre-emptive approach (similar to variable in Windows)
|
||||||
|
>
|
||||||
|
> The most recent scheduling algorithm in Linux for time sharing tasks is the **completely fair scheduler**
|
||||||
|
|
||||||
|
**Real time FIFO** have the highest priority and are scheduled with a **FCFS approach** using a pre-emption if a higher priority job shows up.
|
||||||
|
|
||||||
|
**Real time round robin tasks** are preemptable by clock interrupts and have a time slice associated with them.
|
||||||
|
|
||||||
|
Both ways *cannot* guarantee hard deadlines.
|
||||||
|
|
||||||
|
**Time sharing tasks**
|
||||||
|
|
||||||
|
> The CFS (completely fair scheduler) **divides the CPU time** between all processes and threads.
|
||||||
|
>
|
||||||
|
> <ins>If all N processes/threads have the same priority. </ins>
|
||||||
|
>
|
||||||
|
> They will be allocated a time slice equal to 1/N times the available CPU time.
|
||||||
|
>
|
||||||
|
> The length of the **time slice** and the available CPU time are based on the **targeted latency** (every process/thread should run at least once in this time)
|
||||||
|
>
|
||||||
|
> If N is very large, the **context switch time will be dominant**, hence a lower bound on the time slice is imposed by the minimum granularity.
|
||||||
|
>
|
||||||
|
> A process/thread's time slice can be no less than the **minimum granularity.**
|
||||||
|
|
||||||
|
A **weighting scheme** is used to take difference priorities into account.
|
||||||
|
|
||||||
|
<img src="/lectures/osc/assets/k.png" alt="alt text" style="zoom:60%;" />
|
||||||
|
|
||||||
|
The tasks with the **lowest proportional amount** of "used CPU time" are selected first. (Shorter tasks picked first if Wi is the same).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**Shared Queues**
|
||||||
|
|
||||||
|
A single of multi-level queue **shared** between all CPUs
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ---------------------------- | -------------------------------------------------------- |
|
||||||
|
| Automatic **load balancing** | Contention for the queues (locking is needed) |
|
||||||
|
| | **Cache** becomes invalid when moving to a different CPU |
|
||||||
|
|
||||||
|
Windows will allocate the **highest priority threads** to the individual CPUs/cores.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
**Private Queues**
|
||||||
|
|
||||||
|
> Each CPU has a private (set) of queues
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| -------------------------------------------- | ------------------------------------------------------------ |
|
||||||
|
| CPU affinity is automatically satisfied | **Less load balancing** (one queue could have 1000 tasks while the other has 4) |
|
||||||
|
| **Contention** for shared queue is minimised | |
|
||||||
|
|
||||||
|
**Related vs. Unrelated threads**
|
||||||
|
|
||||||
|
> **Related**: multiple threads that communicated with one another and **ideally run** together
|
||||||
|
>
|
||||||
|
> **Unrelated** processes threads that are **independent**, possibly started by **different users** running different programs.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Threads belong to the same process are are cooperating e.g. they **exchange messages** or **share information**
|
||||||
|
|
||||||
|
The aim is to get threads running as much as possible, at the **same time across multiple CPU**s.
|
||||||
|
|
||||||
|
**Space Sharing**
|
||||||
|
|
||||||
|
> N threads are allocated to N **dedicated CPUs**
|
||||||
|
>
|
||||||
|
> N threads are kept waiting until N CPUs are available
|
||||||
|
>
|
||||||
|
> **Non pre-emptive** i.e. blocking calls result in idle CPUs
|
||||||
|
>
|
||||||
|
> N can be dynamically adjusted to match processor capacity.
|
||||||
|
|
||||||
|
**Gang Scheduling**
|
||||||
|
|
||||||
|
> Time slices are synchronised and the scheduler groups threads together to run simultaneously
|
||||||
|
>
|
||||||
|
> A pre-emptive algorithm
|
||||||
|
>
|
||||||
|
> **Blocking threads** result in idle CPU (If a thread blocks, the rest of the time slice will be unused due the time slice synchronisation across all CPUs)
|
||||||
|
|
||||||
172
docs/lectures/osc/04_concurrency1.md
Normal file
@@ -0,0 +1,172 @@
|
|||||||
|
12/10/20
|
||||||
|
|
||||||
|
1. Threads and processes execute concurrently or in parallel and can **share resources** (like devices, memory, variables, data structures etc)
|
||||||
|
2. A process/thread can be interrupted at any point in time. The process "state" (including registers) is saved in the **process control block**
|
||||||
|
|
||||||
|
The outcome of programs may be unpredictable:
|
||||||
|
|
||||||
|
> Sharing data can lead to **inconsistencies**.
|
||||||
|
>
|
||||||
|
> The **outcome of execution** may **depend on the order** in which instructions are carried out.
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <stdio.h>
|
||||||
|
#include <pthread.h>
|
||||||
|
int counter = 0;
|
||||||
|
void * calc(void * number_of_increments) {
|
||||||
|
int i;
|
||||||
|
for(i = 0; i < *((int*) number_of_increments);i++)
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
int main() {
|
||||||
|
int iterations = 50000000;
|
||||||
|
pthread_t tid1,tid2;
|
||||||
|
pthread_create(&tid1, NULL, calc, (void *) &iterations);
|
||||||
|
pthread_create(&tid2, NULL, calc, (void *) &iterations);
|
||||||
|
pthread_join(tid1,NULL);
|
||||||
|
pthread_join(tid2,NULL);
|
||||||
|
printf("The value of counter is: %d\n", counter);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This piece of code creates two threads, and points them towards the `calc` function. The `pthread_join(tid1,NULL);` line is waiting until thread 1 is finished until the code moves on.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
Counter++ consists of three separate actions.
|
||||||
|
|
||||||
|
1. *read* the value of counter from memory and **store it in a register**
|
||||||
|
2. *add* one to the value in the register
|
||||||
|
3. *store* the value of the register **in counter** in memory
|
||||||
|
|
||||||
|
The above actions are **not** "atomic". This means they can be interrupted by the timer.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
TCB - *Thread Control Block*
|
||||||
|
|
||||||
|
This is what could happen if the threads are not interrupted.
|
||||||
|
|
||||||
|
However the thread control block could be out of date by the time the thread starts running again. For example *counter* could be 2 but the thread control block still has the old value of *counter*.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
The problem is that simple instructions in c are actually multiple instructions in assembly code. Another example is `print()`
|
||||||
|
|
||||||
|
```c
|
||||||
|
void print() {
|
||||||
|
chin = getchar();
|
||||||
|
chout = chin;
|
||||||
|
putchar(chout);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
If the two threads are **interleaved** one after the other, there is no issue.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
However if **interleaved** like this they do interact. The global variable used to store the character in thread 1, is overwritten when thread 2 runs. This means 1+1+1 = 2
|
||||||
|
|
||||||
|
## Bounded Buffer
|
||||||
|
|
||||||
|
> Consider a **bounded buffer** in which N items can be stored
|
||||||
|
>
|
||||||
|
> A **counter** is maintained to count the number of items currently in the buffer. **Increment** when something is added and **decremented** when an item is removed.
|
||||||
|
>
|
||||||
|
> Similar **concurrency problems** as with the calculation of sums happen in the bounded buffer which is a consumer problem.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// producer
|
||||||
|
while (true) {
|
||||||
|
//while buffer is full
|
||||||
|
while (counter == BUFFER SIZE); /* do nothing */
|
||||||
|
// Produce item
|
||||||
|
buffer[in] = new_item;
|
||||||
|
in = (in + 1) % BUFFER_SIZE;
|
||||||
|
counter++;
|
||||||
|
}
|
||||||
|
// consumer
|
||||||
|
while (true) {
|
||||||
|
// wait until items in buffer
|
||||||
|
while (counter == 0); /* do nothing */
|
||||||
|
// Consume item
|
||||||
|
consumed = buffer[out];
|
||||||
|
out = (out + 1) % BUFFER_SIZE;
|
||||||
|
counter--;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This is a circular queue, there's a start and end pointer (*in* and *out*). The shared counter is being manipulated from 2 different places which can go wrong.
|
||||||
|
|
||||||
|
## Race Conditions
|
||||||
|
|
||||||
|
A **race conditions occurs** when multiple threads/processes **access shared data** and the result is dependent on **the order in which the instructions are interleaved**.
|
||||||
|
|
||||||
|
### Concurrency within the OS
|
||||||
|
|
||||||
|
> **Kernels are pre-emptive**
|
||||||
|
> **Multiple processes/threads are running** in the kernel.
|
||||||
|
> Kernel processes can be **interrupted** at any point.
|
||||||
|
>
|
||||||
|
> The kernel maintains **data structures**
|
||||||
|
>
|
||||||
|
> 1. These data structures are accessed **concurrently.**
|
||||||
|
> 2. These can be subject to **concurrency issues.**
|
||||||
|
>
|
||||||
|
> The OS must make sure that interactions within the OS do not result in race conditions.
|
||||||
|
>
|
||||||
|
> Processes **share resources** including memory, files, processor time, printers etc.
|
||||||
|
>
|
||||||
|
> The OS must:
|
||||||
|
>
|
||||||
|
> 1. provide **locking mechanisms** to implement **mutual exclusion** and **prevent starvation and deadlocks.**
|
||||||
|
> 2. Allocate and deallocate these resources safely.
|
||||||
|
|
||||||
|
A **critical section** is a set of instructions in which **shared resources** between processes/threads **are changed**.
|
||||||
|
|
||||||
|
**Mutual exclusion** must be enforced for **critical sections**.
|
||||||
|
|
||||||
|
> Only **one process at a time** should be in the critical section (mutual exclusion)
|
||||||
|
>
|
||||||
|
> Processes have to **get "permission"** before entering their critical section
|
||||||
|
>
|
||||||
|
> 1. Request a lock
|
||||||
|
> 2. Hold the lock
|
||||||
|
> 3. Release the lock
|
||||||
|
|
||||||
|
Any solution to the **critical section problem** must satisfy the following requirements:
|
||||||
|
|
||||||
|
1. **Mutual exclusion** - only one process can be in its critical section at any one point in time.
|
||||||
|
2. **Progress** - any process must be able to enter its critical section at some point in time. (a process/thread has a right to enter its critical section at a point in time). If there is no thread/process in the critical section there is no reason for the currently thread not to be allowed in the **critical section**.
|
||||||
|
3. **Fairness/bounded waiting** - fairly distributed waiting times/processes cannot be made to wait indefinitely.
|
||||||
|
|
||||||
|
These requirements have to be satisfied, independent of the order in which sequences are executed.
|
||||||
|
|
||||||
|
### Enforcing Mutual Exclusion
|
||||||
|
|
||||||
|
**Approaches** for mutual exclusion can be
|
||||||
|
|
||||||
|
1. Software based - Peterson's solution
|
||||||
|
2. Hardware based - `test_and_set()` `swap_and_comapare()`
|
||||||
|
|
||||||
|
Deadlocks have to be prevented as well.
|
||||||
|
|
||||||
|
#### Deadlock Example
|
||||||
|
|
||||||
|
A set of processes/threads is *deadlocked* if each process/thread in the set is waiting for an event that only the other process/thread in the set can cause
|
||||||
|
|
||||||
|
Each **deadlocked process/thread** is waiting for a resource held by another deadlocked process/thread (which cannot run and hence release the resource) .
|
||||||
|
|
||||||
|
* Assume that X and Y are **mutually exclusive resources**.
|
||||||
|
* Thread A and B need to **acquire both resources** and request them in oppose orders.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Four conditions** must hold for a deadlock to occur
|
||||||
|
|
||||||
|
1. **Mutual exclusion** - a resource can be assigned to at most one process at a time.
|
||||||
|
2. **Hold and wait condition** - a resource can be held while requesting new resources.
|
||||||
|
3. **No pre-emption** - resources cannot be forcefully taken away from a process
|
||||||
|
4. **Circular wait** - there is a circular chain of two or more processes,, waiting for a resource held by the other processes.
|
||||||
|
|
||||||
|
**No deadlocks** can occur if one of the conditions isn't met.
|
||||||
161
docs/lectures/osc/05_concurrency2.md
Normal file
@@ -0,0 +1,161 @@
|
|||||||
|
15/10/20
|
||||||
|
|
||||||
|
## Peterson's Solution
|
||||||
|
|
||||||
|
**Peterson's solution** is a **software based** solution which worked well on **older machines**
|
||||||
|
|
||||||
|
Two **shared variables** are used
|
||||||
|
|
||||||
|
1. *turn* - indicates which process is next to enter its critical section
|
||||||
|
2. *Boolean flag [2]* - indicates that a process is ready to enter its critical section
|
||||||
|
|
||||||
|
* Peterson's solution can be used over multiple processes or threads
|
||||||
|
* Peterson's solution for two processes satisfies all **critical section requirements** (mutual exclusion, progress, fairness)
|
||||||
|
|
||||||
|
`````c
|
||||||
|
do {
|
||||||
|
flag[i] = true; // i wants to enter critical section
|
||||||
|
turn = j; // allow j to access first
|
||||||
|
while (flag[j] && turn == j);
|
||||||
|
// whilst j wants to access critical section
|
||||||
|
// and its j’s turn, apply busy waiting
|
||||||
|
// CRITICAL SECTION
|
||||||
|
counter++
|
||||||
|
flag[i] = false;
|
||||||
|
// remainder section
|
||||||
|
} while (...);
|
||||||
|
`````
|
||||||
|
|
||||||
|
**Figure**: *Peterson's solution for process i*
|
||||||
|
|
||||||
|
```c
|
||||||
|
do {
|
||||||
|
flag[j] = true; // j wants to enter critical section
|
||||||
|
turn = i; // allow i to access first
|
||||||
|
while (flag[i] && turn == i);
|
||||||
|
// whilst i wants to access critical section
|
||||||
|
// and its i’s turn, apply busy waiting
|
||||||
|
// CRITICAL SECTION
|
||||||
|
counter++
|
||||||
|
flag[j] = false;
|
||||||
|
// remainder section
|
||||||
|
} while (...);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Figure**: *Peterson's solution for process j*
|
||||||
|
|
||||||
|
Even when these two processes are interleaved, its unbreakable as there is always a check to see if the other process is in the critical section.
|
||||||
|
|
||||||
|
### Mutual exclusion requirement:
|
||||||
|
|
||||||
|
The variable turn can have at most one value at a time.
|
||||||
|
|
||||||
|
* Both `flag[i]` and `flag[j]` are *true* when they want to enter their critical section
|
||||||
|
* Turn is a **singular variable** that can store only one value
|
||||||
|
* Hence `while (flag[i] && turn == i);` or `while (flag[j] && turn == j);` is true and at most one process can enter its critical section (mutual exclusion)
|
||||||
|
|
||||||
|
**Progress**: any process must be able to enter its critical section at some point in time
|
||||||
|
|
||||||
|
> Processes/threads in the **remaining code** do not influence access to critical sections
|
||||||
|
>
|
||||||
|
> If a process *j* does not want to enter its critical section
|
||||||
|
>
|
||||||
|
> * `flag[j] == false`
|
||||||
|
> * `white (flag[j] && turn == j)` will terminate for process *i*
|
||||||
|
> * *i* enters critical section
|
||||||
|
|
||||||
|
### Fairness/bounded waiting
|
||||||
|
|
||||||
|
Fairly distributed waiting times/process cannot be made to wait indefinitely.
|
||||||
|
|
||||||
|
> If P<sub>i</sub> and P<sub>j</sub> both want to enter their critical section
|
||||||
|
>
|
||||||
|
> * `flag[i] == flag[j] == true`
|
||||||
|
> * `turn` is either *i* or *j* assuming that `turn == i` *i* enters it's critical section
|
||||||
|
> * *i* finishes critical section `flag[i] = false` and then *j* enters its critical section.
|
||||||
|
|
||||||
|
Peterson's solution works when there is two or more processes. Questions on Peterson's solution with more than two solutions is not in the spec.
|
||||||
|
|
||||||
|
|
||||||
|
**Disable interrupts** whilst **executing a critical section** and prevent interruptions from I/O devices etc.
|
||||||
|
|
||||||
|
For example we see `counter ++` as one instruction however it is three instructions in assembly code. If there is an interrupt somewhere in the middle of these three instructions bad things happen.
|
||||||
|
|
||||||
|
```c
|
||||||
|
register = counter;
|
||||||
|
register = register + 1;
|
||||||
|
counter = register;
|
||||||
|
```
|
||||||
|
|
||||||
|
Disabling interrupts may be appropriate on a **single CPU machine**, not on a multi-core processor though. This means multiple cores can take a value from memory, manipulate that value (in this example iterating it) whilst not knowing the value has already changed on a different core and write back the wrong value to memory. This can lead to `1+1+1=2`.
|
||||||
|
|
||||||
|
### Atomic Instructions
|
||||||
|
|
||||||
|
> Implement `test_and_set()` and `swap_and_compare()` instructions as a **set of atomic (uninterruptible) instructions**
|
||||||
|
>
|
||||||
|
> * Reading and setting the variables is done as **one complete set of instructions**
|
||||||
|
> * If `test_and_set()` / `sawp_and_compare()` are called **simultaneously** they will be executed sequentially.
|
||||||
|
>
|
||||||
|
> They are used in combination with **global lock variables**, assumed to be `true (1) ` is the lock is in use.
|
||||||
|
|
||||||
|
#### Test_and_set()
|
||||||
|
|
||||||
|
```c
|
||||||
|
// Test and set method
|
||||||
|
boolean test_and_set(boolean * bIsLocked) {
|
||||||
|
boolean rv = *bIsLocked;
|
||||||
|
*bIsLocked = true;
|
||||||
|
return rv;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example of using test and set method
|
||||||
|
do {
|
||||||
|
// WHILE the lock is in use, apply busy waiting
|
||||||
|
while (test_and_set(&bIsLocked));
|
||||||
|
// Lock was false, now true
|
||||||
|
// CRITICAL SECTION
|
||||||
|
...
|
||||||
|
bIsLocked = false;
|
||||||
|
...
|
||||||
|
// remainder section
|
||||||
|
} while (...)
|
||||||
|
```
|
||||||
|
|
||||||
|
* `test_and_set()` must be **atomic**.
|
||||||
|
* If two processes are using `test_and_set()` and are interleaved, it can lead to two processes going into the critical section.
|
||||||
|
|
||||||
|
```c
|
||||||
|
// Compare and swap method
|
||||||
|
int compare_and_swap(int * iIsLocked, int iExpected, int iNewValue) {
|
||||||
|
int iTemp = *iIsLocked;
|
||||||
|
if(*iIsLocked == iExpected)
|
||||||
|
*iIsLocked = iNewValue;
|
||||||
|
return iTemp;
|
||||||
|
}
|
||||||
|
|
||||||
|
// Example using compare and swap method
|
||||||
|
do {
|
||||||
|
// While the lock is in use (i.e. == 1), apply busy waiting
|
||||||
|
while (compare_and_swap(&iIsLocked, 0, 1));
|
||||||
|
// Lock was false, now true
|
||||||
|
// CRITICAL SECTION
|
||||||
|
...
|
||||||
|
iIsLocked = 0;
|
||||||
|
...
|
||||||
|
// remainder section
|
||||||
|
} while (...);
|
||||||
|
```
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
`test_and_set()` and `swap_and_compare()` are **hardware instructions** and **not directly accessible** to the user.
|
||||||
|
|
||||||
|
**Disadvantages**:
|
||||||
|
|
||||||
|
* **Busy waiting** is used. When the process is doing **nothing** just sitting in a loop, the process is still eating up processor time. If I know the process won't be **waiting for long busy waiting is beneficial** however if it is a long time a blocking signal will be sent to the process.
|
||||||
|
* **Deadlock** is possible e.g when two locks are requested in opposite orders in different threads.
|
||||||
|
|
||||||
|
The OS uses the hardware instructions to implement higher level mechanisms/instructions for mutual exclusion i.e. **mutexes** and **semaphores**.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
234
docs/lectures/osc/06_concurrency3.md
Normal file
@@ -0,0 +1,234 @@
|
|||||||
|
16/10/20
|
||||||
|
|
||||||
|
## Mutexes
|
||||||
|
|
||||||
|
**Mutexes** are an approach for mutual exclusion **provided by the operating system** containing a Boolean lock variable to indicate availability.
|
||||||
|
|
||||||
|
> The lock variable is set to **true** if the lock is available
|
||||||
|
>
|
||||||
|
> Two atomic functions are used to **manipulate the mutex**
|
||||||
|
>
|
||||||
|
> 1. `acquire()` - called **before** entering a critical section, Boolean set to **false**
|
||||||
|
> 2. `release()` - called **after** exiting the critical section, Boolean set to **true** again.
|
||||||
|
|
||||||
|
```c
|
||||||
|
acquire() {
|
||||||
|
while(!available); // busy wait
|
||||||
|
available = false;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```c
|
||||||
|
release() {
|
||||||
|
available = true;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`acquire()` and `release()` must be **atomic instructions** .
|
||||||
|
|
||||||
|
* No **interrupts** should occur between reading and setting the lock.
|
||||||
|
* If interrupts can occur, the follow sequence could occur.
|
||||||
|
|
||||||
|
```c
|
||||||
|
T_i => lock available
|
||||||
|
... T_j => lock available
|
||||||
|
... T_j sets lock
|
||||||
|
T_i sets lock ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The process/thread that acquires the lock must **release the lock** - in contrast to semaphores.
|
||||||
|
|
||||||
|
| Pros | Cons |
|
||||||
|
| ------------------------------------------------------------ | ------------------------------------------------------------ |
|
||||||
|
| Context switches can be **avoided**. | Calls to `acquire()` result in **busy waiting**. Shocking performance on single CPU systems. |
|
||||||
|
| Efficient on multi-core systems when locks are **held for a short time**. | A thread can waste it's entire time slice busy waiting. |
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## Semaphores
|
||||||
|
|
||||||
|
> **Semaphores** are an approach for **mutual exclusion** and **process synchronisation** provided by the operating system.
|
||||||
|
>
|
||||||
|
> * They contain an **integer variable**
|
||||||
|
> * We distinguish between **binary** (0-1) and **counting semaphores** (0-N)
|
||||||
|
>
|
||||||
|
> Two **atomic functions** are used to manipulate semaphores**
|
||||||
|
>
|
||||||
|
> 1. `wait()` - called when a resource is **acquired** the counter is decremented.
|
||||||
|
> 2. `signal()` / `post()` is called when the resource is **released**.
|
||||||
|
>
|
||||||
|
> **Strictly positive values** indicate that the semaphore is available, negative values indicate the number of processes/threads waiting.
|
||||||
|
|
||||||
|
```c
|
||||||
|
//Definition of a Semaphore
|
||||||
|
typedef struct {
|
||||||
|
int value;
|
||||||
|
struct process * list;
|
||||||
|
} semaphore;
|
||||||
|
```
|
||||||
|
|
||||||
|
```c
|
||||||
|
wait(semaphore * S) {
|
||||||
|
S->value--;
|
||||||
|
if(S->value < 0) {
|
||||||
|
add process to S->list
|
||||||
|
block(); // system call
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```c
|
||||||
|
post(semaphore * S) {
|
||||||
|
S->value++;
|
||||||
|
if (S->value <= 0) {
|
||||||
|
remove a process P from S->list;
|
||||||
|
wakeup(P); // system call
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
```
|
||||||
|
Thread N
|
||||||
|
...
|
||||||
|
...
|
||||||
|
wait(&s)
|
||||||
|
...
|
||||||
|
(wakeup)
|
||||||
|
...
|
||||||
|
post(&s)
|
||||||
|
```
|
||||||
|
|
||||||
|
Calling `wait()` will **block** the process when the internal **counter is negative** (no busy waiting)
|
||||||
|
|
||||||
|
1. The process **joins the blocked queue**
|
||||||
|
2. The **process/thread state** is changed from running to blocked
|
||||||
|
3. Control is transferred to the process scheduler.
|
||||||
|
|
||||||
|
Calling `post()` **removes a process/thread** from the blocked queue if the counter is less than or equal to 0.
|
||||||
|
|
||||||
|
1. The process/thread state is changed from ***blocked** to **ready**
|
||||||
|
2. Different queuing strategies can be employed to **remove** process/threads e.g. FIFO etc
|
||||||
|
|
||||||
|
The negative value of the semaphore is the **number of processes waiting** for the resource.
|
||||||
|
|
||||||
|
`block()` and `wait()` are system called provided by the OS.
|
||||||
|
|
||||||
|
`post()` and `wait()` **must** be **atomic**
|
||||||
|
|
||||||
|
> Can be achieved through the use of mutexes (or disabling interrupts in a single CPU system)
|
||||||
|
>
|
||||||
|
> Busy waiting is moved from the **critical section** to `wait()` and `post()` (which are short anyway - the original critical sections themselves are usually much longer)
|
||||||
|
|
||||||
|
We must lock the variable `value`.
|
||||||
|
|
||||||
|
```c
|
||||||
|
post(semaphore * S) {
|
||||||
|
// lock the mutex
|
||||||
|
S->value++;
|
||||||
|
if (S->value <= 0) {
|
||||||
|
remove a process P from S->list;
|
||||||
|
wakeup(P); // system call
|
||||||
|
}
|
||||||
|
//unlock mutex
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Semaphores put your code to sleep. Mutexes apply busy waiting to user code.
|
||||||
|
|
||||||
|
Semaphores within the **same process** can be declared as **global variables** of the type `sem_t`
|
||||||
|
|
||||||
|
> * `sem_init()` - initialises the value of the semaphore.
|
||||||
|
> * `sem_wait()` - decrements the value of the semaphore.
|
||||||
|
> * `sem_post()` - increments the values of the semaphore.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Synchronising code does result in a **performance penalty**
|
||||||
|
|
||||||
|
> * Synchronise only **when necessary**
|
||||||
|
>
|
||||||
|
> * Synchronise as **few instructions** as possible (synchronising unnecessary instructions will delay others from entering their critical section)
|
||||||
|
|
||||||
|
```c
|
||||||
|
void * calc(void * increments) {
|
||||||
|
int i, temp = 0;
|
||||||
|
for(i = 0; i < *((int*) increments);i++) {
|
||||||
|
temp++;
|
||||||
|
}
|
||||||
|
sem_wait(&s);
|
||||||
|
sum+=temp;
|
||||||
|
sem_post(&s);
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Figure**: optimised `calc` function
|
||||||
|
|
||||||
|
#### Starvation
|
||||||
|
|
||||||
|
> Poorly designed **queueing approaches** (e.g. LIFO) may results in fairness violations
|
||||||
|
|
||||||
|
#### Deadlock
|
||||||
|
|
||||||
|
> Two or more processes are **waiting indefinitely** for an event that can be caused only by one of the waiting processes or thread.
|
||||||
|
|
||||||
|
#### Priority Inversion
|
||||||
|
|
||||||
|
> Priority inversion happens when a high priority process (`H`) has to wait for a **resource** currently held by a low priority process (`L`)
|
||||||
|
>
|
||||||
|
> Priority inversion can happen in chains e.g. `H` waits for `L` to release a resource and L is interrupted by a medium priority process `M`.
|
||||||
|
>
|
||||||
|
> This can be avoided by implementing priority inheritance to boost `L` to the `H`'s priority.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
## The Producer and Consumer Problem
|
||||||
|
|
||||||
|
> * Producer(s) and consumer(s) share N **buffers** (an array) that are capable of holding **one item each** like a printer queue.
|
||||||
|
> * The buffer can be of bounded (size N) or **unbounded size**.
|
||||||
|
> * There can be one or multiple consumers and or producers.
|
||||||
|
> * The **producer(s)** add items and **goes to sleep** if the buffer is **full** (only for a bounded buffer)
|
||||||
|
> * The **consumer(s)** remove items and **goes to sleep** if the buffer is **empty**
|
||||||
|
|
||||||
|
The simplest version of this problem has **one producer**, **one consumer** and a buffer of **unbounded size**.
|
||||||
|
|
||||||
|
* A counter (index) variable keeps track of the **number of items in the buffer**.
|
||||||
|
* It uses **two binary semaphores:**
|
||||||
|
* `sync` **synchronises** access to the **buffer** (counter) which is initialised to 1.
|
||||||
|
* `delay_consumer` ensures that the **consumer** goes to **sleep** when there are no items available, initialised to 0.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
It is obvious that any manipulations of count will have to be **synchronised**.
|
||||||
|
|
||||||
|
**Race conditions** will still exist:
|
||||||
|
|
||||||
|
> When the consumer has **exhausted the buffer** (when `items == 0`), it should go to sleep but the producer increments `items` before the consumer checks it.
|
||||||
|
>
|
||||||
|
> * Consumer has removed the **last element**
|
||||||
|
> * The producer adds a **new element**
|
||||||
|
> * The consumer should have gone to sleep but no longer will
|
||||||
|
> * The consumer consumes **non-existing elements**
|
||||||
|
>
|
||||||
|
> **Solutions**:
|
||||||
|
>
|
||||||
|
> * Move the consumers' if statement inside the critical section
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Producers and Consumers Problem
|
||||||
|
|
||||||
|
The previous code (one consumer one producer) is made to work by **storing the value of** `items`
|
||||||
|
|
||||||
|
A different variant of the problem has `n` consumers and `m` producers and a fixed buffer size `N`. The solution is based on **3 semaphores**
|
||||||
|
|
||||||
|
1. `sync` - used to **enforce mutual exclusion** for the buffer
|
||||||
|
|
||||||
|
2. `empty` - keeps track of the number of empty buffers, initialised to `N`
|
||||||
|
|
||||||
|
3. `full` - keeps track of the number of **full buffers** initialised to 0.
|
||||||
|
|
||||||
|
The `empty` and `full` are **counting semaphores** and represent **resources**.
|
||||||
|
|
||||||
|

|
||||||
114
docs/lectures/osc/07_concurrency4.md
Normal file
@@ -0,0 +1,114 @@
|
|||||||
|
19/10/20
|
||||||
|
|
||||||
|
## The Dining Philosophers Problem
|
||||||
|
|
||||||
|
<img src="/lectures/osc/assets/w.png" alt="img" style="zoom:67%;" />
|
||||||
|
|
||||||
|
The problem is defined as:
|
||||||
|
|
||||||
|
* **Five philosophers** are sitting on a round table
|
||||||
|
* Each one has a plate of spaghetti
|
||||||
|
* The spaghetti is too slippery, and each philosopher **needs 2 forks** to be able to eat
|
||||||
|
* When hungry, the philosopher tries to acquire the forks on his left and right.
|
||||||
|
|
||||||
|
Note that this reflects the general problem of **sharing a limited set** of resources (forks) between a **number of processes** (philosophers).
|
||||||
|
|
||||||
|
### Solution 1
|
||||||
|
|
||||||
|
**Forks** are represented by **semaphores** (initialised to 1)
|
||||||
|
|
||||||
|
* 1 if the fork is available: the philosopher can continue.
|
||||||
|
* 0 if the fork is unavailable: the philosopher goes to **sleep** if trying to acquire it.
|
||||||
|
|
||||||
|
Solution: Every philosopher picks up one fork and waits for the second fork to become available (without putting the first one down).
|
||||||
|
|
||||||
|
This solution will **deadlock** every time.
|
||||||
|
|
||||||
|
> * The deadlock can be avoided by exponential decay. This is where a philosopher puts down their fork and waits for a random amount of time. (this is how Ethernet systems avoid data collisions)
|
||||||
|
> * Just **add another fork**
|
||||||
|
|
||||||
|
### Solution 2
|
||||||
|
|
||||||
|
**One global mutex** set by a philosopher when they want to eat (only **one can eat at a time**)
|
||||||
|
|
||||||
|
*Question*: Can I initialise the value of the `eating` semaphore to 2 to create more parallelism?
|
||||||
|
|
||||||
|
Setting the semaphore to 2 allows the possibility of 2 philosophers to eat at one time. If these two philosophers are sitting next to each other then they will try to grab the same fork. The code will not deadlock, however only one (sometimes two) philosopher(s) is able to eat.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Solution 3
|
||||||
|
|
||||||
|
A more sophisticated solution is necessary to allow **maximum parallelism**
|
||||||
|
|
||||||
|
The solution uses:
|
||||||
|
|
||||||
|
> * `state[N]` : one **state variable** for every philosopher (`THINKING` `HUNGRY` and `EATING`)
|
||||||
|
> * `phil[N] ` : one **semaphore per philosopher** (i.e. **not forks** initialised to 0)
|
||||||
|
> * The philosopher goes to sleep if one of their neighbours are eating
|
||||||
|
> * The neighbours wake up the philosopher if they have finished eating
|
||||||
|
> * `sync` : one **semaphore/mutex** to enforce **mutual exclusion** of the critical section (while updating the **states** of `hungry` `thinking` and `eating`)
|
||||||
|
> * A philosopher can only **start eating** if their neighbours are **not eating**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Code for Solution 3
|
||||||
|
|
||||||
|
```c
|
||||||
|
#define N 5
|
||||||
|
#define THINKING 1
|
||||||
|
#define HUNGRY 2
|
||||||
|
#define EATING 3
|
||||||
|
|
||||||
|
int state[N] = {THINKING, THINKING, THINKING, THINKING, THINKING};
|
||||||
|
sem_t phil[N]; // sends philosopher to sleep
|
||||||
|
sem_t sync;
|
||||||
|
|
||||||
|
void * philosopher(void * id) {
|
||||||
|
int i = *((int *) id);
|
||||||
|
|
||||||
|
while(1) {
|
||||||
|
printf("%d is thinking\n", i);
|
||||||
|
take_forks(i);
|
||||||
|
printf("%d is eating\n", i);
|
||||||
|
put_forks(i);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void take_forks(int i) {
|
||||||
|
sem_wait(&sync);
|
||||||
|
state[i] = HUNGRY;
|
||||||
|
test(i); //checks surrounding philosophers to see if its ok to eat
|
||||||
|
sem_post(&sync);
|
||||||
|
sem_wait(&phil[i]); //1 -> 0
|
||||||
|
}
|
||||||
|
|
||||||
|
void test(int i) {
|
||||||
|
int left = (i + N - 1) % N;
|
||||||
|
int right = (i + 1) % N;
|
||||||
|
if(state[i] == HUNGRY && state[left] != EATING && state[right] != EATING) {
|
||||||
|
state[i] = EATING;
|
||||||
|
sem_post(&phil[i]); //0 -> 1
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
void put_forks(int i) {
|
||||||
|
int left = (i + N - 1) % N;
|
||||||
|
int right = (i + 1) % N;
|
||||||
|
sem_wait(&sync);
|
||||||
|
state[i] = THINKING;
|
||||||
|
test(left);
|
||||||
|
test(right);
|
||||||
|
sem_post(&sync);
|
||||||
|
}
|
||||||
|
|
||||||
|
void test(int i) {
|
||||||
|
int left = (i + N - 1) % N;
|
||||||
|
int right = (i + 1) % N;
|
||||||
|
if(state[i] == HUNGRY && state[left] != EATING && state[right] != EATING) {
|
||||||
|
state[i] = EATING;
|
||||||
|
sem_post(&phil[i]);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
120
docs/lectures/osc/08_concurrency6.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
23/10/20
|
||||||
|
|
||||||
|
## The readers-writers Problem
|
||||||
|
|
||||||
|
* Reading a record (or a variable) can happen in parallel without problems, **writing needs synchronisation** (or exclusive access).
|
||||||
|
|
||||||
|
* Different solutions exist:
|
||||||
|
|
||||||
|
> * Solution 1: naive implementation with limited parallelism
|
||||||
|
> * Solution 2: **readers** receive **priority**. No reader is kept waiting unless a writer already has access (writers may starve).
|
||||||
|
> * Solution 3: **writing** is performed as soon as possible (readers may starve).
|
||||||
|
|
||||||
|
### Solution 1: No parallelism
|
||||||
|
|
||||||
|
```c
|
||||||
|
void * reader(void * arg)
|
||||||
|
{
|
||||||
|
while(1)
|
||||||
|
{
|
||||||
|
pthread_mutex_lock(&sync);
|
||||||
|
printf("reading record\n");
|
||||||
|
pthread_mutex_unlock(&sync);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
void * writer(void * writer)
|
||||||
|
{
|
||||||
|
while(1)
|
||||||
|
{
|
||||||
|
pthread_mutex_lock(&sync);
|
||||||
|
printf("writing\n");
|
||||||
|
pthread_mutex_unlock(&sync);
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This prevents **parallel reading**.
|
||||||
|
|
||||||
|
### Solution 2: Allows parallel reading
|
||||||
|
|
||||||
|
A correct implementation requires:
|
||||||
|
|
||||||
|
> `iReadCount`: an integer tracking the number of readers
|
||||||
|
>
|
||||||
|
> * if `iReadCount` > 0: writers are blocked `sem_wait(rwSync)`
|
||||||
|
> * if `iReadCount` == 0: writers are released `sem_post(rwSync)`
|
||||||
|
> * if already writing, readers must wait
|
||||||
|
>
|
||||||
|
> `sync`: a mutex for mutual exclusion of `iReadCount`.
|
||||||
|
>
|
||||||
|
> `rwSync` : a semaphore that synchronises the readers and writers, set by the first/last reader.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
`sync` is used to `mutex_lock` and `mutex_unlock` when the `iReadCount` is being modified.
|
||||||
|
|
||||||
|
When `iReadCount == 1`, the `sem_wait(&rwSync)` is used to block the writer from writing. Further down in the code when `iReadCount == 0`, the `sem_post(&rwSync)` is called to 'wake up' the writer, so that it can write.
|
||||||
|
|
||||||
|
When we say 'send process to sleep' or 'wake up a process' we actually mean: move that process from the blocked queue to the ready queue (or visa versa).
|
||||||
|
|
||||||
|
If the `iReadCount == 1` is run when the writer is writing. The `sem_wait(&rwSync)` will go from 0 -> -1, forcing the reader to go to sleep. As soon as the writer is done, the `sem_post(&rwSync)` is run meaning it goes from -1 -> 0, which wakes the reader up.
|
||||||
|
|
||||||
|
Unless `iReadCount` reaches 0, writing will not happen. **This means writers can easily starve if there are multiple readers**.
|
||||||
|
|
||||||
|
### Solution 3: Gives priority to the writer
|
||||||
|
|
||||||
|
**Solution 3 uses:**
|
||||||
|
|
||||||
|
> * `iReadCount` and `iWriteCount`: to keep track of the number of readers and writers.
|
||||||
|
> * `sRead`/`sWrite`: to synchronise the **reader/writer's critical section**.
|
||||||
|
> * `sReadTry`: to **stop readers** when there is a **writer waiting**.
|
||||||
|
> * `sResource`: to **synchronise** the resource for **reading/writing**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
[explanation time stamp 43:35]
|
||||||
|
|
||||||
|
`sRead` and `sWrite` are used whenever `iReadCount` and `iWriteCount` are used respectively. Unlike the mutex in the last example it is important that the same semaphore variable isn't used for both `iReadCount` and `iWriteCount`.
|
||||||
|
|
||||||
|
There is no reason the read and write count cannot be changed at the same time. If you were to use the same semaphore then you would be limiting the parallelism of your code (slowing run time).
|
||||||
|
|
||||||
|
In the case `iWriteCount == 1` the `sReadTry` is set from 1 -> 0, meaning that no new readers can attempt to read. For the writer to begin writing, it must wait for the readers to finish reading (due to the `sResource` semaphore.
|
||||||
|
|
||||||
|
So when `iReadCount --`, the reader checks if it is the last reader by `iReadCount == 0`, and if it is it unlocks `sResource` (-1->0) so that the writers can write. If more readers show up, they cannot enter as `sReadTry == -1`.
|
||||||
|
|
||||||
|
The last writer does the same thing, but instead of unlocking the resource it unlocks the `sReadTry` semaphore.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
137
docs/lectures/osc/09_mem_management1.md
Normal file
@@ -0,0 +1,137 @@
|
|||||||
|
29/10/20
|
||||||
|
|
||||||
|
## Memory Overview
|
||||||
|
|
||||||
|
Computers typically have memory hierarchies:
|
||||||
|
|
||||||
|
> * Registers
|
||||||
|
> * L1/L2/L3 cache
|
||||||
|
> * Main memory (RAM)
|
||||||
|
> * Disks
|
||||||
|
|
||||||
|
**Higher Memory** is faster, more expensive and volatile. **Lower Memory** is slower, cheaper and non-volatile.
|
||||||
|
|
||||||
|
The operating system provides **memory abstraction** for the user. Otherwise memory can be seen as one **linear array** of bytes/words.
|
||||||
|
|
||||||
|
### OS Responsibilities
|
||||||
|
|
||||||
|
* Allocate/de-allocate memory when requested by processes, keep track of all used/unused memory.
|
||||||
|
* Distribute memory between processes and simulate an **indefinitely large** memory space. The OS must create the illusion of having infinite main memory, processes assume they have access to all main memory.
|
||||||
|
* **Control access** when multi programming is applied.
|
||||||
|
* **Transparently** move data from **memory** to **disk** and vice versa.
|
||||||
|
|
||||||
|
#### Partitioning
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### Contiguous memory management
|
||||||
|
|
||||||
|
Allocates memory in **one single block** without any holes or gaps.
|
||||||
|
|
||||||
|
##### Non-contiguous memory management models
|
||||||
|
|
||||||
|
Where memory is allocated in multiple blocks, or segments, which may not be placed next to each other in physical memory.
|
||||||
|
|
||||||
|
**Mono-programming:** one single partition for user processes.
|
||||||
|
|
||||||
|
**Multi-programming** with **fixed partitions**
|
||||||
|
|
||||||
|
* Fixed **equal** sized partitions
|
||||||
|
* Fixed non-equal sized partitions
|
||||||
|
|
||||||
|
**Multi-programming** with **dynamic partitions**
|
||||||
|
|
||||||
|
#### Mono-programming
|
||||||
|
|
||||||
|
> * Only one single user process is in memory/executed at any point in time.
|
||||||
|
> * A fixed region of memory is allocated to the OS & kernal, the remaining memory is reserved for a single process
|
||||||
|
> * This process has direct access to physical memory (no address translation takes place)
|
||||||
|
> * Every process is allocated **contiguous block memory** (no holes or gaps)
|
||||||
|
> * One process is allocated the **entire memory space** and the process is always located in the same address space.
|
||||||
|
> * **No protection** between different user processes required. Also no protection between the running process and the OS, so sometimes that process can access pieces of the OS its not meant to.
|
||||||
|
>
|
||||||
|
> * Overlays enable the **programmer** to use **more memory than available**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### Short comings of mono-programming
|
||||||
|
|
||||||
|
> * Since a process has direct access to the physical memory, it may have access to the OS memory.
|
||||||
|
> * The OS can be seen as a process - so we have **two processes anyway**.
|
||||||
|
> * **Low utilisation** of hardware resources (CPU, I/O devices etc)
|
||||||
|
> * Mono-programming is unacceptable as **multi-programming is excepted** on modern machines
|
||||||
|
|
||||||
|
**Direct memory access** and **mono-programming** are common in basic embedded systems and modern consumer electronics eg washing machines, microwaves, cars etc.
|
||||||
|
|
||||||
|
##### Simulating Multi-Programming
|
||||||
|
|
||||||
|
We can simulate multi-programming through **swapping**
|
||||||
|
|
||||||
|
* **Swap process** out to the disk and load a new one (context switches would become **time consuming**)
|
||||||
|
|
||||||
|
Why Multi-Programming is better theoretically
|
||||||
|
|
||||||
|
> * There are *n* **processes in memory**
|
||||||
|
> * A process spends *p* percent of its time **waiting for I/O**
|
||||||
|
> * **CPU Utilisation** is calculated as 1 minus the time that all processes are waiting for I/O
|
||||||
|
> * The probability that **all** *n* **processes are waitying for I/O is *p*^n^
|
||||||
|
> * Therefore CPU utilisation is given by $1 - p^{n}$
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
With an **I/O wait time of 20%** almost **100% CPU utilisation** can be achieved with four processes ($1-0.2^{4}$)
|
||||||
|
|
||||||
|
With an **I/O wait time of 90%**, 10 processes can achieve about **65% CPU utilisation**. ($1-0.9^{10}$)
|
||||||
|
|
||||||
|
CPU utilisation **goes up** with the **number of processes** and **down** for **increasing levels of I/O**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Assume that**:
|
||||||
|
|
||||||
|
> * A computer has one megabyte of memory
|
||||||
|
> * The OS takes up 200k, leaving room for four 200k processes
|
||||||
|
|
||||||
|
**Then:**
|
||||||
|
|
||||||
|
> * If we have an I/O wait time of 80%, then we will achieve just under 60% CPU utilisation (1-0.8^4^)
|
||||||
|
> * If we add another megabyte of memory, it would allow us to run another five processes. We can now achieve about **87%** CPU utilisation (1-0.8^9^)
|
||||||
|
> * If we add another megabyte of memory (14 processes) we find that CPU utilisation will increase to around **96%**
|
||||||
|
|
||||||
|
##### Caveats
|
||||||
|
|
||||||
|
* This model assumes that all processes are independent, this is not true.
|
||||||
|
* More complex models could be built using **queuing theory** but we still use this simplistic model to make **approximate predictions**
|
||||||
|
|
||||||
|
#### Fixed Size Partitions
|
||||||
|
|
||||||
|
* Divide memory into **static**, **contiguous** and **equal sized** partitions that have a fixed **size and location**.
|
||||||
|
* Any process can take **any** partition. (as long as its large enough)
|
||||||
|
* Allocation of **fixed equal sized partitions to processes is trivial**
|
||||||
|
* Very **little overhead** and **simple implementation**
|
||||||
|
* The OS keeps a track of which partitions are being **used** and which are **free**.
|
||||||
|
|
||||||
|
##### Disadvantages
|
||||||
|
|
||||||
|
* Partition may be necessarily large
|
||||||
|
* Low memory utilisation
|
||||||
|
* Internal fragmentation
|
||||||
|
* **Overlays** must be used if a program does not fit into a partition (burden on the programmer)
|
||||||
|
|
||||||
|
#### Fixed Partitions of non-equal size
|
||||||
|
|
||||||
|
* Divide memory into **static** and **non-equal sized partitions** that have **fixed size and location**
|
||||||
|
* Reduces **internal fragmentation**
|
||||||
|
* The **allocation** of processes to partitions must be **carefully considered**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**One private queue per partition**:
|
||||||
|
|
||||||
|
* Assigns each process to the smallest partition that it would fit in.
|
||||||
|
* Reduces **internal fragmentation**.
|
||||||
|
* Can reduce memory utilisation (eg lots of small jobs result in unused large partitions)
|
||||||
|
|
||||||
|
**A single shared queue:**
|
||||||
|
|
||||||
|
* Increased internal fragmentation as small processes are allocated into big partitions.
|
||||||
152
docs/lectures/osc/10_mem_management2.md
Normal file
@@ -0,0 +1,152 @@
|
|||||||
|
30/10/20
|
||||||
|
|
||||||
|
## Relocation and Protection
|
||||||
|
|
||||||
|
```c
|
||||||
|
#include <stdio.h>
|
||||||
|
|
||||||
|
int iVar = 0;
|
||||||
|
|
||||||
|
int main() {
|
||||||
|
|
||||||
|
int i = 0;
|
||||||
|
|
||||||
|
while(i < 10) {
|
||||||
|
iVar++;
|
||||||
|
sleep(2);
|
||||||
|
printf("Address:%x; Value:%d\n",&iVar, iVar);
|
||||||
|
i++;
|
||||||
|
}
|
||||||
|
return 0;
|
||||||
|
}
|
||||||
|
//If running the code twice (simultaneously):
|
||||||
|
//Will the same or different addresses be displayed for iVar?
|
||||||
|
//Will the value for iVar in the first run influence the value for iVar in the second run?
|
||||||
|
```
|
||||||
|
|
||||||
|
The addresses will be the same, as memory management within a process is the same. The process doesn't know where it is in memory, however the process is allocated the same amount of memory and the address is relative to the process.
|
||||||
|
|
||||||
|
If the process is run twice, they are allocated two different memory spaces, so the addresses will be the same.
|
||||||
|
|
||||||
|
[explanation 8:05]
|
||||||
|
|
||||||
|
## Relocation
|
||||||
|
|
||||||
|
When a program is run, it does not know in advance which partition it will occupy.
|
||||||
|
|
||||||
|
* The program **cannot** simply **generate static addresses** (like jump instructions) that are absolute
|
||||||
|
* **Addresses should be relative to where the program has been loaded**.
|
||||||
|
* Relocation must be **solved in an operating system** that allows **processes to run at changing memory locations**.
|
||||||
|
|
||||||
|
**Protection**: Once you can have two programs in memory at the same time, protection must be enforced.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Logical Address**: is a memory address seen by the process
|
||||||
|
|
||||||
|
* It is independent of the current physical memory assignment
|
||||||
|
* It is relative to the start of the program
|
||||||
|
|
||||||
|
**Physical address**: refers to an actual location in main memory
|
||||||
|
|
||||||
|
The **logical address space** must be **mapped** onto the **machines physical address space**.
|
||||||
|
|
||||||
|
### Static Relocation
|
||||||
|
|
||||||
|
This happens at compile time, a process has to be located at the same location every single time (impractical)
|
||||||
|
|
||||||
|
### Dynamic Relocation
|
||||||
|
|
||||||
|
This happens at load time
|
||||||
|
|
||||||
|
* An **offset is added to every logical address** to account for its physical location in memory.
|
||||||
|
* **Slows down the loading** of a process, does not account for **swapping**
|
||||||
|
|
||||||
|
### Dynamic Relocation at run-time
|
||||||
|
|
||||||
|
Two special purpose registers are maintained in the CPU (the **MMU**) containing a **base address** and **limit**
|
||||||
|
|
||||||
|
> * The **base register** stores the **start address** of the partition.
|
||||||
|
> * The **limit register** holds the **size** of the partition.
|
||||||
|
>
|
||||||
|
> At **run-time**
|
||||||
|
>
|
||||||
|
> * The base register is added to the **logical (relative) address** to generate the physical address.
|
||||||
|
> * The resulting address is **compared** against the **limit register**. This allows us to see the bounds of where the process exists.
|
||||||
|
>
|
||||||
|
> NOTE: This requires **hardware support** (which didn't exist in the early days).
|
||||||
|
|
||||||
|
<img src="/lectures/osc/assets/G.png" alt="registers" style="zoom:80%;" />
|
||||||
|
|
||||||
|
#### Dynamic Partitioning
|
||||||
|
|
||||||
|
**Fixed partitioning** results in **internal fragmentation**:
|
||||||
|
|
||||||
|
> An exact match between the requirements of the process and the available partitions **may not exist**.
|
||||||
|
>
|
||||||
|
> * This means the partition may **not be used in its entirety**.
|
||||||
|
|
||||||
|
**Dynamic Partitioning**
|
||||||
|
|
||||||
|
> * A **variable number of partitions** of which the **size** and **starting address** can **change over time**.
|
||||||
|
> * A process is allocated the **exact amount** of **contiguous memory it requires**, thereby preventing internal fragmentation.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
**Swapping** holds some of the **processes** on the drive and **shuttles processes** between the drive and main memory as necessary
|
||||||
|
|
||||||
|
Reasons for **swapping**:
|
||||||
|
|
||||||
|
> * Some processes only **run occasionally**.
|
||||||
|
> * We have more **processes** than **partitions**.
|
||||||
|
> * A process's **memory requirements** may have **changed**.
|
||||||
|
> * The **total amount of memory that is required** for the process **exceeds the available memory**.
|
||||||
|
|
||||||
|
For any given process, we might not know the exact **memory requirements**. This is because processes may involve dynamic parts.
|
||||||
|
|
||||||
|
> Solution (??):
|
||||||
|
>
|
||||||
|
> Allocate the current requirements **AND** a 'bit extra'
|
||||||
|
>
|
||||||
|
> The 'bit extra' will try and account for the dynamic nature of the process.
|
||||||
|
>
|
||||||
|
> If the process out grows it's partition, then it is shuttled out onto the main disk and allocated a new partition.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### External Fragmentation
|
||||||
|
|
||||||
|
> * Swapping a process out of memory will **create 'a hole'**
|
||||||
|
> * A new process may not **use the entire 'hole'**, leaving a small **unused block**
|
||||||
|
> * A new process may be **too large for a given 'hole'**
|
||||||
|
>
|
||||||
|
> The **overhead** of memory **compaction** to **recover holes** can be **prohibitive** and requires **dynamic relocation**.
|
||||||
|
|
||||||
|
##### Allocation Structures
|
||||||
|
|
||||||
|
**Bitmaps**:
|
||||||
|
|
||||||
|
> * The simplest data structure that can be used is a **bitmap**.
|
||||||
|
> * **Memory is split into blocks** of 4 Kb size.
|
||||||
|
> * A bitmap is set up so that each **bit is 0** if the memory block is free, and 1 if the **block is being used**
|
||||||
|
> * 32 Mb memory / 4 Kb blocks = 8192 bitmap entries
|
||||||
|
> * 8192 bits occupy 1 Kb of storage (8192 / 8)
|
||||||
|
> * The size of this bitmap will depend on the **size of the memory** and the **size of the allocation unit**.
|
||||||
|
> * To find a hole of say 128 K, then a group of **32 adjacent bits set to 0** must be found
|
||||||
|
> * Typically a long operation, the longer it takes, the lower the CPU utilisation is.
|
||||||
|
> * A **trade-off exists** between the **size of the bitmap** and the **size of the blocks**
|
||||||
|
> * The size of the bitmaps can become prohibitive for small blocks and may make searching the bitmap slower
|
||||||
|
> * Larger blocks may increase internal fragmentation.
|
||||||
|
> * **Bitmaps are rarely used** because of this trade off
|
||||||
|
|
||||||
|
**Linked List**:
|
||||||
|
|
||||||
|
A more **sophisticated data structure** is required to deal with a **variable number** of **free and used partitions**.
|
||||||
|
|
||||||
|
> * A linked list consists of a **number of entries** (links)
|
||||||
|
> * Each link **contains data items** e.g. **start of memory block**, **size** and a flag for free and allocated
|
||||||
|
> * It also contains a pointer to the next link.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
110
docs/lectures/osc/11_mem_management3.md
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
05/11/20
|
||||||
|
|
||||||
|
## Dynamic Partitioning Management
|
||||||
|
|
||||||
|
### Allocating Available Memory
|
||||||
|
|
||||||
|
#### First Fit
|
||||||
|
|
||||||
|
> * First fit starts scanning **from the start** of the linked list until a link is found, which can fit the process
|
||||||
|
> * If the requested space is **the exact same size** as the 'hole', all the space is allocated
|
||||||
|
> * Otherwise the free link is split into two:
|
||||||
|
> * The first entry is set to the **size requested** and marked **used**
|
||||||
|
> * The second entry is set to **remaining size** and **free**.
|
||||||
|
|
||||||
|
#### Next Fit
|
||||||
|
|
||||||
|
> * The next fit algorithm maintains a record of where it got to last time and restarts it's search from there
|
||||||
|
> * This gives an even chance to all memory to get allocated (first fit concentrates on the start of the list)
|
||||||
|
> * However simulations have been run which show that this is worse than first fit.
|
||||||
|
> * This is because a side effect of **first fit** is that it leaves larger partitions towards the end of memory, which is useful for larger processes.
|
||||||
|
|
||||||
|
#### Best Fit
|
||||||
|
|
||||||
|
> * The best fit algorithm always **searches the entire linked list** to find the smallest hole that's big enough to fit the memory requirements of the process.
|
||||||
|
> * It is **slower** than first fit
|
||||||
|
> * It also results in more wasted memory. As a exact sized hole is unlikely to be found, this leaves tiny (and useless) holes.
|
||||||
|
>
|
||||||
|
> Complexity: $O(n)$
|
||||||
|
|
||||||
|
#### Worst Fit
|
||||||
|
|
||||||
|
> Tiny holes are created when best fit split an empty partition.
|
||||||
|
>
|
||||||
|
> * The **worst fit algorithm** finds the **largest available empty partition** and splits it.
|
||||||
|
> * The **left over partition** is hopefully **still useful**
|
||||||
|
> * However simulations show that this method **isn't very good**.
|
||||||
|
>
|
||||||
|
> Complexity: $O(n)$
|
||||||
|
|
||||||
|
#### Quick Fit
|
||||||
|
|
||||||
|
> * Quick fit maintains a **list of commonly used sizes**
|
||||||
|
> * For example a separate list for each of 4 Kb, 8 Kb, 12 Kb etc holes
|
||||||
|
> * Odd sized holes can either go into the nearest size or into a special separate list.
|
||||||
|
> * This is much f**aster than the other solutions**, however similar to **best fit** it creates **many tiny holes**.
|
||||||
|
> * Finding neighbours for **coalescing** (combining empty partitions) becomes more difficult & time consuming.
|
||||||
|
|
||||||
|
### Coalescing
|
||||||
|
|
||||||
|
Coalescing (join together) takes place when **two adjacent entries** in the linked list become free.
|
||||||
|
|
||||||
|
* Both neighbours are examined when a **block is freed**
|
||||||
|
* If either (or both) are also **free** then the two (or three) **entries are combined** into one larger block by adding up the sizes
|
||||||
|
* The earlier block in the linked list gives the **start point**
|
||||||
|
* The **separate links are deleted** and a **single link inserted**.
|
||||||
|
|
||||||
|
### Compacting
|
||||||
|
|
||||||
|
Even with coalescing happening automatically, **free blocks** may still be **distributed across memory**
|
||||||
|
|
||||||
|
> * Compacting can be used to join free and used memory
|
||||||
|
> * However compacting is more **difficult and time consuming** to implement then coalescing.
|
||||||
|
> * Each **process is swapped** out & **free space coalesced**.
|
||||||
|
> * Processes are swapped back in at lowest available location.
|
||||||
|
|
||||||
|
## Paging
|
||||||
|
|
||||||
|
Paging uses the principles of **fixed partitioning** and **code re-location** to devise a new **non-contiguous management scheme**
|
||||||
|
|
||||||
|
> * Memory is split into much **smaller blocks** and **one or multiple blocks** are allocated to a process (e.g. a 11 Kb process would take 3 blocks of 4 Kb)
|
||||||
|
> * These blocks **do not have to be contiguous in main memory**, but **the process still perceives them to be contiguous**
|
||||||
|
> * Benefits:
|
||||||
|
> * **Internal fragmentation** is reduced to the **last block only** (e.g. previous example the third block, only 3 Kb will be used)
|
||||||
|
> * There is **no external fragmentation**, since physical blocks are **stacked directly onto each other** in main memory.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
A **page** is a **small block** of **contiguous memory** in the **logical address space** (as seen by the process)
|
||||||
|
|
||||||
|
* A **frame** is a **small contiguous block** in **physical memory**.
|
||||||
|
* Pages and frames (usually) have the **same size**:
|
||||||
|
* The size is usually a power of 2.
|
||||||
|
* Size range between 512 bytes and 1 Gb. (most common 4 Kb pages & frames)
|
||||||
|
|
||||||
|
**Logical address** (page number, offset within page) needs to be **translated** into a **physical address** (frame number, offset within frame)
|
||||||
|
|
||||||
|
* Multiple **base registers** will be required
|
||||||
|
* Each logical page needs a **separate base register** that specifies the start of the associated frame
|
||||||
|
* i.e a **set of base registers** has to be maintained for each process
|
||||||
|
* The base registers are stored in the **page table**
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The page table can be seen as a **function**, that **maps the page number** of the logical address **onto the frame number** of the physical address
|
||||||
|
$$
|
||||||
|
frameNumber = f(pageNumber)
|
||||||
|
$$
|
||||||
|
|
||||||
|
* The **page number** is used as an **index to the page table** that lists the **location of the associated frame**.
|
||||||
|
* It is the OS' duty to maintain a list of **free frames**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
We can see that the **only difference** between the logical address and physical address is the **4 left most bits** (the **page number and frame number**). As **pages and frames are the same size**, then the **offset value will be the same for both**.
|
||||||
|
|
||||||
|
This allows for **more optimisation** which is important as this translation will need to be **run for every memory read/write** call.
|
||||||
136
docs/lectures/osc/12_mem_management4.md
Normal file
@@ -0,0 +1,136 @@
|
|||||||
|
06/11/20
|
||||||
|
|
||||||
|
## Paging implementation
|
||||||
|
|
||||||
|
Benefits of paging
|
||||||
|
|
||||||
|
* **Reduced internal fragmentation**
|
||||||
|
* No **external fragmentation**
|
||||||
|
* Code execution and data manipulation are usually **restricted to a small subset** (i.e limited number of pages) at any point in time.
|
||||||
|
* **Not all pages** have to be **loaded in memory** at the **same time** => **virtual memory**
|
||||||
|
* Loading an entire set of pages for an entire program/data set into memory is **wasteful**
|
||||||
|
* Desired blocks could be **loaded on demand**.
|
||||||
|
* This is called the **principle of locality**.
|
||||||
|
|
||||||
|
#### Memory as a linear array
|
||||||
|
|
||||||
|
> * Memory can be seen as one **linear array** of **bytes** (words)
|
||||||
|
> * Address ranges from $0 - (N-1)$
|
||||||
|
> * N address lines can be used to specify $2^N$ distinct addresses.
|
||||||
|
|
||||||
|
### Address Translation
|
||||||
|
|
||||||
|
* A **logical address** is relative to the start of the **program (memory)** and consists of two parts:
|
||||||
|
* The **right most** $m$ **bits** that represent the **offset within the page** (and frame) .
|
||||||
|
* $m$ often is 12 bits
|
||||||
|
* The **left most** $n$ **bits** that represent the **page number** (and frame number they're the same thing)
|
||||||
|
* $n$ is often 4 bits
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Steps in Address Translation
|
||||||
|
|
||||||
|
> 1. **Extract the page number** from logical address
|
||||||
|
> 2. Use page number as an **index** to **retrieve the frame number** in the **page table**
|
||||||
|
> 3. **Add the logical offset within the page** to the start of the physical frame
|
||||||
|
>
|
||||||
|
> **Hardware Implementation**
|
||||||
|
>
|
||||||
|
> 1. The CPU's **memory management uni** (MMU) intercepts logical addresses
|
||||||
|
> 2. MMU uses a page table as above
|
||||||
|
> 3. The resulting **physical address** is put on the **memory bus**.
|
||||||
|
>
|
||||||
|
> Without this specialised hardware, paging wouldn't be quick enough to be viable.
|
||||||
|
|
||||||
|
### Principle of Locality
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
We have more pages here, than we can physically store as frames.
|
||||||
|
|
||||||
|
**Resident set**: The set of pages that are loaded in main memory. (In the above image, the resident set consists of the pages not marked with an 'X')
|
||||||
|
|
||||||
|
#### Page Faults
|
||||||
|
|
||||||
|
> A **page fault** is generated if the processor accesses a page that is **not in memory**
|
||||||
|
>
|
||||||
|
> * A page fault results in an interrupt (process enters **blocked state**)
|
||||||
|
> * An **I/O operation** is started to bring the missing page into main memory
|
||||||
|
> * A **context switch** (may) take place.
|
||||||
|
> * An **interrupt signal** shows that the I/O operation is complete and the process **enters the ready state**.
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Trap operating system
|
||||||
|
- Save registers / process state
|
||||||
|
- Analyse interrupt (i.e. identify the interrupt is a page fault)
|
||||||
|
- Validate page reference, determine page location
|
||||||
|
- Issue disk I/O: queueing, seek, latency, transfer
|
||||||
|
2. Context switch (optional)
|
||||||
|
3. Interrupt for I/O completion
|
||||||
|
- Store process state / registers
|
||||||
|
- Analyse interrupt from disk
|
||||||
|
- Update page table (page in memory) **
|
||||||
|
- Wait for original process to be sceduled
|
||||||
|
4. Context switch to original process
|
||||||
|
```
|
||||||
|
|
||||||
|
### Virtual Memory
|
||||||
|
|
||||||
|
#### Benefits
|
||||||
|
|
||||||
|
> * Being able to maintain **more processes** in main memory through the use of virtual memory **improves CPU utilisation**
|
||||||
|
> * Individual processes take up less memory since they are only partially loaded
|
||||||
|
> * Virtual memory allows the **logical address space** (processes) to be larger than **physical address space** (main memory)
|
||||||
|
> * 64 bit machine => 2^64^ logical addresses (theoretically)
|
||||||
|
|
||||||
|
#### Contents of a page entry
|
||||||
|
|
||||||
|
> * A **present/absent bit** that is set if the frame is in main memory or not.
|
||||||
|
> * A **modified bit** that is set if the page/frame has been modified (only modified pages have to be written back to the disk when evicted. This makes sure the pages and frames are kept in sync).
|
||||||
|
> * A **referenced bit** that is set if the page is in use (If you needed to free up space in main memory, move a page, however it is important that a page not in use is moved).
|
||||||
|
> * **Protection and sharing bits**: read, write, execute or various different combos of those.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### Page Table Size
|
||||||
|
|
||||||
|
> * On a **16 bit machine**, the total address space is 2^16^
|
||||||
|
> * Assuming that 10 bits are used for the offset (2^10^)
|
||||||
|
> * 6 bits can be used to number the pages
|
||||||
|
> * This means 2^6^ or 64 pages can be maintained
|
||||||
|
> * On a **32 bit machine**, 2^20^ or ~10^6^ pages can be maintained
|
||||||
|
> * On a **64 bit machine**, this number increases a lot. This means the page table becomes stupidly large.
|
||||||
|
|
||||||
|
Where do we **store page tables with increasing size**?
|
||||||
|
|
||||||
|
* Perfect world would be registers - however this isn't possible due to size
|
||||||
|
* They will have to be stored in (virtual) **main memory**
|
||||||
|
* **Multi-level** page tables
|
||||||
|
* **Inverted page tables** (for large virtual address spaces)
|
||||||
|
|
||||||
|
However if the page table is to be stored in main memory, we must maintain acceptable speeds. The solution is to page the page table.
|
||||||
|
|
||||||
|
### Multi-level Page Tables
|
||||||
|
|
||||||
|
We use a tree-like structure to hold the page tables
|
||||||
|
|
||||||
|
* Divide the page number into
|
||||||
|
* An index to a page table of second level
|
||||||
|
* A page within a second level page table
|
||||||
|
|
||||||
|
This means there's no need to keep all the page tables in memory all the time!
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The above image has 2 levels of page tables.
|
||||||
|
|
||||||
|
> * The **root page table** is always maintained in memory.
|
||||||
|
> * Page tables themselves are **maintained in virtual memory** due to their size.
|
||||||
|
>
|
||||||
|
> Assume that a **fetch** from main memory takes *T* nano-seconds
|
||||||
|
>
|
||||||
|
> * With a **single page table level**, access is $2 \cdot T$
|
||||||
|
> * With **two page table levels**, access is $3 \cdot T$
|
||||||
|
> * and so on...
|
||||||
|
>
|
||||||
|
> We can have many levels as the address space in 64 bit computers is so massive.
|
||||||
168
docs/lectures/osc/13_mem_management5.md
Normal file
@@ -0,0 +1,168 @@
|
|||||||
|
12/11/20
|
||||||
|
|
||||||
|
## Page Tables Optimisations
|
||||||
|
|
||||||
|
##### Memory Organisation
|
||||||
|
|
||||||
|
* The **root page table** is always maintained in memory.
|
||||||
|
* Page tables themselves are maintained in **virtual memory** due to their size.
|
||||||
|
* Assume a **fetch** from main memory takes $T$ time - single page table access is now $2\cdot T$ and **two** page table levels access is $3 \cdot T$.
|
||||||
|
* Some optimisation needs to be done, otherwise memory access will create a bottleneck to the speed of the computer.
|
||||||
|
|
||||||
|
### Translation Look Aside Buffers
|
||||||
|
|
||||||
|
* Translation look aside buffers or TLBs are (usually) located inside the memory management unit
|
||||||
|
* They **cache** the most frequently used page table entries.
|
||||||
|
* As they're stored in cache its super quick.
|
||||||
|
* They can be searched in **parallel**.
|
||||||
|
* The principle behind TLBs is similar to other types of **caching in operating systems**. They normally store anywhere from 16 to 512 pages.
|
||||||
|
* Remember: **locality** states that processes make a large number of references to a small number of pages.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The split arrows going into the TLB represent searching in parallel.
|
||||||
|
|
||||||
|
* If the TLB gets a hit, it just returns the frame number
|
||||||
|
* However if the TLB misses:
|
||||||
|
* We have to account for the time it took to search the TLB
|
||||||
|
* We then have to look in the page table to find the frame number
|
||||||
|
* Worst case scenario is a page fault (takes the longest). This is where we have to retrieve a page table from secondary memory, so that it can then be searched.
|
||||||
|
|
||||||
|
> Quick maths:
|
||||||
|
>
|
||||||
|
> * Assume a single-level page table
|
||||||
|
>
|
||||||
|
> * Assume 20ns associative **TLB lookup time**
|
||||||
|
>
|
||||||
|
> * Assume a 100ns **memory access time**
|
||||||
|
>
|
||||||
|
> * **TLB hit** => 20 + 100 = 120ns
|
||||||
|
> * **TLB miss** => 20 + 100 + 100 = 220ns
|
||||||
|
>
|
||||||
|
> * Performance evaluation of TLBs
|
||||||
|
>
|
||||||
|
> * For an 80% hit rate, the estimated access time is:
|
||||||
|
> $$
|
||||||
|
> 120\cdot 0.8 + 220\cdot (1-0.8)=140ns
|
||||||
|
> $$
|
||||||
|
> (**40% slowdown** relative to absolute addressing)
|
||||||
|
>
|
||||||
|
> * For a 98% hit rate, the estimated access time is:
|
||||||
|
> $$
|
||||||
|
> 120\cdot 0.98 + 220\cdot (1-0.98)=122ns
|
||||||
|
> $$
|
||||||
|
> (**22% slowdown**)
|
||||||
|
>
|
||||||
|
> NOTE: **page tables** can be **held in virtual memory** => **further slow down** due to **page faults**.
|
||||||
|
|
||||||
|
### Inverted Page Tables
|
||||||
|
|
||||||
|
A **normal page table size** is proportional to the number of pages in the virtual address space => this can be prohibitive for modern machines
|
||||||
|
|
||||||
|
>An **inverted page table's size** is **proportional** to the size of **main memory**
|
||||||
|
>
|
||||||
|
>* The inverted table contains one **entry for every frame** (not for every page) and it **indexes entries by frame number** not by page number.
|
||||||
|
>* When a process references a page, the OS must search the entire inverted page table for the corresponding entry (which could be too slow)
|
||||||
|
> * It does save memory as there are fewer frames than pages.
|
||||||
|
>* To find if your pages is in main memory, you need to iterate through the entire list.
|
||||||
|
>* *Solution*: Use a **hash function** that transforms page numbers (*n* bits) into frame numbers (*m* bits) - Remember *n* > *m*
|
||||||
|
> * The has functions turns a page number into a potential frame number.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
So when looking for the page's frame location. We have to sequentially search through the table until we hit a match, we then get the frame number from the index - in this case 4.
|
||||||
|
|
||||||
|
#### Inverted Page Table Entry
|
||||||
|
|
||||||
|
> * The **frame number** will be the index of the inverted page table.
|
||||||
|
> * Process Identifier (**PID**) - The process that owns this page.
|
||||||
|
> * Virtual Page Number (**VPN**)
|
||||||
|
> * **Protection** bits (Read/Write/Execute)
|
||||||
|
> * **Chaining Pointer** - This field points towards the next frame that has exactly the same VPN. We need this to solve collisions
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Due to the hash function, we now only have to look through all entries with **VPN**: 1 instead of all the entries.
|
||||||
|
|
||||||
|
#### Advantages
|
||||||
|
|
||||||
|
* The OS maintains a **single inverted page table** for all processes
|
||||||
|
* It **saves lots of space** (especially when the virtual address space is much larger than the physical memory)
|
||||||
|
|
||||||
|
#### Disadvantages
|
||||||
|
|
||||||
|
* Virtual to physical **translation becomes much slower**
|
||||||
|
* Hash tables eliminates the need of searching the whole inverted table, but we have to handle collisions (which also **slows down translation**)
|
||||||
|
* TLBs are necessary to improve their performance.
|
||||||
|
|
||||||
|
### Page Loading
|
||||||
|
|
||||||
|
* Two key decisions have to be made when using virtual memory
|
||||||
|
* What pages are **loaded** and when
|
||||||
|
* Predictions can be made for optimisation to reduce page faults
|
||||||
|
* What pages are **removed** from memory and when
|
||||||
|
* **page replacement algorithms**
|
||||||
|
|
||||||
|
#### Demand Paging
|
||||||
|
|
||||||
|
> Demand paging starts the process with **no pages in memory**
|
||||||
|
>
|
||||||
|
> * The first instruction will immediately cause a **page fault**.
|
||||||
|
> * **More page faults** will follow but they will **stabilise over time** until moving to the next **locality**
|
||||||
|
> * The set of pages that is currently being used is called it's **working set** (same as the resident set)
|
||||||
|
> * Pages are only **loaded when needed** (i.e after **page faults**)
|
||||||
|
|
||||||
|
#### Pre-Paging
|
||||||
|
|
||||||
|
> When the process is started, all pages expected to be used (the working set) are **brought into memory at once**
|
||||||
|
>
|
||||||
|
> * This **reduces the page fault rate**
|
||||||
|
> * Retrieving multiple (**contiguously stored**) pages **reduces transfer times** (seek time, rotational latency, etc)
|
||||||
|
>
|
||||||
|
> **Pre-paging** loads as many pages as possible **before page faults are generated** (a similar method is used when processes are **swapped in and out**)
|
||||||
|
|
||||||
|
*ma*: memory access time *p*: page fault rate *pft*: page fault time
|
||||||
|
|
||||||
|
**Effective access time** is given by: $T_{a} = (1-p)\cdot ma+pft\cdot p$
|
||||||
|
|
||||||
|
NOTE: This doesn't take into account TLBs.
|
||||||
|
|
||||||
|
The expected access time is **proportional to page fault rate** when keeping page faults into account.
|
||||||
|
$$
|
||||||
|
T_{a} \space\space\alpha \space\space p
|
||||||
|
$$
|
||||||
|
|
||||||
|
* Ideally, all pages would have to be loaded without demanding paging.
|
||||||
|
|
||||||
|
### Page Replacement
|
||||||
|
|
||||||
|
> * The OS must choose a **page to remove** when a new one is loaded
|
||||||
|
> * This choice is made by **page replacement algorithms** and **takes into account**:
|
||||||
|
> * When the page was **last used** or **expected to be used again**
|
||||||
|
> * Whether the page has been **modified** (this would cause a write).
|
||||||
|
> * Replacement choices have to be made **intelligently** to **save time**.
|
||||||
|
|
||||||
|
#### Optimal Page Replacement
|
||||||
|
|
||||||
|
> * In an **ideal** world
|
||||||
|
> * Each page is labelled with the **number of instructions** that will be executed/length of time before it is used again.
|
||||||
|
> * The page which is **going to be not referenced** for the **longest time** is the optimal one to remove.
|
||||||
|
> * The **optimal approach** is **not possible to implement**
|
||||||
|
> * It can be used for post execution analysis
|
||||||
|
> * It provides a **lower bound** on the number of page faults (used for comparison with other algorithms)
|
||||||
|
|
||||||
|
#### FIFO
|
||||||
|
|
||||||
|
> * FIFO maintains a **linked list** of new pages, and **new pages** are added at the end of the list
|
||||||
|
> * The **oldest page at the head** of the list is **evicted when a page fault occurs**
|
||||||
|
>
|
||||||
|
> This is a pretty bad algorithm <s>unsurprisingly</s>
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Explanation at 53:40
|
||||||
|
|
||||||
|
Shaded squares on the top row are page faults. Shaded squares in the grid are when a new page is brought into memory.
|
||||||
|
|
||||||
176
docs/lectures/osc/14_mem_management6.md
Normal file
@@ -0,0 +1,176 @@
|
|||||||
|
13/11/20
|
||||||
|
|
||||||
|
## Virtual Memory & Potential Problems
|
||||||
|
|
||||||
|
#### Page Replacement
|
||||||
|
|
||||||
|
##### Second chance
|
||||||
|
|
||||||
|
> * If a page at the front of the list has **not been referenced** it is **evicted**
|
||||||
|
> * If the reference bit is set, the page is **placed at the end** of the list and it's reference bit is unset.
|
||||||
|
> * This works better than FIFO and is relatively simple
|
||||||
|
> * **Costly to implement** as the list is constantly changing.
|
||||||
|
> * Can degrade to FIFO if all pages were initially referenced.
|
||||||
|
|
||||||
|
##### Clock Replacement Algorithm
|
||||||
|
|
||||||
|
> The second chance implementation can be improved by **maintaining the page list as a circle**
|
||||||
|
>
|
||||||
|
> * A **pointer** points to the last visited page.
|
||||||
|
> * In this form the algorithm is called the one handed clock
|
||||||
|
> * It is faster, but can still be **slow if the list is long**.
|
||||||
|
> * The **time spent** on **maintaining** the list is **reduced**.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### Not Recently Used (NRU)
|
||||||
|
|
||||||
|
> For NRU, **referenced** and **modified** bits are kept in the page table
|
||||||
|
>
|
||||||
|
> * Referenced bits are set to 0 at the start, and **reset periodically**
|
||||||
|
>
|
||||||
|
> There are four different **page types** in NRU:
|
||||||
|
>
|
||||||
|
> 1. Not referenced recently, not modified
|
||||||
|
> 2. Not referenced recently, modified
|
||||||
|
> 3. Referenced recently, not modified
|
||||||
|
> 4. Referenced recently, modified
|
||||||
|
>
|
||||||
|
> **Page table entries** are inspected upon every **page fault**. This could be implemented in the following way
|
||||||
|
>
|
||||||
|
> 1. Find a page from **class 0** to be removed.
|
||||||
|
> 2. If step 1 fails, scan again looking for **class 1**. During this scan we set the reference bit to 0 on each page that is bypassed
|
||||||
|
> 3. If step 2 fails, start again from step 1 (Now we should find pages from class 2&3 have been moved to class 0 or 1)
|
||||||
|
>
|
||||||
|
> The NRU algorithm provides a **reasonable performance** and is easy to understand and implement.
|
||||||
|
|
||||||
|
##### Least Used Recently
|
||||||
|
|
||||||
|
> Least recently used **evicts the page** that has **not be used for the longest**
|
||||||
|
>
|
||||||
|
> * The OS must keep track of when a page was last used.
|
||||||
|
> * Every page table entry contains a field for the counter
|
||||||
|
> * This is **not cheap to implement** as we need to maintain a **list of pages** which are **sorted** in the order in which they have been used.
|
||||||
|
>
|
||||||
|
> This algorithm can be **implemented in hardware** using a **counter** that is incremented after each instruction ...
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
This will look familiar to the FIFO algorithm however, when a page is used, that is like its just come in.
|
||||||
|
|
||||||
|
|
||||||
|
### Resident Set
|
||||||
|
|
||||||
|
How many pages should be allocated to individual processes:
|
||||||
|
|
||||||
|
* **Small resident sets** enable to store **more processes in memory** => improved CPU utilisation.
|
||||||
|
* **Small resident sets** may result in **more page faults**
|
||||||
|
* **Large resident sets** may **no longer reduce** the **page fault rate** (**diminishing returns**)
|
||||||
|
|
||||||
|
A trade-off exists between the **sizes of the resident sets** and **system utilisation**.
|
||||||
|
|
||||||
|
Resident set sizes may be **fixed** or **variable** (adjusted at run-time)
|
||||||
|
|
||||||
|
* For **variable sized** resident sets, **replacement policies** can be:
|
||||||
|
* **Local**: a page of the same process is replaced
|
||||||
|
* **Global**: a page can be taken away from a **different process**
|
||||||
|
* Variable sized sets require **careful evaluation of their size** when a **local scope** is used (often based on the **working set** or the **page fault rate**)
|
||||||
|
|
||||||
|
### Working Set
|
||||||
|
|
||||||
|
The **resident set** comprises the set of pages of the process that are in memory (they have a corresponding frame)
|
||||||
|
|
||||||
|
The **working set** is a subset of the resident set that is actually needed for execution.
|
||||||
|
|
||||||
|
* The **working set** $W(t, k)$ comprises the set of referenced pages in the last $k$ (working set window) **virtual time units for the process**.
|
||||||
|
* $k$ can be defined as **memory references** or as **actual process time**
|
||||||
|
* The set of most recent used pages
|
||||||
|
* The set of pages used within a pre-specified time interval
|
||||||
|
* The **working set size** can be used as a guide for the number of frames that should be allocated to a process.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The working set is a **function of time** $t$:
|
||||||
|
|
||||||
|
* Processes **move between localities**, hence, the pages that are included in the working set **change over time**
|
||||||
|
* **Stable** intervals alternate with intervals of **rapid change**
|
||||||
|
|
||||||
|
$|W(t,k)|$ is then a variable in time. Specifically:
|
||||||
|
$$
|
||||||
|
1\le |W(t,k)| \le min(k, N)
|
||||||
|
$$
|
||||||
|
where $N$ is the total number of pages of the process. All the maths is saying is that the size of the working set can be as small as **one** or as large as **all the pages in the process**.
|
||||||
|
|
||||||
|
Choosing the right value for $k$ is important:
|
||||||
|
|
||||||
|
* Too **small**: inaccurate, pages are missing
|
||||||
|
* Too **large**: too many unused pages present
|
||||||
|
* **Infinity**: all pages of the process are in the working set
|
||||||
|
|
||||||
|
Working sets can be used to guide the **size of the resident sets**
|
||||||
|
|
||||||
|
* Monitor the working set
|
||||||
|
* Remove pages from the resident set that are not in the working set
|
||||||
|
|
||||||
|
The working set is costly to maintain => **page fault frequency (PFF)** can be used as an approximation: $PFF\space\alpha\space k$
|
||||||
|
|
||||||
|
* If the PFF is increased -> we need to increase $k$
|
||||||
|
* If PFF is very low -> we could decrease $k$ to allow more processes to have more pages.
|
||||||
|
|
||||||
|
#### Global Replacement
|
||||||
|
|
||||||
|
> Global replacement policies can select frames from the entire set (they can be taken from other processes)
|
||||||
|
>
|
||||||
|
> * Frames are **allocated dynamically** to processes
|
||||||
|
> * Processes cannot control their own page fault frequency. The PFF of one process is **influenced by other processes**.
|
||||||
|
|
||||||
|
#### Local Replacement
|
||||||
|
|
||||||
|
> Local replacement policies can only select frames that are allocated to the current process
|
||||||
|
>
|
||||||
|
> * Every process has a **fixed fraction of memory**
|
||||||
|
> * The **locally oldest page** is not necessarily the **globally oldest page**
|
||||||
|
|
||||||
|
Windows uses a variable approach with local replacement. Page replacement algorithms can use both policies.
|
||||||
|
|
||||||
|
|
||||||
|
### Paging Daemon
|
||||||
|
|
||||||
|
It is more efficient to **proactively** keep a number of **free pages** for **future page faults**
|
||||||
|
|
||||||
|
* If not, we may have to **find a page** to evict and we **write it to the drive** (if its been modified) first when a page fault occurs.
|
||||||
|
|
||||||
|
Many systems have a background process called a **paging daemon**.
|
||||||
|
|
||||||
|
* This process **runs at periodic intervals**
|
||||||
|
* It inspects the state of the frames and if too few frames are free, it **selects pages to evict** (using page replacement algorithms)
|
||||||
|
|
||||||
|
Paging daemons can be combined with **buffering** (free and modified lists) => write the modified pages **but keep them in main memory** when possible.
|
||||||
|
|
||||||
|
**Buffering**: a process that preemptively writes modified pages to the disk. That way when there's a page fault we don't lose the time taken to write to disk
|
||||||
|
|
||||||
|
|
||||||
|
### Thrashing
|
||||||
|
|
||||||
|
Assume **all available pages are in active use** and a new page needs to be loaded:
|
||||||
|
|
||||||
|
* The page that will be evicted will have to be **reloaded soon afterwards**
|
||||||
|
|
||||||
|
**Thrashing** occurs when pages are **swapped out** and then **loaded back in immediately**
|
||||||
|
|
||||||
|
#### Causes of thrashing include:
|
||||||
|
|
||||||
|
* The degree of multi-programming is too high i.e the total **demand** (the sum of all working sets sizes) **exceeds supply** (the available frames)
|
||||||
|
* An individual process is allocated **too few pages**
|
||||||
|
|
||||||
|
This can be prevented by **using good page replacement algorithms**, reducing the **degree of multi-programming** or adding more memory.
|
||||||
|
|
||||||
|
The **page fault frequency** can be used to detect that a system is thrashing.
|
||||||
|
|
||||||
|
> * CPU utilisation is too low => scheduler **increases degree of multi-programming**
|
||||||
|
> * Frames are allocated to new processes and taken away from existing processes
|
||||||
|
> * I/O requests are queued up as a consequence of page faults
|
||||||
|
>
|
||||||
|
> This is a positive reinforcement cycle.
|
||||||
|
|
||||||
|
And when all this comes together, its how memory management working in modern computers.
|
||||||
198
docs/lectures/osc/15_file_systems1.md
Normal file
@@ -0,0 +1,198 @@
|
|||||||
|
19/11/20
|
||||||
|
|
||||||
|
## Disk Scheduling
|
||||||
|
|
||||||
|
### Hard Drives
|
||||||
|
|
||||||
|
#### Construction of Hard Drives
|
||||||
|
|
||||||
|
> Disks are constructed as multiple aluminium/glass platters covered with **magnetisable material**
|
||||||
|
>
|
||||||
|
> * Read/Write heads fly just above the surface and are connected to a single disk arm controlled by a single actuator
|
||||||
|
> * **Data** is stored on **both sides**
|
||||||
|
> * Hard disks **rotate** at a **constant speed**
|
||||||
|
>
|
||||||
|
> A hard disk controller sits between the CPU and the drive
|
||||||
|
>
|
||||||
|
> Hard disks are currently about 4 orders of magnitude slower than main memory.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Low Level Format
|
||||||
|
|
||||||
|
> Disks are organised in:
|
||||||
|
>
|
||||||
|
> * **Cylinders**: a collection of tracks in the same relative position to the spindle
|
||||||
|
> * **Tracks**: a concentric circle on a single platter side
|
||||||
|
> * **Sectors**: segments of a track - usually have an **equal number of bytes** in them, consisting of a **preamble, data** and an **error correcting code** (ECC).
|
||||||
|
>
|
||||||
|
> The number of sectors on each track increases from the inner most track to the outer tracks.
|
||||||
|
|
||||||
|
##### Organisation of hard drives
|
||||||
|
|
||||||
|
Disks usually have a **cylinder skew** i.e an **offset** is added to sector 0 in adjacent tracks to account for the seek time.
|
||||||
|
|
||||||
|
In the past, consecutive **disk sectors were interleaved** to account for transfer time (of the read/write head)
|
||||||
|
|
||||||
|
NOTE: disk capacity is reduced due to preamble & ECC
|
||||||
|
|
||||||
|
#### Access times
|
||||||
|
|
||||||
|
**Access time** = seek time + rotational delay + transfer time
|
||||||
|
|
||||||
|
* **Seek time**: time needed to move the arm to the cylinder
|
||||||
|
|
||||||
|
* **Rotational latency**: time before the sector appears underneath the read/write head (on average its half a rotation)
|
||||||
|
|
||||||
|
* **Transfer time**: time to transfer the data
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Multiple requests may be happening at the same time (concurrently), so access time may be increased by **queuing time**
|
||||||
|
|
||||||
|
In this scenario, dominance of seek time leaves room for **optimisation** by carefully considering the order of read operations.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
The **estimated seek time** (i.e to move the arm from one track to another) is approximated by:
|
||||||
|
$$
|
||||||
|
T_{s} = n \times m + s
|
||||||
|
$$
|
||||||
|
In which $T_{s}$ denotes the estimated seek time, $n$ the **number of tracks** to be crossed, $m$ the **crossing time per track** and $s$ any **additional startup delay**.
|
||||||
|
|
||||||
|
> Let us assume a disk that rotates at 3600 rpm
|
||||||
|
>
|
||||||
|
> * One rotation = 16.7 ms
|
||||||
|
> * The average **rotational latency** $T_{r}$ is then 8.3 ms
|
||||||
|
>
|
||||||
|
> Let **b** denote the **number of bytes transferred**, **N** the **number of bytes per track**, and **rpm** the **rotation speed in rotations per minute**, the per track, the transfer time, $T_{t}$, is then given by:
|
||||||
|
> $$
|
||||||
|
> T_{t} = \frac b N \times \frac {ms\space per\space minute}{rpm}
|
||||||
|
> $$
|
||||||
|
> $N$ bytes take 1 revolution => $\frac{60000}{3600}$ ms = $\frac {ms\space per\space minute}{rpm}$
|
||||||
|
>
|
||||||
|
> $b$ contiguous bytes takes $\frac{b}{N}$ revolutions.
|
||||||
|
|
||||||
|
> Read a file of **size 256 sectors** with;
|
||||||
|
>
|
||||||
|
> * $T_{s}$ = 20 ms (average seek time)
|
||||||
|
> * 32 sectors per track
|
||||||
|
>
|
||||||
|
> Suppose the file is stored as compact as possible (its stored contiguously)
|
||||||
|
>
|
||||||
|
> * The first track takes: seek time + rotational delay + transfer time
|
||||||
|
> $20 + 8.3 + 16.7 = 45ms$
|
||||||
|
> * Assuming no cylinder skew and neglecting small seeks between tracks we only need to account for rotational delay + transfer time
|
||||||
|
> $8.3+16.7=25ms$
|
||||||
|
>
|
||||||
|
> The total time is $45+7\times 25 = 220ms = 0.22s$
|
||||||
|
|
||||||
|
> In case the access is not sequential but at **random for the sectors** we get:
|
||||||
|
>
|
||||||
|
> * Time per sector = $T_{s}+T_{r}+T_{t} = 20+8.3+0.5=28.8ms$
|
||||||
|
> $T_{t} = 16.7\times \frac {1}{32} = 0.5$
|
||||||
|
>
|
||||||
|
> It is important to **position the sectors carefully** and **avoid disk fragmentation**
|
||||||
|
|
||||||
|
### Disk Scheduling
|
||||||
|
|
||||||
|
The OS must use the hardware efficiently:
|
||||||
|
|
||||||
|
* The file system can **position/organise files strategically**
|
||||||
|
* Having **multiple disk requests** in a queue allows us to **minimise** the **arm movement**
|
||||||
|
|
||||||
|
Note that every I/O operation goes through a system call, allowing the **OS to intercept the request and re sequence it**.
|
||||||
|
|
||||||
|
If the drive **is free**, the request can be serviced immediately, if not the request is queued.
|
||||||
|
|
||||||
|
In a dynamic situation, several I/O requests will be **made over time** that are kept in a **table of requested sectors per cylinder.**
|
||||||
|
|
||||||
|
> Disk scheduling algorithms determine the order in which disk events are processed
|
||||||
|
|
||||||
|
#### First-Come First-Served
|
||||||
|
|
||||||
|
> Process the requests in the order that they arrive
|
||||||
|
>
|
||||||
|
> Consider the following sequence of disk requests
|
||||||
|
>
|
||||||
|
> `11 1 36 16 34 9 12`
|
||||||
|
>
|
||||||
|
> The total length is: `|11-1|+|1-36|+|36-16|+|16-34|+|34-9|+|9-12|=111`
|
||||||
|
>
|
||||||
|
> 
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
#### Shortest Seek Time First
|
||||||
|
|
||||||
|
> Selects the request that is closest to the current head position to reduce head movement
|
||||||
|
>
|
||||||
|
> * This allows us to gain **~50%** over FCFS
|
||||||
|
>
|
||||||
|
> Total length is: `|11-12|+|12-9|+|9-16|+|16-1|+|1-34|+|34-36|=61`
|
||||||
|
>
|
||||||
|
> 
|
||||||
|
>
|
||||||
|
> Disadvantages:
|
||||||
|
>
|
||||||
|
> * Could result in starvation:
|
||||||
|
> * The **arm stays in the middle of the disk** in case of heavy load, edge cylinders are poorly served - the strategy is biased
|
||||||
|
> * Continuously arriving requests for the same location could **starve other regions**
|
||||||
|
|
||||||
|
#### SCAN
|
||||||
|
|
||||||
|
> **Keep moving in the same direction** until end is reached
|
||||||
|
>
|
||||||
|
> * It continues in the current direction, **servicing all pending requests** as it passes over them
|
||||||
|
> * When it gets to the **last cylinder**, it **reverses direction** and **services pending requests**
|
||||||
|
>
|
||||||
|
> Total length: `|11-12|+|12-16|+|16-34|+|34-36|+|36-9|+|9-1|=60`
|
||||||
|
>
|
||||||
|
> 
|
||||||
|
>
|
||||||
|
> **Disadvantages**:
|
||||||
|
>
|
||||||
|
> * The **upper limit** on the waiting time is $2\space\times$ number of cylinders (no starvation)
|
||||||
|
> * The **middle cylinders are favoured** if the disk is heavily used.
|
||||||
|
|
||||||
|
##### C-SCAN
|
||||||
|
|
||||||
|
> Once the outer/inner side of the disk has been reached, the **requests at the other end of the disk** have been **waiting the longest**
|
||||||
|
>
|
||||||
|
> * SCAN can be improved by using a circular => C-SCAN
|
||||||
|
> * When the disk arm gets to the last cylinder of the disk, it **reverses direction** but **does not service requests** on the return.
|
||||||
|
> * It is **fairer** and equalises **response times on the disk**
|
||||||
|
>
|
||||||
|
> Total length: `|11-12|+|12-16|+|16-34|+|34-36|+|36-1|+|1-9|=68`
|
||||||
|
|
||||||
|
##### LOOK-SCAN
|
||||||
|
|
||||||
|
> Look-SCAN moves to the last cylinder containing **the first or last request** (as opposed to the first/last cylinder on the disk like SCAN)
|
||||||
|
>
|
||||||
|
> * However, seeks are **cylinder by cylinder** and one cylinder contains multiple tracks
|
||||||
|
> * It may happen that the arm "sticks" to a cylinder
|
||||||
|
|
||||||
|
##### N-Step SCAN
|
||||||
|
|
||||||
|
> Only services $N$ requests every sweep.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
[jay@diablo lecture_notes]$ cat /sys/block/sda/queue/scheduler
|
||||||
|
[mq-deadline] kyber bfq none
|
||||||
|
|
||||||
|
# noop: FCFS
|
||||||
|
# deadline: N-step-SCAN
|
||||||
|
# cfq: Complete Fairness Queueing (from linux)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Driver caching
|
||||||
|
|
||||||
|
For current drives, the time **required to seek a new cylinder** is more than the **rotational time**
|
||||||
|
|
||||||
|
* It makes sense to **read more sectors than actually required**
|
||||||
|
* **Read** sectors during rotational delay (the sectors that just so happen to pass under the control arm)
|
||||||
|
* **Modern controllers read multiple sectors** when asked for the data from one sector **track-at-a-time caching**.
|
||||||
|
|
||||||
|
### Scheduling on SSDs
|
||||||
|
|
||||||
|
SSDs don't have $T_{seek}$ or rotational delay, we can use FCFS (SSTF, SCAN etc may reduce performace due to no head to move).
|
||||||
173
docs/lectures/osc/16_file_systems2.md
Normal file
@@ -0,0 +1,173 @@
|
|||||||
|
20/11/20
|
||||||
|
|
||||||
|
## File System Views
|
||||||
|
|
||||||
|
### User View
|
||||||
|
|
||||||
|
A **user view** that defines a file system in terms of the **abstractions** that the operating system provides
|
||||||
|
|
||||||
|
An **implementation view** that defined the file system in terms of its **low level implementation**
|
||||||
|
|
||||||
|
**Important aspects of the user view**
|
||||||
|
|
||||||
|
> * The **file abstraction** which **hides** implementation details from the user
|
||||||
|
> * File **naming policies**, user file **attributes** (size, protection, owner etc)
|
||||||
|
> * There are also **system attributes** for files (e.g. non-human readable, archive flag, temp flag)
|
||||||
|
> * **Directory structures** and organisation
|
||||||
|
> * **System calls** to interact with the file system
|
||||||
|
>
|
||||||
|
> The user view defines how the file system looks to regular users and relates to **abstractions**.
|
||||||
|
|
||||||
|
#### File Types
|
||||||
|
|
||||||
|
Many OS's support several types of file. Both windows and Unix have regular files and directories:
|
||||||
|
|
||||||
|
* **Regular files** contain user data in **ASCII** or **binary** format
|
||||||
|
* **Directories** group files together (but are files on an implementation level)
|
||||||
|
|
||||||
|
Unix also has character and block special files:
|
||||||
|
|
||||||
|
* **Character special files** are used to model **serial I/O devices** (keyboards, printers etc)
|
||||||
|
* **Block special files** are used to model drives
|
||||||
|
|
||||||
|
### System Calls
|
||||||
|
|
||||||
|
File Control Blocks (FCBs) are kernel data structures (they are protected and only accessible in kernel mode)
|
||||||
|
|
||||||
|
* Allowing user applications to access them directly could compromise their integrity
|
||||||
|
* System calls enable a **user application** to **ask the OS** to carry out an action on it's behalf (in kernel mode)
|
||||||
|
* There are **two different categories** of **system calls**
|
||||||
|
* **File manipulation**: `open()`, `close()`, `read()`, `write()` ...
|
||||||
|
* **Directory manipulation**: `create()`, `delete()`, `rename()`, `link()` ...
|
||||||
|
|
||||||
|
### File Structures
|
||||||
|
|
||||||
|
**Single level**: all files are in the same directory (good enough for basic consumer electronics)
|
||||||
|
|
||||||
|
**Two or multiple level directories**: tree structures
|
||||||
|
|
||||||
|
* **Absolute path name**: from the root of the file system
|
||||||
|
* **Relative path name**: the current working directory is used as the starting point
|
||||||
|
|
||||||
|
**Directed acyclic graph (DAG)**: allows files to be shared (links files or sub-directories) but **cycles are forbidden**
|
||||||
|
|
||||||
|
**Generic graph structure**: Links and cycles can exist.
|
||||||
|
|
||||||
|
The use of **DAG** and **generic graph structures** results in **significant complications** in the implementation
|
||||||
|
|
||||||
|
* Trees are a DAG with the restriction that a child can only have one parent and don't contain cycles.
|
||||||
|
|
||||||
|
When searching the file system:
|
||||||
|
|
||||||
|
* Cycles can result in **infinite loops**
|
||||||
|
* Sub-trees can be **traversed multiple times**
|
||||||
|
* Files have **multiple absolute file names**
|
||||||
|
* Deleting files becomes a lot more complicated
|
||||||
|
* Links may no longer point to a file
|
||||||
|
* Inaccessible cycles may exist
|
||||||
|
* A garbage collection scheme may be required to remove files that are no longer accessible from the file system tree.
|
||||||
|
|
||||||
|
#### Directory Implementations
|
||||||
|
|
||||||
|
Directories contain a list of **human readable file names** that are mapped onto **unique identifiers** and **disk locations**
|
||||||
|
|
||||||
|
* They provide a mapping of the logical file onto the physical location
|
||||||
|
|
||||||
|
Retrieving a file comes down to **searching the directory file** as fast as possible:
|
||||||
|
|
||||||
|
* A **simple random order of directory** entries might be insufficient (search time is linear as a function of the number of entries)
|
||||||
|
* Indexes or **hash tables** can be used.
|
||||||
|
* They can store all **file related attributes** (file name, disk address - Windows) or they can **contain a pointer** to the data structure that contains the details of the file (Unix)
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### System Calls
|
||||||
|
|
||||||
|
Similar to files, **directories** are manipulated using **system calls**
|
||||||
|
|
||||||
|
* `create/delete`: new directory is created/deleted.
|
||||||
|
* `opendir, closeddir`: add/free directory to/from internal tables
|
||||||
|
* `readdir`: return the next entry in the directory file
|
||||||
|
|
||||||
|
**Directories** are **special files** that **group files** together and of which the **structure is defined** by the **file system**
|
||||||
|
|
||||||
|
* A bit is set to indicate that they are directories
|
||||||
|
* In Linux when you create a directory, two files are in that directory that the user has no control over. These files are represented as `.` and `..`
|
||||||
|
* `.` - a file dealing with file permissions
|
||||||
|
* `..` - represents the parent directory (`cd ..`)
|
||||||
|
|
||||||
|
##### Implementation
|
||||||
|
|
||||||
|
> Regardless of the type of file system, a number of **additional considerations** need to be made
|
||||||
|
>
|
||||||
|
> * **Disk Partitions**, **partition tables**, **boot sectors** etc
|
||||||
|
> * Free **space management**
|
||||||
|
> * System wide and per process **file tables**
|
||||||
|
>
|
||||||
|
> **Low level formatting** writes sectors to the disk
|
||||||
|
>
|
||||||
|
> **High level formatting** imposes a file system on top of this (using **blocks** that can cover multiple **sectors**)
|
||||||
|
|
||||||
|
### Partitions
|
||||||
|
|
||||||
|
Disks are usually divided into **multiple partitions**
|
||||||
|
|
||||||
|
* An independent file system may exist on each partiton
|
||||||
|
|
||||||
|
**Master Boot Record**
|
||||||
|
|
||||||
|
* Located as the start of the entire drive
|
||||||
|
* Used to boot the computer (BIOS reads and executes MBR)
|
||||||
|
* Contains **partition table** at its end with **active partition**.
|
||||||
|
* One partition is listed as **active** containing a boot block to load the operating system.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
#### Unix Partition
|
||||||
|
|
||||||
|
> The partition contains
|
||||||
|
>
|
||||||
|
> * The partition **boot block**:
|
||||||
|
> * Contains code to boot the OS
|
||||||
|
> * Every partition has a boot block - even if it does not contain an OS
|
||||||
|
> * **Super block** contains the partitions details e.g. partition size, number of blocks, I-node table etc
|
||||||
|
> * **Free space management** contains a bitmap or linked list that indicates the free blocks.
|
||||||
|
> * A linked list of disk blocks (also known as grouping)
|
||||||
|
> * We use free blocks to hold the **number of the free blocks**. Since the free list shrinks when the disk becomes full, this is not wasted space
|
||||||
|
> * **Blocks are linked together**. The size of the list **grows with the size of the disk** and **shrinks with the size of the blocks**
|
||||||
|
> * Linked lists can be modified by **keeping track of the number of consecutive free blocks** for each entry (known as counting)
|
||||||
|
> * **I-Nodes**: An array of data structures, one per file, telling all about the files
|
||||||
|
> * **Root directory**: the top of the file-system tree
|
||||||
|
> * **Data**: files and directories
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Free space management with linked list (on the left) and bitmaps (on the right)
|
||||||
|
|
||||||
|
**Bitmaps**
|
||||||
|
|
||||||
|
* Require extra space
|
||||||
|
* Keeping it in main memory is possible but only for small disk
|
||||||
|
|
||||||
|
**Linked lists**
|
||||||
|
|
||||||
|
* No wasted disk space
|
||||||
|
* We only need to keep in memory one block of pointers (load a new block when needed)
|
||||||
|
|
||||||
|
Apart from the free space memory tables, there is a number of key data structures stored in memory:
|
||||||
|
|
||||||
|
* An in-memory mount table (table with different partitions that have been mounted)
|
||||||
|
* An in-memory directory cache of recently accessed directory information
|
||||||
|
* A **system-wide open file table**, containing a copy of the FCB for every currently open file in the system, including location on disk, file size and **open count** (number of processes that use the file)
|
||||||
|
* A **per-process open file table**, containing a pointer to the system open file table.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
91
docs/lectures/osc/17_file_systems3.md
Normal file
@@ -0,0 +1,91 @@
|
|||||||
|
26/11/20
|
||||||
|
|
||||||
|
## File System Implementation
|
||||||
|
|
||||||
|
1. Contiguous
|
||||||
|
2. Linked Lists
|
||||||
|
3. File Allocation Table (FAT)
|
||||||
|
4. I-nodes (lookups)
|
||||||
|
|
||||||
|
### File access
|
||||||
|
|
||||||
|
Files will be composed of a number of blocks. Files are **sequential** or **random access**. Random access is essential for example in database systems.
|
||||||
|
|
||||||
|
#### Contiguous Allocation
|
||||||
|
|
||||||
|
**Contiguous file systems** are similar to **dynamic partitioning** in memory allocation.
|
||||||
|
|
||||||
|
> Each file is stored in a single group of **adjacent blocks** on the hard disk
|
||||||
|
>
|
||||||
|
> Allocation of free space can be done using **first fit, best fit, next fit**.
|
||||||
|
>
|
||||||
|
> * However when files are removed, this can lead to external fragmentation.
|
||||||
|
>
|
||||||
|
> **Advantages**
|
||||||
|
>
|
||||||
|
> * **Simple** to implement - only location of the first block and the length of the file must be stored
|
||||||
|
> * **Optimal read/write performance** - blocks are clustered in nearby sectors, hence the seek time (of the hard drive) is minimised
|
||||||
|
>
|
||||||
|
> **Disadvantages**
|
||||||
|
>
|
||||||
|
> * The **exact size** is not known before hand (what if the file size exceeds the initially allocated disk space)
|
||||||
|
> * **Allocation algorithms** needed to decide which free blocks to allocate to a given file
|
||||||
|
> * Deleting a file results in **external fragmentation**
|
||||||
|
>
|
||||||
|
> Contiguous allocation is still in use in **CD-ROMS & DVDs**
|
||||||
|
>
|
||||||
|
> * External fragmentation isn't an issue here as files are written once.
|
||||||
|
|
||||||
|
#### Linked List Allocation
|
||||||
|
|
||||||
|
To avoid external fragmentation, files are stored in **separate blocks** that are **linked**.
|
||||||
|
|
||||||
|
> Only the address of the first block has to be stored to locate a file
|
||||||
|
>
|
||||||
|
> * Each block contains a **data pointer** to the next block
|
||||||
|
>
|
||||||
|
> **Advantages**
|
||||||
|
>
|
||||||
|
> * Easy to maintain (only the first block needs to be maintained in directory entry)
|
||||||
|
> * File sizes can **grow and shrink dynamically**
|
||||||
|
> * There is **no external fragmentation** - every possible block/sector is used (can be used)
|
||||||
|
> * Sequential access is straight forward - although **more seek operations** required
|
||||||
|
>
|
||||||
|
> **Disadvantages**
|
||||||
|
>
|
||||||
|
> * **Random access is very slow**, to retrieve a block in the middle, one has to walk through the list from the start
|
||||||
|
> * There is some **internal fragmentation** - on average the last half of the block is left unused
|
||||||
|
> * Internal fragmentation will reduce for **smaller block sizes**
|
||||||
|
> * However **larger blocks** will be **faster**
|
||||||
|
> * Space for data is lost within the blocks due to the pointer, the data in a **block is no longer a power of 2**
|
||||||
|
> * **Diminished reliability**: if one block is corrupted/lost, access to the rest of the file is lost.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
##### File Allocation Tables
|
||||||
|
|
||||||
|
* Store the linked-list pointers in a **separate index table** called a **file allocation table** in memory.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
> **Advantages**
|
||||||
|
>
|
||||||
|
> * **Block size remains power of 2** - no more space is lost to the pointer
|
||||||
|
> * **Index table** can be kept in memory allowing fast non-sequential access
|
||||||
|
>
|
||||||
|
> **Disadvantages**
|
||||||
|
>
|
||||||
|
> * The size of the file allocation table grows with the number of blocks, and hence the size of the disk
|
||||||
|
> * For a 200GB disk, with 1KB block size, 200 million entries are required, assuming that each entry at the table occupies 4 bytes, this required 800MB of main memory.
|
||||||
|
|
||||||
|
#### I-Nodes
|
||||||
|
|
||||||
|
Each file has a small data structure (on disk) called an **I-node** (index-node) that contains it's attributes and block pointers
|
||||||
|
|
||||||
|
> In contrast to FAT, an I-node is **only loaded when a file is open**
|
||||||
|
>
|
||||||
|
> If every I-node consists of $n$ bytes, and at most $k$ files can be open at any point in time, at most $n\times k$ bytes of main memory are required.
|
||||||
|
|
||||||
|
I-nodes are composed of **direct block pointers** (usually 10) **indirect block pointers** or a combination thereof.
|
||||||
|
|
||||||
|

|
||||||
BIN
docs/lectures/osc/assets/1.png
Normal file
|
After Width: | Height: | Size: 60 KiB |
BIN
docs/lectures/osc/assets/2.png
Normal file
|
After Width: | Height: | Size: 127 KiB |
BIN
docs/lectures/osc/assets/3.png
Normal file
|
After Width: | Height: | Size: 118 KiB |
BIN
docs/lectures/osc/assets/4.png
Normal file
|
After Width: | Height: | Size: 122 KiB |
BIN
docs/lectures/osc/assets/5.png
Normal file
|
After Width: | Height: | Size: 123 KiB |
BIN
docs/lectures/osc/assets/6.png
Normal file
|
After Width: | Height: | Size: 318 KiB |
BIN
docs/lectures/osc/assets/7.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
docs/lectures/osc/assets/9.png
Normal file
|
After Width: | Height: | Size: 78 KiB |
BIN
docs/lectures/osc/assets/A.png
Normal file
|
After Width: | Height: | Size: 62 KiB |
BIN
docs/lectures/osc/assets/B.png
Normal file
|
After Width: | Height: | Size: 23 KiB |
BIN
docs/lectures/osc/assets/C.png
Normal file
|
After Width: | Height: | Size: 24 KiB |
BIN
docs/lectures/osc/assets/E.png
Normal file
|
After Width: | Height: | Size: 68 KiB |
BIN
docs/lectures/osc/assets/G.png
Normal file
|
After Width: | Height: | Size: 92 KiB |
BIN
docs/lectures/osc/assets/I.png
Normal file
|
After Width: | Height: | Size: 115 KiB |
BIN
docs/lectures/osc/assets/L.png
Normal file
|
After Width: | Height: | Size: 28 KiB |
BIN
docs/lectures/osc/assets/M.png
Normal file
|
After Width: | Height: | Size: 151 KiB |
BIN
docs/lectures/osc/assets/Q.png
Normal file
|
After Width: | Height: | Size: 33 KiB |
BIN
docs/lectures/osc/assets/S.png
Normal file
|
After Width: | Height: | Size: 115 KiB |
BIN
docs/lectures/osc/assets/U.png
Normal file
|
After Width: | Height: | Size: 101 KiB |
BIN
docs/lectures/osc/assets/V.png
Normal file
|
After Width: | Height: | Size: 67 KiB |
BIN
docs/lectures/osc/assets/X.png
Normal file
|
After Width: | Height: | Size: 62 KiB |
BIN
docs/lectures/osc/assets/Y.png
Normal file
|
After Width: | Height: | Size: 89 KiB |
BIN
docs/lectures/osc/assets/Z.png
Normal file
|
After Width: | Height: | Size: 248 KiB |
BIN
docs/lectures/osc/assets/a1.png
Normal file
|
After Width: | Height: | Size: 74 KiB |
BIN
docs/lectures/osc/assets/a2.png
Normal file
|
After Width: | Height: | Size: 75 KiB |
BIN
docs/lectures/osc/assets/a3.png
Normal file
|
After Width: | Height: | Size: 69 KiB |
BIN
docs/lectures/osc/assets/a4.png
Normal file
|
After Width: | Height: | Size: 50 KiB |
BIN
docs/lectures/osc/assets/a5.png
Normal file
|
After Width: | Height: | Size: 144 KiB |
BIN
docs/lectures/osc/assets/a6.png
Normal file
|
After Width: | Height: | Size: 103 KiB |
BIN
docs/lectures/osc/assets/a7.png
Normal file
|
After Width: | Height: | Size: 38 KiB |
BIN
docs/lectures/osc/assets/a8.png
Normal file
|
After Width: | Height: | Size: 42 KiB |
BIN
docs/lectures/osc/assets/a9.png
Normal file
|
After Width: | Height: | Size: 44 KiB |
BIN
docs/lectures/osc/assets/b1.png
Normal file
|
After Width: | Height: | Size: 45 KiB |
BIN
docs/lectures/osc/assets/b2.png
Normal file
|
After Width: | Height: | Size: 56 KiB |
BIN
docs/lectures/osc/assets/b3.png
Normal file
|
After Width: | Height: | Size: 25 KiB |
BIN
docs/lectures/osc/assets/b4.png
Normal file
|
After Width: | Height: | Size: 23 KiB |
BIN
docs/lectures/osc/assets/b5.png
Normal file
|
After Width: | Height: | Size: 270 KiB |
BIN
docs/lectures/osc/assets/b6.png
Normal file
|
After Width: | Height: | Size: 49 KiB |
BIN
docs/lectures/osc/assets/b7.png
Normal file
|
After Width: | Height: | Size: 206 KiB |
BIN
docs/lectures/osc/assets/b8.png
Normal file
|
After Width: | Height: | Size: 46 KiB |
BIN
docs/lectures/osc/assets/b9.png
Normal file
|
After Width: | Height: | Size: 156 KiB |
BIN
docs/lectures/osc/assets/c1.png
Normal file
|
After Width: | Height: | Size: 124 KiB |
BIN
docs/lectures/osc/assets/d.png
Normal file
|
After Width: | Height: | Size: 86 KiB |
BIN
docs/lectures/osc/assets/f.png
Normal file
|
After Width: | Height: | Size: 64 KiB |
BIN
docs/lectures/osc/assets/h.png
Normal file
|
After Width: | Height: | Size: 72 KiB |
BIN
docs/lectures/osc/assets/j.png
Normal file
|
After Width: | Height: | Size: 35 KiB |
BIN
docs/lectures/osc/assets/k.png
Normal file
|
After Width: | Height: | Size: 210 KiB |
BIN
docs/lectures/osc/assets/p.png
Normal file
|
After Width: | Height: | Size: 155 KiB |
BIN
docs/lectures/osc/assets/r.png
Normal file
|
After Width: | Height: | Size: 130 KiB |
BIN
docs/lectures/osc/assets/t.png
Normal file
|
After Width: | Height: | Size: 136 KiB |
BIN
docs/lectures/osc/assets/w.png
Normal file
|
After Width: | Height: | Size: 49 KiB |
18
mkdocs.yml
@@ -77,3 +77,21 @@ nav:
|
|||||||
- Developing Maintainable Software:
|
- Developing Maintainable Software:
|
||||||
- Java Collections: lectures/dms/01_java_collections.md
|
- Java Collections: lectures/dms/01_java_collections.md
|
||||||
- UML Diagrams: lectures/dms/02_uml.md
|
- UML Diagrams: lectures/dms/02_uml.md
|
||||||
|
- Operating Systems and Concurreny:
|
||||||
|
- Scheduling Algorithms: lectures/osc/01_scheduling_algorithms.md
|
||||||
|
- Threads: lectures/osc/02_threads.md
|
||||||
|
- Processes 4: lectures/osc/03_processes4.md
|
||||||
|
- Concurrency 1: lectures/osc/04_concurrency1.md
|
||||||
|
- Concurrency 2: lectures/osc/05_concurrency2.md
|
||||||
|
- Concurrency 3: lectures/osc/06_concurrency3.md
|
||||||
|
- Concurrency 4: lectures/osc/07_concurrency4.md
|
||||||
|
- Concurrency 6: lectures/osc/08_concurrency6.md
|
||||||
|
- Memory Management 1: lectures/osc/09_mem_management1.md
|
||||||
|
- Memory Management 2: lectures/osc/10_mem_management2.md
|
||||||
|
- Memory Management 3: lectures/osc/11_mem_management3.md
|
||||||
|
- Memory Management 4: lectures/osc/12_mem_management4.md
|
||||||
|
- Memory Management 5: lectures/osc/13_mem_management5.md
|
||||||
|
- Memory Management 6: lectures/osc/14_mem_management6.md
|
||||||
|
- File Systems 1: lectures/osc/15_file_systems1.md
|
||||||
|
- File Systems 2: lectures/osc/16_file_systems2.md
|
||||||
|
- File Systems 3: lectures/osc/17_file_systems3.md
|
||||||
|
|||||||