Processes access physical memory directly, so they need to be relocated in order to not overlap in physical memory.
Can be done at program load time, but it is a bad idea:
- very slow
- Require extra info from program
To allow several programs to co-exist in memory we need:
- Logical organization
- Physical organization
For that, we have a new abstraction for memory: Address Space
Address Space: set of addresses that a process can use to address memory
- Defines where sections of data and code are located in 32 or 64 address space.
- Defines protection of such sections: ReadOnly, ReadWrite, Execute.
- Confined "private" addressing concept: requires form of address virtualization.
Address Space Example
Map each process address space onto a different part of physical memory.
Two registers: Base and Limit.
- Base: start address of a program in physical memory.
- Limit: length of the program
Only OS can modify Base and Limit.
For every memory access:
- Base is added to the address.
- Result is compared to Limit.
This can be done in hardware. So it doesn't significantly add to latency.
- Programs move in and out of memory
- Holes are created
- Holes can be combined → memory compaction
- What if a process needs more memory?
- If a hole is adjacent to the process, it is allocated to it
- Process has to be moved to a bigger hole
- Process suspended till enough memory is there
Bitmap and Linked List are universal methods used in OS and applications. Other methods employ Heaps.
Bitmap is slow to find k-consecutive 0s for a new process.
Linked List method consists of allocated and free memory segments. It is more convenient to use double-linked lists.
Considers blocks of memory only as 2^N.
Potential for fragmentation (drawback).
If no block of a size is available, it splits higher blocks into smaller blocks.
Easy to implement and fast:
O(log2(MaxBlockSz/MinBlockSz))e.g. 4K .. 128B = 2^(12-7) = 2^(5 steps)
Allocation at level 0
Free "X" at level 2 leading to coalescing
- Memory requirement unknown for most apps.
- Not enough memory: Having enough memory is not possible with current technology. How do you determine “enough”?
- Exploit and enforce one condition: Processor does not execute or access anything that is not in the memory (how would that even be possible ?) Enforce transparently (user is not involved)
Memory management brings processes into main memory for execution by the processor
- involves virtual memory
- based on paging and segmentation
But we can see that...
- All memory references are logical addresses in a process’s address space that are dynamically translated into physical addresses at run time
- An address space may be broken up into a number of pieces that don’t need to be contiguously located in main memory during execution.
So it is not necessary that all of the pieces of an address space be in main memory during execution. Only the ones that I am “currently” accessing
- Each program has its own address space
- This address space is divided into pages (e.g 4kB)
- Pages are mapped into physical memory chunks (called frames)
- By definition then
sizeof(page) == sizeof(frame)(the size is determined by the hardware manufacturer)
Main memory can act as a cache for the secondary storage (disk)
- illusion of having more physical memory
- program relocation
Structure of a Page Table Entry
Present bit: '1' if the values in this entry is valid, otherwise translation is invalid and an pagefault exception will be raised
Frame Number: this is the physical frame that is accessed based on the translation.
Protection bits: 'kernel' + 'w' specifies who and what can be done on this page if kernel bit is set then only the kernel can translate this page. If user accesses the page a 'privilege exception' will be raised. If writeprotect bit is set the page can only be read (load instruction). If attempted write (store instruction), a write protection exception is raised.
Reference bit: every time the page is accessed (load or store), the reference bit is set.
Modified bit: every time the page is written to (store), the modified bit is set.
Caching Disabled: required to access I/O devices, otherwise their content is in cpu cache
To reduce storage overhead in case of large memories.
For sparse address spaces (most processes) only few 2nd tables required
OS has to make sure no segment is allocated into high range of address space (63-48 bits). If bits are set, the MMU will raise an exception (this is really an OS bug then).
4 Level PageTable for 64-bit arch
The OS must create its page table and VMM management following the hardware definition.
- Hardware defines the frame size as 2^N
- (virtual) page size is typically equal frame size (otherwise it's a power-of-two multiple, but that is rare and we shall not base on this exception)
- Everything the OS manages is based on pagesize (hence framesize)
- You can compute all the semantics with some basic variables defined by the hardware:
- Virtual address range (e.g. 48-bit virtual address, means that the hardware only considers the 48-LSB bits from an virtual address for ld/st, all others raise a SEGV error)
- Frame size
- PTE size (e.g. 4byte or 8byte)
- From there you can determine the number of page table hierarchies, offsets
Framesize = 4KB→ 12bit offset to index into frame (12bits)
PTE size = 8Bytes→ 512 entries in each page table hierarchy (9bits/hierarchy)
Virtual address range = 48bits→
12 + N*9 = 48(N=4)
- The hardware does all the indexing as described before
- Mapping virtual to physical address must be fast: we can not always traverse the page table to get the VA -> PA mapping Solution: Translation Lookaside Buffer(TLB)
- If address space is large, page table will be large (but remember the sparsity, not fully populated) Solution: Multi-level page table
Observation: most programs tend to make a large number of references to a small number of pages over a period of time -> only fraction of the page table is heavily used ( data and instruction locality !!!)
TLB is the hardware cache inside the MMU. It caches PageTable translations (VA -> PA) and maps virtual to physical address without going to the page table (unless there’s a miss).
In case of TLB miss -> MMU accesses page table and load entry from pagetable to TLB. TLB misses occur more frequently than page faults.
- TLB entries are not written back on access, just record the R and M bits in the PTE (it is a cache after all) upon access
- On TLB capacity miss, if the entry is dirty, write it back to its associated PageTableEntry (PTE)
- On changes to the PageTable, potential entries in the TLB need to be flushed or invalidated
- "TLB invalidate" e.g. when mmap area disappears
- "TLB flush" write back when changes in TLB to PageTable (either global or per address) must be recorded, think when a “TLB invalidate” might be insufficient
Paging hardware with TLB
- One copy of read-only (reentrant) code shared among processes (i.e., text editors, compilers, window systems)
- Similar to multiple threads sharing the same process space
- Also useful for inter-process communication if sharing of read-write pages is allowed
Private code and data:
- Each process keeps a separate copy of the code and data
- The pages for the private code and data can appear anywhere in the logical address space
Copy-on-Write (COW) allows both parent and child processes to initially share the same pages in memory. If either process modifies a shared page, only then is the page copied
COW allows more efficient process creation as only modified pages are copied
- In general, free pages are allocated from a pool of zero-fill-on-demand pages
- Pool should always have free frames for fast demand page execution: Don’t want to have to free a frame when one is needed for processing on page fault
- Why zero-out a page before allocating it? -> so we get a known state of a page.
fork()system call has parent suspend and child using copy-on-write address space of parent
- Designed to have child call
- Very efficient
Each page labeled with the number of instructions that will be executed before this page is referenced.
Page with the highest label should be removed.
Impossible to implement (just used for theoretical evaluations)
- OS maintains a list of the pages currently in memory [that would be for instance frame table]
- The most recent arrival at the tail
- On a page fault, the page at the head is removed
- In lab3: for simplicity you implement FIFO with a simple round robin using a
HAND == pointerto the frametable
- Modification to FIFO
- Inspect the R bit of the oldest page
- If R=0 page is old and unused -> replace
- If R=1 then
- bit is cleared
- page is put at the end of the list
- the search continues
- If all pages have R=1, the algorithm degenerates to FIFO
Keep page frames on a circular list in the form of a clock
The hand points to the oldest uninspected page
When page fault occurs
- The page pointed to by the hand is inspected
- If R=0
- page evicted
- new page inserted into its place
- hand is advanced
- If R=1
- R is set to 0
- hand is advanced
This is essentially an implementation of 2nd-Chance
When a page fault occurs, the page the hand is pointing to is inspected. The action taken depends on the R bit:
R = 0: Evict the page
R = 1: Clear R and advance hand
Tail moves in front and “processes” candidates, writing optimistically modified pages back to swap and clear “M” bit and R bit
Two status bits with each page
- R: Set whenever the page is referenced (used)
- M: Set when the page is written / dirty
R and M bits are available in most computers implementing virtual memory
Those bits are updated with each memory reference
- Must be updated by hardware
- Reset only by the OS
Periodically (e.g. on each clock interrupt) the R bit is cleared
- To distinguish pages that have been referenced recently
Good approximation to optimal. When page fault occurs, replace the page that has been unused for the longest time. Realizable but not cheap.
Slow + Few machines (if any) have required hardware.
- 64-bit counter increment after each instruction
- Each page table entry has a field large enough to include the value of the counter
- After each memory reference, the value of the counter is stored in the corresponding page entry
- At page fault, the page with lowest value is discarded
Too expensive! Too slow!
- Machine with n page frames
- Hardware maintains a matrix of n x n bits
- Matrix initialized to all 0s
- Whenever page frame k is referenced:
- Set all bits of row k to 1
- Set all bits of column k to 0
- The row with lowest value is the LRU
LRU hardware implementation 2
Maintain the LRU order of accesses in frame list by hardware
Head: 4 <-> 2 <-> ... <-> 7 <-> 3 <-> 5
After accessing page 3:
Head: 3 <-> 4 <-> 2 <-> ... <-> 7 <-> 5
- Not Frequently Used (NFU) algorithm
- Software counter associated with each page, initially zero
- At some periodicity (e.g. a second), the OS scans all page table entries and adds the R bit to the counter of that PTE
- At page fault: the page with lowest counter is replaced
Still a lot of overhead ( not really practical either)
NRU never forgets anything -> high inertia
- shift counter right by 1
- add R bit as the leftmost bit
- reset R bit
This modified algorithm is called aging. Each bit in vector represents a period. The page whose counter is lowest is replaced at page replacement
Working set: the set of pages that a process is currently using
Thrashing: a program causing page faults at high rates (e.g. pagefaults/instructions metric)
- OS must keep track of which pages are in the working set
- Replacement algorithm: evict pages not in the working set
- Possible implementation (but expensive): working set = set of pages accessed in the last k memory references
- Approximations: working set = pages used in the last 100 msec or 1 sec (etc.)
Working set algorithm
Based on the clock algorithm and uses working set
- data structure: circular list of page frames (clock)
- Each entry contains: time of last use, R bit
- At page fault: page pointed by hand is examined
- If R = 1:
- Record current time, reset R
- Advance hand to next page
- If R = 0:
- If age > threshold and page is clean -> it is reclaimed
- If page is dirty -> write to disk is scheduled and hand advances (note in lab3, we skip this step for simplicity reasons, but think why this is advantageous?)
- Advance hand to next page
- Determine how large the program and data will be (initially)
- Create page table
- Allocate space in memory for page table
- Record info about page table and swap area in process table
- TLB flushed for current process
- MMU reset for the process
- Process table of next process made current
- OS releases the process page table
- Frees its pages and disk space
- 1.The hardware:
- Saves program counter
- Exception leads to kernel entry
- 2.An assembly routine saves general registers and calls OS.
- 3.OS tried to discover which virtual page is needed
- 4.OS checks address validation and protection and assign a page frame (page replacement may be needed)
- 5.If page frame selected is dirty
- Page scheduled to transfer to disk
- Frame marked as busy
- OS suspends the current process
- Context switch takes place
- 6.Once the page frame is clean
- OS looks up disk address where needed page is
- OS schedules a disk operation
- Faulting process still suspended
- 7.When disk interrupts indicates page has arrived
- OS updates page table
- 8.Faulting instruction is backed up to its original state before page fault and PC is reset to point to it.
- 9.Process is scheduled for execution and OS returns to the assembly routine.
- 10.The routine reloads registers and other state information and returns to user space.
Process issues a
syscall()to read a file into a buffer, the process suspended while waiting for I/O. New process starts executing, this other process gets a page fault. If paging algorithm is global there is a chance the page containing the buffer could be removed from memory, the I/O operation of the first process will write some data into the buffer and some other on the just-loaded page !
One solution: Locking (pinning) pages engaged in I/O so that they will not be removed by replacement algo
Swap area: not a normal file system on it.
Associates with each process the disk address of its swap area; store in the process table
Before process starts swap area must be initialized
- One way: copy all process image into swap area [static swap area]
- Another way: don’t copy anything and let the process swap out [dynamic]
Instead of disk partition, one or more preallocated files within the normal file system can be used [Windows uses this approach.]
Each process consists of many memory areas, aka:
- VMAs virtual memory areas.
- Examples: Heap, stack, code, data, ronly-data, etc.
Each has different characteristics
- Protection (executable, rw, rdonly)
- Fixed, can grow (up or down) [heap, stack]
Each process can have 10s-100s of these.
Maps a VMA -> file segment (see
- It’s contiguous
- On PageFault, it fetches the content from file at the appropriate offset into a virtual page of the address space
- On "swapout", writes back to file (different versions though)
Benifits: Can perform load/store operations vs input/output
- Cells are by default non-overlapping. Called VMA (Virtual Memory Areas)
- Organized as AVL trees
- Identify in O(logN) time during pgfault:
- Rebalanced when VMA is added or deleted
- Organized as balanced tree
- Node [start - end]
- On add/delete: rebalance
- Lookup in