Sophie

Sophie

distrib > Mandriva > 2010.1 > x86_64 > by-pkgid > 965e33040dd61030a94f0eb89877aee8 > files > 2768

howto-html-en-20080722-2mdv2010.1.noarch.rpm

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
<HEAD>
 <META NAME="GENERATOR" CONTENT="SGML-Tools 1.0.9">
 <TITLE>KernelAnalysis-HOWTO: Fundamentals</TITLE>
 <LINK HREF="KernelAnalysis-HOWTO-4.html" REL=next>
 <LINK HREF="KernelAnalysis-HOWTO-2.html" REL=previous>
 <LINK HREF="KernelAnalysis-HOWTO.html#toc3" REL=contents>
</HEAD>
<BODY>
<A HREF="KernelAnalysis-HOWTO-4.html">Next</A>
<A HREF="KernelAnalysis-HOWTO-2.html">Previous</A>
<A HREF="KernelAnalysis-HOWTO.html#toc3">Contents</A>
<HR>
<H2><A NAME="s3">3. Fundamentals</A></H2>

<H2><A NAME="ss3.1">3.1 What is the kernel?</A>
</H2>

<P>The kernel is the "core" of any computer system: it is the  "software"
which allows users to share computer resources.
<P>
<P>The kernel can be thought as the main software of the OS (Operating
System), which may also include graphics management. 
<P>
<P>For example, under Linux (like other Unix-like OSs), the XWindow
environment doesn't belong to the Linux Kernel, because it manages
only graphical operations (it uses user mode I/O to access video
card devices). 
<P>
<P>By contrast, Windows environments (Win9x, WinME, WinNT, Win2K,
WinXP, and so on) are a mix between a graphical environment and kernel.
<P>
<H2><A NAME="ss3.2">3.2 What is the difference between User Mode and Kernel Mode?</A>
</H2>

<H3>Overview</H3>

<P>Many years ago, when computers were as big as a room, users ran
their applications with much difficulty and, sometimes, their applications
crashed the computer. 
<P>
<H3>Operative modes</H3>

<P>To avoid having applications that constantly crashed, newer OSs
were designed with 2 different operative modes:
<P>
<P>
<OL>
<LI>Kernel Mode: the machine operates with critical data structure,
direct hardware (IN/OUT or memory mapped), direct memory, IRQ, DMA,
and so on.</LI>
<LI>User Mode: users can run applications.
</LI>
</OL>
<P>
<PRE>
                      
               |          Applications           /|\
               |         ______________           |
               |         | User Mode  |           |  
               |         ______________           | 
               |               |                  |  
Implementation |        _______ _______           |   Abstraction
    Detail     |        | Kernel Mode |           |
               |        _______________           |
               |               |                  |
               |               |                  | 
               |               |                  |
              \|/          Hardware               |
</PRE>
<P>Kernel Mode "prevents" User Mode applications from damaging the
system or its features.
<P>
<P>Modern microprocessors implement in hardware at least 2 different
states. For example under Intel, 4 states determine the PL (Privilege
Level). It is possible to use 0,1,2,3 states, with 0 used in Kernel
Mode. 
<P>
<P>Unix OS requires only 2 privilege levels, and we will use such
a paradigm as point of reference.
<P>
<H2><A NAME="ss3.3">3.3 Switching from User Mode to Kernel Mode</A>
</H2>

<H3>When do we switch?</H3>

<P>Once we understand that there are 2 different modes, we have
to know when we switch from one to the other.
<P>
<P>Typically, there are 2 points of switching:
<P>
<P>
<OL>
<LI>When calling a System Call: after calling a System Call, the
task voluntary calls pieces of code living in Kernel Mode</LI>
<LI>When an IRQ (or exception) comes: after the IRQ an IRQ handler
(or exception handler) is called, then control returns back to the
task that was interrupted like nothing was happened.
</LI>
</OL>
<H3>System Calls</H3>

<P>System calls are like special functions that manage OS routines
which live in Kernel Mode.
<P>
<P>A system call can be called when we:
<P>
<P>
<UL>
<LI>access an I/O device or a file (like read or write)</LI>
<LI>need to access privileged information (like pid, changing scheduling
policy or other information)</LI>
<LI>need to change execution context (like forking or executing some
other application) </LI>
<LI>need to execute a particular command (like ''chdir'', ''kill",
''brk'', or ''signal'')
</LI>
</UL>
<P>
<PRE>
                                 |                |
                         -------&gt;| System Call i  | (Accessing Devices)
|                |       |       |  [sys_read()]  |
| ...            |       |       |                |
| system_call(i) |--------       |                |
|   [read()]     |               |                |
| ...            |               |                |
| system_call(j) |--------       |                |  
|   [get_pid()]  |       |       |                |
| ...            |       -------&gt;| System Call j  | (Accessing kernel data structures)
|                |               |  [sys_getpid()]|
                                 |                | 
 
    USER MODE                        KERNEL MODE
 
  
                        Unix System Calls Working 
</PRE>
<P>System calls are almost the only interface used by User Mode
to talk with low level resources (hardware). The only exception to
this statement is when a process uses ''ioperm'' system call. In
this case a device can be accessed directly by User Mode process
(IRQs cannot be used).
<P>
<P>NOTE: Not every ''C'' function is a system call, only some of
them.
<P>
<P>Below is a list of System Calls under Linux Kernel 2.4.17, from
[ arch/i386/kernel/entry.S ]
<P>
<P>
<PRE>
        .long SYMBOL_NAME(sys_ni_syscall)       /* 0  -  old &quot;setup()&quot; system call*/
        .long SYMBOL_NAME(sys_exit)
        .long SYMBOL_NAME(sys_fork)
        .long SYMBOL_NAME(sys_read)
        .long SYMBOL_NAME(sys_write)
        .long SYMBOL_NAME(sys_open)             /* 5 */
        .long SYMBOL_NAME(sys_close)
        .long SYMBOL_NAME(sys_waitpid)
        .long SYMBOL_NAME(sys_creat)
        .long SYMBOL_NAME(sys_link)
        .long SYMBOL_NAME(sys_unlink)           /* 10 */
        .long SYMBOL_NAME(sys_execve)
        .long SYMBOL_NAME(sys_chdir)
        .long SYMBOL_NAME(sys_time)
        .long SYMBOL_NAME(sys_mknod)
        .long SYMBOL_NAME(sys_chmod)            /* 15 */
        .long SYMBOL_NAME(sys_lchown16)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old break syscall holder */
        .long SYMBOL_NAME(sys_stat)
        .long SYMBOL_NAME(sys_lseek)
        .long SYMBOL_NAME(sys_getpid)           /* 20 */
        .long SYMBOL_NAME(sys_mount)
        .long SYMBOL_NAME(sys_oldumount)
        .long SYMBOL_NAME(sys_setuid16)
        .long SYMBOL_NAME(sys_getuid16)
        .long SYMBOL_NAME(sys_stime)            /* 25 */
        .long SYMBOL_NAME(sys_ptrace)
        .long SYMBOL_NAME(sys_alarm)
        .long SYMBOL_NAME(sys_fstat)
        .long SYMBOL_NAME(sys_pause)
        .long SYMBOL_NAME(sys_utime)            /* 30 */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old stty syscall holder */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old gtty syscall holder */
        .long SYMBOL_NAME(sys_access)
        .long SYMBOL_NAME(sys_nice)
        .long SYMBOL_NAME(sys_ni_syscall)       /* 35 */                /* old ftime syscall holder */
        .long SYMBOL_NAME(sys_sync)
        .long SYMBOL_NAME(sys_kill)
        .long SYMBOL_NAME(sys_rename)
        .long SYMBOL_NAME(sys_mkdir)
        .long SYMBOL_NAME(sys_rmdir)            /* 40 */
        .long SYMBOL_NAME(sys_dup)
        .long SYMBOL_NAME(sys_pipe)
        .long SYMBOL_NAME(sys_times)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old prof syscall holder */
        .long SYMBOL_NAME(sys_brk)              /* 45 */
        .long SYMBOL_NAME(sys_setgid16)
        .long SYMBOL_NAME(sys_getgid16)
        .long SYMBOL_NAME(sys_signal)
        .long SYMBOL_NAME(sys_geteuid16)
        .long SYMBOL_NAME(sys_getegid16)        /* 50 */
        .long SYMBOL_NAME(sys_acct)
        .long SYMBOL_NAME(sys_umount)                                   /* recycled never used phys() */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old lock syscall holder */
        .long SYMBOL_NAME(sys_ioctl)
        .long SYMBOL_NAME(sys_fcntl)            /* 55 */
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old mpx syscall holder */
        .long SYMBOL_NAME(sys_setpgid)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old ulimit syscall holder */
        .long SYMBOL_NAME(sys_olduname)
        .long SYMBOL_NAME(sys_umask)            /* 60 */
        .long SYMBOL_NAME(sys_chroot)
        .long SYMBOL_NAME(sys_ustat)
        .long SYMBOL_NAME(sys_dup2)
        .long SYMBOL_NAME(sys_getppid)
        .long SYMBOL_NAME(sys_getpgrp)          /* 65 */
        .long SYMBOL_NAME(sys_setsid)
        .long SYMBOL_NAME(sys_sigaction)
        .long SYMBOL_NAME(sys_sgetmask)
        .long SYMBOL_NAME(sys_ssetmask)
        .long SYMBOL_NAME(sys_setreuid16)       /* 70 */
        .long SYMBOL_NAME(sys_setregid16)
        .long SYMBOL_NAME(sys_sigsuspend)
        .long SYMBOL_NAME(sys_sigpending)
        .long SYMBOL_NAME(sys_sethostname)
        .long SYMBOL_NAME(sys_setrlimit)        /* 75 */
        .long SYMBOL_NAME(sys_old_getrlimit)
        .long SYMBOL_NAME(sys_getrusage)
        .long SYMBOL_NAME(sys_gettimeofday)
        .long SYMBOL_NAME(sys_settimeofday)
        .long SYMBOL_NAME(sys_getgroups16)      /* 80 */
        .long SYMBOL_NAME(sys_setgroups16)
        .long SYMBOL_NAME(old_select)
        .long SYMBOL_NAME(sys_symlink)
        .long SYMBOL_NAME(sys_lstat)
        .long SYMBOL_NAME(sys_readlink)         /* 85 */
        .long SYMBOL_NAME(sys_uselib)
        .long SYMBOL_NAME(sys_swapon)
        .long SYMBOL_NAME(sys_reboot)
        .long SYMBOL_NAME(old_readdir)
        .long SYMBOL_NAME(old_mmap)             /* 90 */
        .long SYMBOL_NAME(sys_munmap)
        .long SYMBOL_NAME(sys_truncate)
        .long SYMBOL_NAME(sys_ftruncate)
        .long SYMBOL_NAME(sys_fchmod)
        .long SYMBOL_NAME(sys_fchown16)         /* 95 */
        .long SYMBOL_NAME(sys_getpriority)
        .long SYMBOL_NAME(sys_setpriority)
        .long SYMBOL_NAME(sys_ni_syscall)                               /* old profil syscall holder */
        .long SYMBOL_NAME(sys_statfs)
        .long SYMBOL_NAME(sys_fstatfs)          /* 100 */
        .long SYMBOL_NAME(sys_ioperm)
        .long SYMBOL_NAME(sys_socketcall)
        .long SYMBOL_NAME(sys_syslog)
        .long SYMBOL_NAME(sys_setitimer)
        .long SYMBOL_NAME(sys_getitimer)        /* 105 */
        .long SYMBOL_NAME(sys_newstat)
        .long SYMBOL_NAME(sys_newlstat)
        .long SYMBOL_NAME(sys_newfstat)
        .long SYMBOL_NAME(sys_uname)
        .long SYMBOL_NAME(sys_iopl)             /* 110 */
        .long SYMBOL_NAME(sys_vhangup)
        .long SYMBOL_NAME(sys_ni_syscall)       /* old &quot;idle&quot; system call */
        .long SYMBOL_NAME(sys_vm86old)
        .long SYMBOL_NAME(sys_wait4)
        .long SYMBOL_NAME(sys_swapoff)          /* 115 */
        .long SYMBOL_NAME(sys_sysinfo)
        .long SYMBOL_NAME(sys_ipc)
        .long SYMBOL_NAME(sys_fsync)
        .long SYMBOL_NAME(sys_sigreturn)
        .long SYMBOL_NAME(sys_clone)            /* 120 */
        .long SYMBOL_NAME(sys_setdomainname)
        .long SYMBOL_NAME(sys_newuname)
        .long SYMBOL_NAME(sys_modify_ldt)
        .long SYMBOL_NAME(sys_adjtimex)
        .long SYMBOL_NAME(sys_mprotect)         /* 125 */
        .long SYMBOL_NAME(sys_sigprocmask)
        .long SYMBOL_NAME(sys_create_module)
        .long SYMBOL_NAME(sys_init_module)
        .long SYMBOL_NAME(sys_delete_module)
        .long SYMBOL_NAME(sys_get_kernel_syms)  /* 130 */
        .long SYMBOL_NAME(sys_quotactl)
        .long SYMBOL_NAME(sys_getpgid)
        .long SYMBOL_NAME(sys_fchdir)
        .long SYMBOL_NAME(sys_bdflush)
        .long SYMBOL_NAME(sys_sysfs)            /* 135 */
        .long SYMBOL_NAME(sys_personality)
        .long SYMBOL_NAME(sys_ni_syscall)       /* for afs_syscall */
        .long SYMBOL_NAME(sys_setfsuid16)
        .long SYMBOL_NAME(sys_setfsgid16)
        .long SYMBOL_NAME(sys_llseek)           /* 140 */
        .long SYMBOL_NAME(sys_getdents)
        .long SYMBOL_NAME(sys_select)
        .long SYMBOL_NAME(sys_flock)
        .long SYMBOL_NAME(sys_msync)
        .long SYMBOL_NAME(sys_readv)            /* 145 */
        .long SYMBOL_NAME(sys_writev)
        .long SYMBOL_NAME(sys_getsid)
        .long SYMBOL_NAME(sys_fdatasync)
        .long SYMBOL_NAME(sys_sysctl)
        .long SYMBOL_NAME(sys_mlock)            /* 150 */
        .long SYMBOL_NAME(sys_munlock)
        .long SYMBOL_NAME(sys_mlockall)
        .long SYMBOL_NAME(sys_munlockall)
        .long SYMBOL_NAME(sys_sched_setparam)
        .long SYMBOL_NAME(sys_sched_getparam)   /* 155 */
        .long SYMBOL_NAME(sys_sched_setscheduler)
        .long SYMBOL_NAME(sys_sched_getscheduler)
        .long SYMBOL_NAME(sys_sched_yield)
        .long SYMBOL_NAME(sys_sched_get_priority_max)
        .long SYMBOL_NAME(sys_sched_get_priority_min)  /* 160 */
        .long SYMBOL_NAME(sys_sched_rr_get_interval)
        .long SYMBOL_NAME(sys_nanosleep)
        .long SYMBOL_NAME(sys_mremap)
        .long SYMBOL_NAME(sys_setresuid16)
        .long SYMBOL_NAME(sys_getresuid16)      /* 165 */
        .long SYMBOL_NAME(sys_vm86)
        .long SYMBOL_NAME(sys_query_module)
        .long SYMBOL_NAME(sys_poll)
        .long SYMBOL_NAME(sys_nfsservctl)
        .long SYMBOL_NAME(sys_setresgid16)      /* 170 */
        .long SYMBOL_NAME(sys_getresgid16)
        .long SYMBOL_NAME(sys_prctl)
        .long SYMBOL_NAME(sys_rt_sigreturn)
        .long SYMBOL_NAME(sys_rt_sigaction)
        .long SYMBOL_NAME(sys_rt_sigprocmask)   /* 175 */
        .long SYMBOL_NAME(sys_rt_sigpending)
        .long SYMBOL_NAME(sys_rt_sigtimedwait)
        .long SYMBOL_NAME(sys_rt_sigqueueinfo)
        .long SYMBOL_NAME(sys_rt_sigsuspend)
        .long SYMBOL_NAME(sys_pread)            /* 180 */
        .long SYMBOL_NAME(sys_pwrite)
        .long SYMBOL_NAME(sys_chown16)
        .long SYMBOL_NAME(sys_getcwd)
        .long SYMBOL_NAME(sys_capget)
        .long SYMBOL_NAME(sys_capset)           /* 185 */
        .long SYMBOL_NAME(sys_sigaltstack)
        .long SYMBOL_NAME(sys_sendfile)
        .long SYMBOL_NAME(sys_ni_syscall)               /* streams1 */
        .long SYMBOL_NAME(sys_ni_syscall)               /* streams2 */
        .long SYMBOL_NAME(sys_vfork)            /* 190 */
        .long SYMBOL_NAME(sys_getrlimit)
        .long SYMBOL_NAME(sys_mmap2)
        .long SYMBOL_NAME(sys_truncate64)
        .long SYMBOL_NAME(sys_ftruncate64)
        .long SYMBOL_NAME(sys_stat64)           /* 195 */
        .long SYMBOL_NAME(sys_lstat64)
        .long SYMBOL_NAME(sys_fstat64)
        .long SYMBOL_NAME(sys_lchown)
        .long SYMBOL_NAME(sys_getuid)
        .long SYMBOL_NAME(sys_getgid)           /* 200 */
        .long SYMBOL_NAME(sys_geteuid)
        .long SYMBOL_NAME(sys_getegid)
        .long SYMBOL_NAME(sys_setreuid)
        .long SYMBOL_NAME(sys_setregid)
        .long SYMBOL_NAME(sys_getgroups)        /* 205 */
        .long SYMBOL_NAME(sys_setgroups)
        .long SYMBOL_NAME(sys_fchown)
        .long SYMBOL_NAME(sys_setresuid)
        .long SYMBOL_NAME(sys_getresuid)
        .long SYMBOL_NAME(sys_setresgid)        /* 210 */
        .long SYMBOL_NAME(sys_getresgid)
        .long SYMBOL_NAME(sys_chown)
        .long SYMBOL_NAME(sys_setuid)
        .long SYMBOL_NAME(sys_setgid)
        .long SYMBOL_NAME(sys_setfsuid)         /* 215 */
        .long SYMBOL_NAME(sys_setfsgid)
        .long SYMBOL_NAME(sys_pivot_root)
        .long SYMBOL_NAME(sys_mincore)
        .long SYMBOL_NAME(sys_madvise)
        .long SYMBOL_NAME(sys_getdents64)       /* 220 */
        .long SYMBOL_NAME(sys_fcntl64)
        .long SYMBOL_NAME(sys_ni_syscall)       /* reserved for TUX */
        .long SYMBOL_NAME(sys_ni_syscall)       /* Reserved for Security */
        .long SYMBOL_NAME(sys_gettid)
        .long SYMBOL_NAME(sys_readahead)        /* 225 */


</PRE>
<H3>IRQ Event</H3>

<P>When an IRQ comes, the task that is running is interrupted in
order to service the IRQ Handler.
<P>
<P>After the IRQ is handled, control returns backs exactly to point
of interrupt, like nothing  happened.
<P>
<P>
<PRE>

           
              Running Task 
             |-----------|          (3)
NORMAL       |   |       | [break execution] IRQ Handler
EXECUTION (1)|   |       |     -------------&gt;|---------| 
             |  \|/      |     |             |  does   |         
 IRQ (2)----&gt;| ..        |-----&gt;             |  some   |      
             |   |       |&lt;-----             |  work   |       
BACK TO      |   |       |     |             |  ..(4). |
NORMAL    (6)|  \|/      |     &lt;-------------|_________|
EXECUTION    |___________|  [return to code]
                                    (5)
               USER MODE                     KERNEL MODE

         User-&gt;Kernel Mode Transition caused by IRQ event
     
</PRE>
<P>The numbered steps below refer to the sequence of events in the
diagram above:
<P>
<P>
<OL>
<LI>Process is executing</LI>
<LI>IRQ comes while the task is running.</LI>
<LI>Task is interrupted to call an "Interrupt handler".</LI>
<LI>The "Interrupt handler" code is executed.</LI>
<LI>Control returns back to task user mode (as if nothing happened)</LI>
<LI>Process returns back to normal execution
</LI>
</OL>
<P>Special interest has the Timer IRQ, coming every TIMER ms to
manage:
<P>
<P>
<OL>
<LI>Alarms</LI>
<LI>System and task counters (used by schedule to decide when stop
a process or for accounting)</LI>
<LI>Multitasking based on wake up mechanism after TIMESLICE time.
</LI>
</OL>
<H2><A NAME="ss3.4">3.4 Multitasking</A>
</H2>

<H3>Mechanism</H3>

<P>The key point of modern OSs is the "Task". The Task is an application
running in memory sharing all resources (included CPU and Memory)
with other Tasks.
<P>
<P>This "resource sharing" is managed by the  "Multitasking Mechanism".
The Multitasking Mechanism switches from one task to another after
a "timeslice" time. Users have the "illusion" that they own all resources.
We can also imagine a single user scenario, where a user can have
the "illusion" of running many tasks at the same time.
<P>
<P>To implement this multitasking, the task uses "the state" variable,
which can be:
<P>
<P>
<OL>
<LI>READY, ready for execution</LI>
<LI>BLOCKED, waiting for a resource
</LI>
</OL>
<P>The task state is managed by its presence in a relative list:
READY list and BLOCKED list.
<P>
<H3>Task Switching</H3>

<P>The movement from one task to another is called ''Task Switching''.
many computers have a hardware instruction which automatically performs
this operation. Task Switching occurs in the following cases:
<P>
<P>
<OL>
<LI>After Timeslice ends: we need to schedule a "Ready for execution"
task and give it access.</LI>
<LI>When a Task has to wait for a device: we need to schedule a new
task and switch to it *
</LI>
</OL>
<P>* We schedule another task to prevent   "Busy Form Waiting", which
occurs when we are waiting for a device instead performing other
work.
<P>
<P>Task Switching is managed by the "Schedule" entity.
<P>
<P>
<PRE>
 
Timer    |           |
 IRQ     |           |                            Schedule
  |      |           |                     ________________________
  |-----&gt;|   Task 1  |&lt;------------------&gt;|(1)Chooses a Ready Task |
  |      |           |                    |(2)Task Switching       |
  |      |___________|                    |________________________|   
  |      |           |                               /|\
  |      |           |                                | 
  |      |           |                                |
  |      |           |                                |
  |      |           |                                |      
  |-----&gt;|   Task 2  |&lt;-------------------------------|
  |      |           |                                |
  |      |___________|                                |
  .      .     .     .                                .
  .      .     .     .                                .
  .      .     .     .                                .
  |      |           |                                |
  |      |           |                                |
  ------&gt;|   Task N  |&lt;--------------------------------
         |           |
         |___________| 
    
            Task Switching based on TimeSlice
 
</PRE>
<P>A typical Timeslice for Linux is about 10 ms.
<P>
<P>
<PRE>

 

 |           |            
 |           | Resource    _____________________________
 |   Task 1  |-----------&gt;|(1) Enqueue Resource request |
 |           |  Access    |(2)  Mark Task as blocked    |
 |           |            |(3)  Choose a Ready Task     |
 |___________|            |(4)    Task Switching        |
                          |_____________________________|
                                       |
                                       |
 |           |                         |
 |           |                         |
 |   Task 2  |&lt;-------------------------
 |           |  
 |           |
 |___________|
 
     Task Switching based on Waiting for a Resource
 
</PRE>
<H2><A NAME="ss3.5">3.5 Microkernel vs Monolithic OS</A>
</H2>

<H3>Overview</H3>

<P>Until now we viewed so called Monolithic OS, but there is also
another kind of OS: ''Microkernel''.
<P>
<P>A Microkernel OS uses Tasks, not only for user mode processes,
but also as a real kernel manager, like Floppy-Task, HDD-Task, Net-Task
and so on. Some examples are Amoeba, and Mach. 
<P>
<H3>PROs and CONTROs of Microkernel OS </H3>

<P>PROS:
<P>
<P>
<UL>
<LI>OS is simpler to maintain because each Task manages a single
kind of operation. So if you want to modify networking, you modify
Net-Task (ideally, if it is not needed a structural update).
</LI>
</UL>
<P>CONS:
<P>
<P>
<UL>
<LI>Performances are worse than Monolithic OS, because you have to
add 2*TASK_SWITCH times (the first to enter the specific Task, the
second to go out from it).
</LI>
</UL>
<P>My personal opinion is that, Microkernels are a good didactic
example (like Minix) but they are not ''optimal'', so not really
suitable. Linux uses a few Tasks, called "Kernel Threads" to implement
a little microkernel structure (like kswapd, which is used to retrieve
memory pages from mass storage). In this case there are no problems
with perfomance because swapping is a very slow job.
<P>
<H2><A NAME="ss3.6">3.6 Networking</A>
</H2>

<H3>ISO OSI levels</H3>

<P>Standard ISO-OSI describes a network architecture with the following
levels:
<P>
<P>
<OL>
<LI>Physical level (examples: PPP and Ethernet)</LI>
<LI>Data-link level (examples: PPP and Ethernet)</LI>
<LI>Network level (examples: IP, and X.25)</LI>
<LI>Transport level (examples: TCP, UDP)</LI>
<LI>Session level (SSL)</LI>
<LI>Presentation level (FTP binary-ascii coding)</LI>
<LI>Application level (applications like Netscape)
</LI>
</OL>
<P>The first 2 levels listed above are often implemented in hardware.
Next levels are in software (or firmware for routers).
<P>
<P>Many protocols are used by an OS: one of these is TCP/IP (the
most important living on 3-4 levels).
<P>
<H3>What does the kernel?</H3>

<P>The kernel doesn't know anything (only addresses) about first
2 levels of ISO-OSI.
<P>
<P>In RX it:
<P>
<P>
<OL>
<LI>Manages handshake with low levels devices (like ethernet card
or modem) receiving "frames" from them.</LI>
<LI>Builds TCP/IP "packets" from "frames" (like Ethernet or PPP ones),
 </LI>
<LI>Convers ''packets'' in ''sockets'' passing them to the right
application (using port number) or</LI>
<LI>Forwards packets to the right queue
</LI>
</OL>
<P>
<PRE>
frames         packets              sockets
NIC ---------&gt; Kernel ----------&gt; Application
                  |    packets
                  --------------&gt; Forward
                        - RX - 
</PRE>
<P>In TX  stage it:
<P>
<P>
<OL>
<LI>Converts sockets or </LI>
<LI>Queues datas into TCP/IP ''packets''</LI>
<LI>Splits ''packets" into "frames" (like Ethernet or PPP ones)</LI>
<LI>Sends ''frames'' using HW drivers
</LI>
</OL>
<P>
<PRE>
sockets       packets                     frames
Application ---------&gt; Kernel ----------&gt; NIC
              packets     /|\    
Forward  -------------------
                        - TX -  


</PRE>
<H2><A NAME="ss3.7">3.7 Virtual Memory</A>
</H2>

<H3>Segmentation</H3>

<P>Segmentation is the first method to solve memory allocation problems:
it allows you to compile source code without caring where the application
will be placed in memory. As a matter of fact, this feature helps
applications developers to develop in a independent fashion from
the OS e also from the hardware.
<P>
<P>
<PRE>
     
            |       Stack        |
            |          |         |
            |         \|/        |
            |        Free        | 
            |         /|\        |     Segment &lt;---&gt; Process    
            |          |         |
            |        Heap        |
            | Data uninitialized |
            |  Data initialized  |
            |       Code         |
            |____________________|  
 
                   Segment  

</PRE>
<P>We can say that a segment is the logical entity of an application,
or the image of the application in memory.
<P>
<P>When programming, we don't care where our data is put in memory,
we only care about the offset inside our segment (our application).
<P>
<P>We use to assign a Segment to each Process and vice versa. In
Linux this is not true. Linux uses only 4 segments for either Kernel
and all Processes.
<P>
<H3>Problems of Segmentation</H3>

<P>
<PRE>
 
                                 ____________________
                          -----&gt;|                    |-----&gt;
                          | IN  |     Segment A      | OUT
 ____________________     |     |____________________|   
|                    |____|     |                    |   
|     Segment B      |          |     Segment B      |
|                    |____      |                    |   
|____________________|    |     |____________________|   
                          |     |     Segment C      |   
                          |     |____________________|
                          -----&gt;|     Segment D      |-----&gt; 
                            IN  |____________________| OUT 
 
                     Segmentation problem


</PRE>
<P>In the diagram above, we want to get exit processes A, and D
and enter process B. As we can see there is enough space for B, but
we cannot split it in 2 pieces, so we CANNOT load it (memory out).
<P>
<P>The reason this problem occurs is because pure segments are continuous
areas (because they are logical areas) and cannot be split.
<P>
<H3>Pagination</H3>

<P>
<PRE>
 
             ____________________
            |     Page 1         |
            |____________________|
            |     Page 2         |
            |____________________| 
            |      ..            |     Segment &lt;---&gt; Process    
            |____________________|
            |     Page n         |
            |____________________|
            |                    |
            |____________________|
            |                    |
            |____________________|  
 
                   Segment  
 
</PRE>
<P>Pagination splits memory in &quot;n&quot; pieces, each one with
a fixed length.
<P>
<P>A process may be loaded in one or more Pages. When memory is
freed, all pages are freed (see Segmentation Problem, before).
<P>
<P>Pagination is also used for another important purpose, "Swapping".
If a page is not present in physical memory then it generates an
EXCEPTION, that will make the Kernel search for a new page in storage
memory. This mechanism allow OS to load more applications than the
ones allowed by physical memory only.
<P>
<H3>Pagination Problem</H3>

<P>
<PRE>
             ____________________
   Page   X |     Process Y      |
            |____________________|
            |                    |
            |       WASTE        |
            |       SPACE        |
            |____________________|  
   
              Pagination Problem
 
</PRE>
<P>In the diagram above, we can see what is wrong with the pagination
policy: when a Process Y loads into Page X, ALL memory space of the
Page is allocated, so the remaining space at the end of Page is wasted.
<P>
<H3>Segmentation and Pagination</H3>

<P>How can we solve segmentation and pagination problems? Using
either 2 policies.
<P>
<P>
<PRE>
 
                                  |      ..            |
                                  |____________________|
                            -----&gt;|      Page 1        |
                            |     |____________________|
                            |     |      ..            |
 ____________________       |     |____________________|
|                    |      |----&gt;|      Page 2        |
|      Segment X     |  ----|     |____________________|
|                    |      |     |       ..           |
|____________________|      |     |____________________|
                            |     |       ..           |
                            |     |____________________|
                            |----&gt;|      Page 3        |
                                  |____________________|
                                  |       ..           |
 
</PRE>
<P>Process X, identified by Segment X, is split in 3 pieces and
each of one is loaded in a page.
<P>
<P>We do not have:
<P>
<P>
<OL>
<LI>Segmentation problem: we allocate per Pages, so we also free
Pages and we manage free space in an optimized way.</LI>
<LI>Pagination problem: only last page wastes space, but we can decide
to use very small pages, for example 4096 bytes length (losing at
maximum 4096*N_Tasks bytes) and manage hierarchical paging (using
2 or 3 levels of paging)
</LI>
</OL>
<P>
<PRE>
 
 

                          |         |           |         |
                          |         |   Offset2 |  Value  |
                          |         |        /|\|         |
                  Offset1 |         |-----    | |         |
                      /|\ |         |    |    | |         |
                       |  |         |    |   \|/|         | 
                       |  |         |    ------&gt;|         |
                      \|/ |         |           |         |
 Base Paging Address ----&gt;|         |           |         |
                          | ....... |           | ....... |
                          |         |           |         |    
 
                     Hierarchical Paging
</PRE>
<HR>
<A HREF="KernelAnalysis-HOWTO-4.html">Next</A>
<A HREF="KernelAnalysis-HOWTO-2.html">Previous</A>
<A HREF="KernelAnalysis-HOWTO.html#toc3">Contents</A>
</BODY>
</HTML>