Chapter2 - Building and Running: The Hello World Module

 The Hello World Module

The hello world is a very simple kernel module.  We will try to explore the same in this blog. I am assuming on your distribution (Debian, Ubuntu, Fedora, CentOS..etc) you have installed all the dependency packages, Kernel Source, and header files of the running kernel.  The sample hello.c kernel module can   

#include <linux/module.h>

#include <linux/init.h>

MODULE_LICENSE("Dual BSD/GPL");

MODULE_AUTHOR("NARESH BHAT");

static int __init hello_init(void) {

    printk(KERN_ALERT "Hello World!\n");

    return 0;

}

static void __exit hello_exit(void) {

    printk(KERN_ALERT "Good Bye, Cruel World!\n");

}
module_init(hello_init);
module_exit(hello_exit);

  • When a module is loaded hello_init will be called and after unloading the kernel hello_exit function will be called by running kernel.  These are defined by module_init and module_exit kernel macros.
  • Hence the module compiled version should match the running kernel version
  • The kernel entry point is module_init and the kernel exit point is module_exit
  • The special macros MODULE_AUTHOR, MODULE_LICENSE tell author and license used for the kernel module
  • The printk function defined in Linux kernel and made available to modules
  • Kernel needs it's own printing function because it runs by itself, without the help of the C library functions
  • After insmod has loaded, the module linked to kernel and can access the kernel's public symbols 
  • You can use insmod/rmmod utilities to load/unload the kernel module
  • The messages from printk goes into one of the system log files such as /var/log/messages
Compiling and Loading

The Makefile can be written as below

obj-m := hello.o
all:
        make -C /lib/mdules/$(shell unman -r)/build M=$(PWD) modules
clean
        make -C /lib/modules/$(shell unman -r)/build clean

How the makefile works ?
The obj-m := hello.o kernel build system handles the res.  The assignment says there is one module tube built from the object file hello.o.  The resulting module is named as hello.ko  after being built from the object file.
The command starts by changing to the directory path followed by -C (that is kernel source directory).  It finds the kernel top level makefile. The M= option cause that makefile to move back into module source directory before trying to build the modules target.  This target intern refers to the list of modules found in the obj-m variable.
The kernel developers have developed a sort of makefile idiom, which makes life easier for those building modules outside of the kernel tree.

Loading and Unloading Modules

After module build, next step is to load the module
  • insmod utility does the module inserting.  The program loads module code and data into the kernel, which intern performs the function similar to the ld, in that it resolves any unresolved symbol in the module to the symbol table of the kernel.
  • Load time configuration of module gives the user more flexibility than compile time configuration, which is still used sometime.
What actually happens when insmod used with a hello.ko ?
  • The insmod relies on the system call defined in kernel/module.c.  
    • The function sys_init_module  allocates memory to hold a module (memory allocated with vmalloc)
    • Then copies modules text region into memory region
    • Resolves references in kernel module via the kernel symbol table
    • Then calls the module_init function 
  • The system calls are prefixed with sys_
What is modprobe utility ? Where it is used ?
  • modprobe references any symbols that are not currently defined in the kernel. It basically check for the kernel module dependencies.
  • If any such references are found, modprobe looks for other modules in the current search path that defines relevant symbols
  • rmmod removes the module by calling module_exit function
  • lsmod list the modules currently loaded in the kernel
Version Dependency happens while building the module ?
  • One of the steps in the module build process is to link your module against a file (called vermagic.o) from current kernel tree
    • When module loaded kernel checks the processor specific configuration option for modules and makes sure that they match the running kernel.
  • This object contains a fair amount of information about the kernel the module was built for, including the target kernel version, compiler version and settings of the important configuration variables
  • If you want to build a module against specific kernel version then use KERNELDIR variable to kernel source directory
  • The definitions found at linux/version.h
  • The header file linux/module.h automatically includes version
  • module.h contains definitions about symbols and functions needed by loadable modules
  • init.h needed for initialisation and cleanup functions
  • You need to include moduleparam.h to enable the passing of the parameters to the module at load time.
The Kernel Symbol Table ?
  • A convenient way to manage visibility of your symbols
  • Reduce namespace pollution
  • promoting properinformation hiding
  • If your module needs to export symbols for other modules to use,  the following macros used
EXPORT_SYMBOL(name);
EXPORT_SYMBOL_GPL(name);

_GPL version makes the symbol available to GPL licensed modules only.
  • This variable stored in a special part of the module executable (an "ELF" section) that is used by the kernel at load time to find the variables exported by the module
Initialisation and Shutdown

static int __init initialisation_function(void)
{
        /* initialisation code here */
}
module_init(initialisation_function);
  • module_init should be defined as static because to make it local to that file
  • The __init means after the function loaded, initialisation done, then the memory is cleaned up.
  • There is a similar tag __initdata for data used only during initialisation.
  • __devinit and __devinitdata in the kernel source; these translate to __init and __initdata only if the kernel has not been configured for hot-luggable devices
  • module_init is a Kernel Entry point you must use it, if not then your module initialisation function never called
  • Most of the registration functions are prefixed with register_
static void __exit cleanup_function(void)
{
        /*cleanup function */
}

  • The __exit modifier marks the code being for module unload only (by causing compiler to place in special ELF section)
  • The __exit discarded if your module is built-in or configured to disallow the module unloading, Hence it can only be called while module unloading to system shutdown time. Any other use case is an error
  • module_exit is necessary to enable to kernel to find your cleanup function
Error handling
  • Error recovery sometimes best handled with goto statement.
  • The error codes are negative numbers belong to the set defined in <linux/error.h>
  • You can define your own error
Module Parameters

  • The parameters can be assigned at load time by insmod or modprobe
  • It can also read from /etc/modprobe.conf file
  • parameters are declared with the module_param macro defined in moduleparam.h
  • module_param takes 3 arguments name of the variable, type and permission mask
  • The macro should be placed outside of any function.  Typically found near head of the source file.
  • To declare array of parameters use module_param_array(name, type, num, perm);
    • name - name of the array
    • type - type of array elements
    • num - Integer variable
    • perm - permission value
  • Module loader refuses to accept more values than will fit in the array
Advantages and Disadvantages of Userspace drivers
  • full C library can be linked in
  • Can run conventional debugger on driver code
  • Userspace driver hangs, you can simply kill it
  • User memory swappable
  • If you want to write closed- source driver then user space option makes it easier

  • Interrupts are not available at user space, need to use signals
  • DMA is only possible only by mmapping /dev/mem, with privileged user
  • I/O port available only after calling ioperm or iopl
  • Response time is slower
  • If driver has been mapped to disk then, response time is unacceptably slow
  • Important devices can't be handled in user space driver 


Kernel Modules Versus Applications
  • Every kernel module registers itself in order to server future requests, called Event-Driven programming.  But all applications are not event driven
  • Kernel module use the exit function todo a clean exit by releasing all the resources and free the memory, if not they remain in system till rebooted.  But applications could be lazy kind of exit and will not affect the system much.
  • After initialization the function terminates immediately 
  • The application can call functions it does't define; the linking stage resolves external references using the appropriate library of functions. printf is one of those callable function defined in libc
  • A module on the other hand only linked to kernel, and only functions it can call are the ones exported by the kernel;  there are no libraries to link to.
  • Kernel module can be unloaded which is a modularisation approach,  will cutdown the development process time.
  • Application segfault are harmless, kernel fault kills the current process at least,  if not whole system.
User Space and Kernel Space
  • A module runs in kernel space, application run in user space
  • Kernel space OS must ensure unauthorised access to resources
  • Each one has it's own memory mapping and address space as well.
  • With system call or suspended hardware interrupt UNIX transfer the execution from user space to kernel space
Concurrency in Kernel
  • Application programming is the issue of concurrency.  Most applications, with notable exception of multithreading applications
  • Linux can run SMP systems, with the result that your driver could be executing concurrently on more than one CPU
  • The Linux kernel code, including driver, must be reentrant - it must be capable of running in more than one context at the same time.
  • Datastructure must be written carefully by taking care of memory corruption, concurrency and race conditions
The current process
  • Kernel modules don't execute sequentially as applications do
  • Kernel code can refer to the current process by accessing the global item current
  • The current pointer refer to the process that is currently executing
Few other details
  • Applications are laid out with very large stack area. Used to hold automatic variables, function call history 
  • Kernel has very small stack it's never good idea to declare large automatic variables; if you need larger structure you should allocate them dynamically at call time.
  • Kernel code can't do floating point arithmetic
  • Enabling floating point would require that the kernel save and restore the floating point processors state on each entry to and exit from kernel space.  Its a extra overhead not worthwhile



























Comments

Popular posts from this blog

Apache Ambari on ARM64

Benchmarking BigData

mockbuild warning in CentOS