Sunday, August 22, 2010

A Sneak-Peek into Linux Kernel - Chapter 1: Introduction

The source code of Linux Kernel was something that fascinated me eight years ago and I am still addicted and I follow changes that happen to it. I am planning to write a series of post to explore the source code of the kernel. I hope that this would help budding developers and fans to understand and change the Linux Kernel and improve the operating system. But if you do so, please remember that you do it at your own risk. There is a general checkpoint in your evolution as a C programmer to check if you are in a position to change the kernel source and recompile it. Ask yourself: how many segmentation fault do I get, when I write a complex C code? how many times I got problems due to dangling pointer? If the numbers you get as answers to these questions are significantly high, it's a good idea to wait. Apart from that, I think that writing these articles would help me revisit many portions of the kernel and refresh my knowledge.

Here are some questions that I might ask myself with the answers. Who are you to write it? Have you submitted patches to kernel? No, yet to. Have you modified and recompiled the kernel? Yes, several times. Most of the changes are very specific to the kind of work I am doing. Submitting that as a patch might not help the community. What Linux do you use? I use a Debian 5.0. Have you been paid to modify the kernel? Yes, as part of my job or as a freelance. Do you like doing that? I love doing that.

The order in which I go through the kernel is in the same order in which Robert Love structured his book (because I personally like that book), with some variations here and there. You would see the exact copy of Linux Kernel code in many places. Copyrights of the code belong to its developers (check github) under GPL or one of its variants.

Chapter 1: Introduction - Tasks or Processes

In Linux terminology, task stands for process. Wikipedia defines a process as an instance of a computer program that is being executed. Lets call it task going forward, because many of the variables in kernel source are named so. In Linux, a task can have multiple concurrent threads, a task can spawn multiple children-tasks, and each task operates on virtual memory with most of the time no regards to the other tasks. A task is created by a clone() or fork() system calls and terminated by an exit() system call.

In Linux, the information about each task is preserved in a structure: struct task_struct in linux/sched.h. This is a huge structure because of the information it has to store. Some important variables in that structure are:
  • volatile long state; - defines the state of the task. Negative means the task is not runnable; zero means the task is runnable and can be scheduled; and positive means the task is stopped or suspended for some reason.
  • pid_t pid; - unique process id. pid_t is usually int.
  • struct mm_struct* mm; - user address space for the task
  • struct files_struct *files; - pointer to the files kept open by this task
  • struct thread_struct thread; - this is the CPU specific state of the current task. We will look at this later
  • struct held_lock held_locks[MAX_LOCK_DEPTH]; - array of locks held by the task. Typical value of MAX_LOCK_DEPTH is 48
  • struct task_struct *parent; - pointer to the parent of the current task (which would wait() for it)
  • struct list_head children; - list of children of current task
  • struct list_head sibling; - list of siblings of current task
Instances of struct task_struct is dynamically created by the kernel. Also kernel uses a macro called current to determine the struct task_struct of the current task. In a processor with less registers, like Pentium, the pointer to struct task_struct is stored at the top of the stack (if it grows up) and fetched by accessing stack pointer (ESP in case of x86). In a RISC processor like PowerPC or MicroBlaze with a lot of registers, the pointer to struct task_struct is stored in the register (r1 usually) for fast access.

The state of any task can be changed by the kernel using the macro: set_task_state(task, state); which is defined as follows:

#define set_task_state(tsk, state_value) \
set_mb((tsk)->state, (state_value))

Here set_mb is again architecture specific, just like current macro. If you are interested, you can find all the architecture specific macros in folder arch/##architecture##. Specifically, the implementation details of set_mb() can be found in asm/system.h and asm/barrier.h in every architecture.

Now let us write some simple code using what we learned. The following code traces back the ancestry of the current task and prints them in the kernel log:

struct task_struct* task;
for (task = current; task != init_task; task = task->parent)
printk("%s-%d\n", task->comm, task->pid);

To print all the siblings of the current task in the kernel log:

struct task_struct* sibpos;
list_for_each_entry(sibpos, &current->sibling, children)
printk("%s-%d\n", sibpos->comm, sibpos->pid);
The code listed above are computationally expensive, and should be used only for debug purposes. That's it for now. In the next chapter (expect it on 5th of September), we will see how a process can be created and how threads in Linux kernel are different from processes (are they different?).