Wednesday, September 29, 2010

A Sneak-Peek into Linux Kernel - Chapter 3: Process Termination

Finally I found some time to get back to continuing this effort of writing about Linux kernel. This chapter is about how the process or task gets terminated.

In Linux, a task is terminated by an exit() system call, made either explicitly by the task or implicitly when the main() function of the task ends. After the task is terminated, its parent has to be informed about the termination through SIGCHLD signal. Major chunk of exit() operation is performed in do_exit() function in kernel/exit.c.

The first major process in do_exit() function is to validate the credential for exit(). Validation happens in cred.c with this piece of code:

if (tsk->cred == tsk->real_cred) {
if (unlikely(read_cred_subscribers(tsk->cred) <>cred)))
goto invalid_creds;
} else { if (unlikely(read_cred_subscribers(tsk->real_cred) <>cred) <>real_cred) ||
goto invalid_creds;
Oh.. yes; kernel developers never shun away from the usually dreaded goto statement. If they want the control to go to somewhere, they just goto there.

If the validation is successful, then exit_irq_thread() function, implemented in kernel/irq/manage.c is called to set the IRQTF_DIED flag that prevents any attempt to wake up the thread. Then PF_EXITING flag is set on the task_struct of the exiting task. Then the exiting task is protected from cleaning the area of pi futex. A futex is the same as mutex (fast user-space mutex = futex) that is used in the Linux kernel to implement locks. Out of that, pi futex or to be technically correct, PI enabled futex stand for Priority Inheritance enabled futex – a set of lightweight futex, about which we will discuss later.

Following this, the do_exit() invokes exit_mm() function that clears out the user memory space allocated to the process. exit_mm() first releases the user-space by calling mm_release() in kernel/fork.c. mm_release() first removes all the register state saved by the process and inform the parent sleeping on vfork(). The user-space thread id field is wiped off if the exit is normal. On the contrary, if the exit is due to some signal like segmentation fault or bus fault (checked using PF_SIGNALED flag of the task_struct, then it is not wiped so that this information can be written in core dump. This is followed in order by invocations of exit_sem, exit_files, and exit_fs that dequeue if the task has queued any semaphore, removes the locks on files and file space respectively. Then the task status is set to tsk->state = TASK_DEAD.

As it can be observed here, the task_struct instance of the exiting task is set to a state: TASK_DEAD and it is not entirely wiped off. That means that the parent can still gather information about the finished task. Removal of the task_struct altogether is managed by invoking release_task(). The release_task() has this following code:

RCU stands for Read-Copy-Update, a synchronization mechanism used in Linux. So whatever falls between rcu_read_lock() and rcu_read_unlock() is a read-side critical section – just wanted to show a real piece of the OS kernel code that establishes a critical section. release_task() also calls do_notify_parent() function which notifies the parent with SIGCHLD. This is followed by a final disposal of the task by a call to architecture specific release_thread() function. In x86, it involves freeing up vm86 irq for this task; whereas in ARM or PowerPC, this function does nothing (serious). And thus, a task reaches its demise.

So in the past three episodes, we discussed the rise and fall of a task or process. But what happens in between is more interesting. The next chapter would be on task scheduling.

Wednesday, September 01, 2010

A Sneak-Peek into Linux Kernel - Chapter 2: Process Creation

In the last chapter, we looked at the basics of process or task in Linux kernel and with a brief overview of struct task_struct. Now we are going to discuss how the process or task gets created.
In Linux, a new task can be created using fork() system call. fork() call creates an almost exact copy of the parent process. The differences are in pid (unique), and parent pointer. It creates an exact copy of task descriptor, and resource utilization limit (set to 0, initially). But there are a lot of parameters in the task descriptor that must not be copied to child. So the forking operation is implemented in the kernel using clone() system call, which in turn calls do_fork() system call in kernel/fork.c. do_fork() predominantly uses an unsigned long variable called clone_flag to determine what parameters need to be shared or copied. The first call made by do_fork() is copy_process(), which has the following code:

retval = security_task_create(clone_flags);
The first step of copy_process() is to call security_task_create() implemented in security/security.c. In this function, a structure called struct security_operations is used. This function takes clone_flags as input and determines if the current process has sufficient permission to create a child process. This is done by calling selinux_task_create() function in security/selinux/hooks.c, which also has a function current_has_perm() that takes clone_flag and check for several permissions using access vector cache - a component that provides caching of access decision computation in order to avoid doing it over and over again. If security_task_create() returns 0, copy_process() cannot create the new task.

p = dup_task_struct(current);
retval = copy_creds(p, clone_flags);
The next step is to duplicate the struct task_struct by calling dup_task_struct(). Once the struct task_struct is duplicated, the values of the pointers to parent tasks that do not make sense in the child are set to appropriate values as explained in the subsequent paragraph. This is followed by a call to copy_creds(), implemented in kernel/creds.c. The copy_creds() copies the credential of parent to the child and at this point a new clone_flag parameter called CLONE_THREAD has to be introduced.

In Linux, there is no differentiation between threads and tasks (or processes). They are both created and destroyed in the same way, but handled a little differently. CLONE_THREAD flag says that the child has to be placed in the same group as the parent. The parent of the child process in this case is the same as the parent of the task that called clone() and not the task that called clone() itself. The process and session keyrings (handled later) are shared between all the threads in a process. This explains why struct task_struct has two pointers, namely real_parent and parent. In case of CLONE_THREAD flag set, real_parent points to the task that invoked the clone(), but beyond that this "real parent" does not have any control over the newly created task. The parent of the newly created task is marked as the task that created all these threads. getppid() system call brings that parent process and only the parent process gets SIGCHLD on termination of child, if it makes wait() system call. So the copy_creds() has to copy the credentials and keyrings of the common parent (not the real parent) for all these threads. So threads in Linux are processes that share parent and user space. Note that the thread we are talking about here are user threads that any program creates using pthread or zthread libraries. They are completely different from kernel threads created by Linux kernel for its internal operations.

After this, the function takes care of setting the CPU time utilized to zero and splitting the total CPU time from parent to give it to the child. Then sched_fork() function defined in kernel/sched.c is invoked. Process scheduling is a separate topic of discussion.

In this chapter, we looked at how a task is created in Linux kernel or rather what important operations are performed during task creation. In the next chapter, we will see task termination. In the meantime, you can put printk() statements in copy_process() function of kernel/fork.c and check out the kernel log and observe its working.

Sunday, August 22, 2010

A Sneak-Peek into Linux Kernel - Chapter 1: Introduction

The source code of Linux Kernel was something that fascinated me eight years ago and I am still addicted and I follow changes that happen to it. I am planning to write a series of post to explore the source code of the kernel. I hope that this would help budding developers and fans to understand and change the Linux Kernel and improve the operating system. But if you do so, please remember that you do it at your own risk. There is a general checkpoint in your evolution as a C programmer to check if you are in a position to change the kernel source and recompile it. Ask yourself: how many segmentation fault do I get, when I write a complex C code? how many times I got problems due to dangling pointer? If the numbers you get as answers to these questions are significantly high, it's a good idea to wait. Apart from that, I think that writing these articles would help me revisit many portions of the kernel and refresh my knowledge.

Here are some questions that I might ask myself with the answers. Who are you to write it? Have you submitted patches to kernel? No, yet to. Have you modified and recompiled the kernel? Yes, several times. Most of the changes are very specific to the kind of work I am doing. Submitting that as a patch might not help the community. What Linux do you use? I use a Debian 5.0. Have you been paid to modify the kernel? Yes, as part of my job or as a freelance. Do you like doing that? I love doing that.

The order in which I go through the kernel is in the same order in which Robert Love structured his book (because I personally like that book), with some variations here and there. You would see the exact copy of Linux Kernel code in many places. Copyrights of the code belong to its developers (check github) under GPL or one of its variants.

Chapter 1: Introduction - Tasks or Processes

In Linux terminology, task stands for process. Wikipedia defines a process as an instance of a computer program that is being executed. Lets call it task going forward, because many of the variables in kernel source are named so. In Linux, a task can have multiple concurrent threads, a task can spawn multiple children-tasks, and each task operates on virtual memory with most of the time no regards to the other tasks. A task is created by a clone() or fork() system calls and terminated by an exit() system call.

In Linux, the information about each task is preserved in a structure: struct task_struct in linux/sched.h. This is a huge structure because of the information it has to store. Some important variables in that structure are:
  • volatile long state; - defines the state of the task. Negative means the task is not runnable; zero means the task is runnable and can be scheduled; and positive means the task is stopped or suspended for some reason.
  • pid_t pid; - unique process id. pid_t is usually int.
  • struct mm_struct* mm; - user address space for the task
  • struct files_struct *files; - pointer to the files kept open by this task
  • struct thread_struct thread; - this is the CPU specific state of the current task. We will look at this later
  • struct held_lock held_locks[MAX_LOCK_DEPTH]; - array of locks held by the task. Typical value of MAX_LOCK_DEPTH is 48
  • struct task_struct *parent; - pointer to the parent of the current task (which would wait() for it)
  • struct list_head children; - list of children of current task
  • struct list_head sibling; - list of siblings of current task
Instances of struct task_struct is dynamically created by the kernel. Also kernel uses a macro called current to determine the struct task_struct of the current task. In a processor with less registers, like Pentium, the pointer to struct task_struct is stored at the top of the stack (if it grows up) and fetched by accessing stack pointer (ESP in case of x86). In a RISC processor like PowerPC or MicroBlaze with a lot of registers, the pointer to struct task_struct is stored in the register (r1 usually) for fast access.

The state of any task can be changed by the kernel using the macro: set_task_state(task, state); which is defined as follows:

#define set_task_state(tsk, state_value) \
set_mb((tsk)->state, (state_value))

Here set_mb is again architecture specific, just like current macro. If you are interested, you can find all the architecture specific macros in folder arch/##architecture##. Specifically, the implementation details of set_mb() can be found in asm/system.h and asm/barrier.h in every architecture.

Now let us write some simple code using what we learned. The following code traces back the ancestry of the current task and prints them in the kernel log:

struct task_struct* task;
for (task = current; task != init_task; task = task->parent)
printk("%s-%d\n", task->comm, task->pid);

To print all the siblings of the current task in the kernel log:

struct task_struct* sibpos;
list_for_each_entry(sibpos, &current->sibling, children)
printk("%s-%d\n", sibpos->comm, sibpos->pid);
The code listed above are computationally expensive, and should be used only for debug purposes. That's it for now. In the next chapter (expect it on 5th of September), we will see how a process can be created and how threads in Linux kernel are different from processes (are they different?).

Friday, July 23, 2010

Virtual Machine 2.0

Latvian programmer and blogger Peter Krumins (Pete) has recently announced that he along with James Halliday. I came to know about Pete a couple of years ago. As many of you might know, I am big fan of computing, and programming. Pete's blog is a must read for anybody with a similar interest. James Halliday is known for his work on node.js, an easy and practical way to build scalable network program. He authored DNode, the node.js library that enables asynchronous, bidirectional RMI for node.js and browser. Anybody who knows something about computing would call this an amazingly smart idea. When I started dipping my legs into Haskell based parallel computing, I admired at his simple and neat demonstration of the principle of Matryoshka Doll - a beautiful abstraction that I would certainly use someday.

I digress; Coming back to the original subject, the startup founded by Pete and James Halliday is called StackVM. We know that virtualization has a lot of potential and is going to play a vital role in future. StackVM takes it one step ahead by taking the VM to web. It is a much more accessible online virtual machine, making it sharable and usable from anywhere. That means, at one point I would be able to access my personal, preferred operating system from a public library. StackVM also allows you to share your virtual machine with its state through Internet in the same way you share your photos or videos. This means that you do not have to type verbose description about your problem in Ubuntu's forum. Picture = 1000 words; Movie = 1000 pictures; Shared VM = 1000 movies => Shared VM = 1,000,000,000 words.

Fasten your seat belts. This is just the beginning of their scope. As I said earlier, one of author is wrote asynchronous RMI for node.js. So what do you expect from him? Yes. They are making networking between the StackVMs easy, just drag and drop to create any virtual network topology you wish, with firewalls, switches, bells and whistles. Also they are making something called "vmcasts" - to explain in Pete's own words: "much like a screencasts, except the computation is recorded, meaning that at any point you can break into the playing vmcast and change the course of computation (and return back to it later)."

So what are the potentials uses of such a system? I can think of a lot many, but Pete has listed a few of them:
Here are a few use cases:
  • Suppose you're selling software and you want your users to try it before they buy it. Perfect use of StackVM - put your software in the virtual machine and embed it on your products page. The potential customers can try your software before they buy it right from your website!
  • Suppose you're an application developer and have written a program that should work cross-platform. You can easily rent 10 virtual machines with Linux, Windows, MacOS, and other operating systems and test your software. Just drag and drop it into the virtual machines, and you can test your software!
  • Suppose you want to teach someone how to work in Perl in a series of blog posts (like I do), you can embed the terminal with a vmcast in your blog post, and everyone can follow your tutorial, and also try out the examples interactively, in a real shell!
  • You can build a virtual honeypot network and have hackers break into it, then analyse how they did breakins. Or, you can build a huge network and learn routing and networking concepts!
  • Suppose you want to share your work with a group of people. You can easily do it in stackvm! Just send the other people link to your VM and they can connect to it with any web browser. They'll be able to see what you're doing, comment on your work, and if you allow fix your bugs (think pair programming!)

Running virtual machines on web requires a lot of infrastructure. They are working on it and until then we have to use their demo site. StackVM is completely open source and idea open, which means any of you smart guys can contribute through code and idea.

To Pete, Good Luck with your startup! Let this small and humble beginning take you to greater heights!!

Monday, March 29, 2010

Night Fox of New Jersey

The movie Ocean's Twelve featured a very interesting character called "The Night Fox", a master thief enacted by the French leading man Vincent Cassel. In that movie, Cassel steals the Coronation Egg (in fact it's fake) by acrobatically avoiding the laser beams. In its previous version, "the Amazing Yen" uses his acrobatic skills to rob a casino. I thought such things are possible only in movies.

I was wrong. In New Jersey, a band of acrobatic thieves managed to rob Best Buy in a similar fashion. The thieves never appeared in the security camera. The entire floor had sensors, but still no alarm was tripped, as the thieves never stepped on the floor. Apparently they left boot print on the gas pipe they climbed by. NJPD is looking for the thieves.

Thursday, March 25, 2010

Job in 2018

The unemployment numbers frustrate me and I feel we have to be like this for a long time. Oh.. Light at the end of the tunnel. A new study makes a convincing argument that by 2018, there will be overwhelming number of jobs open because of the mass retirement of baby boomers. And this is of course with no change in the immigration policy or labour law. Just eight more years! At that time, inflation may sky rocket and it is something to worry about. Poor policymakers - there is always something in the economy that spoils their sleep at night.

Sunday, March 14, 2010

Where is consumer electronics heading?

Here is a graph by International Data Corp. that indicates where we are heading in the consumer electronics market. People want it small and want it to connect to the worldwide web!

Monday, March 08, 2010

Picture = 1024 words

Most of the time, I need a pictorial representation of call graphs to understand the profiler information with more clarity. Usually I use it to ensure that I am using the memory and CPU efficiently and which part of the code I need to optimize to get the best bang per buck. In my laptop, I use valgrind to generate the call graph and KCachegrind to study the visual representation of call graph. When I use Solaris, I use the dtrace script given by my friend here along with graphviz to visualize the call graph.

But recently at work, I got a Fedora system for which I was not the system administrator. The system administrator was reluctant to install valgrind or KCachegrind stating the fact that gprof was installed in the system and I could use that. Only thing I dislike about gprof is the fact that I have to recompile my C source codes with an additional option. But other than that, I have nothing against it. Even then, I needed a visual representation and so I started writing yacc and lex to parse the gprof output and create a graphviz file. Before getting deep into it, I noticed that there is something called gprof2dot - a python code that converts gprof output to a dot file.

gprof2dot does the magic that I wanted. Interestingly, it has support for an array of profilers including valgrind. The source code is open and it is object oriented and well organized. Each profiler is parsed by a separate class (like GProfParser) that implements a common Parser class. So we can easily plugin our code for our profiler's output (if we use a different profiler). Only thing that could have been done differently is that the author could have used GvGen dot file generator library instead of creating it as a generic output file.

Now all I need to do is to use gprof2dot to generate the dot file that I can parse through graphviz. Or alternatively I can modify the source code to use PyDot library and generate the PNG file in a single shot. Once again the world is saved by free and open source software!

Friday, March 05, 2010

Bloom Box and Artificial Photosynthesis

A couple of weeks ago, CBS' 60 minutes focused on K.R. Sridhar's innovation, the Bloom Box, a small sized wireless power plant that can generate energy for your house. That's a form of clean energy. We can envision that in the next 10 years, all homes would supply electricity for themselves with the Bloom Box that I looked at as the miniature of a monolith (Tycho Magnetic Anomoly-4!).

Now Dan Nocera of MIT made an absolutely remarkable innovation that artificially mimics photosynthesis. Dr. Nocera has created a catalyst that uses sunlight to convert water and carbon-di-oxide into hydrogen and oxygen, in a project funded by ARPA-E, which in turn was funded by the stimulus package. The generated hydrogen and oxygen can then be recombined in a fuel cell to produce energy. Please look at the embedded video.

These innovations are impressive. However I don't know whether they can produce clean, sustainable energy at industrial scale at a low cost. Many of the clean energy companies face this problem of huge production cost and problem with production in scale. If one or both of these technologies work, they can be the Holy Grail for all our energy crisis (both present and future). Who knows? May be they can be fitted in automobiles, like cars, trains and planes to operate them without wasting the fossil fuels and without polluting the environment. Great independent work by Dr. Sridhar and Dr. Nocera and more importantly their teams! Hats off!!

UPDATE: You can watch the CBS' coverage of K.R.Sridhar's Bloom Box here. Sorry, I cannot embed that video.

Tuesday, March 02, 2010

Compiling the extra dimension

Three dimensional FPGA and three dimensional chips are creating a lot of buzz and may be the direction that VLSI design and reconfigurable computation are headed in. Why is it just a "may be"? There are a few start up companies that have come up with the 3D fabric and the interesting part is that the third dimension is not the same for everyone. In some architecture, the third dimension is the z-axis. So the 3D FPGA is just multiple fabrics of FPGAs stacked up. So the chip that looked like Manhatten structure earlier would start looking like Tokyo flyovers with roads running between different layers. In some other FPGA architecture, the third dimension is the time axis. What all these essentially mean is that, a mash-up multi-dimensional FPGA is certainly a possibility. So what's the problem with this?

Starting up a company with a new FPGA architecture and be successful with that is extremely difficult. The reason is not because of the quality of your architecture; rather it is because you may have to make the entire array of computer-aided design tools available to make your architecture programmable. Every company that might potentially use the newly introduced architecture would have a tried and tested development process that is strictly defined on the basis of the existing technology. They would want to fit the exact process with the new technology and this is where several newbies fail. Unless you can convince your customers that the benefit that your new technology brings to the table can overwhelm the temporary loss of productivity due to the process change, you cannot sell your product.

So the three dimensional FPGA should have a HDL parser and optimizer, logic minimization, technology mapping, placement and routing tooling support to the same extent that Xilinx or Altera provides. Now the first two are pretty established and architecture independent. You can use one of the different implementations of Espresso for logic minimization. Technology mapping can be very much similar to how it is done in the existing 2D FPGA, using force directed scheduling or any other established scheme.

The problem starts with the last two: placement and routing. Most of the placer and router used in industry are proprietary. I am pretty much confident that Altera's placement algorithm is deterministic and may be Xilinx also has a deterministic placement, I am not sure. When it comes to routing, most of these tools use negotiated routing with the negotiation scheme being proprietary. Placement is a NP-hard problem and routing is NP complete. So to achieve a critical path, if your placement and routing takes twenty times more CPU cycles than what the existing tools take, then your potential customers are going to take a step back immediately. Well, they want a "rapid" prototyping, you see.

If the 3D FPGA happens to be entirely made up of similar CLBs and IO blocks, the placement and routing is a little easier. Versatile Place and Route (VPR) is the standard, open source academic tool for placement and routing. It performs a simulated annealing based placement and Pathfinder-based negotiated routing on array-based FPGAs. How easy is it to modify VPR to support the third dimension? A block is defined in VPR through the following structure:

struct s_block
char *name;
t_type_ptr type;
int *nets;
int x;
int y;
int z;

In this structure, you can give the name of the block, type of the block, typically CLB or IO-block and the cartesian coordinate of the block's location, during placement. Here the x and y indicate the coordinates in x-y plane. What is z? No, it's not the third dimension you are looking for. Rather z is the number that specifies which sub-block the net refers to. So a single block can have multiple sub-block - each sub-block is a logic element. In this structure, you may have to add another variable to specify the coordinate in the third dimension (you may have to use a different name, since z is taken). And then it involves changing all the loops in VPR. You may also need to change the placement file's structure to introduce the extra dimension and make Pathfinder understand that. It is difficult; but with some effort, it is possible to make VPR support the 3D FPGA placement and routing. But can its quality compete with the existing 2D FPGAs? Can it be done on click of a button at a reasonable speed?

So on summary, introducing the third dimension in FPGA is certainly impressive. But if you want to make money out of it, you have to provide an array of necessary tooling support so that your potential customers can realize the fruits of extra dimension, without taking much risk and paying much price. On the other hand, 3D FPGA may evolve from some already established company that is not struck in a marketing myopia. Lets wait and see.

Wednesday, February 24, 2010

Cost of public school education

Earlier I have written a blogpost regarding the need for an educational reform citing the increasing number of college dropouts. In that, I have mentioned that the public universities give excellent education for an affordable cost. Recently Dr. Mark Perry of the University of Michigan has written an article comparing the per-capita money spent on public school education to the Harvard tuition fee. The results are astounding. A little number crunching removes the fog that surrounds the truth and when it is visible, sometimes it is stranger than fiction. Here is the link to Mark Perry's blogpost. The US seriously needs an educational reform.

Monday, February 22, 2010

Growth in computing power

I am running a fever, cough and cold that collaboratively make me tired at body and heart. The tiredness unfortunately keeps me away from blogging, as research, work, and study effectively occupy my reduced "active" time. When the going gets tough, I have to give priority to what I am paid for and what I pay for. Somehow I find sometime to write this, when my Python script is running in the background.

Recently I have read this 2002 NBER paper, titled The Progress of Computing by Yale's economic professor, W.D. Nordhaus. This is a good 61 page article that covers the 150 year history of computers. I am always a big fan of the evolution of computers, because it teaches a lot of lessons about what not to do. It's still a serious paper, not a set of anecdotes.

The important observation is the rate at which the power of computation has grown after the World War II. Believe it or not, over this period of the last 150 years, the growth in computational power is trillion times. I don't think any of technology or field can boast this kind of growth. Of course, medicine is a field that grows very fast, although it is not easily measurable like computers. But the growth in medicine had a huge acceleration when computers started playing a role in improving diagnostic and surgical ability of doctors, as with any other field.

The most interesting part of the essay is the graph shown above. This graph is plotted for a forty year period in which it shows the exponential decrease in the computer price (it's a semilog plot). As it can be observed, the price of computation from 1970 to 2000 has gone down by a factor of one million. I would like to see the increase in efficiency and productivity in various industries that has been brought about by this reduced cost of computation. This may arguably be the most important testimonial for man's achievements over the past half-a-century.

Thursday, February 04, 2010

Inequality and Income distribution of G20

Recently I have read an extremely informative article in G20 website about the growth in personal income among the population of different G20 countries and there was a similar NBER working paper. The article is a little long, but it gives deep insight about poverty, inequality and income distribution. The research tracks the income growth since 1970. One interesting observation is that the G20 income distribution exactly traces the entire world's income distribution. Since G20 represents more than 60% of world's population, it may make sense. But what makes then a part of this club is not their population, but their economic growth. If that is the case, shouldn't G20 income be significantly or at least notably larger than the world average? The fact that it is not simply implies that even in a fast growing economies, poverty is a continuous problem. The US, Europe and Japan occupy the highest income distribution segment among the G20, as expected.

Another observation is that the income range in 1980 is more or less the same as it was in 1970. Seventies has been the decade of small, recurring depressions all over the western world, just like the current decade, although there was nothing as worse as the current depression since 1930s. Seventies also saw the peak of cold war. In seventies, more than one-fourth of India and China was earning lesser than $1 per day. China's socialist market economy has not kick started at that time and India was still confused and there were not stable economic and trade policy to talk about. In fact, it was more tilted towards USSR.

In 1980, China has started its internal liberalizations. India was still confused and the government was holding most of the industries. The industries lacked in technological growth, innovation, and productivity. The society was split in different terms and keep on innovating ways to not stick together.

Then started the roaring nineties and its globalization. India made some right choices and jumped on the globalization bandwagon. Or rather it can be argued that India had no other choice. One thing that still bothers me even now was the marked absence of internal liberalization in India. Except in certain sectors like banking, when the Indian market opened for private investments, it immediately invited global players. So the number of pure Indian players who compete with these global giants was less. At the end of the roaring nineties, Asia hit a currency crisis with Thailand as its epicenter. The different countries which were the proud foster children of IMF fell down first almost pulling down everybody around along with them. What did India and China do differently? The more suitable question is what did they not do. They did not drink the capital market liberalization potion. And thus they survived.

The best observation that we can make out of these curves is that throughout this period, the shape of the income distribution curve remained the same. This means when India's economy started developing through globalization, it created positive effect of everybody across the histogram. On the other hand, China's curve became flatter and flatter - which means the inequality between the rich and the poor increased constantly and dramatically. But in India, although it did not achieve the income shift as China did, it managed to maintain its income distribution ratio.

Now whatever short term political stupidity might have happened in India over the past two decades, this could arguably be its greatest achievement since independence. Kudos to India's policymakers!

Note: All images courtesy:,

Wednesday, February 03, 2010

3D-FPGA Reinvented

The concept of 3D chip has been coming and going. I have not seen many commercial chip that is 3D. But that may be the norms in future, as we find it difficult to integrate more transistors into a given area.

If the area is small, stack it up. That's what was done in New York in the mid nineties. In future, at least in a distant future, that's what they would do in Atlanta and Phoenix. What applies to geography, applies to chip design - I find them both extremely similar.

This time, the famous and well respected innovator Zvi Or-Bach has reinvented 3D FPGA. FPGA has a big advantage of ASIC in terms of rapid development cycle and less initial investment. But in practice, FPGA is much slower and are not dense enough. FPGA architecture and the synthesis technology has undergone several changes taking it closer and closer to ASIC in terms of performance. 3D FPGA may be yet another step, as Or-Bach claims.

Nu-PGA tries to increase the interconnect density of the FPGA by taking the antifuse to a separate layer from the configurable logic blocks. What does the increasing interconnect density mean? A lot, actually. With a rich interconnect availability, the bounding box of the chip is reduced a lot. So the placement and routing algorithm need not have to bother much about where to lay the track, number of tracks in each direction and what the channel width is. Also it increases the speed of placement and routing process. All these clearly mean getting closer to ASIC.

But the 3D FPGA described above is more like a building with two levels. Not so fancy, but we have never done anything more than one-level building earlier. I would however expect the building to go up fast, like putting CLBs on top of each other in different layers, or having a layer of CLB and a layer of IO blocks. I seriously think there is a good scope for growing tall as there is more space available there. Let's see how the market reacts to these innovative ideas and where it takes us to.

Monday, February 01, 2010

Turned on by COTSon

A few months ago, I wrote an article longing for the need of a multicore processor simulator - preferably one that is free. One of the reader of my blog have left a message about COTSon (I wonder why (s)he wanted to be anonymous). So I started reading more about it, starting with the white paper that was in SIGOPS. And I liked what I read.

COTSon is the result of a collaborative effort by HP and AMD. The best part about COTSon is that it is not just a multicore processor simulator, but can also act as a full blown system simulator. That means it can simulate a range of hardware models and software stack. It is based on AMD SimNow, which performs high speed instruction set translation for x86 and AMD Athelon type processors. The software stack that it can simulate includes everything that runs on x86 and AMD Athelon including proprietary software like MATLAB for instance.

What COTSon does is not a cycle-accurate or bit-accurate simulation, which runs over the entire weekend to tell what is wrong. Rather it's a fully functional simulation. So may be COTSon can act as a first cut simulation at one level higher than the Transaction Level Modeling. Or on the other hand, COTSon can be used as a tool to visualize the paradigms of TLM. Maybe COTSon is what we need in the design cycle in this world where time-to-market is of prime importance.

I am now interacting with a friend doing his doctorate in Vrije University about COTSon, how we can learn how to use it and use it. We would know the reality only when the rubber hits the road. Lets see how good it is and I will post at the end of my analysis with some greater detail.

Saturday, January 02, 2010

Do you feel your boss is more incompetent than you are?

In management, the level up to which someone would raise in the organizational ladder is governed by the Peter Principle, named after the Canadian psychologist Lawrence Peter. Consider a team of dozen employees out of which Alice is the most competent. So Alice gets promoted to the next job level. Now given the fact that Alice was the best among her contemporaries does not mean that she would be the best even in the current job - which is, say, supervising those who earlier worked with Alice. Peter's Principle says that all new members in an organization climb up the ladder until they reach the maximum incompetence.

Common sense tells that in any organization, the person who performs the best in a job has to be promoted to the higher level. But most of the time, the kind of job involved in one or two level above would be radically different from the current job responsibilities. So the person who does best in one job may fail big time in the other. For example, Michael Jordan was able to accomplish great things as a player of Chicago Bulls. But he did not achieve the same amount of success as the coach of Washington Wizards. Other better examples could be seen in your day to day lives.

Two solutions proposed by Peter for this are: (i)increase the pay for the best performing employee rather than immediately promoting them. As a result, some lower level employee may earn as much as their boss. This also requires a psychological change in boss' side. (ii) train the potential candidates for the job and promote on the basis of their performance in the training program. The success of this depends upon how good one can devise that training program. Personally, I think both of these can be applied concurrently.

But a recent paper by Alessandro Pluchino et al published in a journal of computer science and game theory proposed two dangerous solutions. Pluchino and his team has performed an agent-based simulation of a hierarchical organization in which the promotions are offered on the basis of performance at a given level, and the job responsibility differs at each level. Based on these results, this paper offers two unconventional solutions. First solution is to promote the most competent and the least competent alternatively. Second solution is to promote people in random without considering the competence at the current level.

These techniques might have worked wonders among the agents. But humans are complex entities. The psychological effect of these solutions cannot be overlooked. First if the person is to be promoted in random, then there is no incentive to be competent. This effect may spread through the entire organization making it a zombie. Instead if the most competent and the least competent are promoted alternatively, this gives an incentive for those who are below average to become intentionally and glaringly incompetent. Most of the time, the failure of the theoretical model in a recurring system can be ascribed to its failure to consider the practical implications of its first round implementation on the subsequent rounds.

Even Hari Seldon made a reservation that future cannot be predicted and announced, because of the human nature to act according to the result of the prediction and there by falsifying it. Unless this human nature has to be considered in these kind of simulations, they would never become practical.