We are designing all of our future development to multicore designs. We believe this is a key inflection point for the industry.Followed by the diminishing returns from the Instruction-led Parallelism in a uniprocessor, the world of computer architecture decided that multicore processor and chip multiprocessor is the direction of the future.
I knew the importance of multicore processors even before they became famous in the general purpose computing. Part of my undergraduation research thesis involved implementing digital beamforming in quad-core SHARC processor. Now it is apparent that multicore processors are here to stay and whether you like it or not parallel programming is the future way of computing. Web programs are already running in parallel managed by the web application servers. Embedded systems programming are rapidly moving towards introducing parallelism wherever performance matters. There still are two issues that make parallel programming difficult. One is the availability of debugging tools, especially the rather unique bugs like Heisenbugs. The firms are moving towards developing debuggers to reveal the heisenbugs and ease the programming. Although the multicore developers and compiler designers are coming up with parallel programming debugger extensions to solve this problem, it is clear, present and painful at this stage.
Second issue with multicore processor is the lack of simulators for multicore processors. SimpleScalar is certainly a excellent processor simulator. But simulating a Chip MultiProcessor (CMP) with hundreds of core is still an open problem for the computer architecture community. Recently Monchiero et al of Hewlett-Packard Laboratories have come up with an idea to simulate shared-memory CMP of large size, published in the recent SIGARCH transaction.
The best part of this paper is the simplicity of the underlying idea. The idea is to translate the thread-level parallelism of the software to core-level parallelism in the simulated CMP. First step is to use the existing full system simulator to separate instruction streams belonging to different threads. Then the instructions flow of each thread is mapped to different cores of the targeted CMP. And then the final step is simulating the synchronization between the different cores. The simulator explained in this paper can be used to simulate any multithreaded application in a conventional system simulator and extend the evaluation to any homogenous multicore processor. I believe this framework is going to be used in many CMP-simulators in future.
UPDATED ON 02/02/2010: This might be a viable multicore processor simulator.