Friday, July 31, 2009

Bokode - the barcode killer and much more

In SIGGRAPH 2009 conference, the MIT media labs is going to present a paper titled, Bokode: Imperceptible Visual Tags for Camera Based Interaction from a Distance. Here is the abstract of the paper:
We show a new camera based interaction solution where an ordinary camera can detect small optical tags from a relatively large distance. Current optical tags, such as barcodes, must be read within a short range and the codes occupy valuable physical space on products. We present a new low-cost optical design so that the tags can be shrunk to 3mm visible diameter, and unmodified ordinary cameras several meters away can be set up to decode the identity plus the relative distance and angle. The design exploits the bokeh effect of ordinary cameras lenses, which maps rays exiting from an out of focus scene point into a disk like blur on the camera sensor. This bokeh-code or Bokode is a barcode design with a simple lenslet over the pattern. We show that an off-the-shelf camera can capture Bokode features of 2.5 microns from a distance of over 4 meters. We use intelligent binary coding to estimate the relative distance and angle to the camera, and show potential for applications in augmented reality and motion capture. We analyze the constraints and performance of the optical system, and discuss several plausible application scenarios.
I did not read the full paper, but those interested can find it here(5.5 MB) and here is the news release. This paper might change the way in which the future looks. This Bokode is a kind of barcode design that could be detected by ordinary camera. The most obvious usage of this is in retail industry in place of barcode. Bokode has several advantages that the ordinary barcode lacks. It can be read by an out of focus cellphone camera. It has usage in machine vision and in a normal scenario to identify the locations of different objects in a plane, their positions and angles. I am not an imaging expert, but I think that Bokode may be the starting point for many new imaging devices. MIT has also released some cool sketches with projected future scenarios.

Tuesday, July 28, 2009

ARM-wrestling with Intel

ARM Cortex A8 is finally going to run in GHz speed, delivering more than 2000 Mips. So your Netbooks and iPhones may just be faster. If your response is Intel's Atom is already beyond GHz mark, here is the best part of the news: ARM Cortex A8 does all these while consuming just 640 mW power and can run at a minimum supply of 1 volt. Currently iPhone 3G runs at 600 MHz powered by ARM Cortex A8 processor. Both Intel and ARM knows that netbooks and smartphones are the computers of tomorrow, as PC was back in eighties.

So both the companies are gearing up from opposite directions to capture the market. Intel's x86-based Atom runs at 2 GHz, but the problem is that it's like the gas-guzzlers of GM. People would not go for a PDA or netbook that consumes battery at fast rate. Intel has speed but the problem is with the power consumption which it is working towards. It has already announced the release of Metfield, a 32 nm Atom that hits market in 2010. Smaller size chip with low power consumption. The best fit to compete with ARM. Intel's Atom codenamed Metfield is already reported by CNET as the smartphone chip of the 2011. The figure (Courtesy: Intel/CNET) shows the strategy of Intel.

As with ARM is concerned, market presence is its huge advantage. Almost all the latest handheld gadgets have ARM inside. ARM developers have more experience in embedded systems and so poised to develop low power processors. Currently they are up to the task of speeding up the processor to meet the x86 standard. Both ARM and x86 are superscalar architecture. I think both of them use AMBA interconnects. Starting from ARMv5TE (introduced in 1999), they have a DSP instruction set extension, which Atom also has. But the similarities end here. Cortex architecture is strikingly different from the x86 architecture. This fall, Texas Instruments is going to sample on OMAP4 with two parallel Cortex A9 cores in place of a single Atom core. There are already plans to introduce a quad-core Cortex A9 (see figure, Courtesy ARM/CNET), which would certainly pose a stiffer competition to Metfield.

Friday, July 24, 2009

Recession and the broken Okun's law

In economics, Okun's Law states the relationship between unemployment and GDP gap. It states that for every 1% increase in the unemployment, the GDP will decrease by 2% of its potential GDP. This is an emphirical based on observation rather than theory. So conversely it looks reasonable to assume that for every 2% reduction in GDP gap, a 1% reduction in unemployment can be expected.

But the problem is Okun's law does not always work and especially it is broken in recessions. As a result, we may have a GDP recovery ending the recession and even starting the growth, but the unemployment rate would remain as such in the verge of 10%. It's a bad news, but unfortunately some of the liberal economists that I respect (Bradford DeLong, Nouriel Roubini a.k.a. Dr. Doom) suggest so.

UPDATED ON 1st AUGUST: Paul Krugman predicts jobless recovery with recent GDP figures.
UPDATED FOR CORRECTION: Actually Paul Krugman says Okun's law behaves itself now. My misunderstanding. But the question is, would it during GDP recovery? I don't think so.

Thursday, July 23, 2009

Standing on Flights

Those who have been to India might have noticed that in India, people are allowed to stand in the bus while travelling. I have also noticed that in New York subway trains. What about standing on short flights? The recent survey by Ryanair suggests that 60% of the passengers would do so when the ticket is for free. 42% would do so for half-charged tickets. The airline also considers replacing the normal seats by vertical seats that can be found in roller-coasters in amusement parks. Soon we can expect the IT employees going onsite to stand all the way, as the standing-tickets would come cheap.

Weighing Scale for Molecules

A group of physicists headed by Dr. Michael Roukes have developed a nanoelectromechanical system (NEMS) based mass spectrometer, which can measure the mass of things as small as a single molecule. The professor and his group has been working on this for the past 10 years in Caltech's Kavli Nanoscience Institute.

In laymen terms, it is a NEMS resonator which keeps oscillating at a particular frequency. You drop a molecule on it, now because of the force excerted by this molecule, the standing wave frequency of the resonator changes. The change in the frequency is mapped to the mass of the molecule. Well, as you could imagine, the frequency could change depending upon where on the resonator the molecule has fallen. In order to avoid this, the molecule has to be dropped several times and the arithmetic mean of all measurement would give the mass. Here is the complete press release. [Image Credit: Caltech/Akshay Naik, Selim Hanay]

Now a curious question. Do they have a nanometric tuning fork to measure the change in standing wave frequency of this nanometric resonator?

Tuesday, July 21, 2009

Linux boots in one second

Montavista has recently created a record for boot time of embedded Linux. They booted it in one second! This significant achievement is made on Freescale MPC5121E RISC processor. Please look at the demo in the Freescale technology forum. This is undeniably a great achievement.
The application requirements demanded visual feedback of critical real-time data in one second or less from cold power-on. These performance improvements were achieved through a combination of careful tuning of the entire software stack and a highly optimized kernel.
It would be great if the Montavista team publishes a whitepaper on what the team did to achieve this performance. The usual techniques for speeding up boot loading includes Kernal Execute-In-Place, which involves executing the Kernel from Flash memory, and copying Kernal using DMA. Many times, just by increasing the decompression speed by fast decompressors like UCL would do the job. It may not be a surprize if Montavista uses all these together, along with something special. Lets wait and hope that they throw more light on their careful tuning and Kernel optimization to acheive this.

Monday, July 20, 2009

India Microprocessor - Sensibility with no Sense

The top scientists in India are going to convene together in a project to make India Microprocessor. Network security has become a hot topic in many country's security meets thanks to Chinese hackers. Chinese hackers have broken into several governmental networks over the past few years. They managed to take down Russian consulate website and made nearly 70000 attempts on a single day to penetrate into NYPD network.
“History has shown that the need for defence[sic] security has sparked a chip industry in most nations,” she [Poornima Shenoy, the president of Indian Semiconductor Association] said.
Unlike the US and China, India still does not have chip-making technology, and Zerone seeks to change that.
Ms. Shenoy is true about the history. Not just microprocessor, about everything that we use now from Internet to mobile phone has been the output of the need for defense security. India is fundamentally afraid that it might be denied the microprocessor technology at one point of time. India has not produced any evidence for that suspicion. It might be confidential. Are we going to develop our own mobile phones, going by the same logic?

The entire story would make sense, if it is some nascent technology like nanotechnology based carbon chips or biological influence in IC manufacturing. Microprocessor technology is something that is place for the past 3-4 decades. The SPARC RISC processor architecture that they are planning to follow came in 1986. I am not taking anything away from the SPARC architecture. The point is just that it is not what can be called cutting-edge. This raises a lot of questions.

Why should suddenly a country decide to invest on developing a new processor? Instead it can better encourage the companies in India to make them. What happens when the processor technology changes or the processor proves to be inefficient in SPEC benchmark? Those who know history would be aware that there are many failure stories in processor technologies. Is India going to build its own fab to fabricate this processor? Actually building a fab just to fabricate a single kind of chip is a big waste of money. It may take even up to 40 years to recover that money. What if the processor technology falls behind in the test of time? Is India planning to build subsequent versions? That would mean keeping a permanent research team in payroll. Just convening a set of passionate designers is different from keeping them together permanently. In 1994, Intel's Pentium processor had a small bug with its FPU when floating point division was performed. Intel incurred a huge loss to fix the bug and replace all the sold processors. Would India Inc. do the same?

The argument may sound like questioning the country's ability to make microprocessors. I do not think that all government run science projects are inefficient. I rather feel that it should channel their effort in exploring new cryptographic algorithms, if it wants to improve network security and in socially progressive technologies like clean energy for instance. Building general purpose microprocessors in India can better be left out to private firms, by just providing them necessary facilities. And Intel did develop the first Made-in-India microprocessor.
Unless India has its own microprocessor, we can never ensure that networks (that require microprocessors) such as telecom, Army WAN, and microprocessors used in BARC, ISRO, in aircraft such as Tejas, battle tanks and radars are not compromised,” the document points out.
The entire argument is about India not investing directly in the making of general-purpose processor - the one that we use in PCs and game consoles. I think Tejas battletank and RADARs would be using application-specific embedded processors, microcontrollers, and digital signal processors. I don't think and I would not prescribe using general purpose microprocessors for battletanks. Making of these chips indigenously is a completely different ball game and it would make sense if India invests on this directly.

India directly putting bucks on general-purpose processor architecture might make bold headlines. TV channels might insist that Indian citizens should feel proud of that achievement. What may be the most sensible news for media, makes no sense for me as an engineer.

Thursday, July 16, 2009

Apple, is thy pseudonym Stimulus?

A little strange, but a good news! Integrated circuits sales has raised worldwide by 16% last quarter. This is the biggest quarter-on-quarter growth since the second quarter of 1984. More particularly DSP unit shipment has increased by 40% QoQ. One probable reason that I can think off is, in the last one quarter both Apple iPhone 3G(S) and Palm Pre were introduced. And with the reduced price line, both have been selling like hot cakes. On the hindsight, in January of 1984 Apple Macintosh was introduced. And it sold like hot cake. A probable reason for the previous growth record. Apple, is thy pseudonym Stimulus?

Wednesday, July 15, 2009

Intel x86 Processors – CISC or RISC? Or both??

The argument between CISC architecture and RISC architecture is longstanding. For compiler designers, RISC is a little burden since the same C code will translate to nearly five times more lines of RISC assembly code compared to x86 assembly code. But from pure academic point of view, it is easy to see that RISC wins the argument because of several of its advantages. RISC instruction set is very small, for which it is easy to optimize the hardware. Simple instructions running in a single clock cycle is a typical characteristic of RISC that permits aggressive pipelined parallelism. RISC invests more area on registers (using a technique called register windowing), allowing easy out-of-order execution. OOO and pipelining are possible in CISC, but a little clumsy.

One reason that RISC cannot win despite all these advantages is Intel. Microsoft too is one of the major reasons because during the PC revolution, Win 95 had no support for RISC processors. But Intel with its CISC based x86 based architecture blocked all the avenues in general purpose computing for RISC processors. RISC has a good presence in embedded processing however, because of its low power, high real-time, small area advantages.

Two years ago I tried to investigate why Intel did not change its x86 core to a RISC. The findings were astounding, but then I did not have time to write it down in a blog like this. Better late than never. After the success with CISC based CPUs, in 1990 Intel entered the RISC zone with the introduction of i960. The i960 architecture however mainly targeted the embedded systems domain and not the general purpose computer understandably due to the lack of software support.

In general computing domain, Intel Pentium employed two staged pipeline for its IA-32 instructions. The presence of variable length instructions obligated an inherent sequential execution because every execution cycle involved identifying the length of the instruction. As a result, new instruction can begin anywhere with the set of instructions that the processor fetches. As the world was moving towards parallel programming, the only advantage that CISC enjoyed was the software support which might die down soon.

Sometimes when you think that you know where things are heading, there will be a ground breaking invention that would change the entire scenario. One such seminal invental in the form of the introduction of high performance substrate (HPS) by the famous microarchitecture guru, Yale Patt. Although I am tempted to explain HPS in detail, I would rather consider it to be out of the scope of this blogpost. A very simple (not necessarily accurate) description would be that Patt succeeded in converting the CISC instruction to multiple RISC-like instructions or micro-ops.

Intel demonstrated its fast finger by implementing this in its P6 architecture. As any successful, innovative company, Intel is always good at adapting to the new wave. It did it by jumping from its memory business to microprocessor back in eighties and now it did it again by using HPS. Intel’s first IA-32-to-micro-op decoder featured in Pentium Pro. P6 architecture contained three parallel decoders to simultaneously decode the CISC instructions to micro-ops resulting in a deeply pipelined execution (see figure). Sometimes this instruction decoding hardware can become extremely complex. But as the feature size reduced at very fast rate, Intel did not face any significant performance issue with this approach.

Now we are into the post-RISC era, where processors have the advantages of both RISC and CISC architecture. The gap between RISC and CISC has blurred significantly, thanks to the scale of integration possible today and the increased importance of parallelism. Trying to jot down the difference between the two is no longer relevant. Intel’s Pentium Core 2 Duo processor can execute more than one CISC instruction per clock cycle due to increased processing speed. This speed advantage would enable CISC instructions to be pipelined. On the other hand, RISC instructions are also becoming complex (CISC-like) to take advantage of increased processing speed. RISC processors also use complicated hardware for superscalar execution. So at present, classifying a processor as RISC or CISC is almost impossible, because their instructions sets all look similar.

Intel remained in the CISC even when the whole world went towards RISC and it enjoyed the advantage of software support. When the situation started favoring RISC in the advent of parallel processing, Intel used micro-op convertors to exploit the pipelining advantages of RISC. The current Intel processors have a highly advanced micro-op generator and an intricate hardware to execute complex instructions in a single cycle – a powerful CISC-RISC combination.

Monday, July 13, 2009

Atmel's battery authentication IC - a reality check

Atmel introduces a cryptographic battery authentication IC - AT88SA100S in a attempt to curb the market of counterfeit batteries that has all sorts of problems that would tarnish the brand value of original equipment manufacturer (OEM). I think the idea is more like digitally signing the battery. OEM would have a signature that the quacks cannot forge.
The AT88SA100S CrytpoAuthentication™ IC is the only battery authentication IC that uses a SHA-256 cryptographic engine...
SHA-256! Excellent hashing algorithm. Developed and recommended by NSA itself. I don't think there are many commercial hardware implementation of SHA-256. SHA-2 style hashing like SHA-256 requires many registers and gates compared to SHA-1 implentations. As a result, the die size, the critical path, and the operational frequency all increase. Frequency specifications are not given in this press release. But here comes the most important claim:
...a SHA-256 cryptographic engine and a 256-bit key that cannot be cracked using brute force methods.
Now that's interesting. I am not a professional cryptanalyst or a professor of mathematics. But what I know is that any N-bit hash function can be cracked through brute force with atmost 2N trials - in this case 2256 trials. A collision attack can be done in a 2N/2 trials - in this case 2128 trials. In a 256-bit key hash function, a 50% probability of random collision can be obtained through birthday attack with 4 x 1038 attempts. Well, these are quite large number of trials that may take years of computational time. Still you cannot categorically deny that brute force is impossible. May be in their implementation, brute force would not be allowed. Something like wait for 3 unsuccessful attempts and then self-destruct. Such a scheme would not pass. I don't want my iPhone to be broken, just because I tried putting in a phony battery. I need more detail.
The 256-bit key is stored in the on-chip SRAM at the battery manufacturer’s site and is powered by the battery pack itself. Physical attacks to retrieve the key are very difficult to effect because removing the CryptoAuthentication chip from the battery erases the SRAM memory, rendering the chip useless.

Challenge/response Authentication. Battery authentication is based on a "challenge/response" protocol between the microcontroller in the portable end-product (host) and the CrytpoAuthentication IC in the battery (client).
The first point makes a lot of sense. The key is the SRAM powered by the battery itself. You pull the SRAM, the CMOS SRAM cells would loose power and thus memory. Second point is that it uses challenge/response authentication. It is more like the UNIX password authentication - user supplies password, it is hashed and the hash is compared with the stored hash in the UNIX server. In this case, I think battery would supply its hash to the device. The device must have a table of possible battery manufacturer ids and their hashes. How secure is this table?

Security of a system lies in its overall implementation rather than the strength of the cryptographic algorithm it uses for communication. The algorithm itself is just a part of it and not all of it.

Overall, Atmel has done a good job. Soon we can expect electronic devices to support only the authentic batteries that do not leak and spoil your device itself. Soon we can say Auf Wiedersehen to counterfeit batteries and their makers.

Mukherjee at Minsky moment - a clarification

A few friends of mine showed caustic reaction through phone calls to my previous blog by blaming me for not understanding the importance of reducing fiscal deficit. One of them was a professional economist working in a small south Indian university.

Let me clarify my position. In normal period, when foreign banks are not closing at a fast rate affecting jobs all around the world, I am totally in for reducing the fiscal deficit and if required even build up a fiscal surplus. But during the current crisis, I would advise not to go for fiscal retrenchment. The problem in India is not the fiscal deficit as such, but its distribution and how it is funded. Here is the take of Roubini's Global Economonitor on India's situation:
However, the annual growth rate for Community, Social and personal services has remarkably increased to 13.1% in 2008-09 as compared to 6.8% in 2007-08 reflecting the impact of increased expenditures by the Government through financing schemes like NREGS. It is important to notice that such expenditures have not only increased the fiscal deficit beyond the estimated budget for 2009-10, but only 9% of the Indian workforce engaged in Community, Social, and Personal services expected to be benefited through it. Thus the excess flow of subsidized bank credits to GoI for financing the budget deficit is ultimately restraining the economic growth.
Herbert Hoover tried fiscal retrenchment during a downturn that was one of the prime reasons for the Great Depression. FDR's fiscal retrenchment was the reason for a double-dip depression. During both these times, many economists and Wall Street welcomed the move. But in hindsight, it turned out to be an incorrigibly bad choice. Here is the President's chief economic advisor talking about the lessons from the Great Depression. I am just happy that India is not repeating that mistake.

Sunday, July 12, 2009

Mukherjee at Minsky Moment

In India, Mr. Pranab Mukherjee presented his first budget of this term on July 6th. The budget speech was criticized heavily by economic luminaries. Sensex responded by falling down by 870 points. Pranab Mukherjee has gone into a spending spree on infrastructure and building social security safety net. Dr. Jayaprakash Narayan called it a lackluster budget with fiscal deficit crossing Rs. 10 lakh crores. While I agree with Dr. Narayan’s point on reduced allocation to healthcare, I beg to disagree with his view on country’s deficit control.

I accept that India is in the middle of a huge debt. But let us not blame Mr. Mukherjee for this. We have been running huge fiscal deficit for the past several years. Currently we are in the middle of a great recession. This is not the right time for deficit control. Mukherjee did the right job, a Keynesian economist’s job, by increasing the fiscal spending to control the impact of the downturn. While fiscal deficit is something that we have to control in long run. At the middle of Minsky moment, Pranab Mukherjee has done a decent job by not minding it this time.

However once the recession is over, the finance minister (whoever will be, at that time) must ensure to reduce the fiscal deficit in the same Keynesian style with which they increased fiscal spending this year. Don’t care about the reaction of Sensex; don’t care about Narayan’s comment. You have done a good job at the bad time, Mr. Mukherjee! Fiscal deficit control? Previous finance ministers should have done it. Future finance minister should do it. Not this finance minister.

Wednesday, July 08, 2009

Connection Machines – Prelude to Parallel Processing

The computer architecture entered into a new phase with the stored program concept and programmable and general purpose computing architecture. The credit for this development goes to John Von Neumann, Grace Hopper, and Howard Aiken. Later it was relatively easy to build microprocessors and computers like ENIAC, since the general computing architecture was well established.

However there was a problem with this primordial architecture. Unlike human intelligence, it was massively relying on a single powerful processor that operated on the stored program in a sequential order. The first computer to depart from this view and behave more similar to human brain was Connection Machine(CM).

In early eighties, Danny Hillis a graduate student in MIT Artificial Intelligence Lab designed a highly parallel supercomputer that incorporated 65,536 processors. This design was commercially manufactured by Thinking Machines Corporation(TMC) created by Danny Hillis under the name CM-1. Thousands of processors which formed a part of CM-1 were extremely simple one bit processor connected together in a complex 20-dimensional hypercube. The routing mechanism between several processors in CM-1 was designed by Nobel Laureate Richard Feynman himself. In 1985, CM-1 was a dream SIMD machine for labs working on artificial intelligence(AI). But it had some practical problems. First it was too expensive a machinery to be purchased by budding AI labs. Second it did not have a FORTRAN compiler, which was the famous programming language among scientists at that time. Third it did not have floating point processing mechanism, a must for scientific analysis. So CM-1, although was a parallel processing marvel, looked more immature to face the market. Realizing the mistakes done in the design of CM-1, Thinking Machine released CM-2 which had a floating point processor and FORTRAN compiler. But still it did not fly. Evidently Danny Hillis was making a machine for future that present had no use with.

In early nineties, Thinking Machine introduced CM-5, which featured in the control station of Steven Speilburg’s Jurassic Park. It is considered to be not only a technological marvel, but also a totally sexy supercomputer (see figure). Instead of simple processors, CM-5 had a cluster of powerful SPARC processors. They also came out of the hypercube concept and built their data network as a binary fat tree. CM-5 is a synchronized MIMD machine combined with some best aspects of SIMD. The system can support up to 16,384 SPARC processors. The processing nodes and data networks interacted through 1-micron standard cell CMOS network interfaces with clock synchronization. The raw bandwidth for each processing node was 40 MBPS. But in a fat tree as it goes up the level, the cumulative bandwidth may reach up to several GBPS (if all these remind you of Beowulf, you are not alone!). TMC guaranteed that CM-5 was completely free of fetch-deadlock problem that occurs between multiple processors (using this).

Although it looks like a great architecture, from pure technical standpoint it is evident that TMC has toned down its idea of a complete parallel machine, simply because in eighties it did not sell partly due to the lack of market readiness and partly due to some blows in design. Secondly the failure of earlier CM series took a toll on TMC’s strategy. When CM-5 was introduced, the future of the company was dependent on that supercomputer sale. Los Alamos National Laboratory bought one. Jurassic Park set bought one. I cannot think of any other major customer. If CM-5 had survived, it would have had to fight with the likes of Intel Paragon, and Beowulf cluster for market space.

After the cold war, DARPA has cut down its funding on high performance computing which fell as a final blow on Thinking Machines. One fine morning in 1994, TMC filed Chapter 11 bankruptcy protection. The Inc. gives an alternative explanation for the failure of Thinking Machine. It is a good insight, but I cannot buy in its opinion about Danny Hillis. Perhaps The Inc. should restrict itself to its primary aim of advising budding entrepreneurs and refrain from measuring scientific minds. Nobody can deny that Danny Hillis was a genius, but the problem was that he was an out of the world freak who could not become a good businessman. Currently he is working on an ambitious project to build a monumental, mechanical clock that would run for multi-millennium.

The way that Thinking Machine took, was certainly not the way to build a successful enterprise. But the initial architecture that they introduced in early eighties was truly a stroke of a genius that every computer architect must study and understand.

Tuesday, July 07, 2009

Robot Democracy

The previous blogpost has stimulated some of the philosophical gyri of my brain.

After long, inhuman experimentations, we finally figured out that democracy is the best for to govern humans. But we do not give the democratic rights to other organisms like cats and dogs or to machines working in the assembly lines of Ford Motors. It is not surprising because the humans are far more superior to other organisms and machines in terms of intelligence, general awareness, creativity, and common sense.

After thousands of years in a science fictional world, let us assume that robots also become more and more intelligent - not just in terms of computational power, but also in terms of general awareness, creativity, etc. They become smart enough to understand that there is no such thing as free lubricant oil, and so assume the roles of professors and surgeons to establish themselves in the society. In such a world, will this civilized society extend the democratic rights to Artificial Intelligence? Or should these robots still have to toil under despotism? If a humanoid becomes visibly smarter than the dumbest human with democratic rights, could that humanoid be considered for promotion on the basis of its artificial intelligence? The answers to these absolutely crazy questions would determine whether there would be robotic terrorism and terminator-style man vs. machine war in the future.

So why should not I sit down and write a science fiction about a robot which becomes a lawyer, fights for robot rights in a Gandhian way? In every science fiction, robots fight humans to control the world. This time, let it fight for its rights and free will through Ahimsa.

Future belongs to carbon based lifeforms

Many science fiction novels and movies are based on either robots ruling mankind in the distant future or a warfare between humans and robots. Going by the recent technological advancements, one thing seems clear like a beacon. The future world is going to be ruled by carbon-based lifeforms; either one form or the other.

Sunday, July 05, 2009

Hardware / Software Partitioning Decision

Most important part of Hardware / Software partitioning scheme is to determine which part of software needs to be moved to FPGA. This problem becomes complex in a system with multiple applications running at a time. Kalavade et al has given a set of thumb rules to decide whether a given node can be moved to hardware or not. Here they are:
  • Repetition of a node: How many times a given type of node occurs across all applications? More this number, better to implement this in hardware.
  • Performance-area ratio of a node: What is the performance gain, if a given node is implemented in hardware, in terms of area penalty in the implementation? Higher the ratio, better to move to hardware.
  • Urgency of the node: How many times goes the given node appear in the critical path of applications? More this number, better would be the overall performance if this is moved to the hardware
  • Concurrency of the node: How many concurrent instances of the given node can potentially run at a time (on an average)? Hardware is always good at doing things in parallel.
By considering these four factors, we can decide the multiple nodes that would qualify for hardware implementation.

Saturday, July 04, 2009

Numeric definite integral in FPGA

Today when I looked at the comp.arch.fpga USENET feed, I saw an interesting question: how to perform integration in FPGA? Well, most of the time the answer would be, since FPGA is not handling continuous function, summation would be the approximation for integration. From the first glance, this may look like a right answer. But when we look at the purpose of integration, we find that it is too bad an approximation.

The purpose of integration is usually to find the area under a curve. If that is the case, to evaluate it numerically we have to go for adaptive quadrature algorithms, like Simpson's quadrature, Lobatto's quadrature, Gauss-Konrad quadrature, etc. If you are using MATLAB, you should be familier with the quad functions which has implemented these adaptive quadrature algorithms.

In an FPGA however, we never get input as functions, rather they are outputs of functions. For example, you would never get something like f(x) = (sin(x) + 20) for all 0<=x<=pi. Instead it would be the values of f(x) for each discrete value of x. In that case, the integration can be approximated as sum of area of triangle formed between the adjacent different values and the rectable formed by the lowest of the adjacent values with x-axis (y=0). And this has to be done for each value of x (usually comes with each CLK) and summed up. Once we get the result, we can multiply the result with the difference in x-axis between samples. Usually in an FPGA, the x-axis is CLK and so we have to use CLK period as the scale.

So at each CLK (somebody please tell me how to use LaTeX with blogger):
int(n) = int(n-1) + diff(f(n), f(n-1)) >> 1 + min(f(n), f(n-1))
The function is not as difficult to implement as it looks. The diff(f(n), f(n-1)) >> 1 gives the area of the triangle and min(f(n), f(n-1)) gives the area of the rectangle. "int" is the integration or the area under the curve at that limit. The function works if the CLK frequency is 1 Hz. For the other frequencies, the final integration function just needs to be multiplied by the CLK period.

For an ever-increasing function [f(n) >= f(n-1) for 0<=n<=N], the formula would reduce down to:
int(n) = int(n-1) + (f(n) - f(n-1)) >> 1 + f(n-1)
This works neat, but it is up to the person who implements, to determine whether they need summation or finding area.

Friday, July 03, 2009

USB 3.0 – knowns and unknowns

One of the biggest technology news that I see in journals often is about the advent of USB 3.0. Of course, not up to the hype of iPhone, but certainly all major news feeds that I am subscribed to cover this. Linux has come up with a driver for USB 3.0. Windows 7 will have USB 3.0 support. All these could help you in 2010 with transferring high speed HD videos and large databases.

One year ago, when my Debian implementation had some device drivers accidentally messed up, I sat down to write the parallel port and USB device driver myself. Parallel port worked great, but when I was in the file_operation: read part of USB driver, we had a power fluctuation at home crashing my computer. But that operation gave me a chance for understanding the USB architecture.

When USB 3.0 specifications released, I wanted to know how they claim it would be backward compatible. It was all very simple. USB 3.0 has the same bus structure of USB 2.0 with a USB 3.0 superspeed structure added to it in parallel. The baseline topology is the same as USB 2.0 – a tiered star topology. No wonder, it is backward compatible. In the PHY layer, USB 3.0 contains 8 wires instead of four in the previous versions. Out of them, four are the usual USB 2.0 wires, the other four are two superspeed transmitter wires and two superspeed receiver wires. So in physical layer, USB 3.0 is nothing but USB 2.0 tied up with a speedier architecture in the entire PHY layer, with its dedicated spread spectrum clocking (which is known for reducing EMI). This layer also provides a shift register based scrambling. Usually when you design and develop peripherals for USB 3.0, I would advise you turn it off, do your unit testing, and then enable scrambling back (just the same as PCIe development). Otherwise you would have a tough time validating the results.

One of the important features in USB 3.0 is its power management architecture. The power management is done at three loosely coupled levels: localized link power management, USB device power management, USB function power management. You can enable a remote wake feature and wake up any device from remote.

One thing, that is completely in the cloud in the USB chip architecture (proprietary design). I did not find any document in Internet about that. It is a completely mystery how they provide that super speed. USB 3.0 HCI specification does not seem to be completely open yet. From what we know, USB 3.0 looks more like PCI-SIG controller, more specifically express 2 architecture (5 GBPS) packaged differently, although Intel denies it. Both of them use the same encoding scheme, shift register based scrambling, spread spectrum clocking etc. So if you know PCIe 2.0 architecture, then you know more than 50% of USB 3.0. Having said all these, there are a lot of things still hidden. And many of the melodies are still hidden. As Keats says,
Heard melodies are sweet, but those unheard are sweeter.
Let’s wait with all ears to listen to the unheard, whenever it gets loud enough.