Tuesday, June 30, 2009

Ride on the STBus - a brief look at the different protocols

System on Chip (SoC) communication architecture is a vital component in SoC that interconnects heterogeneous components and IP blocks and supplies a mechanism for data and control transfer. This is a significant parameter in the performance and power utilization of the chip, especially when the feature size is in nanometers. As a result, a given architecture must ensure to deliver an agreed QoS through arbitration mechanisms. Recently I took some interest in going through the VSIA (Virtual Socket Interface Alliance) standards for intra-chip communication and the closely aligned bus architecture of STMicroelectronics (STBus) to see how it fits in an automobile collision avoidance system (Blogger sucks. I cannot upload a PDF! Ridiculous!!).

The STBus IP connects an initiator and a target. The initiator (master) initiates the communication by sending a service request to the target and waits for a response. The target obtains the service requests, validates it, processes it and sends back the response. There are configuration registers to change the behavior of request and response handling. These configuration registers allow changing of bus behavior and adjusting the traffic depending on priority, bandwidth, etc.


STBus Block Diagram




There are three protocols in which STBus can operate: Type 1, Type 2, and Type 3 protocols. Type 1 is for a simple, low performance access to peripherals. Type 2 is a little more complex with support to pipelined architecture. Type 3 extends supports to asynchronous interactions and complex instructions with varied sizes. All these are implemented used a shared multiplexer or a crossbar multiplexer based architecture. After going through the characteristics of these protocols, it was easy for me to decide that Type 1 and Type 2 interface are suffice for my collision avoidance system.

Type 1 protocol is the best candidate for general purpose I/O with very minimal operations. It has a very simple handshake with each packet containing the type of transaction (request or response), position of the last cell, address of the operation and the related data. All peripherals are required to instantiate these mandatory signals.

Type 2 protocol has everything within the Type 1 added with pipelining ability, source labeling, prioritization, etc. Another important feature of Type 2 is that a transaction can be split into two parts: the request and the response. So once the initiator sends the request part it can do whatever other activities without waiting for the response, since response is like a separate transaction. This property certainly adds to the performance of the system. Pipelined transactions arise as a result of this transaction splitting. The Type 2 initiator can send consecutive transaction as in a pipeline. The most important point to note here is that the order of transaction has to be maintained with care. The response is expected in the same sequence in which the request is sent. As a result, the initiator-target pairing has to be maintained until all responses are received. Establishing contact with a new target is forbidden if the existing target has not responded for all service requests.

Type 3 protocol is a more advanced protocol that is prescribed only if both the interacting systems are intelligent enough to cope with the complex handshakes between initiator and target. Two main features of Type 3 protocol are shaped packets and out of order transaction management. Shaped packet allows a varied size of request and response packet to be transferred. This feature is certainly an improvement in bandwidth usage. The out of order transaction management feature allows transactions to be processed in any order. This is because in Type 3 protocol, each transaction is earmarked with a four byte transaction id.

In the collision avoidance system, I was talking about all the peripherals and external registers can be connected through Type 1 protocol. Type 2 is well positioned between my baseband demodulator and signal processor or FPGA. My baseband demodulation gives me complex baseband signal values for a 64 element array. For this array, I have to perform power spectrum estimation. My transaction size is going to be fixed and the initiator (baseband modem) is coupled with the target (DSP/FPGA). The spectral estimation is usually a complex process and may take more time than baseband demodulation. So Type 2 allows me to split the request and response and pipeline the flow to the processor. I may need to add a buffer at the target end and latency at initiator end so as to balance the difference in speed. In Type 2, since I cannot resend a message that is already delivered, I have to piggyback the acknowledgement to decide whether the initiator can send another asynchronous packet or go to a wait state.

Friday, June 26, 2009

A Quick Tour of the FORst

In an earlier post about VLSI routing, I promised a little wonkish post on Obstacle Avoiding Rectilinear Steiner Minimum Tree (OARSMT). This may be it, but still not so wonkish. In this post, I would try to explain FORst - one of the earlier methods (2004) to solve OARSMT problem.

OARSMT is a NP-Complete problem and so it cannot be solved in polynomial time. So VLSI designers use different heuristics to derive the solution for OARSMT or simply RSMT. The simplest of all the solutions is to derive a rectilinear minimum spanning tree and taking that as the approximation for RSMT. Theoretically, it is nearly a 50% approximation, i.e. wire length would be up to 50% more if we assume rectilinear minimum spanning tree to be RSMT, although in practice, it would be lot better.

In this paper, the authors have derived a 3-step heuristic algorithm as an optimal solution for OARSMT. The first step is to split the entire graph into several subgraphs depending on the location of the terminals. For this we have to construct a full steiner tree. Full steiner tree is the one in which all nodes in the graph are leaves. Hwang's theorem is used to construct such a tree. Once done, we just need to wipe out all edges that pass over one or more obstacles. Now the nodes in the subtrees that we get, constitutes the subgraph.

The second step is RSMT construction. It is apparent that the subgraphs that we constructed are free of obstacles. We can use any heuristic to construct the RSMTs. The original paper uses a combination of ant colony optimization (ACO) and greedy approximation methods to construct the RSMTs. To be more specific, ACO is used for smaller terminals for accuracy and for connecting them together in a larger terminal greedy method is used as it produces faster result.

The third step is to combine all the RSMTs. The nearest nodes of every RSMT is joined together with all adjacent subgraphs. The paper gives experimental evidence that even a large graph with several obstacles can be routed in a small time. A typical nanometer integrated circuit fabric would have thousands of nodes and hundreds of obstacles like power lines, IPs, etc. All those cases can be treated as large scale OARSMT problem and FORst is the right candidate if you are looking for a solution.

FORst is now nearly five years old and there have been several improvements and several improved heuristics resulting in more and more efficient implementations. However we just could not miss our FORst. It's like the 8088 processor, however outdated you would always learn before proceeding to the architecture of P4.

Sunday, June 21, 2009

Human Development Index - where the Indian tiger trails

In late eighties, India was in the verge of its self-annihilation. The mirth of license raj fettered India's arms to welcome private sector to do business in India. The government engine was at the peak of its inefficiency. Indian Rupee was inconvertible and high tariffs prevented cheap foreign goods and cutting edge foreign technologies from making inner roads in India. The country suffered from a sluggish growth rate and was unable to manage fiscal expenditures.

All these changed in early nineties, when India jumped the globalization bandwagon, creating a fair ground for Indian and global players to do business in India. As a result, the Indian economy registered a steady growth on par with most of the Asian nations.

With all these facts in mind, India's former President Dr. Abdul Kalam predicted India to become a developed nation in 2020. Not so fast. A country's growth and status is not just measured by the stock exchange values or GDP growth. It is also measured by social factors like life expectancy, infrastructural growth, literacy, gender equality, rich-poor ratio, poverty alleviation etc. While there is no doubt about the GDP growth of India, it's social factor is still abysmal.

One of the statistical factor to measure the growth of a country beyond its GDP in terms of different factors mentioned above, is the Human Development Index (HDI). HDI was developed in 1990, and accepted by the United Nations as the standard mean to measure the growth of a country in terms of several social factors along with per capita GDP. Development economists like Amartya Sen, Mahbub Ul Haq were instrumental in developing this index. The UN publishes the HDI and its yearly growth as a part of its annual human development report.


The HDI growth of India is extremely pathetic compared to other developing countries. The chart above gives a comparison of HDI growth from 1975 to 2005 for the BRIC countries. It can be observed that India is the only country in this group to be below the world average. Although globalization might have increased the per capita of the Indian middle class, it is not an absolute panacea to cure all the worries. The country and the states must ensure that it commits itself to the growth of the society as much as to that of the economy.


The chart above compares the growth of India in terms of GDP per capita and HDI. Obviously the GDP growth is an exponential curve. But the HDI has just moved nearly 1.5 times that of the value in 1975. With the way the HDI is measured, you could not expect an exponential curve, however the growth is not even close to somebody who anticipates to become a superpower. In a different context, the Harvard economist, Edward Glaeser has commented about India, "So why do governments that cannot manage the basics of public hygiene think that they can micro-manage an economy?".

In order to ensure the growth of the society, Indian central government has to publish the HDI growth rate of all states. The good performing states must be rewarded and the bad performing states need to be advised to tune their social reforms. All districts have to publish this figure and the district collector can be rated on this basis. During elections, the election commission along with the human resources ministry can publish this figure for each constituency. Let the voters know what their representative had been doing in the past phase. Unless we take some extremely focused measure, we could not become a superpower by 2020 - not when more than 34% of your country lives with less than $1 per day. After all as the Nobel laureate economist, Gary Becker says, the only purpose of economics is to understand and alleviate poverty.

Tuesday, June 16, 2009

VLSI Routing - the most interesting puzzle of the last decade

One of the most important task in physical design of VLSI circuits is routing. The speed and power conservation of the chip depends upon how the different components are routed using the shortest possible wiring. The speed at which an optimal or semi-optimal route is derived, is crucial for the rate of chip production in scale. To solve this problem, we have to divulge into Graph theory.


Now lets start with Minimum Spanning Tree. In a connected, undirected, weighted graph with several nodes, a spanning tree is defined as subgraph that connects all the vertices together. A graph can have many spanning trees. One of those spanning tree becomes a minimum spanning tree, if the sum of weights of all edges in that spanning tree is the least compared to the other spanning trees. So in a simple VLSI layout in which we need to connect all nodes, you can very well assert that the minimum spanning tree is the best way to route the different nodes. Minimum spanning tree problem is easy to solve using any of these algorithms.

But in practise, the layout routing is not always that easy. The complication comes when the total number of nodes in the graph is V and there are N nodes, such that N is a subset of V. In such a situation, we have an option to use the nodes that are in V, but not in N, provided it yields a better solution. The minimum spanning tree among the N used nodes is not necessarily the most optimal solution, because using the nodes that are not in N, we could actually create shorter routes. So we have to go for a steiner tree. A Steiner tree is a tree that connects the N nodes in a graph using extra nodes, in an attempt to reduce the total Euclidean length. Solving a Steiner tree problem is not so simple because the Steiner Tree problem is a NP-complete problem. Under certain circumstances the problem can be solved in polynomial time, but in general heuristics-based algorithm has to be used. To get more details about Steiner Tree problem, please refer to this book. A typical VLSI layout will have V nodes in which only N needs to be connected, although you can use all the V nodes to ensure shortest routing. So Steiner Tree Problem is more similar to VLSI routing than the spanning tree problem.

There is another practical factor that we overlooked in the previous paragraph. In VLSI routing, we cannot simply take any arbitrary route to connect two nodes. In other words, the shortest route between two nodes is not necessarily the line connecting them. This is because once the placement is over, the routing is allowed only through the rectilinear grid in between the cells. So the problem to solve is not really Steiner Tree, but a rectilinear Steiner Tree. So in a VLSI layout, given a set of terminal, the rectilinear Steiner minimum tree interconnects all the terminals through some additional nodes.

But as the size of the chip reduces, as we move from micrometer scale design to nanometer scale design, new problems crop up. The VLSI routing in a nanometer design has to consider various obstacles like power lines, pre-routed nets, jumpers, etc. As a result, an obstacle avoiding rectilinear steiner minimal tree (OARSMT) has become an important problem in recent years. The first practical approach of OARSMT was present in the name of FORst approach in 2004. Since then there has been a lot of interesting heuristics and approximations to solve this problem. In another wonkish post, I would explain FORst and recent developments to optimize OARSMT solution in another wonkish post.

Monday, June 15, 2009

Germany might loose two decades

A few days ago, I wrote an article on how Germany's current account surplus may help its economy. But a more careful analysis shows that its entire economy is grounded only on the exports and the domestic market is extremely week. This is not a good sign, if what they are looking for is a fast recovery. This is what happened with Japan in 90s. Here is Paul Krugman in an interview to The Guardian:
Germany has huge inadequacy of domestic demand. Their economic recovery in the first seven years of this decade rested on the emergence of gigantic current account surplus.

How is it possible that Germany, which did not have a house price bubble, is having a steeper GDP fall than anyone else in the major economies?

The answer is that they depended upon exporting to the bubble regions of Europe, so they actually got side-swiped by the loss of those exports worse than the bubble regions themselves got hit.

It's Germany on a global scale that is the concern. We worry about the drag on world demand from the global savings coming out of east Asia and the Middle East, but within Europe there's a European savings glut which is coming out of Germany. And it's much bigger relative to the size of the economy.

Germany can still get a better recovery, if it does not commit the mistakes of Japan, like inadequate fiscal stimulus, premature quantitative easing, etc. But Germany already did the first of the two specified mistake. And it is highly likely that it would do the second mistake in near future.

Bad news is, there is a big chance that Germany will loose two decades.

Palm Pre components and surprise factors

Last week, iSuppli posted a preliminary cost analysis of Palm Pre, the most touted iPhone-killer. iSuppli simply did an excellent job. The components used in Palm Pre shows that the design of the device is very much similar to that of the iPhone.


Source: iSuppli Corporation

It uses integrated circuits of TI, Samsung, Qualcomm, and Sony. As you can observe, the display is the costliest of all components. It costs around $40. In iPhone also, display is the costliest with an estimated cost of $60.

There are two surprising components in the design. The power management of Qualcomm baseband processor is done by Maxim's MAX8695 and not Qualcomm's PM6650, an obvious choice for power management with Qualcomm chipsets. The second surprise is the 8GB MoviNAND flash memory of Samsung, instead of the usual MLC NAND memory. The second choice seems to be justified by MoviNAND's high performance, faster development time(time-to-market factor), and easy integration with other components. While the first design choice still surprises me. I do believe the designers should have had a good reason to choose Maxim's power management component. I just cannot guess.

Perhaps a more difficult guess to make is whether Palm Pre, the Kryptonite would eventually throw out iPhone, the Superman.

Cash for clunkers - Does India need one such scheme?

The US congress has approved the "Cash for clunkers" bill. Under this bill, the government buys in the old gas guzzling vehicles (both cars and trucks) for a reasonable amount and gives the seller incentive to buy new environmentally-friendly vehicles. By environmentally-friendly vehicle I don't really mean a hybrid car, but any car with a better mileage. This bill is based on the proposal made by the Princeton economist Alan Blinder, an year ago in the New York Times Op-Ed.

According to Alan Blinder, this scheme brings out multiple advantages. The most important of them: a cleaner environment, as we can get rid of the old polluting vehicles easily; an effective stimulus for the auto manufacturing industries, since the people who sell the old car usually buy a new car with a better mileage; reduced dependence on fossil fuels.

This scheme has been criticized by the University of Chicago professor, Steven Levitt:
If any vehicles are going to qualify under this program, I suspect it will be because enterprising people who already plan to buy new cars will go out and buy old junkers on the used-car market and then trade them in under the program. But those transactions won’t represent incremental new car sales; it will just be a way for people who were already going to buy a car to rip off the government.

One thing will happen: entrepreneurs will play the role of the middleman, buying old beaters and then reselling them to people who are about to buy new cars, skimming off a little profit along the way.

But of course, every scheme and every bill would have exploiters and scope for abuse. During the implementation, the US government has to bring in restrictions to ensure that some greedy miscreants do not cash in on this.

Now for the bigger question. Does India need one such scheme? The problem is smaller and different in India compared to the US. There has been a huge increase in the number of cars only after 2000, so there are not many gas guzzling cars on Indian roads. The graphs below (Courtesy: Automobile India) show the sales of cars in units after 2000, until 2006-07.



In 2007-08, the domestic car sales increased by nearly 12%. In 2008-09 the growth has shrunk to 3.44%, in spite of the 54% increase in the car export. So even though the Indian car industry needs a stimulus to catch up with its old figures, it is still better off compared to the rest of the world. Moreover much of our environmental pollution is attributed to the bad roads, adulterated petrol and clogged up traffic rather than the cars themselves.

What about trucks in India? There are still a lot of old, polluting trucks in India (I don't have any statistics to back this up. Readers, help). I would also assume that cumulatively trucks in India travel more miles in a year than all cars in India put together. So India can still introduce a Cash-for-clunkers scheme to encourage the truck owners to refurbish their clunkers. It would certainly help the truck industry and the environment. Additionally if the truck owners replace their clunkers with a new truck having a better tonnage, it would speed up the goods movements within India. With proper regulation, if India introduces Cash-for-clunkers scheme for trucks, I am confident that it would be welcome by environmentalists, All-India Interstate lorry owners association, truck makers, and more importantly the citizens of India.

Saturday, June 13, 2009

A Verilog HDL library for fixed point-floating point conversion

Generally when people ask for my advice to learn Verilog HDL, I prescribe them this book, this book, this book or this book. But then, learning of a programming language does not complete without practise. So I would suggest that they write a synthesisable, vendor-neutral Verilog code for a simple, fixed point RISC microprocessor and test it by writing a testbench. Later on they can expand then processor to include pipelined processing.

But in recent days, the floating point processors are slowly creeping into the scene, displacing their fixed point counterpart, although the choice between them are extremely application specific.
So I ask the Verilog newbies to try out a floating point RISC processor, additionally. But why do they need to write a testbench separately for the floating point processor, if it is functionally similar to its fixed point counterpart, they wrote at the beginning? I tried searching around and found an extremely useful library (AFFHL).

Frankly the name of this library is not so cool, but its design and its working are impressive. I managed to convert the comprehensive testbench that I wrote for the fixed point processor to a testbench for floating point processor. This hardware library can also convert floating to fixed point, although I have not yet tried it. Simulink has toolbox for floating-to-fixed point conversion. But now a HDL library for that is simply wonderful.

Unfortunately I don't think VHDL has such a library (please correct me if I am wrong). In future you would see such a VHDL library based on AFFHL design in this page, provided the designers permit.

Friday, June 12, 2009

Analysis of current account deficit

One of the problems complicating this recession is that the US and UK ran a huge current account deficit, even before the recession. Current account deficit is roughly the difference between export and import, adjusted to foreign aid, interest payment, dividend payment, etc. So as the country starts spending more money to recover out of the recession, it will get into a fiscal deficit.

The empirical relationship between current account deficit and fiscal deficit is negative for countries that have accumulated huge debt, like the US, under Barro-Ricardo equivalence. So as the fiscal spending increases, it is normal to expect the US to reduce their current account deficit (53% deficit). But then, it would come in terms of reduction in import as more products are made in domestic rather than through export, unless of course as Krugman says the US finds a completely different planet to export its goods to.

I tried to search if there were any country in the past with such a huge current account deficit, accumulated debt, and in recession. It was Germany after the first world war. That is due to the war indemnities imposed on Germany.


So I tried to see how Germany does today (see the graph, above. Courtesy: Robbins Lecture 2009 PDF). WOW!!! More than $250 billion dollars of current account surplus, which is surprising when compared with the others in European union. Germany's surplus is more than 9% of their GDP. Only other country with a similar figure is China. In contrast, the US has a current account deficit that is 53% of their GDP. If some tout China to become epicenter of the world economy at the end of the recession, watch out. Even Germany has a chance. And personally I trust the figures reported by the Federal Republic of Germany over those reported by the People Republic of China.

But for all these Germany has to come out of the recession. German government was smart enough to maintain a good surplus balance of payment. But they don't seem to be doing a good job handle the recession.