THE METHODOLOGY
The ministry of corporate affairs (MCA) provides master data on companies registered on a monthly basis. The data was available from November 2013 till July 2015. It showed the address of the new companies, what type of activity they did (manufacturing or services) and how much capital was invested.
We matched the pincode in the companies’ address with the pincode directory available at Data.gov.in, an open data initiative by the government, to get the district name for each address.
To provide a context for the data on the number of companies formed, we computed it per 100,000 population. The district-wise population data was taken from Census 2011.
We faced a few problems though:
(a) The number of districts in the pincode directory and Census didn't match. Census has 640 districts, while the pincode directory had 631. For instance, Tiruppur is a district in the Census, but not in the pincode directory.
(b) The names in the two sources of data didn't match. Here's one example: Ahmedabad in the pincode directory, and Ahmadabad in Census. We faced such issues in at least 100 districts.
(c) Many of the pincodes given by the companies didn't exist in the pincode directory at all. We looked at each address for other details like district and area name to narrow it down to the pincode that is available.
Even after these three steps, we still had around 500 out of 132,000 companies whose address couldn't be matched with a Census district. As a percentage of total initial investment made by these companies, these 500-odd firms accounted for less than 1.5% of the total.
(Analysis done by John Samuel Raja and Avinash Celestine at How India Lives. Project coordinated by NS Ramnath)