SOI Tax Stats - Corporation Source Book: Data File |
| |
Return to Tax Stats home page
A new format of the Corporation Source Book data is available beginning with 2004. The purpose of this file is to provide users a way to download the entire Source Book in one file, as well as to be in a ready to use format readable by most statistical software packages. Please read the information below for a description of these files.
Description of files contained in the compressed file:
*Beginning with Year 2008 there are six files which are in the compressed file. Please note an extraction utility capable of decompressing .zip files is required. If an extraction utility is unavailable, please contact the Statistical Information Services for help. Five Comma Separated Value (.csv) files that contain the source data available in the individual Excel files and one Excel file which contains the documentation. Files 08sb1.csv and 08sb2.csv contain the aggregated data for all returns (‘with net income’ and ‘with and without net income’) and all sector, major, and minor industry codes (this represents tables 1 and 2 of the published data). Files 08sb3.csv, 08sb4.csv and 08sb5.csv contain the aggregated data for all 1120S returns (tables 3, 4 and 5) and all sector codes. The sixth file, which is an Excel file, contains documentation on variable definitions, NAICS industry code titles, asset class definitions, special data indicator definitions and changes to variables from the previous year.
Years 2007 and earlier: there are three files contained in the compressed file. Two Comma Separated Value (.csv) files contain the source data available in the individual Excel files. One .csv file contains the aggregated data for all returns (with net income as well as with and with out net income) and all sector, major, and minor industry codes. (This represents tables 1 and 2 of the published data.) The second .csv file contains the aggregated data for all 1120S returns (tables 3, 4 and 5) and all sector codes. The third file is an Excel file with documentation on variable definitions, NAICS industry code titles, asset class definitions, special data indicator definitions and changes to variables from the previous year.
The CSV (*.csv) files contain only numeric data. Columns of data are separated by commas, and each row of data ends in a carriage return. The first observation (row) contains variable names. All variables (columns) are preceded with an indicator variable sharing the same name as the data variable but suffixed with "_IND." (Note: variables identifying the year, table number, industry code, and asset class do not have an indicator variable.) The value in the indicator describes the statistical reliability of the data in its associated variable, whether the variable's data value was rounded to zero, or whether changes have been made to the value for disclosure prevention. The description of these variables can be found in the documentation file.
For additional information and various explanatory notes about the limitation of the data, etc., please check HERE.
|
|
|
Page Last Reviewed or Updated: January 23, 2012