SOI Tax Stats - Corporation Source Book: Data File
Return to Tax Stats home page
Return to the Corporation Source Book page
A new format of the Corporation Source Book data is available beginning with 2004 data. The purpose of this file is to provide users a way to download the entire Source Book in one file, as well as to be in a ready to use format readable by most statistical software packages. Please read the information below for a description of these files.
|Download the compressed file|
Description of files contained in the compressed file:
There are six files in each compressed file. Each compressed file contains five Comma Separated Value (.csv) files that contain the source data available in the individual Excel files and one Excel (.xls) file which contains the documentation. Files sb1.csv and sb2.csv contain the aggregated data for all returns (‘with net income’ and ‘with and without net income’) and all sector, major, and minor industry codes (this represents tables 1 and 2 of the published data). Files sb3.csv, sb4.csv and sb5.csv contain the aggregated data for all 1120S returns (tables 3, 4 and 5) and all sector codes. The sixth file, which is an Excel file, contains documentation on variable definitions, NAICS industry code titles, asset class definitions, special data indicator definitions and changes to variables from the previous year. Please note: an extraction utility capable of decompressing .zip files is required. If an extraction utility is unavailable you may download a WinZip reader, or contact the Statistical Information Services for help.
The CSV (.csv) files contain only numeric data. Columns of data are separated by commas, and each row of data ends in a carriage return. The first observation (row) contains variable names. All variables (columns) are preceded with an indicator variable sharing the same name as the data variable but suffixed with "_IND." (Note: variables identifying the year, table number, industry code, and asset class do not have an indicator variable.) The value in the indicator describes the statistical reliability of the data in its associated variable, whether the variable's data value was rounded to zero, or whether changes have been made to the value for disclosure prevention. The description of these variables can be found in the documentation file.