SOI Tax Stats - Papers - 2007 American Statistical Association Conference


This section contains selected papers given at the 2007 annual meetings of the American Statistical Association conference. These papers were written by members of the Statistics of Income Division of IRS, and others.

The views expressed here are those of the authors and are not necessarily the official positions of the Internal Revenue Service.

All papers are available as PDF files. A free Adobe Acrobat Reader is available for download, if needed.


Measuring Disclosure Risk and an Examination of the Possibilities of Using Synthetic Data in the Individual Income Tax Return Public Use FilePDF
Sonya Vartivarian, John L. Czajka, and Michael Weber
Each year, the Statistics of Income (SOI) Division of the Internal Revenue Service (IRS) draws a sample of individual and sole proprietorship tax returns, abstracts and edits a large number of data items, and prepares a microdatabase that the Treasury Department and the Congress use for tax policy analysis.

Measuring the Quality of Service to Taxpayers in Volunteer SitesPDF
Kevin Cecco, Ronald Walsh, and Rachael Hooker
There are nearly 12,000 locations nationwide where low-income, elderly, and military taxpayers can receive assistance in satisfying their tax responsibilities from non-Internal Revenue Service (IRS) volunteers. The Stakeholder Partnerships, Education and Communication (SPEC) organization of the IRS is responsible for building and maintaining partnerships with the stakeholders in local communities who oversee these locations. A key to maintaining these relationships is SPEC's ability to measure the overall quality of service provided by the volunteers who staff these locations. The Statistics of Income (SOI) Division of the IRS provides general statistical consulting services to various internal IRS customers. This paper will detail SOI's attempt to find a viable solution to SPEC's quality needs while balancing logistical issues, available resources, and customer expectations.

Evaluating Alternative One-Sided Coverage Intervals for an Extreme Binomial ProportionPDF
Yan K. Liu and Phillip S. Kott
The interval estimation of a binomial proportion is difficult, especially when the proportion is extreme (very small or very large) compared to the sample size. Most of the methods proposed in the literature implicitly assume simple random sampling. These interval-estimation methods are not immediately applicable to data derived from a complex sample design. Some recent papers have addressed this problem, proposing modifications for complex samples. Matters are further complicated when a one-sided coverage interval is desired. This paper provides an extensive review of methods for constructing coverage intervals of a binomial proportion under both simple random and complex sample designs. It also evaluates the empirical performances of different methods for constructing one-sided coverage intervals for an extreme proportion under stratified simple random sampling.

SOI Develops Better Survey Questions Through PretestingPDF
Diane Milleville and Tara Wells
Recently, the Statistics of Income (SOI) Division of the Internal Revenue Service helped a customer develop a survey. Without any prior knowledge of the survey's topic, SOI found it difficult to write "good" survey questions. Through the use of pretesting, SOI gradually became more familiar with the topic, determined how to phrase the questions, and understood which questions to include in the survey. Using cognitive interviewing, along with an Intranet application, SOI was able to obtain feedback from a small subset of the survey population. The survey content evolved into a set of well-developed questions that were easily understood by the participants. Through continued research of survey question development, SOI will benefit in future survey projects through the use of pretesting.

Using the Statistics of Income Division's Sample Data To Reduce Measurement and Processing Error in Small-Area Estimates Produced from Administrative Tax RecordsPDF
Kimberly Henry, Partha Lahiri, and Robin Fisher
The large Individual Master File constructed by the Internal Revenue Service (IRS) has been used in the past to produce various income-related statistics for small geographic areas. Previous research using the Statistics of Income Division's (SOI) Form 1040 sample, a large national sample of cleaned administrative tax records, suggests the IRS data are subject to various measurement and processing errors. Thus, small-area estimates based on the IRS data, though free from the usual sampling error problem typical in small area estimation, are subject to various nonsampling errors. The SOI sample can be potentially used to reduce nonsampling errors in the IRS-based small area estimates. We propose an empirical best prediction (EBP) method to improve the IRS-based small area estimates by exploiting complementary strengths of IRS and SOI data.

An Empirical Evaluation of Various Direct, Synthetic, and Traditional Composite Small-Area EstimatorsPDF
Kimberly Henry, Michael Strudler, and William Chen
Currently, the Statistics of Income (SOI) Division of the Internal Revenue Service uses the Individual Masterfile (IMF), administrative data for the population of Form 1040 tax returns, to produce totals of various tax return variables at the state level. Previous research based on the SOI's Form 1040 sample, a large national sample of cleaned administrative tax records, suggests that the IRS data is subject to various kinds of errors, which has lead to alternative approaches. This paper compares alternative direct, synthetic, and traditional composite estimators of state-level estimates of several variables' totals and evaluates the alternatives using various robust criteria.

Papers by Year

2008 / 2007 / 2006 / 2005 / 2004 / 2003 / 2002 / 2001 / 2000 / 1999 / 1998 / 1997 / 1993 / 1992