This section contains selected abstracts from papers to be given at the annual meetings of the American Statistical Association, August 3–7, 2008. These abstracts and papers to follow, were written by members of the Statistics of Income Division of IRS, and others.
The views expressed here are those of the authors and are not necessarily the official positions of the Internal Revenue Service.
Variance Estimation for an Estimator of Between-Year Change in Totals from Two Stratified Bernoulli Samples
Kimberly A. Henry and Valerie L. Testa and Richard Valliant
This paper provides the theoretical framework for estimating the variance of the difference in two years' totals estimated under the stratified Bernoulli sample design. We provide a design-unbiased estimator that accounts for two practical problems: a large overlap of sample units between two years' samples and "stratum jumpers," which are population and sample units that shift across strata from one year to another. Both problems affect estimating the covariance term in the variance of the difference. The estimator is applied to data from the Statistics of Income Division's individual tax return sample. Naïve variance estimates using only the separate years' variances are compared to show the effect of ignoring the estimated covariance.
Defining Business Data Needs
A national tax agency's data can be leveraged through proper identification, capture, processing, and integration for statistical research. Access and budgetary constraints can be transcended with a partnership of analysts. Important questions include what fosters entrepreneurship; why the nonprofit sector's increasingly significant; outsourcing and independent contractors; the effect of true effective tax rates on economic performance. Barriers to research include public and private researchers are inadequately aware of the roles each need to play; budgets and legal authorizations limit public analysts on access and data quality options; private researchers often do not know or appreciate these constraints and the vast data potential, including its limitations. Proposed measures might serve the research community's needs outside the historical zero sum game of access and barriers.
Methodological Limitations in Producing Subnational Tabulations of Unincorporated Business Activity That Partnerships and Sole Proprietorships Report on Returns
The Statistics of Income (SOI) Division generally compiles statistics based on stratified probability samples, using such classes as size of income, presence or absence of a specific form or schedule, and business activity. In addition, it evaluates these estimates by comparing them with information it collects from extracts of the population of filers. This paper addresses the methodological limitations associated with producing state and county-level tabulations of unincorporated business activity from the information on a Form 1065 or a Form 1040, Schedule C. It presents trends and comparisons among states, counties and business activities across five tax years—2001 to 2005. Discussion includes the methodology for assigning entities to states and counties, as well as comparisons at the national level between this data and SOI estimates drawn from samples.
Statistics from Individual Income Tax Returns: Populations, Samples, and Processing of Individual Income Tax Returns at Statistics of Income
Michael E. Weber and David P. Paris and Peter J. Sailer
Statistics from Individual Income Tax Returns have been produced by the Statistics of Income Division since 1917. This paper discusses the statistics generated from the yearly filing of Individual Income tax returns from the early 1960s to the present. It traces changes made to the yearly sampling plan from a single cross section used for national estimates to the current configuration which includes a cross section for national estimates, a Continuous Work History Sample panel, multiple stratified high income cohort panels, and a an expanded cross section sample for state level estimates, as well a nonfiler sample. The paper will also explain how these samples are processed from raw administrative data into perfected data as well provide definitions of SOI terminology and a description of the various products derived from these samples.
SOI/IRS Sales of Capital Assets Sample Redesign for Tax Year 2007
Yan K. Liu and Michael Strudler and Jana Scali and Janette Wilson
The Statistics of Income (SOI) of the IRS developed a stratified sample of individual returns to study the form 1040 Sales of Capital Assets (SOCA) in tax year 1999. It was a cross-sectional sample from the population of all individual returns of tax year 1999. Because of high processing cost of SOCA returns, there has been no other SOCA cross-sectional sample ever since. However, the 1999 SOCA cross-sectional sample is outdated as there have been many economic changes that impact capital gains. Therefore, it is decided to start a new SOCA cross-sectional sample for tax year 2007. This paper discusses how the new sample is designed, including determining stratum boundaries, allocating sample sizes to strata, and balancing the cost and precision. The Neyman allocation is used and the information on cost estimate and variance estimate is obtained from the related SOI's data sources.
Attrition in the Tax Years 1999–2005 Individual Income Tax Return Panel
Policy research is increasingly being done on panel data; attrition can undermine validity and misrepresent results of many policy analyses. Using the Individual Income Tax Return Panel for Tax Years 1999–2005, this paper will examine panel attrition. It tests the randomness of attrition and evaluates the implications of such results. It assesses the observed rate to determine the predictability over time. Finally, it will present several tabular representation problems hindering analysis.
Coevolution of Multivariate Optimal Allocations and Stratum Boundaries
Coevolution in evolutionary algorithms allows solutions to two interdependent optimization problems to be determined simultaneously, similar to the evolution of symbiotic species in nature. There are many methods for determining multivariate optimal allocations in stratified sampling, including the use of an evolutionary algorithm (Day, 2006). There are also widely accepted methods for determining optimal stratum boundaries. This work presents a method for simultaneous determination of multivariate optimal allocations and stratum boundaries using the concept of coevolution in evolutionary algorithms.