Florida Institute of Technology - College of Business
ABTA Institute LogoABTA Analytics Dashboard

Technical Limitations of Data

Data users should carefully consider the quality of the information they are using. All data are subject to errors.

Errors in the data include sampling errors and nonsampling errors. Sampling error affects those items collected from a sample of the population in a survey. Sampling error occurs when a portion of the population is surveyed to represent the entire population - whether that population consists of students, patients, or lane miles. Data based on samples are estimates that would differ somewhat from data based on a complete enumeration of all members of the population. For example, data submitted to meet Federal data requirements, such as those for the Department of Transportation's Federal Highway Administration, often includes results based on analysis of only a portion of the entire population. Not all bridges may be assessed or all miles of road tested for condition during a given year.

Nonsampling errors occur in the collection and processing of data. They are often difficult to measure and identify. Nonsampling errors may be random or in a consistent direction which biases the data. Nonsampling errors occur when data elements are missed or counted erroneously (i.e., counted more than once) or caused by nonresponse and processing errors. While every effort is made to keep these errors at a minimum, some mistakes and inconsistencies of official reporting, or of handling of particular items, escape detection.

Each state department must determine the level of effort they put into responding to the various surveys and data requests made by the federal government, national associations, and professional organizations, etc. Data is collected, stored, processed, analyzed, and formatted to meet the various needs of different organizations. Each step of the process is an opportunity for errors to occur that affect the accuracy of the data.

In addition to being subject to errors, there may be inconsistencies in policies or assumptions in reporting as well as processing the data. For example, the Census Bureau collects data on the finances of municipalities and townships. In the 2002 census, nearly 18% of the municipalities and townships did not respond to the census. Missing data was filled in by substituting the most recently reported data adjusted for growth rates. These growth rates were calculated by utilizing information from similar municipalities and townships. Other government agencies, such as the Federal Highway Administration, may substitute the missing data with previous year data without making any adjustments.

Developing like data that is consistent across all states requires a comprehensive data collection methodology and significant support in the form of data definitions, data templates, survey testing, training, availability of subject matter experts, follow-up, and timely analysis of the data.

Users should also be aware of the policies and assumptions that determine how data is handled. While these are typically not errors in themselves, their application may produce unexpected results. For example, USDOT is interested in understanding transportation costs regardless of which organizational unit within a state bears those costs - it collects costs based on certain transportation categories. However, different states may organize their transportation costs across multiple departments. For example, highway patrol and safety may not always be under the Department of Transportation's jurisdiction thereby affecting cost comparisons when only looking at organizational boundaries rather than function.

In summary, data users should be aware of the errors, assumptions and policies to which the data are subjected. Users should review the data to make sure it makes sense historically. While reporting methodologies include procedures to standardize data and to minimize errors, it is impossible to avoid data problems entirely. Some processing data procedures themselves, such as clerical checking and computer editing and imputation, introduce error into the data. Knowledge of the types of errors, the extent of errors that may be present, and the policies and assumptions adopted to address the data contributes to more a meaningful understanding of the results.