Environmental projects require data collection and analysis, thus data quality and integrity are critical components of our work. TRC teams often have discussions regarding how much field data are necessary to make remedial management decisions about a project. There are several variables that may or may not be in your control that affect this decision:
- You may inherit a project with an existing dataset and aren’t sure if you need to collect additional data, or
- You may be starting a new project and need to collect data to characterize site conditions, but don’t want to exceed your budget in the process.
During efforts to collect and evaluate data of high quality and integrity, extensive planning is undertaken, multiple personnel are mobilized, training is acquired and long days are spent out in the field. Considering all the effort put into the processes which encompass data collection and interpretation, it is important to ensure our investigation efforts result in a technically robust and interpretable dataset that is large enough to be evaluated statistically.
Accomplishing Statistical Goals Using EPA’s ProUCL 5.2 Software
One goal of statistics is to draw meaningful inferences about a population based on limited sample data taken from that population. Although it is often impossible to sample an entire population due to limited resources and budget, your dataset must be large enough to provide technically-defensible and robust statistics.
Statistical software, like United States Environmental Protection Agency’s (USEPA’s) ProUCL 5.2, can be used for interpreting environmental data. One common problem users face is having insufficient data for the meaningful calculation of statistics and estimates. To avoid this issue,
ProUCL’s User’s Guide suggests “at least 10 observations should be collected to compute UCLs and various other limits.”
Why A Sample Size of 10?
A sample size of 10 or larger lends to more confidence in the statistical results. When dataset sample sizes are small, errors can occur when calculating statistical data limits, including upper tolerance limits (UTLs) and upper confidence limits (UCLs). There also may be errors when a large number of data points are non-detect. An error message may look like the following image:
By having a sample size of 10 or larger, you can trust that the work of your sampling team results in a dataset which allows you to make quality decisions and recommendations for your site.
Resources:
For more information, the USEPA established ProUCL 5.2 guidance with the publication of the ProUCL Version 5.2.0 User Guide: Statistical Software for Environmental Applications for Data Sets With and Without Nondetect Observations in 2022 (USEPA, 2022). This document provides further guidance and best practices for the ProUCL 5.2 software.
TRC’s Experts Can Help
Understanding your project’s data requirements, especially in calculating statistics, can be an overwhelming task. TRC’s Risk Assessment Team and data specialists like Patti Meadows can help you overcome these challenges.