The statistical analyses by this tool have been developed by the Database Team to provide screening level summaries of BMP performance based on water quality concentrations. Analyses based on loads and/or volumes are not included in this tool, but are important considerations in evaluating the performance of BMP types that provide significant volume reduction.

To minimize errors and maximize computational efficiency, the tool uses a set of statistical methods that can be implemented efficiently for a broad range of influent and effluent data sets with varying size and complexity. In addition, the data sets have been initially screened by the Database Team for appropriateness for this type of analysis. The statistical summaries provided here may, in some cases, differ from BMP performance summaries resulting from more detailed analyses conducted by the Database Team.

The statistical summaries provided in this tool are intended to provide a general summary of BMP performance for the set of BMPs selected by the user. While reasonable effort has been made to generate representative summaries of BMP performance, use of these analysis results is solely at the risk and option of the user.

The statistical methods used in developing these summaries are described below.
A subset of the results of methods have been verified with other statistical packages such as SciPy^{[1]} and R^{[2]}.

The median and interquartile range values are presented to provide a non‐parametric description of the central tendency of the data set. An advantage of non-parametric statistics is that they do not require assumptions about the distribution of the underlying data.

The mean and standard deviation are also presented.
Simple substitution of one-half of the detection limit values has been used for non‐detects^{[3]}.
The percent of non‐detects in a given data set provides some insight into the potential bias introduced by this substitution.
The percentages of influent and effluent non-detect results should be reviewed before drawing conclusions regarding the validity of the statistics.
This is particularly the case for parameter groups such as dissolved metals where non-detect results are most prevalent.

Results from the Mann‐Whitney and Wilcoxon tests provide information about the statistical significance of the difference between the influent and effluent distributions. The Mann-Whitney test applies to independent data sets, whereas the Wilcoxon test applies to the paired values. These tests are evaluated at the 0.05 and 0.10 significance levels with the null hypotheses stating that "the influent and effluent data are sampled from the same distribution." The Welch's t-test (unequal variance) provides comparable information on the statistical significance of the difference in the influent and effluent mean concentrations. The null hypothesis may be rejected for p‐values less than the indicated significance level.

In some cases, the Mann-Whitney and Wilcoxon hypothesis test results produce conflicting conclusions regarding statistically significant differences. Such cases are more likely to occur where there are imbalances in the number of influent and effluent samples for a particular data set because the Mann-Whitney test operates on the entire data set whereas the Wilcoxon test only operates on data pairs. For BMPs with long residence times and/or permanent pools (e.g., wet ponds), the paired storm event hypothesis test results relying on the Wilcoxon test may be less representative than the Mann-Whitney test because of variations in sampling program designs for collection of influent and effluent samples that may not enable event-based pairing of monitoring data. For example, influent for a storm event on a particular date may mix with water from a previous event that has been stored since the previous storm. Thus, in cases where the Mann-Whitney and Wilcoxon test results conflict for BMPs with permanent pools, the Mann-Whitney results may provide a better indicator of pollutant removal performance.

Box plots (or box and whisker plots) provide a schematic representation of the central tendency and spread of the influent and effluent data sets. For each set of analysis results, the influent box plots are provided on the left and the effluent box plots are provided on the right. A key to the box plots is provided below.

Quantile plots illustrate the empirical distribution of the data. A comparison of the influent and effluent probability plots shows differences among all quantiles (not just the median) and whether the influent and effluent data sets are similarly distributed. Although the influent and effluent concentrations in a quantile plot are not paired values, the relative position and slope of the two populations can indicate the effectiveness of the BMP. The linearity of the series on these plots also provide an indication of whether or not each is well‐fit to a lognormal distribution.

The plots presented in these analysis results are developed in a manner modeled after the open‐source R statistical package.
The quantiles are computed using functions based on Wichura (1988)^{[4]}.

Influent vs. effluent scatterplots depict paired data to provide an indication of how effluent concentrations may be related to the influent concentrations. Data points below the 45 degree line indicate removals whereas data points above the 45 degree line indicate increases. A diamond symbol is used if both the influent and effluent are non-detect. If only the influent or effluent is non-detect, then a triangle symbol pointing downward or to the left, respectively, is used. Because these plots require sample concentrations for both influent and effluent, performance may be under-represented for facilities that discharge infrequently such as bioretention facilities or other infiltrating BMPs (i.e., don’t have effluent samples to pair with influent)

The time series plot presented in the statistical summaries is a simple scatter plot showing the influent and effluent concentrations collected on a given date. In most cases, paired data are available that have been collected from the same BMP for the same storm event; however, this is not the case for all studies.

Not all water quality records included in the BMP database are included in the data sets used by this statistical tool. An initial screening has been conducted by the Database Team to identify water quality data that are reasonably appropriate for analysis. Records that pass this initial screening are identified by:

- 'Inc' or 'Yes' value in the 'Initial Analysis Screening' field in the 'WATER QUALITY' table; and
- 'Yes' value in the 'Use in BMP WQ Analysis?' field in the 'MONITORING STATIONS' table of the main Database.

Some additional screening criteria have been applied to the resulting data set so that the data used in the statistical analysis tool are internally consistent and reasonably appropriate for comparison of influent and effluent concentrations. These screening criteria are described below.

- Only those water quality records representing influent or effluent samples as identified by the 'Monitoring Station Type' field are selected for analysis. Samples taken at other monitoring stations (e.g. intermediate points in a treatment train) are not included. For some studies like porous pavement BMPs, effluent from reference watersheds are used in place of an "influent" monitoring location.
- For all BMP types except for Wetland Basins and Retention Ponds, grab samples are excluded when composites samples is available for the same event. The majority of studies in the Database use flow-weighted composite samples to obtain a representative event mean concentration.
- For Wetland Basins and Retention Ponds, grab samples are included in the analysis data set because flow-weighted composites are not available for these BMP types.
- For bacteria, grab samples are included for all BMP types because many researchers do not collect composite samples for bacteria due to short sample hold times.
- A small number of studies in the Database have multiple influent or effluent samples taken for the same storm event that were not composited by the data provider. For storms with multiple samples, the average (arithmetic mean) of the water quality results was used as a representative value for that storm event.
- Samples that have invalid values in the 'WQ Analysis Value' field in the 'WATER QUALITY' table and have not been otherwise identified as non‐detects have been excluded from the analysis dataset.
- Samples that have NULL values in the 'Sample Date' field in the 'WATER QUALITY' table are considered incomplete and have been excluded from the analysis dataset to avoid errors in plotting time series data. This condition applies to a few older (e.g., 1980s) studies extracted from literature based on an event number (rather than a date) and included in the initial release of the database in 1999.

^{1}SciPy is a Python-based ecosystem of open-source software for mathematics, science, and engineering.
More information can be found at https://www.scipy.org/

^{2}R is a free software environment for statistical computing and graphics.
More information can be found at https://www.r-project.org/

^{3}Note that other, more specific analyses conducted by the Database Team have used more advanced approaches for dealing with non‐detects, which may lead to different
results.
A simpler method was selected for this analysis to provide a more general tool for use with a variety of data sets.

^{4}Wichura, M. J. (1988) Algorithm AS 241: The Percentage Points of the Normal Distribution. Applied Statistics, 37, 477–484.