Freshwater ecosystems provide many benefits to society, including food, water, flood control, aesthetics, and recreation (Finlayson et al. 2005). Despite state and federal regulations aimed at protecting these resources, 52% of all assessed streams in the United States are impaired (USEPA 2012). Recognizing the degraded state of their waters, many public stakeholders across the world have formed watershed associations in efforts to improve the health of their local watersheds (Cline and Collins 2003). Watershed groups have been shown to enhance their community’s chance of receiving funds to improve their watershed and to develop programs to protect and enhance water quality (Cline and Collins 2003). However, monitoring water quality is not only complex logistically, it is also expensive (Maas et al. 1991). Many watershed groups turn to citizen science to both engage the public and collect large amounts of data that they need to address their concerns. For example, a 9th grade class in New Jersey worked with their local watershed partnership to determine their community’s willingness to pay for the restoration of ecosystem services in the watershed (Nicosia et al. 2014).
Citizen science involves engaging and collaborating with members of the public to gather data to address scientific problems (Cohn 2008; Dickinson et al. 2012; Miller-Rushing et al. 2012). Recently, citizen science projects have grown from a few examples to thousands (Conrad and Hilchey 2009; Shirk et al. 2012). Many benefits have been identified as reasons for including citizens in scientific work, including increased public knowledge of science, ability to capture large amounts of data across space and time, advancement of scientific knowledge, lowered cost of collection and processing, increased social capital, and government and ecosystem benefits (Bonney et al. 2009; Cohn 2008; Conrad and Hilchey 2011; Silvertown et al. 2009). Kolok and Schoenfuss (2011) specifically describe citizen science as a meaningful approach for monitoring waterways. Despite all of the benefits that citizen science projects provide, there are still continued concerns about the validity and subsequent application of the data collected (Bonney et al. 2014; Conrad and Hilchey 2011; Dickinson et al. 2010; Kim et al. 2011; Kolok et al. 2011). Specifically, Conrad and Hilchey (2009) note that citizen science data often are not used in the decision-making process, either because of concerns with collection methods or the inability to get data to decision-makers.
Many examples of citizen science for monitoring water quality exist in the peer-reviewed scientific literature (Au et al. 2000; Buytaert et al. 2014; Canfield et al. 2002; Fore at al. 2001; Lottig et al. 2014; Maas et al. 1991; Nicholson et al. 2002; Overdevest et al. 2004; Peckenham et al. 2012; Savan et al. 2003) and more certainly exist in practice. In assessing the validity of volunteer-collected and volunteer-analyzed water chemistry data, Nicholson et al. (2002) found mixed results depending on the variable, though their assessment was between yearly means of two different datasets, not direct sample comparison. Savan et al. (2003) found that 40% of their citizen science program’s water chemistry variables failed quality control checks, leading them to use biological measures of water quality over chemical measures. Fore et al. (2001) and Canfield et al. (2002) both found no significant difference between volunteer collected biological and chemical water quality data, while Maas et al. (1991) chose to run their water chemistry samples through a university lab to avoid volunteer error. Au et al. (2000) found that local high school students were able to evaluate toxicity of Escherichia coli (E. coli) similarly to experts after they were trained, and Peckenham et al. (2012) determined that middle to high-school aged students were able to accurately analyze pH and conductivity, but that additional quality assurance was needed for hardness, chloride, and nitrate testing. There is still a need to assess current water monitoring programs and provide examples of applications of citizen science to collect and analyze water quality data for improved watershed management.
The overall goal of this research is to address two of the main issues surrounding citizen science: data validity and data application. The first objective is to compare volunteer-collected and volunteer-analyzed water quality data to volunteer-collected and laboratory-analyzed water quality data to assess the validity of the volunteer-analyzed data. The second objective is to provide a proof of concept of how the data collected by volunteers can be used by watershed groups to target management strategies and priorities.
The Wabash River Enhancement Corporation (WREC) is a 501c3 nonprofit agency established in 2004 and based in Lafayette, Tippecanoe County, Indiana (www.wabashriver.net). The goal of WREC is to lead efforts within the community to improve and enhance the local Wabash River corridor as well as to engage and educate the community in the implementation of projects, programs, and activities that enhance the Wabash River ecosystem. WREC has established many programs to achieve their goals, including cost-share programs for urban and agricultural best management practices, green business certification, and riverfront development.
In 2009, WREC—in partnership with researchers at Purdue University—established a citizen science water quality monitoring program called the Wabash Sampling Blitz (Blitz). The two main goals of the Blitz are first, to provide a hands-on opportunity for the community to experience the Wabash River and its tributaries through citizen science, and second, to obtain uniform, simultaneous water quality data throughout the area that WREC serves. This large-scale and simultaneous collection of data can help by providing “hot spot” identification for future watershed priorities (Kolok et al. 2011). The Blitz occurs twice a year in the spring (April) and fall (September), when approximately 250 volunteers sample 206 sites within the Region of the Great Bend of the Wabash River Watershed (Figure 1). Land use in the watershed is mostly row-crop agriculture (corn and soybean), but the watershed is also host to the urban areas of West Lafayette and Lafayette (~100,000 population). Since its establishment in 2009 and until the fall of 2013, the Blitz has benefited from the contribution of 889 unique community volunteers (192 repeat volunteers) giving more than 3,000 hours of their time (Figure 2).
The Blitz is held for approximately four hours on one afternoon. Volunteers may arrive at any time during the sampling window. Volunteers are pre-assigned to one of three staging locations where they either meet up with or are matched with at least one other sampling partner. Staging location organizers detail the sampling methods and objectives with volunteer groups. Volunteers then travel in their own vehicles to 3–4 sites where they collect water samples in stream, measure water transparency with a transparency tube, and measure in-stream water temperature. Volunteers then return to their staging location where additional volunteers help them filter a portion of their samples to use in subsequent lab analyses. The remainders of their water samples are used to test for nutrients and contaminants on-site using field test strips. Volunteers then color in selected constituent (nitrate+nitrite-N and water temperature) levels on a map of the watershed so they can easily compare their results to other portions of the watershed, as well as to data from the previous year. The constituents tested in lab and by participants have varied from year to year depending on funding availability, but many have been consistently analyzed (Table 1). The Purdue University Soil Science Laboratory used an AQ2 Discrete Analyzer to measure concentrations of ammonia (mg/L; AQ2 method EPA-103-A Rev. 10), nitrate+nitrite-N (mg/L; AQ2 method EPA-114-A Rev. 9), and orthophosphate-P (mg/L; AQ2 method EPA-118-A Rev. 5-subsequently converted to orthophosphate). Dissolved organic carbon concentration (mg/L) was measured with a Shimadzu TOC-V CSH. Field test strips were used by volunteers to determine concentrations for nitrate+nitrite-N (mg/L; Hach Aquacheck Cat. 27454-25) and orthophosphate (mg/L; Hach Aquacheck Cat. 27571-50), and pH levels (Sigma P-4411). Spring 2010 samples only were analyzed using WaterWorks Nine-Way Test Kits that included pH, nitrate+nitrite-N, and other tests. Volunteers also used a transparency tube, with a secchi disc, and marked in units of cm to record in-stream transparency and took water temperature readings using alcohol thermometers (°C).
|Lab analysis (completed by professionals)||Fall 2009||Spring 2010||Fall 2010||Spring 2011||Fall 2011||Spring 2012||Fall 2012||Spring 2013||Fall 2013|
|Dissolved organic carbon (DOC; mg/L)||x||x||x||x||—||x||x||—||x|
|On-site analysis (completed by volunteers)||Fall 2009||Spring 2010||Fall 2010||Spring 2011||Fall 2011||Spring 2012||Fall 2012||Spring 2013||Fall 2013|
|Nitrate+nitrite-N strip (mg/L)||x||x||x||x||x||x||x||x||x|
|Orthophosphate strip (mg/L)||x||—||—||—||—||—||x||x||x|
|Transparency tube (cm of visibility)||—||—||x||x||x||x||x||x||x|
Comparison of volunteer and lab-determined data
For this study, the volunteer-collected nitrate+nitrite-N and orthophosphate sample concentrations were compared to the lab-determined sample concentrations because these two variables had the most field and lab data available. The field test strips are used on unfiltered samples while the lab analysis are performed on samples filtered through a 0.45 micron glass fiber filter. While this should not impact the nitrate+nitrite-N comparisons, it could lead to volunteer overestimation of orthophosphate due to the affinity of phosphorus to sorb to sediments (Zhou et al. 2005). However, given the high transparency of the water samples on average (Table 3), this may not have a large influence on the readings. Another issue with comparing these datasets is that the field test strips used by volunteers have a binned, colored scale comparison for volunteers to read the level of the constituent. These bins essentially make the data provided by volunteers categorical. Thus, in order to compare the two datasets, the lab-determined data were binned to match the test strips in order to make them categorical as well (Table 2). The test strip scales provide a single concentration value associated with each color, which was assumed to represent the mid-point of the represented concentration range. Bins were therefore centered around the test strip concentration values. For example, the first nitrate+nitrite-N bin ranges from zero to halfway between the first and second concentration value (0.5). Because nitrate and nitrite are combined from the lab analysis, the field strip nitrate and nitrite values were also added together. Because nitrate concentrations are larger and because most volunteer-read nitrite values were close to zero, nitrate values were used to make the bins.
|Nitrate-N test strip scale value (mg/L)||Assigned nitrate+nitrite bins (mg/L)||Orthophosphate test strip scale value (mg/L)||Assigned orthophosphate bins (mg/L)|
|Water Quality Variable||Fall Mean||Spring Mean||p-value|
|Dissolved organic carbon (mg/L)||3.66||2.88||1.54e-05|
Once the datasets were categorized as described in Table 2, three measures of agreement were used to determine how well the volunteer-read data compared with actual lab concentrations: The percent agreement, the unweighted and weighted Cohen’s Kappa Statistic (hereafter referred to as Kappa), and the unweighted and weighted Bangdiwala B statistic (hereafter referred to as B). Three measurements were chosen to provide more certainty of the conclusions and to demonstrate multiple methods that can be used to assess agreement. As a first order evaluation, the exact percent agreement and the percent agreement within one category were calculated to provide a straightforward assessment of the percent of volunteer-read observations that fell exactly into the same bin as lab-analyzed values (exact percent agreement) or the percent of observations that fell into the same bin or one bin higher or lower of lab-analyzed values (percent agreement within one category). The Kappa statistic and the B statistic are two different ways to evaluate the agreement between two independently classified observations, provided the datasets have the same categories (Munoz and Bangdiwala 1997). The B and Kappa statistics both go beyond percent agreement by taking into consideration that some agreement could occur by chance (Banerjee et al. 1999; Munoz and Bangdiwala 1997). The B statistic is calculated based on a graphical “area of agreement” whereas the Kappa statistic is based on the observed proportion of agreement (Munoz and Bangdiwala 1997). The higher the Kappa and B statistics, the better the agreement between the two datasets. The Kappa statistic is also known to be more conservative in measuring agreement when most values fall into one category, known as the prevalence problem, which is important for interpretation (Hallgren 2012; Viera and Garrett 2005). The unweighted statistics only compare how well the two observation sets match up for each category bin. The weighted statistics consider how far the observations are from exact agreement (i.e., within one or two categories). More weight is thus given to agreement in categories closer to exact agreement. The weighted and unweighted Kappa and B statistics were determined for each Blitz, as well as for all Blitz events combined using the ‘vcd’ package of R (Meyer et al. 2014). The interpretation guidelines developed by Munoz and Bangdiwala (1997) were then used to determine the qualitative level of agreement. Lastly, to further evaluate the levels of agreement between the field and lab concentrations, bubble plots were created in R. These plots provide a visual interpretation of agreement between two observed datasets. Perfect agreement is shown along the upward diagonal of the plot, with the number of data points that fall within a category provided in each bubble and proportional to bubble size.
The second objective of this research was to utilize data collected from the Blitz events (both volunteer-analyzed and lab-analyzed) to help target outreach and education within the watershed. Because multiple variables were available over the period of nine Blitz events over five years at 206 sites, multivariate cluster analyses techniques were employed to examine the large dataset. Cluster analysis is a multivariate statistical technique that can aid in interpreting very large datasets by grouping objects (e.g., sampling sites) with similar characteristics together, and is a common tool used in riverine systems (Bierman et al. 2011). While many studies have used cluster analyses techniques to interpret water chemistry data (Alberto et al. 2001; Daughney et al. 2012; Güler et al. 2002; Kim et al. 2005; Mavukkandy et al. 2014; Najar et al. 2012; Pati et al. 2014; Shrestha and Kazama 2007; Simeonov et al. 2003; Singh et al. 2004; Singh et al. 2005; Templ et al. 2008; Wang et al. 2013; Wang et al. 2014), none of these studies were conducted as part of a citizen science effort or used data collected by citizen science volunteers.
Six variables that had the most data available were used in the cluster analyses. Variables were first pre-tested to see if the fall and spring Blitz samples had different means, as temporal variation has been shown to be important in cluster analysis (Gamble and Babbar-Sebens 2012). All observations for a given variable in the spring were tested versus all observations for a given variable in the fall using two sample t-tests (Table 3). All six variables showed significant differences in fall and spring values, so these were separated into different variables (i.e., fall pH and spring pH were treated as two separate variables). The individual sampling event observations were averaged into fall and spring variables for each location, because some sites were not sampled in a given year owing to low/high water levels, site inaccessibility, or a lack of volunteers. Thus, the final cluster analysis was completed to categorize each sampling location based on its average spring and fall water quality (12 variables, 206 sites).
There are many types of clustering techniques; however, for this project hierarchical clustering was employed as it has been previously applied to the classification of water quality data and is the most common approach (Shrestha and Kazama 2007). Hierarchical clustering connects similar data points based on a chosen distance measure and seeks to minimize within-cluster variation and to maximize between cluster variations. There are two main types of hierarchical clustering: Agglomerative, starting with individual data and grouping like observations, and divisive, starting with all data in one group and then dividing into groups. An agglomerative technique was used because these methods are very efficient (Alberto et al. 2001) and often have been used for water chemistry clustering. Before the cluster analysis was completed, the variables were transformed to achieve normal distribution using either a log10 transformation or a three-parameter lognormal or log10 transformation (Table 4) and then standardized to meet the normality and equal variance assumptions of cluster analysis (Güler et al. 2002). The cluster analysis was completed using R statistical software employing the Ward’s Method using a Euclidean distance measure (Alberto et al. 2001; Güler et al. 2002; Kim et al. 2005; Shrestha and Kazama 2007; Simeonov et al. 2003; Singh et al. 2004; Singh et al. 2005). Cluster numbers were determined using Dmax*0.66 as the cutoff criteria where Dmax is the maximum distance between clusters (Singh et al. 2005). A subsequent principal components analysis (PCA) was used to identify important variables in the cluster analysis (see the Supplementary Materials for details). Principal components analysis is most often employed to reduce a large dimension dataset into smaller dimensions by creating combinations of variables called principal components (Güler et al. 2002). Boxplots summarizing the distribution of the variables which contributed most to principal component loadings were constructed to examine the results of the cluster analysis.
|Normalization Technique||Water Quality Variable Transformed|
|no transformation (already normally distributed)||fall pH, spring pH, fall temperature, spring temperature|
|log10 transformation||fall dissolved organic carbon, spring dissolved organic carbon|
|three parameter lognormal transformation; shift parameter determined using a quantile lower bound estimator||fall nitrate+nitrite-N, spring nitrate+nitrite-N, fall orthophosphate, spring orthophosphate|
|three parameter log10 transformation; shift parameter estimated as 1 plus data maximum||fall turbidity, spring turbidity|
Comparison of volunteer and lab-determined data
There was a good agreement for nitrate+nitrate-N between the volunteer-analyzed and lab-analyzed data (Figure 3). The exact (same bin) percentage agreement between the two datasets for nitrate+nitrite-N was 55% and went up to 84% if considering agreement within one category, i.e., the volunteer-read concentrations fell within one bin of the lab-determined values (Table 5). The Kappa and B statistics show that field strip volunteer-read nitrate+nitrite-N concentration data agree moderately to substantially well (Munoz and Bangdiwala 1997) with lab-determined values most of the time (Figures 3 and 4). As seen in Figure 3, most observations fell into the lowest bin. This was especially true for the fall of 2013, for which the B statistics are high while the Kappa statistics are low. This is likely because the Kappa statistic does not do well when there are very few categories (Viera and Garrett 2005). The low Kappa and B values in the spring of 2010 are likely due to the fact that different test strips were used in this Blitz than in all other Blitz events. This change in strips may have led to incorrect readings by volunteers, or these strips could have been faulty. Because of this, overall statistics were calculated both with (“All”) and without (“All-S10”) those values.
|Event||Exact Agreement||Within One Category||Exact Agreement||Within One Category|
The bubble plot illustrating the agreement between volunteer- and lab-analyzed orthophosphate values shows that the range of the data was even lower than that of the nitrate+nitrite-N data (Figure 5). The overall percentage agreement was only 33%, but went up to 99% if considering agreement +/- one bin (Table 5). The Kappa and unweighted B statistics for the orthophosphate comparison are fair to moderate overall (Figure 6). This is because most of the data were below 5 mg/L, thus putting them into one of the lowest two bins (Figure 5). The weighted B statistic is very good because all except a few of the data points were within the lower two categories. Because the actual orthophosphate levels fell primarily into the lower category, the results of this comparison may not be broadly transferable to other studies. Although the majority of samples were overestimated, the majority of samples were also within one bin +/- bin, indicating that the volunteers were not estimating values completely incorrectly. This consistent overprediction bias by the volunteers could be due to the fact that most samples were actually in the lowest category or perhaps the volunteers were reading unfiltered samples and the lab data were for filtered samples.
Ward’s Method using Euclidean distance measures was applied to the water quality data in order to group the sampling sites into similar clusters. Applying the Dmax*0.66 criterion, three distinct clusters emerged. To evaluate cluster membership a PCA was performed on the variables (see Supplementary Materials). The variables contributing the greatest loading to the first three principal components were summarized using boxplots, grouped by cluster (Figure 7) and included: Spring and fall DOC, spring nitrate+nitrite-N, spring temperature, and fall and spring orthophosphate. Additionally, the cluster means are summarized in a spider plot to visualize cluster separation (Figure 8). Boxplots for all other variables are included in the Supplementary Materials.
The clusters were mapped within the watershed (Figure 9) and show a striking similarity to the land-use of the watershed (Figure 1) which can serve as a reasonable way to evaluate the quality of the clusters (Templ et al. 2008). In comparing the land-use percentages of each cluster, Cluster 1 had the greatest percentage of urban and suburban land use, Cluster 2 had the greatest percentage of agricultural land use, and Cluster 3 had a fairly even mix of all land-use types.
The results showed that Cluster 1 generally had the highest fall and spring DOC, highest spring temperature, and highest spring orthophosphate. Additionally, this cluster had low spring nitrate+nitrite-N, lower fall orthophosphate concentrations, and lower transparency than the other clusters. Cluster 2 was characterized by some of the highest fall and spring nitrate+nitrite-N, generally higher fall orthophosphate, and lower DOC and greater transparency values compared to cluster 1. Cluster 3 was the relatively “cleanest” cluster, having generally lower nutrients and DOC compared with the other two clusters while maintaining high transparency, average spring temperatures, and the lowest fall temperatures. The pH did not seem to vary greatly across the clusters.
Discussion and Conclusions
Citizen science data
The greatest challenge in comparing volunteer-determined and lab-analyzed water quality data was that the test strip methods of measuring nitrate+nitrite-N and orthophosphate as read by volunteers created categorical datasets because volunteers picked values only on the scale provided within the strips. This can create a challenge in analyzing samples using common statistical methods. Similarly to Peckenham et al. (2012), we chose to address this issue by binning the continuous lab data into comparable categories; we then used multiple types of agreement analysis methods to compare the volunteer-read and lab-tested data. The results demonstrated that for nitrate+nitrite-N, volunteers were consistently able to estimate concentrations using field test strips with moderate to substantial agreement to lab values, although the potential biases of volunteer-read data were not evaluated. This is consistent with a study that demonstrated that nitrate test strips showed good precision when read by students (Peckenham and Peckenham 2014). Agreement between volunteer-read data and lab-analyzed data across the Blitz events and overall in this study supports the conclusion that citizen-collected data can be scientifically valid for water quality assessment, which is key to demonstrating the usefulness of citizen science (Bonter and Cooper 2012). In addition to providing meaningful data, the fact that participants were able to accurately evaluate on-site nitrate+nitrite-N concentrations enhances the educational outcomes for the Blitz participants (Jordan et al. 2012).
The validity of orthophosphate observations measured on-site by volunteers was more difficult to assess considering there was little variability of measurements within the test strip categories (most of the lab-measured orthophosphate concentrations were very low) and there was a consistent overestimation by the volunteers. Similarly to this finding, Peckenham and Peckenham (2014) found that overestimation occurred when using nitrate+nitrite test strips when actual concentrations were low. However, some of the overestimation in orthophosphate comparisons could also result from the fact that the test strips measured orthophosphate in unfiltered water (e.g., with more sediment-bound phosphate) and the lab analysis was performed on filtered samples (e.g., with less sediment-bound phosphate). This, combined with the fact that most of the data fell into the lowest test strip category, suggests that future orthophosphate testing—especially in this watershed—would benefit from test strips that have more categories between 0 and 5 mg/L and from testing samples that have been filtered for better comparison. Overall, the results support the idea that water quality data observed by volunteers can be acceptable for an educational experience and informative for watershed groups.
Cluster interpretation for watershed management
The cluster analysis and characterization was completed only for volunteer-collected water quality data. Similarly to other studies (Kim et al. 2005; Shrestha and Kazama 2006; Simeonov et al. 2003; Singh et al. 2004; Varol et al. 2012), the cluster analysis revealed unique primary management zones (Figure 8) that can be used to target education and conservation strategies in the future, as follows:
- Cluster 1: Urban/suburban management zone with high spring and fall DOC and generally lower nutrients and transparency.
- Cluster 2: Agricultural management zone with the highest nutrients and lower DOC.
- Cluster 3: Minimal management zone with the greatest transparency and lower nutrient and DOC concentrations.
These general relationships are useful for identifying the persistent water quality impacts associated with different land uses (Foley et al. 2005) and serve to confirm targeted conservation strategies within the watershed. For example, although nutrient management is certainly important for agricultural land (Vitousek et al. 2010), sediment pollution may be a greater problem in urban streams (Tayler and Owens 2009) to the extent that transparency reflects sediment load. In contrast, exceptions to the general land use pattern can help to identify areas which might have specific polluters that are unrelated to land use. For example, one sampling site that falls into Cluster 2 is primarily forested and urban land, not agriculture. This specific area is host to a golf course, and previous research has shown that nutrient loadings from golf courses can be similar to those of agriculture (King et al. 2007), which likely explains why the site would fall into a cluster with primarily agricultural land use. Another sampling site that is primarily forest and urban also was placed into Cluster 2. A small wastewater treatment plant is located within this site, and the constant nitrate+nitrite-N signals likely explain its inclusion in this category. Overall, these clusters help provide WREC with insight into the specific water quality concerns seen throughout the watershed, so that management and education strategies can be improved (Brezonik et al. 1999).
The cluster results also can be used by WREC to determine which sites need no further testing given a shortage of volunteers or a reduction in budget (Wang et al. 2014). Lastly, this cluster analysis demonstrates how volunteer-collected and tested data (transparency, temperature, pH) can be used along with volunteer-collected and lab-tested data (nutrients, DOC) to perform more complex and informative analyses of water quality data.
Citizen science approach for water quality monitoring
Watershed groups exist in all parts of the US and the world, and many operate as nonprofit organizations (Lubell et al. 2002). By collaborating with local citizen scientists, these groups can not only maximize their resources but also educate and involve the local community in water protection efforts (Cline and Collins 2003). Such groups, along with other citizen science-based organizations, are under increasing pressure to show the effectiveness of their programs (Conrad and Hilchey 2011). Overall, our research illustrates that citizen science-produced data can be highly valuable for use by watershed groups. Twice a year, hundreds of citizen scientists in Indiana help to sample 206 sites to provide a snapshot of water quality conditions in the Great Bend of the Wabash River Watershed that would otherwise not be achievable. By utilizing relatively inexpensive field test strips, volunteers are able to instantly evaluate the quality of the water they are sampling, which provides not only important data but a great educational opportunity. The test strips are inexpensive compared to lab analyses, and our analyeses show that they can be informative to water quality managers even when read by the members of the public. The cluster analysis provides a replicable example of how citizen science-collected data can be used to further inform watershed management decisions. Overall, this work supports the increasing body of scientific knowledge demonstrating that citizen scientists can contribute worthwhile data which can easily be used in planning by watershed groups.