Introduction

Many marine ecosystems are at risk of degradation and decline as a result of multiple interacting global, regional, and local stressors, including climate change, pollution, and overfishing (; ; ). To detect and fully understand the impact of stressors on marine ecosystems, and to provide evidence that supports conservation measures, baseline information and continued monitoring over large spatiotemporal scales and including a diverse array of species is often required (; ). The establishment and persistence of large, long-term monitoring programmes however, requires sustained financial investment alongside a trained “workforce” ().

In recent decades, citizen science, widely defined as public participation in scientific research, has emerged as a powerful and cost-effective means of generating extensive ecological data sets across a multitude of ecosystems (; ; Earp and Liconti 2019). Involvement in such projects can increase environmental stewardship as participants develop a greater appreciation of natural environments (; ), and can enhance participant wellbeing through mental stimulation, social interaction, and improved fitness (). In the marine realm, citizen scientists have participated, both in person and more recently online, in monitoring species and environmental conditions across ecosystems including coral reefs (; ; ), seagrass meadows (), temperate rocky reefs (; ), and many more (; ; ). Yet, despite its capacity to broaden the scope of monitoring initiatives, citizen science has yet to be fully embraced as a valid means of scientific investigation (), primarily due to concerns regarding data quality (; ; ; ).

The accuracy of citizen science–generated data may vary depending on the knowledge and/or experience of the participants, the task level, and the ecosystem in question (). Techniques suggested to minimize error and/or biases and increase the robustness of citizen science data include retaining trained and/or experienced participants, training participants to use standardized protocols, reducing the taxonomic resolution or complexity of the task/protocol, verifying data (i.e., the checking of record correctness by experts), and comparing data collected by citizen scientists and professional scientists (Baker et al. 2021; ; ; ; ; ; ; ).

A growing body of research has compared data collected by citizen and professional scientists to demonstrate that citizen scientists can generate data of similar and potentially greater quality than professional scientists (; ; ; ; ; ; ; ). However, Koss et al. () found that on subtidal rocky shores, estimates of percentage cover of algal species differed significantly between citizen and professional scientists, potentially due to misidentification and/ or confusion between morphologically similar species. Here, we use data collected as part of an intertidal ecology experiment conducted by a marine citizen science program to examine whether the use of a simple, low-taxonomic-resolution algal monitoring protocol allows citizen scientists to generate data comparable to those of professional scientists. To do this, we provide a novel means of assessing the reliability of field estimates of algal percentage cover generated by different types of observers (e.g., citizen scientists, professional scientists, and combined units of citizen scientists working with professional scientists), by comparing them to digitally derived baseline estimates.

Firstly, we assessed differences between algal percentage cover estimates generated by a subgroup of citizen scientists in the field and digital baseline estimates generated by a professional scientist using three different digital analysis techniques. We then investigated differences between estimates of algal percentage cover generated by trained citizen scientists, professional scientists, and combined units in the field, and a single digital baseline estimate generated by a professional scientist (which represents a consistent method of estimation without observer bias). We also explored whether differences between field and digital baseline estimates of algal percentage cover were influenced by the level of algal coverage. This provided a simple means by which citizen science programmes with similar monitoring protocols may investigate the quality of their data, and ultimately, further evidence of the robustness of citizen science data to support its use in research and management.

Materials and Methods

Citizen scientists

The citizen scientists involved in this study were recruited and trained as part of the Capturing our Coast marine citizen science program (www.capturingourcoast.co.uk) that ran in the United Kingdom from 2015 to 2018. All citizen scientists attended a one-day training session delivered by professional marine scientists that covered rocky shore ecology, species identification, and monitoring methods. Training was supported by the distribution of reference materials and ongoing support (including field refreshers) from program staff to aid data quality and participant retention.

Study system

The study was conducted at moderately exposed to exposed shores (, ) across the United Kingdom, including the west coast of Scotland (3 sites), north Wales (4 sites), northeast England (3 sites) and southwest England (2 sites) (Supplemental files 1 and 2: Appendix A). Sites were selected based on their proximity to Capturing our Coast training hubs, and thus trained citizen scientists, as well as their accessibility for participants from a range of demographics. At each site, surveys were undertaken on the low shore, where algal communities were dominated by the canopy-forming macroalgae, Fucus serratus.

Field surveys

Data were collected as part of an intertidal ecology experiment exploring rates of recovery from manipulated disturbances. Experimental plots of 0.25m2 were surveyed seasonally from autumn 2016 to spring 2018 by observer units comprised of (1) a group of 2 to 3 trained citizen scientists, (2) a professional scientist (i.e., individuals with a formal qualification in marine science who had undergone Capturing our Coast staff training), or (3) a combination of a citizen scientist and a professional scientist working together. Firstly, a non-gridded quadrat was placed over the plot, and a top-down photograph of the quadrat was taken. Algal percentage cover (0–100%) was then assessed by placing a 0.25m2 gridded quadrat with 100 squares over the plot and estimating the number of squares covered by live, attached algae (including all canopy, turf, and encrusting species). To account for individual learning styles and approaches to visualisation (), participants could estimate percentage cover using various methods explained during the one-day training session, including summing up partially covered squares to make whole squares, or considering covered areas as patches and visualising the number of squares they would cover if all the patches were moved together. The visualisation method utilised by different participants was not recorded, nor was the date of the most recent survey experience, or the approximate number of surveys they had undertaken previously. The simple protocol employed was selected over others to facilitate the collection of robust data while encouraging involvement from a broad range of participants (Deither et al. 1992; ).

Differences between field estimates and digital baseline estimates generated using three different techniques

Quadrat photographs were analysed by one professional scientist using Coral Point Count with Excel extensions [v4.1] (). The point-intercept method was used to generate digital baseline estimates of algal percentage cover that were then compared with field estimates. To do this, a digital grid was positioned over each quadrat image in three different ways (Figure 1), and the underlying substrate at each grid intercept was identified as one of three categories: (1) live attached algae (including all canopy, turf, and encrusting species); (2) other (i.e., bedrock, dead encrusting algae, invertebrates); or (3) the quadrat frame.

Figure 1 

Comparison of the three different methods of digital grid placement. Grey squares represent the quadrat frame and coloured squares represent the digital grid outline (not including the intercepts). (a) Method 1 (purple) had a 10x10 point grid, (b) Method 2 (green) had a 10x10 point grid, (c) Method 3 (yellow) had a 13x13 point grid.

The first method of digital grid placement involved placing a 10x10 point grid within the largest possible area of the plot while including as little of the quadrat frame as possible (Figure 1a). Algal percentage cover was then calculated as (Total algal intercepts/(Total number of intercepts – Total quadrat frame intercepts) * 100). The second method involved placing a 10x10 point grid within the largest possible area of the plot while excluding all of the quadrat frame (Figure 1b). Algal percentage cover was given as the total number of algal intercepts. The third method involved placing a 13x13 point grid across the entire area of the plot while including as little of the quadrat frame and beyond as possible (Figure 1c). A 13x13 point grid was used as it was revealed to be the minimum grid size that ensured at least 100 intercepts fell within the plot area. Intercepts identified as quadrat frame or beyond were removed, and of the remaining intercepts, 100 were randomly selected to give an estimate of algal percentage cover.

Differences between estimates of algal percentage cover generated by two randomly selected trained citizen scientist units in the field were compared with digital baseline estimates (generated by a professional scientist using the three different methods of grid placement) across 15 plots. Differences were determined by subtracting the digital baseline estimate from the field estimate.

The three methods of grid placement (and thus baseline estimate generation) were selected because they are less subjective, complex, and time consuming to implement compared with photograph manipulation to make the quadrats and digital grids align (which is not always possible due to the angle of some photographs) or with digital area analysis (which would be challenging given the patchy nature of certain algal species). The three methods were compared over a limited number of quadrats to ensure they generated analogous data and that the method selected for subsequent analyses represented a robust means of baseline estimation.

Differences between field estimates and digital baseline estimates generated using one technique across different observer units and levels of algal cover

Observer units that surveyed and photographed ≥ 8 plots per survey (i.e., per site, per season) were identified. This comprised 11 citizen scientist units (i.e., a group of 2 to 3 citizen scientists that had attended a one-day Capturing our Coast training session), 18 professional scientist units (i.e., an individual with a formal qualification in marine science that had undergone Capturing our Coast staff training), and 13 combined units (i.e., one citizen scientist and one professional scientist working together), giving a total of 42 observer units. From each observer unit, eight of the surveyed plots were randomly selected for digital analysis (n = 336 plots). The low quality of one photograph meant digital analysis was not possible, and because no substitute plot from the same observer unit was available, n = 7 was used for one citizen scientist observer unit.

Digital analysis was conducted by one professional scientist who generated a baseline estimate of algal percentage cover using the third method of grid placement (Figure 1c). The third method of digital grid placement was selected over other methods because it incorporated the greatest area of the plot and ensured 100 intercepts within each plot. Differences between field and digital baseline estimates of algal percentage cover were determined by subtracting the digital baseline estimate from the field estimate. A categorical variable for the level of algal cover was generated based on the field estimates of total algal percent cover, with 0% to 25% considered low, 26% to 50% considered low-mid, 51% to 75% considered mid-high, and 76% to 100% considered high.

Data analyses

To investigate differences between estimates of algal percentage cover generated by a subgroup of citizen scientists in the field and digital baseline estimates generated by a professional scientist using three different digital analysis techniques, a linear mixed effect model (LMER) was used. Method type (i.e., field, digital method 1, digital method 2, digital method 3) was set as a fixed factor, and quadrat ID and date were set as random factors. Site and season were not considered factors in this analysis because we were not investigating spatiotemporal variability in algal coverage, but rather variability across methods of estimating algal coverage. Data were converted to proportions (divided by 100), and logit transformed prior to analysis, with 0.025 added or subtracted from proportions equal to 0 or to 1 (; ). Data were visualised as the difference between the field estimate and each of the three different digital baseline estimates.

To determine whether differences between estimates of algal percentage cover generated by trained citizen scientists, professional scientists, and combined units in the field and digital baseline estimates generated by one professional scientist using the third method of digital grid placement (Figure 1c) were significant, and whether differences were influenced by the level of algal cover, a LMER was used. Observer unit (i.e., citizen scientists, professional scientist, combined), algal cover (i.e., low, low-mid, mid-high, high) and the interaction between these factors were set as fixed factors, and quadrat ID and date were set as random factors. As above, site and season were not considered factors in the analysis because the aim was to understand variability in the capacity of different observer units to correctly estimate algal cover. Data were square root transformed prior to analysis. Data were visualised as the difference between the field estimate and the digital baseline estimate, with lower variation assumed to indicate greater accuracy of field estimates.

All analyses and plotting were undertaken in the statistical software R [v.4.1.0] (). Models were generated using the lme4 package (), and model fits were determined through visual examination of the quantile-quantile (QQ) and residual versus fitted values plots. Type II sum of squares were calculated using the Anova function of the car package (). Post hoc Tukey-adjusted comparisons were generated for individual fixed effects using the glht function of the multcomp package (). Graphs were produced using the ggplot2 package ().

Results

Differences between field estimates and digital baseline estimates generated using three different techniques

Differences between field and digital baseline estimates of algal cover were not consistent across quadrats, nor the three digital baseline estimation methods, although in general, field estimates were lower than digital baseline estimates (Figure 2). However, estimates of algal percentage cover were not significantly different across the three estimation methods (χ2 = 4.8105, df = 3, p = 0.1862).

Figure 2 

Differences between field estimates of algal percentage cover generated by trained citizen scientists and digital baseline estimates generated by a professional scientist using the three different methods of grid placement across 15 plots. Coloured circles represent differences, calculated as a field estimate minus a digital baseline estimate, per digital baseline method (Figure 1). Black squares and error bars represent the mean difference across the three baseline estimation methods ±1 standard error.

Differences between field estimates and digital baseline estimates generated using one technique across different observer units and levels of algal cover

Differences between field and digital baseline estimates of algal percentage cover were, on average, greater for citizen scientist units (mean ± 1 SE; –3.68 ± 1.07) compared with professional and combined units (mean ± 1 SE; –0.74 ± 1.06 and –1.21 ± 1.29 respectively), although the range of difference values was greater for the latter two units (Figure 3a; Supplemental file 3: Appendix A). However, overall, differences did not significantly differ across the different observer units (χ2 = 6.1124, df = 2, p = 0.9453; Supplemental file 4: Appendix A), nor did different types of observers vary in their capacities to estimate algal percentage cover based on the field-estimated level of algal cover (χ2 = 6.1522, df = 6, p = 0.3070).

Figure 3 

Differences between field and digital baseline estimates of algal percentage cover across (a) three different types of field observer units and (b) different levels of algal cover. Yellow circles represent differences, calculated as field estimates minus digital baseline estimates (generated using the third method of digital grid placement; Figure 1c) per plot. Black squares and error bars represent the mean variation across all plots ±1 standard error. Values in turquoise represent the number of plots. Significant post hoc Tukey-adjusted comparisons for single fixed effects are indicated as *** = p < 0.05.

Differences between field and digital baseline estimates of algal percentage cover were, however, found to be related to the level of algal cover alone (χ2 = 58.5496, df = 3, p < 0.000). Average differences were greatest for plots with a medium level of algal cover (mean ± 1 SE; low-mid (26–50%) –6.40 ± 3.16 and mid-high (51–75%) –7.02 ± 2.21) compared with plots with high (76–100%) and low (0–25%) levels of cover (mean ± 1 SE; 0.78 ± 0.47 and –1.06 ± 1.69, respectively; Figure 3b). Pairwise post hoc analyses revealed that the most significant differences were between plots with a high (76–100%) level of algal cover compared with those with low-mid (26–50%) and mid-high (51–75%) levels of cover (Figure 3b; Supplemental file 4: Appendix A).

Discussion

Concerns regarding data quality remain a key barrier to the wider use of citizen science datasets (; ). Therefore, data verification and comparisons with professionally generated data are critically important to determine data validity and to promote confidence in the conclusions. This study demonstrates a novel means of digitally assessing field data collected by citizen scientists, and places variation in the data collected by trained citizen scientists and professional scientists in the same context. We demonstrate that when using a simple, low-taxonomic-resolution field-monitoring protocol, trained citizen scientists can generate estimates of algal percentage cover comparable to those of professional scientists. We also show that irrespective of the type of observer, field estimates of algal cover are most variable in plots with medium (26–50% and 51–75%) levels of algal cover. Here we discuss these trends, identify areas for further research, and provide considerations for future citizen science programmes. We also highlight the strengths and limitations of digital verification and field approaches to citizen science.

Trained citizen scientists within this study generated data that were comparable to those of professional scientists, which is consistent with findings of previous studies involving marine citizen scientists (; ; ; ; ). In this case, we believe the result is likely due, in part, to the use of a simple, low-taxonomic-resolution field-monitoring protocol (i.e., estimating total algal percentage cover as opposed to functional group or species-level cover). Estimating percentage cover is a simple yet effective means of gathering data from intertidal environments, which is more accurate (i.e., closer to digital baseline estimates) and repeatable (i.e., less intra- and interobserver variation), and is less time-consuming to implement (thus allowing for greater replication) compared with methods such as point quadrats (; ). The low taxonomic resolution of the protocol reduced the scope for error through misidentifications (e.g., confusion between morphologically similar species) (; ; ), as well as the likelihood of overestimations by participants who may be eager to report rare species (). Participant training is also known to increase the quality of the data generated (), and coupled with the provision of ongoing participant support, is also likely to have contributed to the generation of comparable data within this study, although comparisons with untrained participants are required to support this.

Although not significant, trained citizen scientists tended to underestimate algal percentage cover compared with both digital baseline estimates and other field observer units (i.e., professional scientists and combined units), yet in the laboratory, they overestimated algal cover in to-scale photos compared with professional scientists (Grist et al. 2019 unpublished data in ). We suggest that these differences, plus the small range of difference values observed for citizen scientists, may be due to the perceived importance of real-life field data that results in extra care being taken to get the data right, as well as the ability to physically manipulate specimens in the field. However, the experience level (e.g., previous monitoring experience) of citizen scientists was not considered, and thus further investigations are required to see if and how this may influence the findings. Similarly, an individual’s interpretation of instructions, or their experience on the training day, could have influenced the data they generated (), but it was not possible to assess this in our analysis.

On average, differences in estimates generated by professional scientists were lower than differences in estimates generated by citizen scientists. This may be explained by the higher experience level of professional scientists who often undertake regular field monitoring as part of their employment, and therefore the data they generate is often considered to be more accurate and consistent (), although, the range of differences in estimates was slightly greater for professional scientists owing to a small number of outliers. While the cause of these outliers remains unclear, they could be attributed to survey fatigue, a survey-specific factor (e.g., becoming time/tide limited or being distracted by participants they are supervising) () or errors in image processing, although the latter is unlikely. These outliers demonstrate, like others (e.g., ), that professionally generated data may also contain discrepancies.

Pairing citizen scientists with professional scientists (i.e., combined units) was shown to reduce some of the observed disparity between citizen and professionally generated field data. While we acknowledge that such pairings may not be a feasible and/or cost-effective option for all citizen science projects, where possible, they could be suggested as a beneficial means of supporting citizen scientists with their transition from training to independent surveying, and have been suggested as a means of increasing the appeal of citizen science programmes (). Such pairings may also be beneficial at certain time intervals to re-align citizen scientists and professional scientists, allowing citizen scientists to continue their development and granting professional scientists a dedicated opportunity to reflect on their own surveying skills and potentially enhance their teaching practices.

Irrespective of observer unit, estimates of algal percentage cover were most variable in plots with medium (26–50% and 51–75%) levels of algal cover. This indicates that some observers may find these plots more challenging to survey, potentially because of the patchy nature of algae within these plots. The different visualisation methods participants may have used to determine algal cover in the field may also have contributed to some of the observed variation in plots with medium levels of algal cover. Investigations into whether different field estimation techniques influenced the data generated were not possible here but would be beneficial for future efforts to determine the most appropriate technique to estimate percentage cover. Additionally, investigations into other potential drivers of the variability in plots with medium coverage are required, for example, the number of recent surveys a participant has undertaken, as those with more recent survey experience may “have got their eye in” and therefore generate more accurate estimates. In the meantime, future citizen science project facilitators should consider placing greater emphasis on training all participants (i.e., professionals and citizen scientists) to monitor plots with medium levels of coverage, and increasing replication in ecological studies to account for this variability.

Accurate estimates of percentage cover generated using simple, low-taxonomic-resolution protocols, although limited in terms of resolution, can be beneficial to scientists addressing certain ecological questions, for example, by identifying areas where changes are occurring (e.g., large scale losses/recovery of canopy algae) so that scientists may target these areas for further investigation. In addition, the use of these protocols provides important field experience for citizen scientists, allowing them to become confident employing a monitoring protocol in a realistic setting. The complexity of the protocol can then be enhanced depending on the confidence of observers, the quality of the data generated, and the requirement of the area, although data verification techniques would need to be adapted, and further training provided.

Strengths and limitations of digital and field approaches to citizen science

Owing to recent improvements in photographic technology, the analysis of digital imagery has become a common method of monitoring marine environments (; ) that can generate reliable estimates of abundance for low-resolution taxonomic groups (e.g., coral cover) (). Here, we have shown that digital analysis of photographs also represents a beneficial means of validating certain field monitoring data and understanding observer variation. All three methods of digital analysis generated comparable estimates of algal percentage cover that correlated well to field estimates despite slight differences in the approach (i.e., visual estimation versus point-intercept).

Digital analysis approaches are expected to increase in the coming years due to technological advances such as mobile applications and machine learning (). Such techniques can be considered a cost-effective means of generating monitoring data that constitutes a permanent visual record (). Furthermore, they can remove observer variability as field quadrats are often monitored once per time point, and usually by different observers over time and across sites, whereas digital analysis of a photograph can be done multiple times and by multiple people to minimise observer error and/or biases. Digital analysis can also increase the scope of citizen science projects because it allows for participation by individuals who may otherwise be unable (e.g., due to their location or physical ability). For example, online platforms (e.g., Zooniverse; www.zooniverse.org) have allowed citizen scientists to participate remotely in research by identifying, classifying, and marking factors of interest in images, videos, and audios (), in turn reducing the time-consuming nature of digital analysis for professional scientists. Further research is required to understand the variation in data generated digitally by citizen scientists compared with professional scientists (but see ), particularly in terms of percentage cover.

A major limitation of digital analysis, and therefore a strength of field monitoring, is the capacity to increase taxonomic resolution. Although this was not a problem during this study, it could become an issue should a more complex field protocol be employed (i.e., examining functional groups or a greater taxonomic resolution) because the three-dimensional nature of algal canopies would result in sub-canopy individuals being obscured from the photograph and thus inaccurately quantified, while smaller, cryptic, and finely branched species may go undetected. Additionally, digital baseline estimates could have been influenced, to a degree, by difficulty in determining the status (i.e., dead or alive) of encrusting species such as Lithophyllum sp. on some photographs. While this was minimised by using high resolution photographs, and would likely have influenced digitally generated estimates across all observer units and levels of algal cover equally, future efforts could consider excluding encrusting algae from estimates of algal abundance to minimise this issue. Furthermore, there are several, often unquantified, benefits that can be derived from field citizen science approaches, meaning they cannot simply be replaced by digital approaches. For example, being outdoors can improve fitness and allow for social interaction among participants that can in turn reduce stress levels and enhance mental wellbeing (). Field approaches also allow participants to witness the natural world and the threats it faces firsthand, which may lead to increased environmental stewardship, positive behavioural change, and greater acceptance of management strategies among participants compared with those involved only in digital approaches (; ).

Conclusion

Confidence in the data generated by citizen science programmes is critically important if citizen science is to be fully embraced by the scientific community and used to inform management initiatives. While there is natural variation among observers, digital analyses and placing data generated by citizen scientists in the same context as professionally generated data are effective means of quantifying and examining this variation and ensuring data quality. The low taxonomic resolution estimates of algal percentage cover generated by citizen scientists as part of the Capturing our Coast programme can be considered accurate, with reported changes over time more likely to represent actual changes in marine ecological communities as opposed to differences among observer units. The validity of the data generated was likely due to the use of a simple, low-taxonomic-resolution monitoring protocol alongside effective training of participants and ongoing support and resources that collectively reduced the scope for error. Current and future citizen science projects, including those in the terrestrial realm, would benefit from adopting similar approaches to ensure and evaluate data quality, to strengthen training and protocols, and to promote the wider application of citizen science data.

Data Accessibility Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Supplementary Files

The supplementary files for this article can be found as follows:

Supplemental File 1: Appendix A

Coordinates and exposure of sites where surveys of total algal percentage cover were undertaken. Exposure was determined based on wave fetch values extracted from Burrows (), with sheltered sites having wave fetch values of < 2, moderately exposed sites between 2 and 3.5, and exposed > 3.5 (). DOI: https://doi.org/10.5334/cstp.483.s1

Supplemental File 2: Appendix A

Locations of the twelve moderately exposed and exposed shores across the United Kingdom where surveys of total algal percentage cover were undertaken. DOI: https://doi.org/10.5334/cstp.483.s2

Supplemental File 3: Appendix A

Differences between field and digital baseline estimates of algal percentage cover per individual field observer unit. Circles show individual algal percentage cover estimates with colours representing the different surveyor types. Black squares and error bars represent the mean ± 1 standard error. N quadrats per observer unit = 8 except for unit 7 where N = 7. DOI: https://doi.org/10.5334/cstp.483.s3

Supplemental File 4: Appendix A

Post hoc Tukey-adjusted comparisons generated for single fixed effects to investigate differences between field and digital baseline estimates of total algal percentage cover across (a) field observer units, and (b) the level of algal coverage. Significant values (p < 0.05) are indicated in bold. DOI: https://doi.org/10.5334/cstp.483.s4