Introduction

Citizen science, the participation of non-professional community members in the scientific process, is a popular and growing discipline with numerous benefits to participants and to the scientific community (; ). Through citizen science, participants gain new knowledge and skills (Ellis 2011; ), enjoyable social experiences (), and have an avenue to contribute to conservation and natural resource management (; ). In turn, the professional scientific community can collect information relatively inexpensively over large geographic areas and long timescales, and can generate public awareness and support for science-based policy and management decisions (; ). These benefits can be realized only if citizen science projects are intentionally designed to achieve research objectives (), and if participants have the skills, knowledge, and training to collect high-quality data (; ).

Project design for citizen science can range widely. Some programs, such as eBird () and iNaturalist (), are designed around web or phone applications where large numbers of volunteers upload or document species detections at their convenience, generating copious data points across a large spatial scale. These programs are not necessarily designed with a local research question in mind, but data that are generated can be used for a variety of applications afterward (e.g., ; ; ; ; ). Other large-scale projects, such as Project FeederWatch (), are designed to fulfil a purpose, such as monitoring feeder birds in the winter, and the data can be used to answer many research questions related to that purpose (). Other programs are regional or local in scale, relying on a smaller number of volunteers that are trained to collect data in a specific location and during a defined timeline (e.g., ; ; ). There are benefits and costs across the spectrum of citizen science project types (). Larger observational programs generate large amounts of data in a very cost-effective manner (), but they may not be designed to answer smaller-scale or site-specific research questions, and data can have spatial or temporal biases, and quality issues (; ). More targeted citizen science projects can generate more detailed data that can easily answer desired research questions, but require much more time and investment in training, coordination, and oversight of volunteers (). These projects can offer enriching social experiences, more personalized learning outcomes, and a sense of personal contribution for participants (). However, the additional time required for participation can be barriers for some prospective volunteers (). For these projects that require a robust time investment for citizen scientists and project leads, it is especially important to maximize learning benefits for volunteers and to ensure that data are accurate and can meet project needs.

There is a growing body of literature investigating the quality and accuracy of citizen science data (; ; ; ), but findings vary among projects. In many instances, citizen science data is found to be equivalent to data generated by professional scientists (; ; ; ). However, some studies show citizen scientists falling behind professional scientists in rates of correct species identification (), or demonstrate lower data accuracy for citizen scientists during the early part of a project (). Unsurprisingly, many projects document an improvement in accuracy and species identification skills after specialized training (; ; ; ; ; ), but training strategies vary and training programs must be tailored to meet individual project objectives. Quality and useability of data are influenced by project design as well as by the skills and training of citizen scientists (), so projects should be designed with standardized protocols and pre-determined analytical methods in addition to assessing data quality. There are also quantitative methods that can be used to account for any variation in observer skills or detection probabilities (; ), and identifying these patterns can help guide data analysis decisions that can remove any such biases. Data quality concerns are one of the most common barriers to mainstream use of citizen science datasets (), and without an evaluation of data accuracy and quality, many citizen science projects are met with skepticism by scientists and decision-makers (). In order to build credibility for individual projects and trust for citizen science as a whole, citizen science practitioners should consider incorporating a process for evaluating data quality and accuracy into their project design.

Birds are the focus of many citizen science projects (). As a group, birds lend themselves well to study by members of the community. Recreational birding is a widespread hobby (), and birds are easily observable in locations close to where people live, work, and recreate (), so they are often of interest to the general public. Birds are also commonly used as ecological indicators, and can provide information about the overall ecological health of a site based on the presence and absence of certain species (). Capture or measurement is usually not required to identify them at the species level, unlike other taxonomic groups such as small mammals or insects (). For these reasons, birds are the subject of some of the oldest ongoing citizen science projects, such as the Christmas Bird Count () and the North American Breeding Bird Survey (). They are also the focus of some of the most popular citizen science programs. For example, eBird accumulated one billion bird observations from more than 680,000 observers from their launch in 2002 to spring 2021 (). Despite their popularity in citizen science, birds can present challenges to research and monitoring. Birds move quickly, and surveyors must learn a variety of songs, calls, and plumages to identify them accurately in the field (). Bird species identification errors have been noted in studies conducted by professional biologists (; ; ), and citizen science surveys (; ). With the popularity of birds as subjects of study and conservation, and the potential for quality issues in bird data, developing best practices for training programs in bird-focused citizen science projects will provide widespread benefit.

Because bird identification can be challenging, the opportunity to enhance birding skills can be a significant incentive for participation in a bird-focused project (; ). Providing effective learning opportunities is a win-win for citizen science practitioners who wish to provide an enjoyable experience for participants and ensure that high-quality data are collected (). While consideration of the learning process is not yet widespread within citizen science (), recent research has identified some methods to enhance learning within citizen science programs. For example, learning goals should be established and articulated at the beginning of the project, and these goals should align with project needs (). Additionally, practitioners should recognize that learning is a complex process influenced by social systems, and should consider the motivation, interest, and background of project participants (). Assessments are an important but underutilized method to understand if learning outcomes are being achieved (), and assessments should be embedded within the learning experience and measure both demonstrated knowledge and performance of necessary skills (). Information from assessments and other feedback should be used to alter and improve training program design to ensure that participants learn necessary skills while having enjoyable and enriching experiences ().

We evaluated data accuracy and improvement in species identification in a citizen science bird monitoring program in the Salt Lake City, Utah (UT) metropolitan area. Participants in this program learned to conduct avian point count surveys using the Integrated Monitoring in Bird Conservation Regions protocol developed by the Bird Conservancy of the Rockies (), and both citizen scientist volunteers and professional biologists collected point count data. It is essential to our program that point count data is high in quality and consistent; we strive to generate accurate and reliable information to shape on-the-ground habitat restoration and management activities. If data quality was not equal between these two groups, we hypothesized that citizen scientists may have a lower probability of detecting birds and may be able to identify fewer species when compared with professional biologists. We also hypothesized that citizen scientists may produce biased distance estimates, either because they are less able to detect birds further from the sampling point, or because they make measurement and estimation errors during the survey. We compared species detections, bird counts, and distance measurements by citizen scientists and professional biologists to test these hypotheses. We also analyzed three years of volunteer learning assessments to investigate improvement in bird identification skills by citizen scientists. Based on our results, we provide recommendations for other citizen science programs interested in assessing training methods and enhancing data quality.

Methods

Project overview

Tracy Aviary’s Breeding Season Bird Survey is an ongoing bird monitoring program that began in 2011. We conduct avian point count surveys at study sites across the Salt Lake City region, analyze bird data according to individual management and restoration questions at each site, and provide results to partner organizations, state agencies, and municipalities to help achieve goals of enhancing and preserving bird habitat. For this study, we assessed citizen science data accuracy and changes in species identification skills for three years of Tracy Aviary’s Breeding Season Bird Survey program from 2019 to 2021.

Each year from 2019 to 2021, we recruited 30 to 40 citizen scientists to the program. To recruit participants, we advertised to past participants in Tracy Aviary’s citizen science projects, to the local Audubon chapter email list, to local birding Facebook groups, and through posts on Tracy Aviary’s social media account. Citizen scientists were trained to conduct point count surveys using the Integrated Monitoring in Bird Conservation Regions (IMBCR) point-transect protocol developed by the Bird Conservancy of the Rockies (). Surveys were conducted in teams of two; citizen scientists were paired with other citizen scientist volunteers or professional biologists employed at Tracy Aviary or the Utah Division of Wildlife Resources. These teams conducted point counts at sampling sites throughout the breeding season (April 15 through July 10) each year. The team navigated to a series of sampling points within a study site, and conducted point counts at those sampling points between sunrise and approximately 10am. The observer of the team identified all birds seen and heard at the point during a six-minute point count, and noted the direction, detection type (e.g., visual, singing, or calling), the exact distance using a laser rangefinder, and any other information they could determine about the bird (e.g., age and sex). The recorder of the team wrote all of the observations on the datasheet, noted the minute during the survey (one through six) when the observation was made, and also noted weather and site variables, such as wind speed, cloud cover, ambient noise levels, and presence of water or snow. Sites were visited five to nine times during the breeding season, allowing multiple observers to conduct point counts in the same locations.

Training program

Citizen scientists were trained using a combination of learning methods over a two-month period each year. They attended an initial training session at the end of February, where they were given an introduction to the goals of the program and an overview of the protocol. During this initial training session, all participants took a pre-assessment that tested their ability to identify 20 birds by sight and sound.

During the next nine weeks, citizen scientists had access to weekly online training material that taught them how to identify birds they were likely to encounter at the project study sites. Each weekly online training session also included a quiz so participants could test their knowledge after reviewing the material. During this time, they also attended a minimum of four field training sessions, where staff members met them at project study sites to learn how to use the survey equipment, to practice navigating to point count locations, and to conduct practice surveys.

At the end of the nine-week training session, participants took a post-assessment to again test their ability to identify 20 bird species likely to be encountered at project study sites. They were also evaluated by staff members during a field assessment, where they conducted practice point count surveys in the field. During the field assessment, staff members rated participants on a number of criteria required to complete the surveys, including their ability to navigate to the point count location, their ability to detect and identify birds in the field, and their ability to take accurate distance measurements. Citizen scientists who got at least 80% of the bird species correct on their post-assessment and who achieved all criteria on the field assessment were able to sign up for surveys as an observer, the role that was responsible for detecting and identifying all birds during the survey. Citizen scientists who did not get at least 80% of bird species correct, did not pass any element of the field assessment, or who were not yet comfortable conducting surveys as an observer were able to sign up as a recorder, the role that was responsible for recording data and noting weather and site variables.

Three professional biologists employed at Tracy Aviary also completed data collection for the Breeding Season Bird Survey program. All biologists had at least four years of professional experience identifying birds and conducting scientific surveys prior to completing point count surveys with Tracy Aviary during 2019–2021.

Citizen scientists and Tracy Aviary staff biologists completed surveys during April 15 through July 10 each year, visiting sampling points within each study site five to nine times during the season. All citizen scientist and professional biologist participants provided consent to include their data in the project analyses. Participants were informed of the goals of this project, procedures for how data would be gathered and used, and their ability to withdraw from the project if they desired.

Evaluation of volunteer species identification

To evaluate how citizen science volunteers obtained species identification skills, we compared pre- and post-assessment scores for bird identification by sound for 49 citizen scientists that participated in the program during 2019 to 2021. Some citizen scientists participated in the program multiple years, and we used only the first year of pre- and post-assessment data for these participants to avoid biasing our results toward individuals who repeated the training program. We used a paired t-test to evaluate differences between the percentage of bird species that citizen scientists were able to identify by sound during pre- and post-assessments. We evaluated the statistical significance level (α) at 0.01.

Evaluation of data quality

To evaluate citizen science data quality, we compared several metrics of point count surveys conducted by citizen scientists with surveys conducted by professional biologists. We compiled data from all point counts conducted during 2019 to 2021 at point count sampling locations within five sampling sites in our program. These sampling sites were all located adjacent to the Jordan River, UT. They ranged in size from 3.24 to 102 ha and included a range of 2 to 26 sampling points per site resulting in a total of 46 sampling points across the five sites. Surveyors visited each point five to nine times per year. Fifteen citizen scientists and three professional biologists conducted point count surveys, and point counts were conducted in roughly equal numbers by professional biologists and citizen scientist observers. We paired point counts done by professional biologists and citizen scientists at the same point count location, aligning visits so they were performed near the same time period within the survey window each year. We eliminated any instances that did not have both a citizen scientist and professional biologist point count survey for the same location and time period, resulting in a total of 283 individual point counts conducted by citizen scientists and 283 individual point counts conducted by professional biologists during the three-year study period.

We calculated the total number of species detected, the total number of individual birds detected, and the average distance for each point count survey by a citizen science observer and a professional biologist observer. When birds were detected but not identified to the species level (i.e., they were only identified to family group [e.g., “unknown warbler”] or were not identified at all [e.g., “unknown bird”]), they were not included in the total number of species detected, but were included in the count of individual birds and the calculation of average distances. Many sampling methodologies, especially non-invasive wildlife surveys such as point counts, result in imperfect detection (). We assumed that because most birds establish territories and stay within set home ranges during the entirety of the breeding season, multiple surveyors visiting the same point location within this time should detect roughly the same number of individual birds and species, even if detection rates are less than one for a given survey ().

We then used paired t-tests to evaluate differences in the total number of species detected, the total number of individual birds detected, and the average distance for each bird detection for point count surveys conducted by citizen science observers and professional biologists at the same point count location. For metrics found to be statistically significantly different between the citizen scientists and professional biologists, we performed repeated measures ANOVAs to ensure that individual observer differences within each group were not driving between-group differences. Because we paired these metrics at each point count location, we eliminated any sampling points where we did not have point count data from every observer in the group. We evaluated the statistical significance level (α) at 0.01.

Results

Volunteer species identification

We found a significant increase in bird species song and call identification for citizen scientists after going through the training program. Citizen scientists identified an average of 30.1% more bird songs and calls in their post-assessment (M = 72.7%, SE = 4.1) than their pre-assessment (M = 42.6%, SE = 4.8; t(48) = 1.68, p < 0.00001) (Figure 1).

Figure 1 

The average percentage and standard errors of bird species songs and calls identified by 49 citizen scientists in assessments taken before (“pre-assessment”) and after (“post-assessment”) a nine-week training session during 2019–2021 in Tracy Aviary’s breeding season bird survey program. The asterisk (*) represents statistical significance.

Data quality

The average number of birds detected per survey was not statistically different between professional biologists (M = 24.51, SE = 0.893) and citizen scientists (M = 27.99, SE = 1.434; t(282) = –2.354, p = 0.02) (Figure 2). The average detection distance was also not statistically different between professional biologists (M = 96.44, SE = 3.06) and citizen scientists (M = 97.14, SE = 3.00; t(282) = –0.2504, p = 0.8) (Figure 2). The average number of bird species detected per survey was significantly different between observer types; professional biologists detected more species (M = 11.43, SE = 0.185) than citizen scientists (M = 9.95, SE = 0.159; t(282) = 7.07, p < 0.00001) (Figure 2). We did not find evidence that any individual observer or observers within each group caused this difference; there was no statistically significant difference in species detections between individual observers within the professional biologist group (F(18, 3) = 1.08, p = 0.37) or individual observers within the citizen scientist group (F(82, 1) = 3.18, p = 0.08) (Figure 2).

Figure 2 

The averages and standard errors of (a) the number of bird species detected, (b) the number of birds detected, and (c) the estimated distance to birds in point surveys conducted by professional biologists and citizen scientists at the same point count locations during 2019–2021 in Tracy Aviary’s breeding season bird survey program (n = 566). The asterisk (*) represents statistical significance.

Discussion

Citizen science projects can provide numerous benefits to participants and the scientific community (; Ellis 2011; ), but data quality, accuracy, and useability remain barriers to widespread use of citizen science data (; ; ). Using three years of data from a citizen science bird monitoring project in Salt Lake City, UT, we assessed a volunteer learning outcome and compared species detections, number of birds detected, and distance measurements between point counts by citizen scientists and professional biologists. Our results demonstrate that significant species identification skill acquisition can be achieved through a rigorous training program; citizen scientists could identify an average of 30.1% more bird songs and calls after they received training. By several metrics, point count data collected by citizen scientists in our program were equivalent to data collected by professional biologists. Citizen scientist participants detected similar numbers of birds and estimated similar distances to bird observations as professional biologists performing point counts in the same locations, but detected an average of 1.48 fewer species per survey. Our findings emphasize the importance of evaluating training programs and data accuracy for citizen science projects.

Overall, we did not find many differences in point counts conducted by citizen scientists and professional biologists. Because the average number of birds per survey was equivalent between both observer types, it appears that this lower number of species was a result of citizen scientists being unable to identify as many birds to the species level, rather than an outcome from a lower detection rate for birds at the point count location. Citizen scientists detected and estimated measurements to the same number of birds, but used more “unknown bird” codes per survey. Within our dataset, citizen scientists used “unknown bird” codes 366 times, while professional biologists used these codes 130 times. When they weren’t identified to the species level, these birds were not counted toward the final species count.

The point count surveys in our program are designed to be analyzed using an occupancy modeling or distance sampling analysis that produces site occupancy or abundance estimates (). In both of these analysis methods, it is assumed that the probability of detecting a species in a given survey is less than one, so detection probability is estimated as well as occupancy or abundance during data analysis (; ). As long as surveys are otherwise conducted without biases, the slightly lower ability for citizen scientists to identify species in a given survey can be incorporated into the detection probability for that species by including the observer type as a covariate (; ). Certain bird species and taxonomic groups can be more challenging than others to identify (; ). We found that the species that were detected less often by volunteers were species with more traditionally challenging identification characteristics, such as warblers or Empidonax flycatchers (). For example, in our dataset, professional biologists detected 5 times more Wilson’s Warblers (Cardellina pusilla), 4.4 times more Yellow-breasted Chats (Icteria virens), and 1.4 times more Yellow-rumped Warblers (Setophaga coronata) than citizen scientists. A MacGillivray’s Warbler (Geothlypsis tolmiei), Orange-crowned Warbler (Leiothlypsis celata), and Dusky Flycatcher (Empidonax oberholseri) were all detected by professional biologists but not citizen scientists. While the difference of 1.48 species detected per survey may not seem like a large number, this tendency for certain species to be unidentified more often by citizen scientists than professional biologists would result in different detection probabilities between the two observer types. Especially considering that the presence of many of these species can be used to evaluate riparian health (), it is important to generate accurate estimates of their distribution and abundance within our study sites. Because we were able to identify this pattern, citizen science data can still provide unbiased estimates for species occupancy and abundance with a minor addition to the analysis.

Our findings are similar to other studies that have examined the quality of citizen science data in comparison to data gathered by professional scientists. In many studies, citizen scientists can produce similar data to professional scientists (; ), but they are more likely to fall behind professionals when performing more difficult or specialized skills such as difficult species identification () or estimates of abundance (). For example, a study by Newman et al. (), found citizen scientist participants to be able to identify 16–20% fewer invasive plant species than professionals. Even when volunteers performed worse than professional scientists, their data is often still able to be used in meaningful ways (; ; ). This is true for our study; even though our volunteers identified slightly fewer bird species, their data was otherwise unbiased, and could be used by partners to achieve conservation objectives. Because we evaluated our program’s data and made adjustments during the analysis to ensure its quality, our partners can have high confidence in our findings, and citizen scientist participants can be assured that their work is contributing to meaningful bird conservation efforts.

As with many citizen science projects, our program has multiple objectives, which include providing an enriching experience for participants as well as generating high quality data. Through conversations with participants and post-project surveys, we know that many people are attracted to our project in part because they are interested in improving their bird identification skills. Having an effective and enjoyable training program is essential for the success of our project, and evaluations of the program enable us to modify our methods to ensure they are successful (). Over the twelve years that our bird monitoring project has been running, we have been able to use yearly feedback as well as training assessments and investigations of data quality to improve our training program. We have found that a combination of online written lessons, quizzes, and in-person field practice work best for our training program; this variety of methods ensures that participants with different preferred modes of learning can each gain the information they need (). We have also experimented with the order in which species are introduced throughout the training period, and found that our participants did best when similar species are taught together, and when species are taught so that they can be observed in the field at the same time they are introduced in online or written material. For example, migratory species are taught only after they arrive in the area.

While some participants may be drawn to an intensive training program that teaches difficult species identification skills, the time commitment may deter some potential volunteers from participating (). Broadly, participation in citizen science is biased toward white, more well-educated, and older individuals (; ; ). Underrepresentation by traditionally marginalized communities in citizen science can mean that important knowledge and ways of thinking are overlooked (). A lack of inclusion can also lead to biases in the data and a lowered long-term viability for projects (). These communities are also unable to receive the benefits of participation in citizen science, such as acquiring new skills or contributing to local conservation efforts (; Ellis 2011). Anecdotally, our program tends to attract older, white, more affluent, and retired participants. In part, this may be due to the amount of time needed to complete the training program. The requirement of assessments may also be a deterrent to participants who are newer to birding, or who feel unsure of their bird identification skills. One method that we have tried to combat this barrier is our practice of pairing participants and providing the recorder role in our data collection process. The recorder is not responsible for identifying birds or estimating distances, but writes down point count data generated by the observer, and keeps time for the survey. This role provides a lower pressure way to participate in data collection, and may appeal to newer birders who may not feel confident in their bird identification skills. In the process of growing our citizen science program, we have also developed other projects that may be more appealing for members of marginalized communities for a number of reasons (); some are developed in partnership with community groups that serve these communities, some are offered in multiple languages, and some require a much smaller time commitment for the training process. It is important for citizen science projects to consider their desired audience, and to pursue opportunities to increase accessibility for all members of the community when designing their training program.

There are several limitations of our study that should be considered, especially if other groups are interested in replicating this process or furthering research in this area. First, our evaluation of learning was very limited in scope. We were able to assess only one outcome: how species identification skills changed over the course of our training program. The assessment we used for this outcome was a written assessment in which citizen scientists listened to audio tracks of bird songs and calls, which is not the same context in which citizen scientists would actually apply these skills during our project. Including both a performance assessment like this written test, as well as an authentic assessment for which citizen scientists actually perform the skills for the project would be a more holistic measure of whether participants gained species identification skills (). Citizen science has the potential to foster a variety of learning outcomes, from species identification and data collection skills to different scientific reasoning and critical thinking skills (). Future work should investigate multiple learning objectives, and consider additional data sources, such as surveys and feedback forms, to investigate the breadth of learning outcomes that are possible. A second limitation is that our method for evaluating data quality can be used only by projects that have both citizen scientists and professional biologists collecting data in the same locations. Many projects rely exclusively on citizen scientists for data collection, or do not conduct repeated surveys in the same locations. When using point counts performed by professional biologists as our standard, we also made a major assumption that professional biologists would collect high-quality data. We felt confident in this assumption based on the amount of training and background experience held by professional biologists in our group, but this assumption should be considered before embarking on a similar analysis.

Other practitioners interested in evaluating learning in their own projects should consider the following questions: 1) What learning goals do we have for participants in this project? and 2) When during the training process or project duration are these goals attained? Assessments or surveys should be undertaken before, after, and even during the learning period to measure any changes during the training process or project duration. Assessments should be designed to directly measure the desired learning goal or goals. Practitioners interested in evaluating data quality in their own projects should consider the following questions: 1) What are our data quality needs to meet project goals and answer research questions? 2) Where might biases exist in the data set? and 3) Is there a desired or standard data set that can be used to evaluate project data? If no such data set exists, practitioners could generate these data on their own, or may need to rely on an evaluation of data collection skills rather than a direct comparison of data sets to understand any potential data quality limitations.

Conclusion

A common goal in citizen science projects is for data to be used to advance scientific research and influence science-based policy and management decisions. Even with mounting evidence that citizen scientists can collect high-quality and accurate data (; ; ; ), citizen science has not yet achieved widespread use (), and valuable data are sometimes collected without being translated to on-the-ground application (). Our findings emphasize the importance of evaluating training programs and data accuracy for citizen science projects. Incorporating such assessments into project design should be standard practices for citizen science programs. These exercises will help ensure that project objectives and data quality needs are being met (), will identify any need for modification of the data analysis methods to account for differing data quality or biases (), and will ultimately increase the degree of trust in project findings and citizen science as a whole ().

Data accessibility Statement

Data can be accessed in the online supplementary material.

Supplementary Files

The Supplementary files for this article can be found as follows:

Supplemental File 1

Scientific data for pre- and post-assessment species identification comparison. DOI: https://doi.org/10.5334/cstp.604.s1

Supplemental File 2

Scientific data for comparison between citizen scientist and professional point counts. DOI: https://doi.org/10.5334/cstp.604.s2