Introduction

Citizen science (CS) engages the public in scientific research, and thus it is critical that volunteers on CS projects be proficient in project-specific science inquiry skills in order for them to contribute high-quality data and to meet both learning and scientific outcomes (). The National Research Council (NRC) has defined science inquiry skills, broadly, as all tasks required to pursue the work of science (), and these skills are a defining attribute used to characterize models of CS (; ; ). Assessment of volunteers’ proficiency with science inquiry skills can lead to improvements in data quality and confidence in volunteers’ efforts while supporting CS projects’ learning goals ().

Despite this, recent research has documented a clear misalignment between what the field of CS says about the fundamental importance of skilled volunteers and assessment efforts to ensure volunteers have those necessary skills (; ; ; ). For example, in their review of 327 CS projects’ stated goals and objectives, Phillips et al. () found the majority (59%) centered on influencing volunteers’ skills related to data collection and monitoring, implying the importance of said skills. Yet, of the 72 projects that responded to the questions about evaluation, only 28% measured outcomes related to skills—making it the lest measured outcome across all projects. Similarly, only 4 of the 36 CS project proponents (either scientific leads or data managers) interviewed by Bowser at al. () conducted volunteer testing or skill assessment, and Burgess et al. () found of the 125 CS projects they surveyed, 30.9% conducted post-tests, which provide a measure of confidence in data collectors’ abilities. Possible barriers to assessing these skills include lack of time, of staff, of expertise, of funding, and of supporting resources ().

Given limited assessment efforts, it is not surprising that published research on volunteers’ proficiency with science inquiry skills is limited in scope and depth. No articles about what project leaders think about their volunteers’ skill proficiency were found. The majority of skill assessments around collection procedures are self-reported by participants and are project-specific (; ; ; ; ). These studies have found that most volunteers can perform basic skills, such as observing species, identifying species, and collecting data in a standardized manner (; ; ).

In a systematic literature review of participant outcomes of biodiversity CS projects, Peter et al. () found six studies that addressed the acquisition of new scientific skills. Among these, only two studies investigated skill gains across multiple biodiversity CS projects; however, the two papers did not specify the kind of scientific skill gains, but instead only reported they were positive changes (; ). The lack of specificity and reliance on self-reported outcomes can bias toward socially desirable answers (), and suggests such results should be interpreted conservatively.

Analyses of assessments across projects provide insight into broad trends associated with volunteers’ skill proficiency, while direct measures of skills (i.e., beyond self-reported measures) are important complements to self-reported volunteer data and can help validate the quality of data collected by volunteers (). Thus, direct measures of skills across multiple projects would be particularly useful to expand both the scope and depth of research on volunteers’ proficiency with science inquiry skills.

The Present Research

Funded by the National Science Foundation (DRL #1713424), our team studied volunteer skill assessment processes and impacts within the context of CS and published a series of articles based on the results. All of the papers center on the development and use of embedded assessment (EA) and activities integrated into the learning experience that allow learners to demonstrate their science competencies for assessment purposes (; ; ; ).

Within the context of this larger study, we worked collaboratively with staff of 10 CS projects to identify and articulate science inquiry skills common across the projects, and then to develop and implement assessment measures for those skills that could be used by more than one project (see ). Here, we examined findings from the projects’ implementation of these shared measures to explore volunteers’ scientific observation skills in natural settings. Our definition of this overarching skill is based in Eberbach and Crowley’s () paper on how scientific observation evolves. As they wrote, “Scientific observation is not a domain-general practice, but one that goes hand in hand with disciplinary knowledge, theory, and practice” (p. 41), and is in contrast to “everyday observations as those that occur with little or no knowledge of the constraints and practices of scientific disciplines” (p. 46).

Breaking Down Science Inquiry Skills for Assessment

Because science inquiry skills are so broad, it is necessary to move beyond simple lists (e.g., observation, exploration, questioning, prediction, experimentation, argumentation, interpretation, and synthesis) and to conceptualize skills in practice and from the perspective of the volunteer. Likewise, attention and intention are needed to break down broad ideas about science inquiry skills into the smaller, tangible, and measurable underlying dimensions. In our research, we broke down scientific observation into two skills.

The first skill assessed is notice relevant features. Eberbach and Crowley () define “noticing” as “[using] existing knowledge to notice and organize key features that support inferences about deep principles and relationships within biological systems.” That is, an observer is able to match what they see with their knowledge of disciplinary structure. Often, relevant features are used either to distinguish the animal from the background environment (e.g., in a photo) or to accurately identify the organism at a prescribed taxonomic level, such as species.

The second skill assessed is record standard observations. Eberbach and Crowley () define this skill as “record[ing] observations using established disciplinary procedures, standards, and representations.” Standard observations can be spatial (e.g., GPS coordinates), temporal (e.g., date and time of day), environmental (e.g., cloud cover, ground cover), or biological (e.g., species identification or percent of tree leaf senescence).

Volunteers may vary in their proficiency at these skills based on volunteer experience and project data collection procedures. Furthermore, CS project leaders may harbor misconceptions about volunteer skill proficiency, such as assuming that the extent of participation results in improvements. To understand proficiency and associated factors, we asked the following questions:

  1. What do project leaders think about their volunteers’ skill proficiency?
  2. To what extent are volunteers proficient at dimensions of scientific observation?
  3. What differences exist in volunteer skill proficiency based on volunteer experience and project data collection procedures?

Methods

For this research, data was collected from two different groups (CS project leaders and CS volunteers) using two distinct approaches (interviews and embedded assessments, respectively). The follow sections explain the methods used with each group of participants.

Project leader participants

Data collection

The 10 CS project leaders in this study participated in three semi-structured interviews via the Zoom video conferencing platform and two in-person meetings. They were selected for our study because all their projects focused on the skill of scientific observation, and are representative of the variety in CS projects, incorporating monitoring, species identification, and image classification. The purpose of the first interview in the fall of 2018 was to gather initial information about project activities and targeted science inquiry skills. The two in-person meetings, in December 2018 and in December 2019, focused on the development of the shared embedded assessments. All interviews lasted between 30 and 60 minutes and were recorded and later transcribed verbatim. Project leaders were compensated annually for three years with stipends for their participation in the overall NSF research project, which included the development and implementation of the embedded assessment into their CS projects.

Coding

To analyze the interview and meeting notes, three researchers used a six-step collaborative qualitative analysis () to develop an inductive scheme documenting the assumptions that CS project leaders had about their volunteers’ skill proficiency. Interview and in-person meeting notes were then coded and analyzed using NVivo12. Using consensus coding, two researchers coded each interview independently and then compared codes; all disagreements were discussed, and the final codebook was agreed upon (see Appendix A).

Volunteer participants

Data collection

The volunteers involved in this study were recruited by the CS project leaders who were implementing embedded assessments of the two skills. In total, 176 volunteers from seven CS projects participated in the embedded assessments between July 2019 and October 2020. Note, the three remaining projects did not implement these particular assessments, and so did not have any volunteer data to include in this part of the study. Volunteers were not compensated because the assessments were embedded into their projects and were not an additional burden to complete.

Seventy-eight volunteers from five citizen science projects participated in the record standard observations embedded assessment. All five projects used data collection procedures that ask volunteers to record at least three categories of standard observations (including spatial, temporal, environmental, and biological) for organisms (plants and animals).

Ninety-eight volunteers from three CS projects participated in the notice relevant features for taxonomic identification embedded assessment. In these three projects, the data collection procedures ask volunteers to identify animals (insects and mammals) to the lowest possible taxonomic rank.

Embedded assessment instruments

The instruments used to assess these volunteer skills were co-developed with the participating CS project leaders. These embedded assessments are meant to determine whether an adult volunteer can accurately notice relevant features for taxonomic identification or record standard observations, not whether they do so consistently within the parameters of the CS project (see for more details on the development and validation of these assessments). Brief descriptions of each instrument are included below:

  • Notice relevant features. This assessment presents volunteers with images that replicate typical photos of organisms taken in the field by volunteers or from camera traps. Assessment participants are then asked to identify the organism(s) they see in the photos just as they are in the actual protocol, and list the relevant features they noticed in their identification process.
  • Record standard observations. This assessment presents a video clip to simulate the first-person perspective of a volunteer collecting field data. Assessment participants are then asked to record the standard observations (i.e., date, location, ground cover) that are requested on the project’s data sheet.

Assessment scoring

The embedded assessments administered by the seven CS projects were originally scored by the respective CS project leaders (see Supplemental File 1: Appendices A–C) for example scoresheets). One of our team members verified scores and then aggregated the data in Excel by measure: notice relevant features and record standard observations. Missing data were coded as zero. Volunteers participating in their first field season were coded as new volunteers, and volunteers who participated in two or more seasons were coded as returning volunteers.

Data analysis

To explore the extent of volunteers’ ability to notice relevant features for taxonomic identification, and record standard observations, a percentage of accurate answers was calculated for each. Three independent t-tests were conducted to determine if there was (1) a difference in accuracy of species identification between volunteers who could notice correctly one or more relevant features of an organism and those that could not, (2) a difference in ability to accurately notice relevant features and identify species between new and returning volunteers, and (3) a difference in volunteer accuracy to record standard observations of animals versus plants. Cohen’s d was calculated for each t-test to determine the effect size for the comparison between the two means.

Results

As previously stated, data was collected from two different groups, CS project leaders and CS volunteers, and analyzed separately. The following sections explain the results from the data analysis from each group of participants.

Project leader assumptions about skills

Our inquiry with project leaders found two common themes for the assumptions they have about their volunteers’ skill proficiency.

Assumption #1: Volunteers may not need training in order to perform the necessary skills to participate at the start.

Interviews indicated a range of perspectives in relation to how project leaders think about the skills volunteers do and do not possess and their proficiency at these skills when entering the project. All ten leaders made statements relating to this theme. Four out of the ten project leaders reported that their projects required no training before volunteers could start participating. Some project leaders assumed that volunteers understand the protocol (based on the provided information) and can perform the basic skills necessary for participation such as measurement. As one project leader explained:

Because it’s such a voluntary program, people opt in because of a connection to content [i.e., they are already interested in the topic]…When we are expecting people to opt in from the content side, perhaps we take some shortcuts [with our training], which may be problematic.

Thus, they provide training only on content or on more complex skills such as navigating to a field site or identifying a species. Still others provided training only on content knowledge or provided no training at all. The spectrum of perspectives on this assumption is demonstrated through quotes from two project leaders:

We don’t do formal training.

When people first sign up … we mail printed materials with written detailed instructions and examples of how to count… Tips and tricks for how to identify [species] is in a mailed handbook that we give them. The volunteers go through an [in-person] training, learning about the protocol that we use and [species] identification, and then we do a practice survey. There is also online training courses if there is not a local chapter, or if people would want to refresh their skills or familiarity or practice anything.

Assumption #2: Practice makes perfect.

Four project leaders also assume that volunteer proficiency will improve over time and with experience in the project. For example, one project leader stated, “I do feel like you could think of [species] identification as a specialized skill, maybe, and that gets better over time, I would assume,” while another one stated, “What we think..[is by] doing [this] through time, you get better at noticing the [animal].”

Project leaders often justified their assumptions by explaining that their projects provide written documents that volunteers could reference if they had questions. For example, one project leader remarked, “I definitely went in with the assumption that people were using the help resources way more than they actually are.” Another project leader explained this more fully:

…my assumption would be that the longer you participated, the more time you spent looking at the identification guides, the more time you’ve spent possibly doing research elsewhere to try to figure out if you’re submitting a photo or your set of photos. And [if] you’re trying to figure out which of the species it is, you might start with our identification guides. We have some resources on our site, but some of our participants also start Googling and go looking for other information and start trying to build their own knowledge. And so I would expect it would be from that research that they’re doing on our site, on other sites, where they’re being exposed to that language and exposed to how others are describing it. That would be my hunch.

One project leader revealed that they had previously assumed that practice makes perfect, but by conducting an evaluation prior to our work together, they had discovered the active role that project leaders need to play in training volunteers to be proficient at inquiry skills.

…we had done a study of our volunteers at how accurate they were in identifying these animals. The assumption that we had was that they would improve over time, having seen so many images. And what we found was that they weren’t…really improving in their skills for identifying animals over time [when they] weren’t getting feedback from us. So, that’s why that feature was implemented…we felt that giving them feedback would improve their performance.

Volunteers’ scientific observation skills

This study found that the majority of CS volunteers on the seven study projects had the necessary skill proficiency to collect accurate scientific observations. Specifically, 72% of volunteer responses accurately recorded standard observations, 81% of volunteers among the three projects accurately identified species, and 65% of volunteers could accurately notice at least one feature that is considered relevant to identifying the organism. Many of the volunteers could notice two or more features (Figure 1).

Figure 1 

Percentage of volunteers who notice relevant features.

Results also indicated that volunteers who could correctly notice at least one relevant feature of an organism were more likely to identify the species accurately (M = 0.97, SD = 0.17) than those who could not (M = 0.59, SD = 0.49), t(291) = 9.24, p < .05). The effect size for this analysis (d = 1.03) was found to exceed Cohen’s () convention for a large effect (d = 0.80).

We did find a small but significant difference between volunteers’ ability to record different kinds of biological observations accurately. That is, volunteer observations of animals (M = 0.76, SD = 0.43) were more likely to be accurate than their observations of plants (M = .66, SD = 0.47), t(396) = 2.07, p < .05, Cohen’s d = 0.22). Returning volunteers were significantly more likely to notice relevant features (M = 1.17, SD = 0.93) than new volunteers (M = 0.93, SD = 1.02), t(431) = 2.57, p < .05, Cohen’s d = 0.24). In addition, returning volunteers were more likely to identify an organism accurately to the species level (M = 0.87, SD = 0.33) than new volunteers (M = .71, SD = 0.45), t(293) = 2.57, p < .001, Cohen’s d = 0.24).

Discussion

We set out to explore the assumptions that project leaders have about their volunteers’ science inquiry skill proficiency and to investigate volunteers’ actual proficiency at scientific observation, a skill that is fundamental to and shared by many projects. We piloted two shared embedded assessment tools focused on dimensions of scientific observation in natural settings, notice relevant features for taxonomic identification and record standard observations, to answer questions about the extent to which volunteers can perform the skills and about differences in skill proficiency based on volunteer experience and data collection procedures.

While our previous work identified organizational barriers to evaluation (), this study is novel in that it identifies organizational assumptions that function as conceptual barriers to measuring skills. First, some CS projects assume that volunteers come to a project with the necessary skills to participate (without needing training), and second, they assume that volunteers improve in those skills over time through continued participation. These assumptions could influence the way in which CS projects ask for volunteer involvement. For instance, the assumption that volunteers come to a project with the needed skills could mean that projects do not find it necessary to train volunteers, which could lead to volunteers making mistakes and not collecting data accurately. These types of conceptual barriers may stand in the way of CS project efforts to assess volunteer skill.

Our research did indicate that, overall, the majority of volunteers are proficient in skill dimensions measured in this study: notice relevant features for taxonomic identification and record standard observations. The percent accuracy rates reported in this study (between 65% and 81%) are similar to acceptable success rates reported in data validation studies (65–85% and 72%) (; , respectively). This provides empirical evidence to support the assumption held by some project leaders that their volunteers have the necessary skills to participate in the CS project.

However, the findings also suggest there could be nuances in volunteer skill proficiency based on data collection procedures and the skill assessed. In this study, volunteers were more likely to accurately record standard observations of animals than plants, and volunteers who were more accurate at the skill notice relevant features were also more likely to be accurate at the complex skill of taxonomic identification. These findings suggest the importance of assessing volunteer skills so that project leaders understand the training needs of their volunteers.

Not surprisingly, returning volunteers were more accurate in their observations than new volunteers. However, this was a cross-sectional study that collected data from many individuals at one point in time, and thus does not provide any evidence about whether returning volunteers got better over time; another possible explanation for this finding is that people who are better at the skill tend to stay involved in the project for longer. Assuming volunteers will increase proficiency at a skill over time without additional support from the CS project may influence how CS projects design their onboarding and training of volunteers, decreasing the likelihood of continuing education and corrective feedback. This is at odds with the recommended best practices (), which advocate for ubiquitous learning design considerations that include building learning supports, such as training and frequent opportunities for feedback, into the project. That is because, while practice is integral in the process of learning a new skill, practice alone is not enough. For example, feedback coupled with corrective action and/or reinforcement are additional steps commonly recognized in the behavior change literature as necessary for learning ().

Conclusions

Our study contributes to the citizen science field in three fundamental ways. One, it documents assumptions held by some project leaders that are serving as conceptual barriers to implementing assessments of volunteers’ skills. Two, it adds further evidence to the credibility of volunteer-collected data, while, three, demonstrating the value of cross-project analyses using a shared assessment tool.

As previously mentioned, the assumptions held by some project leaders could contribute to the limited efforts at skill assessment by CS projects and, furthermore, to the gap between intended and assessed skill-based outcomes in CS projects. Resolving these types of conceptual barriers is a first and crucial step for projects to implement assessments of volunteers’ skills.

The higher accuracy rates of returning volunteers than of new volunteers suggests future research into reasons for the difference (i.e., practice, less skilled volunteers dropping out after first year). In addition, our study points to the importance of data collected about volunteers as critical for future investigations. A limitation of our study was that we could not conduct an analysis with volunteer training as a variable because only one project collected that information.

There are many ways that this focused examination of skills can be applied to different stages of CS project design and operation, as outlined by Davis et al. (). For instance, skill assessment can inform the type of data a CS project collects, keeps, and discards. It can also inform volunteer recruitment strategies, training topics, and training delivery methods. Identifying skill gaps and updating training or program protocols to fill those gaps can improve data quality proactively. Assessing skills during ongoing participation supports targeted and useful feedback to participants, which can be valuable for volunteer retention and continuous performance improvement (). In team-based projects, a skill assessment could be utilized to form data collection groups, distributing volunteers proficient in the necessary skill across all groups. When skill assessments highlight regularly occurring issues that might otherwise go undetected, data validation processes can include steps to address those issues specifically. Skill assessments can also be used in analyses; this could include developing models that weight data based on assessed skill levels (; ). Beyond the implications for individual projects, volunteers who demonstrate proficiency in a certain skill for one project could be considered pre-qualified for another, thereby streamlining the training process for organizations that are often resource-strapped. Additionally, this facilitated process could be utilized by research projects that use badging and micro-credentials (; ).

The need to assess skills and the challenges associated with measuring them are not unique to CS. Indeed, both formal and informal science education have seen calls encouraging researchers and evaluators to begin using performance as a key metric of skill (; ; ). Skill assessments like those reported in our study are poised to make significant contributions to science education at large.

Data Accessibility Statement

The data used in the research project has not been made available, in accordance with our Internal Review Board’s determination about the best way to protect confidentiality.

Supplementary File

The supplementary file for this article can be found as follows:

Supplemental File 1

Appendices A–C. DOI: https://doi.org/10.5334/cstp.628.s1