Introduction

Public participation in science offers participants opportunities to engage through designing a study, collecting data, analyzing data, and/or interpreting results (; ; ). Such engagement spans from top-down, scientist-driven projects (often referred to as citizen science) to bottom-up, community-driven projects (often referred to as community science). Because there is no globally accepted term encompassing the full range of public participation in science, we default to the term citizen science, which we define as allowing individuals to join or leave at will, where participants are not specifically paid for their work, and where the work results in scientific findings.

In the United States of America (USA) alone, citizen science may now involve tens of millions of participants (), making it a major conduit of scientific information and praxis (). Increasing participation in citizen science is seen as a promise of diversification in science (; ; ), particularly through inclusion of genders and races/ethnicities that are underrepresented in the Science, Technology, Engineering, and Mathematics (STEM) work force ().

However, limited research to date suggests that citizen science participants tend to be white (; ; ) and highly educated (; ; ), at least in the USA. The pattern is less clear for gender (; ; ). This pattern of underrepresentation may be repeated more broadly, at least in the USA. For instance, Schlachter () reported that white, college-educated women are overrepresented in volunteer efforts, suggesting the problem may reflect trends beyond science and citizen science.

In the past 25 years, publications involving citizen science have increased by orders of magnitude (); and across the landscape of public participation in science, equitable inclusion has been recognized as necessary (e.g., ; ). Given these developments, we return to the question of representation in citizen science to examine three primary questions:

  1. Are citizen science participants demographically different from the communities from which they are drawn?
  2. Does the structure or focus of the project explain variation in participant demographics?
  3. Are participant demographic trends changing over time?

In particular, we examined whether the overarching focus of the project (e.g., biodiversity), how the public is engaged (e.g., online versus in person), and where the project is located geographically, influence the demographics of participants. To date, individual studies of demographic trends in citizen science—with the exception of NASEM ()—have focused on only a few projects and/or demographic measures, with the result that a more comprehensive picture is elusive. Although NASEM () reported descriptive statistics (including gender, age, race/ethnicity, education, and previous citizen science experience) for 68 projects, they did not quantitatively examine how project factors co-vary with, or potentially influence, participant demographics, or whether demographics are changing as the social pressures for such change (e.g., ; ) increase.

In this paper, we present a literature meta-analysis that includes peer-reviewed articles between 2000 and June 2022. Our analysis is exploratory rather than hypothesis-driven. However, given the increasing number of calls for more diversity within citizen science (e.g., ; ), our study sheds light on whether diversification is being realized. While our results reinforce some of the prior findings, we also find both temporal trends and high variation in participant demographics, some of which are associated with the structure and/or focus of the project. Therefore, we suggest that there are indications of how the citizen science landscape is changing and may become more inclusive.

Methods

Article set

Rather than redo the search produced by NASEM (), we incorporated their sample set of publications directly into this study. For consistency, we also used their topic field search terms:

  • (“citizen science” or “community science” or “PSSR” or “public participation in scientific research” or “crowd-sourc*”) and
  • (“evaluation” or “survey” or “motivation”).

We extended the NASEM dataset over the period 2017-01-01 to 2021-11-05, focusing our search within Web of Science (WOS: https://webofscience.com), and restricting our search to articles in English.

Our search resulted in 1,474 unique articles. However, articles were filtered out if they did not report on citizen science projects, where citizen science is defined as voluntary participation in scientific research. We further refined our article set to include only those that contained quantifiable demographic data. To this we added one paper published after our literature search: Allf et al. (). Within the citizen science publications, demographic data were collected through surveys, interviews, participant lists, and/or program databases. Our final sample set included 134 peer-reviewed papers that collectively described 157 unique projects (Figure 1; Supplemental File 2). Where projects were featured in multiple citations (e.g., two articles featured NestWatch), we used the most recent publication. Across all studies used in the analysis, there were a total of 151,854 unique within-project participants (because participant identity is protected, it was impossible to determine whether a single person was a participant in multiple projects).

Figure 1 

Counts of articles and projects, demonstrating effects of filtering by project attribute, reported statistics, and geographic location on sample size for each analysis.

To determine whether the publication of papers reporting demographic data of citizen science participants is growing proportionately with citizen science publication as a whole, we compared the publication rate of citations included in this paper with that of citations returned by a WOS topic search for “citizen science” over the years 2000–2021. Because of the rapid rise in publication within the field we used a log-scale, offset by 1.001 to avoid undefined values, and we assessed fit with R2 values.

Demographic data

We collected data on any demographic variable we came across in the papers. In total, we collected information on 28 variables (Supplemental File 1: Supplemental Table 1), representing 10 broad demographic categories. Where possible, we coalesced slightly divergent formulations into a single category. For example, mean and median age had a correlation coefficient of R2 = 86.7% (n = 7 projects), allowing us to combine these into a single category: “central tendency” of age.

Twelve projects reported diverse gender categories (including transgender, gender diverse, other, mixed, non-conforming, and/or non-binary). As these participants made up a small percentage of each participating population (0.1–1%; n = 12), and most studies (n = 117) only reported gender as binary male/female, we maintained gender as binary within our analysis and reported % female. For papers that only presented gender data as % male (n = 14), we calculated % female as % female = 1–% male, assuming that people that were not male were overwhelmingly female. This decision increased our gender sample size to 143 projects.

To estimate retiree data from projects that did not explicitly report percent retiree but did provide age-binned participation, we assumed that 65% of participants age 60+ (i.e., 60 or over) and 90% of participants age 65+ were retirees. When projects reported both the number of people 60+ and the number of people 65+, we used the latter. We assumed that all ages within a reported age bin had an equal probability of occurrence (e.g., if 40% of the participating population was age 55–64, each year represents 4% of the total participant population), and calculated percent of the population aged 60+ or 65+ accordingly. We verified our estimation assumptions with the 10 papers reporting information on percent retirees as well as age-binned participation (R2 = 94.9%; Supplemental File 1: Supplemental Figure 1). Including our estimation of retirees increased the number of projects for which we had retiree data from 14 to 65.

Finally, we constrained all analyses to variables that were reported consistently across projects and for which we had data from 15+ projects. This decision reduced our dataset to five demographic variables (Supplemental File 1: Supplemental Table 2), including: gender (presented as % female), race/ethnicity (recorded variously as the proportion of participants in one or more of the USA census categories for under-represented groups, for minorities, and for the majority group, and presented as % white not including Hispanic), age (collected as a central tendency measure, and presented as a grand mean and standard deviation), estimated retiree participation (presented as % estimated retirees), and education (presented as % with a graduate or professional degree because data for these degrees were reported more consistently across papers than data for associate’s/ bachelor’s degrees).

Project attribute data

We collected information on six project attributes as possible explanatory factors of project demographics. Project information was drawn primarily from the publication(s), with additional information from project websites as needed and available. For categorical coded variables (e.g., project focus), two coders independently categorized each project. Disagreements (~6.5% of the total) were resolved by a third coder followed by consensus through discussion.

Sample size: number of participants for which demographic data were reported. If an approximation was given, e.g., “about 90,” we used that number.

Project location: reported geographic area over which participants were recruited. In some cases, this area was national (e.g., the USA) whereas in others it was more local (e.g., Petersburg, Virginia).

Geo-scheme: We also used project location to group projects according to the United Nations sub-region geo-scheme (https://unstats.un.org/unsd/methodology/m49/). This broad geographic scale ensured we had multiple projects in each area category and allowed inclusion of location as a variable in our models. Projects that took place over multiple geo-schemes (21 projects, or 13.4%) were included in one “mixed” geo-scheme. These projects were primarily online, worldwide projects (17 projects, or 81.0% of the mixed category), such as Foldit ().

Project access: whether participants accessed the project and collected/analyzed data solely online (e.g., CosmoQuest- ; 21 projects or 13.4%) or performed some hands-on activity (e.g., Audubon’s Christmas Bird Count, ; 136 projects or 86.6%).

Project focus: disciplinary focus of the project categorized as: physical science, health, biodiversity, other, or unknown (descriptions in Supplemental File 1: Supplemental Table 3).

Project year: Because the actual year of data collection could be determined for only 67.5% of projects (n = 106), whereas the publication year could be determined for all projects, we used publication year as a proxy for data collection year. We made the simplifying assumptions that publication year is lagged by some unknown quantity relative to the time when the data were collected (for the 106 projects for which data collection year was known, average (mean) time to publication was 2.7 years, standard deviation (SD) = 1.5 years); and that projects with different attributes (e.g., foci) did not vary systematically in publication lag.

To ground our meta-analysis in the fast-growing community of citizen science, we used SciStarter, which “hosts one of the largest online, searchable catalogues of citizen science projects” (), by comparing the frequency distribution across project foci. Because SciStarter does not categorize project foci the same way we did, we mapped SciStarter categories to our foci (Supplemental File 1: Supplemental Table 4).

Geographic comparison

For projects located in the USA and/or Canada and for which the paper gave an explicit “recruitment” geography, we used project location as the basis to collect demographics on the population reported in census data, which we refer to as the “geographic comparison.” We restricted our sample set to the USA and Canada because we could readily access federal census data (USA: https://www.census.gov/; Canada: https://www12.statcan.gc.ca/census-recensement/index-eng.cfm). In cases where census data were included in the study, we used these data (n = 5 projects). For all other projects we created a theoretical “participant pool” of all residents within the project location(s). While our pool is not necessarily the target audience for a particular project, it is not unreasonable to assume that most projects in the area would be accessible to people within the participant pool. We collected census data from the nearest reported year to publication year as follows: age (median), gender (% female), estimated retiree participation (% estimated retiree), education (% with graduate or professional degree), and race/ethnicity (USA: % white one race. Canada: % not a visible minority). When projects took place across the USA and Canada, we calculated averages weighted by population size.

Analysis

Geographic baseline

To determine whether citizen science participants are representative of their relevant geographies, we compared the reported participant demographic data with the census-derived geographic comparison representing the theoretical pool of participants. Gender, race/ethnicity, estimated retiree participation, and education data were collected as a percent, so we calculated percent error (|expected – observed| / expected * 100). Age data were collected as a numerical value, so we calculated the residual (residual = expected – observed).

Linear modeling

We used linear modeling to explore the extent to which variation in demographics are explainable by project attributes. Because of the paucity of published projects in the early part of our sample set (Supplemental File 1: Supplemental Figure 2; 120 out of the 157 projects [76.4%] were published between 2017–2021), we restricted our modeling analyses to data published in the years 2011–2021. Furthermore, we combined the 2011–2016 data into one “year” of data, as the total number of projects in this period was small (n = 11), and doing so did not affect which models were selected as the top model (Supplemental File 1: Supplemental Table 5). This decision increased our sample size to 131 projects. Ten projects (6.4% of the 157 projects) were subsequently dropped because of: “unknown” (2 projects) or “other” (8 projects) focus, leaving 121 projects for our modeling effort (77.1% of the 157 projects, with 107,884 unique within-project participants).

Age central tendency and percent estimated retirees were highly correlated (R2 = 91.5%; n = 4 projects, Supplemental File 1: Supplemental Figure 3). Because age data were often presented as an average without a measure of variation (n = 14), age binning was variable across publications and projects (excepting the tendency to report bins with a break at age 60 and/or age 65), and age central tendency data (n = 14) were not as prevalent as percent estimated retiree (n = 65), we elected to use percent estimated retiree in our models.

We modeled the relationship between each demographic variable and the fixed effects of time (represented by publication year), project access, and project focus. As our response data consisted of the number of participants, Y, identified as being in a demographic group out of a total sample, n, within each program, we modeled our data using quasi-binomial Generalized Linear Models (GLMs; ) with a logit link function, specifying that fixed effects altered the probability [of a demographic group], p, within a program. We used quasi-binomial GLMs instead of binomial GLMs as initial examination of our data indicated that our response exhibited overdispersion relative to a standard binomial distribution. Geo-scheme was added as a random effect, to account for potential variation in demographics due to location.

Models took the general form of:

Yi ~ QuasiBinomial (ni, pi)

logit (pi) = β0 + βaccessi + βfocusi + βyeari + αgeo-schemei

Where βaccess is the categorical effect of project access, βfocus is the categorical effect of project focus, βyear is the trend coefficient through time, and αgeo-scheme is the random effect of geo-scheme for observation i, which is modelled as a normal distribution on the scale of the link function

αgeo-schemei ~ N (0, σgeo).

Because studies with lower participant sample sizes may be more likely to exhibit biased demographics by chance (), we weighted each project in the model by participant count. To ensure projects with large participant counts did not “swamp” smaller projects, we capped the maximum weight (i.e., participant count). We explored caps of 75, 100, and 125. Because this range of potential caps did not affect which models were selected (Supplemental File 1: Supplemental Tables 6 and 7), we used 100 as our maximum weight cap.

With quasi-binomial GLMs, the Akaike Information Criterion is not defined; therefore, we selected variables using the drop1 function in R (). Drop1 sequentially deletes each term from the model, recalculating model deviance without that term. These deviance estimates follow an F distribution and can be used to find an F-statistic and associated p-value as an indication of variable importance. For each demographic variable, we started with the full model with all explanatory variables included and no interaction terms. If all variables had a p-value < 0.05, we used that as the best model. If any variables had a p-value ≥ 0.05, we removed the variable with the largest P-value, reassessing and repeating until all remaining variables had a p-value < 0.05. We validated the final model by plotting the deviance residuals against predicted response values and independent variables to check for violations of model assumptions ().

All modeled analyses were conducted in R (version 4.2.1, www.r-project.org, accessed 6 Aug 2022). To create the models, we used the package lme4 (). We used the confint function to calculate confidence intervals for each predictor (base R; with level = 0.95, using a profile method and 1000 simulations). To display the relationship between the predictors and the demographic variables, we used the package jtools ().

Results

Among the citations returned by our search terms, publications containing citizen science participant demographic information increased exponentially over time, exceeding the rate of publications returned in a WOS topic search for “citizen science” (Figure 2). This suggests that studies reporting demographic data (~7.5% of all citations reviewed as part of this paper) are on the rise. However, the percentage of papers reporting participant demographics is still very small; for 2021, ~2.2%.

Figure 2 

Log (citations, offset by 1.001 to avoid undefined values) for citations returned by a Web of Science topic search for “citizen science” (“WOS”) and citations included in this paper over the years (2000–2021) (“This paper”). Equations and R2 values for lines of best fit (“trend”) are also displayed.

To understand the degree to which we had captured citizen science, we compared project foci from this study to those found in SciStarter (1,599 projects, data collected on 25 Sept 2022; Supplemental File 1: Supplemental Table 4). Both datasets were similarly, and minorly, online (SciStarter 11.9%; this study 13.2%). Biodiversity was dominant for both our project set (57.7% of projects with a known focus) and in SciStarter (54.9%), followed by projects centered on human health (SciStarter 26.5%; this study 27.6%), and physical science (SciStarter 5.6%; this study 8.3%).

Geographic comparison

To assess project reach into the relevant geographic communities, we used the subset of projects for which geographic location was specifically reported, restricting our analysis to the USA and Canada, and using federal census data to create a theoretical pool of participants within each project’s geographic reach. Although this resulted in a small sample set (n = 21 projects), it does provide a cleaner comparison than wider (e.g., USA-wide) geographies, which may be less/not relevant to smaller projects.

Within this subset, participants in about half the projects were more female than would be expected from their geographic comparison (10/17, or 58.8%, n = 17; Figure 3a), and the mean for these projects was slightly above no difference, with high variability (participant count weighted mean: 2.4%, unweighted: 3.5%, unweighted SD = 18.8%). Participants in all projects were more white than their geographic comparison (n = 6, Figure 3a), with a mean 29.3% (unweighted) to 30.5% (weighted) above no difference (unweighted SD = 13.2%), suggesting a statistical and/or social bias against non-white citizen science participants, albeit with a very small sample size. All but one project (4/5 or 80%, n = 5; Figure 3a) had more highly educated participants (i.e., with a graduate or professional degree) than would be expected from the geographic comparison, leading to a mean at 23.0% (unweighted) to 33.9% (weighted) (unweighted SD = 15.8%) above no difference.

Figure 3 

Differences between citizen science and geographic comparison (census) demographics, reported as percent difference (a: female [% female], white [% white], grad. [% with graduate/professional degree]) or as the residual (b: age [years]). Values on the line at zero indicate no difference between citizen science and the geographic comparison, while values above the line at zero indicate project participants were more female, white, educated (a), or older (b) than the geographic comparison (and the converse is true for values below the line).

Participants from all but one project were older than would be expected from the geographic comparison (9/10 or 90%; Figure 3b) with a mean 17.2 (unweighted) to 18.5 (weighted) years above no difference (unweighted SD = 9.1). Because census data includes all residents in the geographic pool, including youth, this result can be interpreted to suggest that citizen science projects primarily target adults. Further evidence supporting this interpretation includes that some papers stated that they limited participation in their demographic surveys to people either 16 and over (n = 3) or 18 and over (n = 19) due to constraints of including children as research subjects. Estimated retiree data were excluded owing to small sample size.

Modeling results

General

To explore potential within-project drivers of demographic biases (statistical and social), we modeled the influence of project attributes on demographics using quasi-binominal models. This analysis was restricted to projects with identifiable attributes and participant count (n = 121 projects). Selected models for each demographic variable contained 1–2 of our predictor variables (Table 1), and all predictor variables were selected in at least one model. Here we report demographic statistics over all projects, as well as model-specific results.

Table 1

Results of quasi-binomial models of the proportion of individuals in citizen science projects who fell into different demographic categories (gender [% female], race/ethnicity [% white], education [% with a graduate or professional degree], and estimated retiree participation [% estimated retiree]). For each coefficient (Coeff) in the selected models, variable estimates (Est), standard errors (SE), the lower (2.5% CI) and upper (97.5% CI) confidence intervals (CI), and p-values are shown. Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1. Significance is assessed against hands-on for project access and biodiversity for focus.


COEFFESTSE2.5% CI97.5% CIP–VALUE

Female (n = 115)

Intercept–517.1888.72–693.08–344.925.46 x 10–8 ***

Year0.260.040.170.345.59 x 10–8 ***

Project access1.88 x 10–5 ***

Online0.620.130.360.891.07 x 10–5 ***

White (n = 14)

Intercept2.960.382.293.808.23 x 10–6 ***

Focus0.00 **

Health–1.240.38–2.09–0.560.01 **

Grad (n = 33)

Intercept–0.750.12–0.98–0.532.91 x 107 ***

Project access0.00 ***

Online0.560.120.320.809.05 x 10–5 ***

Retiree (n = 53)

Intercept–0.750.11–0.96–0.547.70 x 10–9 ***

Focus1.66 x 10–12 ***

Health–1.480.14–1.76–1.212.43 x 10–14 ***

Physical sciences–0.970.47–2.00–0.120.0444 *

Gender

In contrast to the results from the geographic comparison (Figure 3a), male participants dominated slightly over all projects with (binary) gender information (n = 143 projects, 45.6% female, compared with 51.3% female for USA adults, U.S. Census Bureau 2020; Supplemental File 1: Supplemental Table 2). In our models, gender was best explained by publication year and access type (Table 1). Projects published more recently had a higher proportion of women (Figure 4). Women were also significantly overrepresented in online relative to hands-on projects (Figure 5a).

Figure 4 

Fitted values (mean and 95% confidence interval) for the year variable from the selected model for proportion of citizen science participants that were female, along with original (raw) data. Predictions are calculated based on varying the year while holding other predictors constant. The year 2016 represents data from 2011–2016.

Figure 5 

Fitted values (mean and 95% confidence interval) for the selected model for each demographic variable (a = proportion of citizen science participants that were female, b = proportion of participants that were white, c = proportion of participants with a graduate/professional degree, and d = proportion of participants that were estimated to be retired), along with the original/raw data. Predictions are calculated based on varying the predictor shown while holding other predictors constant. For project focus, “Bio.” = biodiversity and “Physical” = physical sciences. Significance is assessed against hands-on for project access and biodiversity for focus.

Race/ Ethnicity

White non-Hispanic participation was quite high (88%, compared with USA adults at 64%; ), although the reporting sample was low (n = 17 projects, Supplemental File 1: Supplemental Table 2). Only one online project and one physical science project reported participant race/ethnicity data, respectively, which precluded modeling. However, we did include these points in graphic comparisons (i.e., Figure 5b).

Given those omissions, race/ethnicity was best explained by project focus. Projects with a health focus had a higher non-white participation (~17%) than those with a biodiversity focus (~8%, Figure 5b), but these values are still quite low when compared to the USA adult non-white population (35.9%, ). The single physical science project in our sample reported 90% white participants.

Education

The average proportion of the participant population with an advanced degree (graduate or professional) over the entire dataset was substantially higher (n = 40 projects, weighted: 43%, unweighted: 35%, Supplemental File 1: Supplemental Table 2) than for the USA population aged 25+ (USA: 13%, ). Education level was best explained by project access, with online projects reporting a significantly higher proportion of people with advanced degrees than hands-on projects (Figure 5c).

Age and estimated retiree participation

The mean weighted age across all participants in all projects for which age was directly reported (n = 52) was 48.1, ~9 years older than for the USA as a whole (38.8; U.S. Census Bureau 2022; Supplemental File 1: Supplemental Table 2). Participants in projects with a biodiversity focus had the highest mean age, while health projects had the lowest (Table 2).

Table 2

For each demographic variable (gender [% Female], race/ethnicity [% White] education [% with graduate/professional degree] estimated retiree participation [% retiree] and age [years]), the project count, mean (%), and SD (% points) for project focus and access. NA indicates we did not calculate descriptive statistics because only one or all but one of the projects were in the sub-category.


COUNTMEAN (%)SD (% PTS)

% Female

Project focusBiodiversity8045.719.4

Health4152.821.5

Physical science1230.411.1

Project accessOnline194521

Hands-on12445.520.8

% White

Project focusBiodiversity1292.015.9

Health473.515.9

Physical science1NANA

Project accessOnline1NANA

Hands-on16NANA

% Grad degree

Project focusBiodiversity2638.115.5

Health1125.617.1

Physical science2NANA

Project accessOnline531.721.3

Hands-on3535.816.3

% Retirees

Project focusBiodiversity3623.517.8

Health2117.719.7

Physical science516.219.6

Project accessOnline812.218.5

Hands-on5722.318.7

Age

Project focusBiodiversity3250.98.5

Health1139.49.3

Physical science648.48.1

Project accessOnline641.44.3

Hands-on47499.6

Across all projects, estimated retirees averaged 21% of all participants (n = 65 projects, Supplemental File 1: Supplemental Table 2), slightly higher than USA statistics when youths are excluded (17.4% of USA adults in 2019 were categorized as retired; Administration on Aging 2021). Estimated retiree participation was best explained by project focus. Biodiversity projects had a significantly higher proportion of estimated retirees than health projects (Figure 5d).

Discussion

To our knowledge, this paper is the most comprehensive meta-analysis of citizen science participant demographics to date, incorporating 157 projects and collectively representing 151,854 unique within-project participants. The strongest signals we found across the English language, peer-reviewed literature that reported demographics of participants is three-part: high participation from educated, white adults; a rising tide of female participation; and project focus influences participant demographics. We found that health-focused projects attracted a younger, relatively more diverse participant base than biodiversity projects, which were older and whiter.

These patterns are echoed in the STEM workforce, at least in the USA. Women make up only ~34% of USA STEM workers (). Blacks/African Americans, Hispanic/Latinos, and American Indians/Alaska Natives constitute ~30% of employed USA workers but only ~23% of STEM workers (). The USA volunteering population also displays overrepresentation of majority demographic groups. Schlachter () reported that volunteers were disproportionately middle-aged, white non-Hispanic females with a college degree. If there are demographic biases in both the STEM workforce and the volunteering population, citizen science may be at the intersection of those two groups, and display some of the same (social and/or statistical) biases.

In their citizen science meta-analysis, Theobald et al. (2013) found that only ~12% of the 388 biodiversity projects they reported on published data in the scientific literature. Our study indicates that single-digit percentages of English language citizen science papers include participant demographics. In a recent survey of 140 German citizen science projects, Moczek et al. (2021) found that only 13–26% of projects reported information on participant age, gender, or education level. Thus, the vast majority of projects, and certainly participants, remain invisible. As Pateman et al. () opine: “practitioners [need to] document and publish participant demographics.” In fact, even in this study of 157 projects, sample sizes for certain variables (e.g., race/ethnicity, n = 30 out of all projects) were quite small. It is therefore important to consider whether the results found in this research are generally applicable.

Findings with respect to gender have been mixed across the literature. Pateman et al. () found males in Great Britain were more likely to participate in citizen science than females. In a broad review of published citizen science projects, NASEM () also found males predominating. Our data suggest wide variation in gender (Figures 3a, 4, 5a), and that the proportion of women participants overall is on the rise (Figure 4). However, we also found evidence to suggest that women were persistently underrepresented in particular project foci—namely, in the physical sciences (Table 2). Male overrepresentation appears to be particularly true in astronomy, where participants in the five projects reported on herein (including both hands-on and online) were only 19–26% female. These disparities can be even more stark. For instance, Curtis () found only 13.2% of participants were female in the physics-based online game Quantum Moves (N = 674).

Why might this be true? Cooper and Smith () suggest gender patterns may be related to the degree to which citizen science project activities are competitive or authoritative (male dominated) versus supportive or participatory (female dominated). Paleco et al. () reported more men participating in technical-related events (e.g., data quality), and more women participating in social science-related events. These studies suggest that gender-based participation may have roots in project focus, and/or in project activities or organization—findings that are echoed in USA academic science graduate degrees, and particularly in mathematics, computer sciences, engineering, physical sciences, and geosciences, which all continue to trend male, despite the continued accession of women (). However, we also believe that while the rates predicted in our modeling are probably an overestimate (e.g., Figure 4), the trend towards greater female participation over time is real. This suggests that citizen science may be able to push boundaries, and further equality, at least in some demographic groups.

Educated white adults

Our analysis indicated an overrepresentation of graduate/professional degrees among citizen science participants, at least relative to the USA average, with extremely high percentages in two online projects (Citizen Sort: 61.8%, ; The COVID-19 Citizen Science Study: 45.4%, ). Allf et al. () reported that participants electing to disclose their information as SciStarter members (n = 423) were also highly educated (at 53% with graduate/professional degrees). That citizen science participants trend toward college and higher education graduates is a result repeatedly demonstrated across the citizen science literature (e.g, ; ; ).

Studies of regional-to-national populations have indicated that citizen science is practiced more often by people who are white and/or those who also enjoy a moderately high socioeconomic status (i.e., middle class; ; ). This finding is repeated in surveys across citizen science populations, which also indicate an overrepresentation of older, white non-Hispanics (; ; ).

Our overall findings reinforce literature results: a propensity, even preponderance, of educated, white adults (Figures 3, 5). Our modeling results (n = 121 projects) suggest that the focus of the project may explain some of this statistical and/or social bias, in that health projects had a higher proportion of non-white (Figure 5b) and younger (Figure 5d) participants as compared with biodiversity projects. One reason may be that health projects often include the intersection of public health, environmental justice, and geographic community (e.g., the Flint water crisis: ; STI/HIV research in LGBTQ+ communities: ), and in that sense are centering in the margins (). Of course, the inverse is also likely true, that biodiversity projects differentially attract older, more affluent, white participants—those with the luxury of time to participate.

There are, however, examples of participant-diverse biodiversity-focused citizen science (e.g., Open Air Laboratories network, ), which may be explained by intentional partnership with community organizations with a mission to support underserved and marginalized groups (). Collectively, these studies suggest that it may be possible to diversify participation in citizen science, and specifically increase non-white participation through purposeful recruitment, including reaching out directly to communities, aligning the project with community interests and goals, and creating projects that do not require significant amounts of time and/or a specific schedule.

Social stratification and solution strategies

Underrepresentation in citizen science, as for science more broadly, is likely driven by a combination of multi-faceted, intersectional factors. Barriers to participation by those who are not white, less wealthy, and/or less highly educated include: time constraints such that citizen science is not a priority (; ; ), the economic constraint of having to work in lieu of days off (; ; ), and/or feeling unwelcome in scientific spaces (; ). Dawson () explored why low-income, minority populations may be broadly excluded from science, using focus groups and interviews. A combination of factors emerged from interviewees, including awareness, feelings of powerlessness, and predetermined exclusion as a function of race/ethnicity. Interviewees imagined participants in science being not like them (in particular, they imagined participants as people who were white, with free time, and with a comfortable income). Dawson () describes these as forms of social stratification within which doing begets acquisition of cultural capital which promotes more doing, whereas not doing promotes exclusion. Many studies have identified the importance of prior participation in citizen science as a significant factor predicting participation (; ; ). The question becomes, how can public participation in informal science activities, including but not limited to citizen science, become more representational of the geographic public from which participants are drawn?

Strategies that have been suggested for diversifying citizen science, and specifically to increase minoritized and marginalized groups, include: constructing authentic community science centered within the traditions of social and environmental justice (; ); framing citizen science around the location, language, norms, and priorities of communities (; ); reducing barriers to participation by providing both online and hands-on ways to participate (; ); incorporating multiple kinds of knowledge (; ,); providing role models that are themselves underrepresented (); and using recruitment strategies specific to diverse audiences (; ). We point out that significant demographic shifts in the “citizen and community science population” are most likely to occur when place-based, community science centered in the margins increases everywhere, at the same time that large-scale citizen science projects realize a more diverse participant base. We posit that the latter may require the scientific community engaged in designing public data collection projects to become more accepting of non–college credentialled participants (), and more willing to take the time and make the effort to honestly engage individuals and groups on their terms ().

Conclusion

This study indicates that citizen science has yet to realize its promise of equal and equitable representation, at least as can be captured in peer-reviewed English language publications. To thoroughly investigate the complex, intersectional topic that is diversity in citizen science, more demographic data needs to be published. Regardless, increasing participant diversity has benefits both for marginalized groups (e.g., ; ) and for science (e.g., ; ). Thus, we suggest that all projects commit to the goal of realizing representation equal to the demographics of their geographic distribution, and further, that the big tent of citizen science (as discussed in ) collectively works to welcome locally based, community projects created by and for those most marginalized from science.

Data Availability Statement

The data is available in Supplemental File 2: Dataset.

Supplemental Files

The supplemental files for this article can be found as follows:

Supplemental File 1: Appendix

Information and data related to the research. DOI: https://doi.org/10.5334/cstp.610.s1

Supplemental File 2: Dataset

Data related to the research. DOI: https://doi.org/10.5334/cstp.610.s2