Introduction

Citizen science programs are a potential way for recreational and protected areas, such as U.S. national parks, to engage with their visitors (e.g., through bio-blitzes, bird counts, and picture post stations) (; ; ). The participants for these programs are generally local study area residents, but tourists may also be effective participants. While other studies have compared the scientific outcomes (i.e., data products and other scientific research products) produced by different groups of participants (; ), tourist participants and their effect on scientific outcomes have not been explicitly examined. This paper explores the potential for tourist participants to engage in citizen science programs by comparing species observation data collected by tourist and resident participants in the Map of Life-Denali citizen science program. This comparison allows us to better understand if tourists can be effective participants and produce similar scientific outcomes to resident participants.

Scientific outcomes of citizen science include datasets and data products such as analysis and modeling results, peer-reviewed papers, and reports (). For this paper, we focus on the datasets collected by participants, specifically species occurrence data, which is commonly collected in citizen science programs (; ). These data types often include taxonomic information, spatial information about where the observation occurred, temporal information about when the observation occurred, and sometimes species absences (; ). The geographic aspects of species occurrence data give it properties concerning data quality that are not often encountered in other types of volunteer-generated data (; ).

Species observation data is generally collected and depicted as point data and can be aggregated to create species range and habitats maps and in species distribution models (). While researchers have long been collecting species data through surveys, GPS collar tracking, and other means, citizen science participants can now collect similar point-based data with additional taxonomic information through phone-based applications. Smartphone-based citizen science applications, like Map of Life, iNaturalist, or eBird, are examples of platforms that allow participants to collect these data accurately. These data are often uploaded to online data hubs like GBIF and are free to researchers and the general public (). Species observation data collected through these apps have been used for species modeling and other conservation biology efforts ().

These data, however, can be prone to observer error, such as false observations, inaccurate location data, or incorrect species identification (; ). Previous studies have examined the quality of citizen science data compared with authoritative datasets and have developed various methods to do so (; ; ; ; ). These methods range from an expert review of data records to data quality flagging, statistical analysis, and determining data fitness for use. For example, , used a control data comparison method that involved taking a subset of data collected by the most experienced participants as control data and then comparing the spatial accuracy of the participant data to the control data.

The types and severity of observer error may be influenced by a participant’s characteristics, including demographics (age, education, etc.), training, or familiarity with the study area and discipline (; ). Studies have used some of the validation methods described above to compare data collected by different types of participants. These studies have focused mainly on age, education, and level of project-specific training. For example, Delaney et al. () found that more highly educated participants produce more accurate data through an expert review of their data. We aim to add to the understanding of how participant characteristics impact the scientific outcomes of a citizen science project by examining the differences between tourist and resident participants and their effect on data accuracy.

We focus this research on a comparison between tourists and residents because there is a tendency for citizen science programs to use participants who either live near the study area or tourists who can participate in training or commit to longer-term involvement through an ecotourism program (; ; ; ; ). In these programs, tourists intend to devote their trip to participate in a research project, such as Earthwatch trips, gap year trips for students, etc. (; ; ; ; ). These programs have proven to be impactful for science and the participants. For example, in a study of ecotourist participants by Weaver (), participants felt they learned more about the environmental problems of the study area and improved their personal well-being. Ecotourism programs, however, are not accessible to a more general tourist population; they can be costly and time-consuming.

In some geographic settings, such as National Parks, historical sites, and popular urban centers, the typical tourists often visit without the intention of volunteering as a significant part of their trip or even a part of their trip. Regardless, these typical tourists are eager to learn more about the area they are visiting (; ). These typical tourists represent a large pool of potential citizen science participants (); the National Park Service alone expects an 8–23% increase in visits in the next few years (). To reach a wider tourist audience who may not be seeking out ecotourism opportunities or may not have the money or time to dedicate to an ecotourism trip, these potential tourist participants can become involved in less time-consuming and less financially intensive citizen science programs.

Some hesitancy around recruiting tourists to participate in citizen science programs may be because of the possibility of unique spatial and temporal biases. In particular, with species identification, participants familiar with the study area or subject are shown to produce more accurate data (). Additionally, tourist destinations have a time of year that is more popular for tourists, and tourists may not venture away from popular sightseeing areas. However, there are established protocols for assessing some of these issues. For example, prequalifying skills or knowledge assessment is one way to determine participant knowledge/skills and the potential for accurate data collection (). If participants do not already possess the skills or knowledge needed for the programs, some programs offer training sessions on using instruments or correctly collecting data. If these training sessions are short and regularly offered (or on-demand), more tourists may be able to participate.

Additionally, smartphone applications and other technologies make it more feasible for tourists to participate in citizen science programs. Mobile device–based citizen science applications, such as eBird, iNaturalist, GLOBE, and Map of Life (), are free and easy to operate. This technology allows for accurate, efficient, and cost-effective data collection for citizen science programs, especially when participants collect spatial data. Many smartphones have built-in location systems that can be used with or without phone service (; ). Many people already know how to use smartphones and therefore do not need instrument training. With simple in-app tutorials, any smartphone user could potentially contribute valuable data in popular tourist areas ().

To examine the possibility of tourists participating in citizen science programs, we focused on assessing the accuracy of tourist-collected species data collected through the Map of Life- Denali program. We hypothesized that although tourist participants are likely more unfamiliar with the park wildlife than the resident participants at the start of their participation in the program, using a simple mobile app, the tourists and resident participants would collect comparable datasets. To address this hypothesis, we analyzed the results of a wildlife knowledge quiz embedded into a pre-visit survey to assess the local resident and tourist participants’ knowledge about the park and wildlife. We then compared the species occurrence data collected by the two groups through the Map of Life (MOL) mobile application to see if they still produced data similar to knowledgeable local participants despite the tourists’ unfamiliarity with the park.

Methods

Study area

This study was conducted in Denali National Park and Preserve (Denali NP&P) (Figure 1). The park covers 6 million acres and is home to charismatic megafaunas such as Grizzly bears, moose, caribou, Dall sheep, and wolves. The wildlife viewing opportunities and landscapes of Denali NP&P attract ~500,000 summer visitors each year. Most (90%) of the visitors are tourists (i.e., not from Alaska) who spend, on average, three days in the park (). This large number of tourist visitors in Denali created a potential pool of participants for this study.

Figure 1 

Denali National Park and Preserve. The park road provides the main point of access into the park and meanders through prime habitat for much of Denali’s wildlife, and it is only accessible for visitors via bus (Park Map obtained from https://www.nps.gov/carto/).

Denali NP&P officials were supportive of this research for two main reasons. First, during the Centennial for the National Park Service (2016) and Denali NP&P 100 anniversary (2017) celebrations, park managers welcomed projects that directly engaged the public in park research initiatives. Second, there was potential for this project to support the park’s Road Ecology Program (REP). The REP actively monitors how the ecology along the park road is impacted by the transportation system (). Data from the park’s Ride Observation and Record (ROAR) program are currently used to inform the REP’s monitoring and research. ROAR participants are local residents or seasonal workers compensated for riding the park busses and recording wildlife sightings and behavior. Denali NP&P was interested in exploring the use of citizen science to complement the data collection efforts of the ROAR program and wanted to explore how a citizen science program focused on collecting species occurrence data could expand the spatial and temporal extent and scale of the ROAR data.

Data collection and analysis

The key instrument in the Map of Life-Denali citizen science program is the Map of Life mobile app, designed and developed by the Jetz lab at Yale University (). (Figure 2). A Denali-specific sub-app (called Map of Life-Denali) was developed to be used without a Wi-Fi or cellular data connection; both are limited in the park. The customized home page includes links to the surveys used in this research. An information page with park-specific information, such as animal safety warnings, and a description of the citizen science project, is featured in the Map of Life-Denali application. When a participant searches for a specific animal, identifying photos and detailed species information appear in the application. The application allows participants to record the precise location of their wildlife observations while touring the park with their phone’s internal GPS to capture spatial data. The participants were encouraged to collect data while riding the buses into the park, going on hikes, or camping there. Thus most of the data collected for the project is concentrated around the park road corridor and does not cover the vast background areas of the park. This particular spatial bias was welcome because the park was interested in seeing how the data compares with the ROAR program data and may be helpful to the Road Ecology Program.

Figure 2 

Map of Life- Denali Mobile Application: home page, species information page, and record observation page (mol.org).

Participants were recruited through signage advertising the citizen science program posted at the park visitor centers, the bus depot, campgrounds, the community library, the park employee information board, the local coffee shop, and hotels. Additionally, the authors personally recruited participants while visitors waited to board buses to enter the park. Participation in the program and research study was open to anyone over 18. The Arizona State University Internal Review Board (STUDY00003874) and the National Park Service (DENA-2016-SCI-0002) approved this study and the data collection methods.

The pre-visit survey was linked in the MOL app, and participants were instructed to take the pre-visit survey right after downloading the app. Through the pre-visit survey, participants self-identified as either tourist (non-Alaska residents) or residents (Alaska residents, including seasonal workers). The survey also included a short wildlife quiz that asked participants multiple-choice and true/false questions about the park’s wildlife and ecosystems to determine how much the participant knew about the park (Table 1). The authors and the park social scientist developed the quiz. The quiz format was influenced by the quizzes given to participants in the Portland Urban Coyote Project (). The quiz section of the pre-survey was analyzed using descriptive statistics (i.e., mean). The pre-visit survey also included questions that do not pertain to this particular paper’s scope, a comprehensive survey analysis, and a full copy of the survey published in .

Table 1

Wildlife quiz questions from the pre-visit survey.


QUESTIONQUESTION CHOICES

What should you do if you are confronted by a moose?
  • Stand your ground
  • Throw rocks
  • Run
  • Play dead
  • I don’t know

True or False, both male and female Caribou have antlers.
  • True
  • False
• I don’t know

Which bird turns all white in the winter?
  • Ptarmigan
  • Common Loon
  • Mountain Chickadee
  • Gyrfalcon
  • I don’t know

What is an indication of climate change?
  • Tree line moving higher in elevation.
  • Glacier melt
  • Wildlife behavior changes
  • All of the above
  • I don’t know

What are the two major ecosystems in the park?
  • Tundra and Taiga (Boreal Forest)
  • Tundra and Rainforest
  • Taiga and Temperate Forest
  • I don’t know

How far must you stay away from a bear?
  • 100 Yards (meters)
  • 300 Yards (meters)
  • 25 yards (meters)
  • 700 yards (meters)
  • I don’t know

True or False, Denali’s wolf population has lost nearly ⅔ of its previous population levels.
  • True
  • False
• I don’t know

The species occurrence data used in this study comes from data collected with the MOL app in the Summer of 2016. These data contain the wildlife observation’s geographic coordinates, taxonomic information for the species, a time stamp, and a unique observer I.D. These data were retrieved from the Map of Life server on September 30th, 2016, and cover the time period between June 25th to September 25th, 2016.

There are various methods to compare spatial datasets; we chose to employ relatively simple spatial analysis methods that can be done with open-source software and replicated by other citizen science practitioners with potentially limited experience analyzing spatial data. First, we performed exploratory data analysis using heat map overlays; then, we used Ripley’s L function to explore the relationship between the datasets further. We used these methods to compare resident and tourist data for three megafauna in Denali: Ursus Arctos (Grizzly bear), Rangifer tarandus (caribou), and Alces alces (moose).

The heat maps show, in raster layers, the density of wildlife observations made throughout the study period for residents and tourists. The value of each cell in the raster layer represents the number of species observations made in that cell. The raster layers were then used in a spatial overlay analysis to identify where the species observations from these two groups overlap. This analysis shows if the residents’ and tourists’ species observations for bear, caribou, and moose are in similar geographic locations, and allowed us to visually compare the species observations collected by the tourists with the resident-collected data.

To create the comparative heat maps, the species occurrence point data was converted to raster format through the Kernel Density tools in ESRI’s ArcMap (this method can also be done with Q-GIS an open-sourced spatial analysis software). A raster consists of rows and columns of cells where each cell contains a value; in this case, the cell values represent the number of species observations made within that cell. This Kernel Density tool calculates the density of points around each output raster cell; thus, a smooth surface is created. The chosen cell size was 1,000 meters because of the level of error in the Map of Life-Denali application (participants can choose how far they are away from the wildlife; 1,000 meters is the farthest option). The cells with the highest values contain the points, cell values decrease farther away from the point, and cells with zero values are at the limit of the search radius distance from the point. Each cell’s values in the two raster layers are added together via ESRI’s ArcMap Overlay tool to create a new raster layer, reflecting where these raster layers correspond.

This resulting overlay raster area was calculated to determine how much the tourists’ species observations coincide with the residents’ species observations. This area is determined by counting the number of cells whose value does not equal zero (if the cell equals zero, these underlying raster layers do not overlap at all in that cell). To calculate how much the tourist’s species observations coincide with the resident’s species observations, the tourist raster layers’ total area was divided by the overlap raster.

To further compare the tourist and resident data, we performed Ripley L Function analysis (a variation of Ripley’s K function, also referred to as Besag’s L Function). Ripley’s K function and its variations—e.g., the L function- compares spatial data made up of points. Specifically, the function determines if point datasets have a similar spatial distribution or if the distribution of points is spatially random at a given distance from each other (). The function determines this by putting a buffer around a random point in the dataset and computing the proximity of other points, and it repeats this at multiple distances. If multiple points are found in the proximity, then the spatial pattern is likely clustered and not random. The Bivariate L Function is a preferred modification of the K function because it can produce more interpretable results ().

The function output includes a graph showing if the included datasets are clustered together or dispersed at given distances. If the graph shows the data is clustered, it indicates a similar point pattern and distribution between the datasets. If the graph shows that the data is distributed, it indicates the datasets do not have a similar point pattern and are not spatially similar. A mathematical formulation of the L function can be found in Bailey and Gatrell (). In ecology, the Ripley L function—or variations of it—has been used to compare human-wildlife relations, predator-prey interactions, etc. (Hasse 1995; ; ; ). This analysis allowed us to quantify how the tourists’ observations were distributed relative to the observations made by residents, and to determine if the tourists were making observations that were spatially similar to those of residents.

To account for significant differences in sample size between the residents and tourists, a random sample was taken from the more plentiful tourist data to match the smaller sample size of the resident data. The random sample was derived from the Biogeography add-on sampling tool in Esri’s ArcMap Version 10.3, which selects a random sample from the existing point data. Ripley’s L function was run in R 4.0.2 using the spatstats package and L-cross command. The function’s confidence envelopes (the range of the best outcomes of the distributions) were calculated with 999 simulations of the model of complete spatial randomness.

In addition to analyzing the tourist and resident data with Ripley’s L function, we compared the Map of Life–collected data with ROAR program species data, which park biologists actively use for monitoring. This analysis helped us understand the potential for using species data collected by tourist participants to enhance an authoritative dataset like the ROAR program data. The ROAR data used in this analysis were also collected during the summer of 2016 and included taxonomic information, location information, and a time stamp. This additional analysis is an initial assessment of the data accuracy of the Map of Life data. A comprehensive data fitness-for-use analysis of the full participant collected dataset is presented in Fischer et al. ().

Results

We first analyzed the pre-visit quiz results to assess the local resident (n = 22) and tourist (n = 117) participants’ knowledge about the park and wildlife. The residents scored an average of 89.79% on the quiz (meaning they correctly answered 89.79% of the quiz questions), and the tourists scored 59.84% (a difference of 30.25% between the quiz scores). Many of the resident participants answered every question correctly. The pre-visit quiz shows us that the resident participants are knowledgeable about the wildlife, so we expect to get accurate data from this group and thus treated the resident data like control data (similar to ). This quiz also indicates that data from tourist participants may contain some errors (likely species identification errors).

The tourist and resident Grizzly bear observations reflect similar hot spots determined by the overlay layers (Figure 3a). These raster layers were derived from 21-point observations from the residents and a random sample of 21-point observations from the tourists. Darker colors represent hotspot areas, that is, areas with a high density of observation points. The total area of overlap is 502 km2. The tourists’ observations overlap 66.93% with the residents’ observations; thus, most tourist observations overlap with the resident observations. The darker red area represents the overlay raster layer, where there is a greater consistency amongst these data. Some hot spots in the tourist data are inconsistent with the resident data and vice versa. However, the overlay shows consistent hot spots in known areas of Grizzly bear habitat in the park.

Figure 3 

Heatmaps of each of the study species showing overlay between the tourist and resident collected data. (a) Grizzly bear observations from tourist (green) and resident (blue) volunteers (with resampled tourist data due to differences in sample size-resampling methods described in the Methods section). The overlay is shown in red. (b) Caribou Observations from tourist and resident volunteers. (c) Moose observations from tourist and resident volunteers.

The caribou observations made by tourist and resident participants are represented in Figure 3b. This map shows areas of hot spots from each participant group, and the areas where these data overlap are shown in red. The total area of the overlay with the resampled tourist data is 613 km2. The tourist observations overlap 70.13% with the resident observations; thus, most of the observations overlap. The hot spot areas are not as pronounced as they were with the Grizzly bear observations. The area with greater consistency among the two samples (darker red areas) is in the park’s known areas of summertime Caribou habitat.

Tourist and resident moose observations are depicted in Figure 3c. Areas of darker blue (resident observations) or green (tourist observations) represent hot spots with a higher density of observation points. The total area of the overlay is 422 km2. The tourist observations overlap 65.73% with the resident observations. The overlay area (red) shows where the two sets of observations overlap, and the darker red area indicates a higher consistency amongst these data. It is expected to see a high density of moose observations near the park entrance area (upper right-hand side of the map) because this well-known moose habitat. The results from all heat map analyses show an overlap of the datasets. Thus, we expect to see a correlation between the datasets with Ripley’s L function analysis.

Results from the L function analysis on the Grizzly bear observations (Figure 4a) show that the tourist (T) and resident (AK) data are significantly clustered at all considered distances (r) (the solid black line representing the species data in the graph is above the upper bound of the envelope-expected range of values). L(r) values larger than the confidence envelope indicate that the tourist data pattern more closely followed the pattern of the resident data than expected if it were random points, that is, the tourist data and the resident data are spatially similar. The caribou observations (Figure 4b) and moose observations (Figure 4c) also show clustering between the tourist- and resident-collected data. The clustering of the tourist and resident datasets with all three species indicates that the tourist-collected data is on par with the resident data– at least for these megafaunas.

Figure 4 

Results from L Function analysis of tourist and resident (a) Grizzly bear, (b) caribou, and (c) moose observations. The solid black line on each graphs shows the observed points. The dotted red line is the expected random pattern, and the grey area bounded by the high/low lines shows the confidence envelope. In each graph, the black line is above the envelope across the considered distances in kilometers (r), indicating the two point datasets are clustered and have similar spatial patterns.

We used Ripley’s L function to compare the point patterns of our three study species (bear, caribou, and moose) collected through Map of Life and the ROAR program to examine the accuracy of the Map of Life data as a whole (with both tourists and resident observations). Park biologists consider the ROAR program data to be accurate and authoritative and are used by park biologists for monitoring reports. Figure 5a shows the bear observations comparison of the Map of Life data and the ROAR program data, and the data are significantly clustered at all considered distances. Figure 5b shows the Map of Life and ROAR Caribou observations, and Figure 5c shows the Map of Life and ROAR moose data. Both also show clustering. The MOL and ROAR data clustering with all three species indicates that the MOL data is spatially similar to the ROAR data for these species.

Figure 5 

Results from L function analysis of the Map of Life and ROAR program data comparison of Grizzly bear, caribou, and moose observations. Panel a compares the Map of Life and ROAR Bear observations, and the two datasets are clustered. Similarly, in panels b and c, the caribou and moose observation data are also clustered.

Discussion

Using both a map overlay technique and spatial statistics, this study shows that despite tourist participants being at first unfamiliar with the park wildlife (as indicated by the pre-visit quiz), tourist and resident participants produce similar species occurrence data. We recognize that many factors can affect the ability of tourists to be effective citizen science participants. Our study focused on a simple-to-use mobile app and analyzed data on easy-to-identify megafauna. If projects wish to engage tourists in their program, they need to design the program to be done by participants who may not be familiar with the study and may have limited time to spend on the project but likely have a lot of enthusiasm to learn more about the place they are visiting. We suspect we would not see such high clustering and overlap in the data if we performed an analysis comparing tourist and resident data on lesser-known species, such as ground squirrels or some bird species. More studies should be done to understand how to engage tourists in citizen science effectively and how projects can be designed with these participants in mind.

The two methods we used to compare the datasets were reasonably simple to implement and use in this data exploration stage. We hope that citizen science project practitioners will utilize these or similar methods to perform data quality checks on their data and compare different groups of participants to ensure data accuracy across the project. For example, these simple checks can help program facilitators determine whether to retrain participants or adjust training and recruitment methods. These checks can also be used to compare citizen science data with authoritative datasets, as we did with the MOL and ROAR data. The heat maps visually represented the tourists and resident data for each species. The heatmaps also help determine if the tourists are misidentifying species. For example, if a tourist consistently misidentifies a caribou as a Grizzly bear, this sighting may be an outlier in the Grizzly bear overlay analysis. The L function was helpful in statically verifying what is visualized in the heatmaps.

Our analysis had some limitations and challenges. The limited timeframe in which these data were collected created limitations regarding the small sample size for the surveys and the species data. The sample sizes captured in this case study reflect the proportion of tourists versus residents who visit the park (every year, on average, 15% of park visitors are Alaska residents (), and 15% of our participants were residents). However, for spatial data analysis purposes, the difference in sample size between the two groups was addressed by choosing random samples from the tourist dataset to equal the same number of observations in the resident dataset. The random samples were then averaged to create a final assessment of, on average, how much the tourist samples overlay with the resident samples. This method should be further refined and possibly replaced by an automated bootstrapping approach.

Conclusion

Researching the capabilities of participants and characteristics of the data they collect, as well as measuring outcomes of citizen science programs, is a prerequisite to ensuring successful programs for both the participants and the scientists. While citizen science data are being more widely accepted as an accurate and valuable source of data, many variables can affect the quality of a particular citizen science dataset, including participant characteristics. However, a properly designed program that considers participants’ characteristics can diminish the potential effects on scientific outcomes. The program we studied was designed to be simple and accessible to many participants, particularly those unfamiliar with the area in which they are collecting data. This research shows the potential for tourists to be effective participants and support research and monitoring in national parks and other tourist areas. This paper also shows that the participant-collected dataset (Map of Life) is similar to the authoritative data (ROAR program) for the species we focused on. This result shows the potential for citizen science data to be used in conjunction with authoritative data to create more spatially and temporally robust datasets.

By researching an untapped pool of participants and showing the potential for engaging them in citizen science, we hope that more programs will consider partnering with recreation areas like national parks and including tourists in their programs. However, we caveat this call to action with caution. Practitioners must consider the program’s design if they wish to engage tourists. A citizen science program that includes intensive training or a complicated task is likely not appropriate for tourists. Designing for your participants is key to the success of the project both in terms of participant outcomes and scientific outcomes (; ). Connecting with tourists, particularly in national parks, can enhance the tourist experience and improve the program’s scientific outcomes. Tourists can also help projects collect data on a larger spatial and temporal scale. Additionally, the experience of participating in a citizen science program can be transformative and provide people with a sense of stewardship and intensify their connection to the study area (; ), thus encouraging more people to be engaged, global citizens.

Data Accessibility Statement

Species occurrence data made through the mobile application data are available from the authors upon request. Data from Denali National Park’s ROAR program or the pre-visit quiz survey is not publicly available.