Introduction and Background

One of the most inflammatory topics in animal welfare and wildlife conservation is free-roaming domestic cat (Felis catus) management (; ). Free-roaming domestic cats, as we use the term here, cover a behavioral spectrum from fractious feral animals to sociable outdoor pets. There is a need to control their populations to protect threatened and endangered wildlife species (), safeguard public health from disease and injury (), and curtail nuisance behaviors such as noisy mating activities (), urine spraying, and fighting with pets ().

The health and well being of free-roaming cats themselves is an important issue to cat welfare proponents, as well as to the public, which generally supports non-lethal management of cats (). Cat advocates, focused on reducing feline deaths, champion the use of spay/neuter (often called trap-neuter-return or TNR, and community cat programs) with the aim of shrinking free-roaming cat populations over time as sterilized adults do not replace themselves with new kittens (). While evidence exists to support the arguments that spay/neuter efforts are linked to reductions in animal shelter intake and euthanasia (), no long-term field research has demonstrated that these programs are affecting the metric of concern to conservation stakeholders: the sheer number of free-roaming cats on the landscape. Two controlled studies of spay/neuter interventions in North America ran for only a year apiece and concluded that the treatment areas did not experience a population decline (; ). This research is dismissed by cat advocates as having failed to collect data for a sufficient amount of time.

There is a need for long-term monitoring of the potential effects of spay/neuter programs on free-roaming cat population sizes. To be acceptable to all stakeholders, this research must combine science-based population estimation methods and multi-year time spans. Such an initiative could answer questions such as whether spay/neuter programs can reduce population sizes of free-roaming cats, what proportion of cats in an area need to be sterilized to see a population reduction, and how long it takes to see that decline. While computer simulations have resulting in widely varying answers, these ideas have never been field tested. Further, these answers are likely to vary based on location-specific context, so there is a need to replicate the same style of cat population monitoring in multiple locations so program efficacy can be compared based on factors such as climate and level of urbanization.

The gold standard for quantifying animal populations is photographic mark-recapture modeling, which estimates a population size based on how often animals in an area are detected and photographed, often combined with spatial data. An animal is “captured” and “marked” when it is first photographed and identified, and “recaptured” when photographed again, creating sighting histories for analysis. While there are statistical methods being developed to perform this type of modeling with only partial identification of individuals (), the ability to accurately identify each individual is preferable.

Photo identification using an animal’s natural marks has been tested across many terrestrial and marine species (e.g., ; ; ; ). These studies sometimes compare the accuracy of identifiers from two groups: those who are considered experts or professionals, and those presumed to be less experienced, such as citizen science volunteers (). However, despite an abundance of free-roaming cat population research that relies on photos as data (e.g. ), and individual photo identification studies in other felids (; ), no whole-body photo identification work has been published about domestic cats.

Researchers from the world of machine learning and computer vision have had success in testing algorithms for identifying domestic cats by their noses () and faces (; ), but these approaches rely on high-resolution, well-lit photos of selected areas of the body in cooperative subjects. Such images would not be obtainable by field biologists collecting free-roaming cat photos via camera traps, smartphones, or possibly even DSLR cameras with telephoto lenses. Artificial intelligence tools are not yet advanced to the point where they can be used for identifying individual domestic cats in real-world research applications.

Field data collection for photographic mark-recapture research commonly uses motion-activated camera traps placed over a study area to obtain photos for building capture histories. However, these methods are not without drawbacks and limitations. Maintaining a camera trap array consumes professional labor, cameras and security boxes require purchasing and maintenance, and time is spent sifting through massive photo collections where many images feature swaying plants or the movement of humans and other non-target species. Camera trap placement permissions, theft, and vandalism are also of concern, especially if used in urban and populated areas.

We propose an alternative to traditional camera trap arrays and data management: engaging volunteers as both smartphone-wielding human camera traps and as data processors in identifying cats and building capture histories for analysis. While one free-roaming cat research program solicited members of the public to host camera traps on their property as part of the DC Cat Count () and another deploys volunteers in road-based cat counts for the Hayden Island Project in Portland, OR () no one has yet attempted a holistic, multi-role citizen science approach for studying free-roaming cat populations. To that end, a nonprofit citizen science program, Kitizen Science, is being developed to monitor free-roaming cat population sizes in urbanized areas in collaboration with spay/neuter organizations.

A flourishing body of literature appeals to practitioners to ensure that tapping into crowdsourced free labor does not cause an unacceptable decrease in the rigor of the scientific process (, ). While there is always benefit to ensuring that research methods are subject to transparency, validation, and piloting measures, this is especially true in areas with high conflict between stakeholder groups and where results could affect policy decisions.

In response to these needs, the first validation study of Kitizen Science’s methods tested our intended approach’s most basic assumption: whether citizen science volunteers drawn from a pool of cat advocates can identify individual free-roaming cats in smartphone photos at a reasonable level of accuracy, thereby making them capable of building credible capture histories for mark-recapture analysis. One metric for defining a reasonable level of accuracy is to compare volunteers in a citizen science program’s target demographic against those in the demographic that is already trusted to perform this type of task. Here, we tested the success of cat advocate citizen science volunteers in identifying and matching cats taken with smartphone photos, and compared them with a reference group of biology and environmental science students—a labor pool commonly trusted to organize data from camera trap research. Further, our study sought to probe the question of cat identification from two perspectives: what makes a volunteer better at cat identification, and what makes a cat photo more identifiable?

Methods

Software and technology

Kitizen Science is based on the Wildbook platform: open source, cloud-based citizen science software developed by the conservation nonprofit Wild Me to support photographic mark-recapture wildlife field studies. Our configuration features numerous additional customizations for use in free-roaming cats in populated environments. A key difference between ours and other Wildbook projects is that we engage citizen science volunteers as both photographers and animal identifiers, whereas other Wildbook projects use artificial intelligence for the latter.

Photograph collection

We obtained photos of cats in June and July 2019 in various locations in Washington and Oregon during the late afternoon and evening when free-roaming cats are known to be active and visible. We shot most photos of cats for this study in the same manner as we would direct volunteers to do for our program, where a person walked along a street or sidewalk and photographed any observed cats. We also included some photos from a rural property where the resident gave us verbal permission to photograph their cats. We included these to supplement the urban/suburban backgrounds of the rest of the photo collection. We used an iPhone XS (Apple, Inc., California, USA) smartphone’s default camera without applying the digital zoom tool or any filters. We also photographed some cats with a Nikon D5300 digital SLR camera with a 70–300 mm AF-S Nikkor zoom lens (Nikon Corporation, Tokyo, Japan) to ascertain physical features less visible with a smartphone. All photos were gathered by a single researcher (SA), allowing us to be confident in the true identities of all cat photos included in the collection, creating a validation study where we can compare volunteer responses to a known true state.

We did not crop or retouch photos to enhance brightness, contrast, lighting, or saturation to make cats more or less identifiable, but we did make alterations to some photos to blur house numbers, street signs, license plates, commercial signage and logos, human faces, cat collar tags with visible text, and other potentially identifying information to protect trademarks and the privacy of residents in areas where we collected photos.

Volunteers

We recruited two types of people to form our test and reference groups. We solicited our test group of cat advocate citizen science volunteers online via cat welfare Facebook Pages and Groups, Twitter, personal contact networks, and an email list of feral cat advocates. These volunteers could be located anywhere so long as they were over 18 and had an internet-connected device. We chose these advertising methods to be similar to how we would recruit future participants for Kitizen Science. We solicited our reference group of student research volunteers at the University of Washington via flyers in environmental and life science buildings on campus, department email lists, and through making announcements in undergraduate wildlife courses. We chose these advertising methods and single-institution scope to mimic how a traditional university-based researcher would seek assistance in processing camera trap images from a photographic mark-recapture study.

During our website signup process, volunteers agreed to an informed consent statement to participate as a research subject, signed up for a user account in agreement with a set of software terms and conditions, completed a demographics and personal background survey, and an explanation about the study and a set of instructions. Only after completing and clicking through these pages could a volunteer proceed into photo-matching tasks. We created separate enrollment pages for cat advocates and students, with slightly different survey questions, but all other aspects of the matching study were the same for both groups. We used the English language only in recruitment material and on our website. Volunteers could ask questions about study and instructions via email. Volunteers participated at their own pace within the 9-week period spanning summer and autumn of 2019 that was allotted for each group.

Study website

We organized photo-matching tasks into trials of 120 pairwise photo comparisons in which our website presented a volunteer with 1 of 50 randomized photos of a different cat to match and a randomized photo of a potential match, and the volunteer was asked a simple yes/no question of whether the same cat appeared in both photos (Figure 1). Each trial contained 0 to 5 correct matches for each cat to match within the library of 120 images. This arrangement was chosen to be a simplified version of how our citizen science website workflow will function: volunteers will be presented with a cat to match and then look through potential matches in search of the same individual. Volunteers were required to make a binary “yes” or “no” decision on each pair of photos (; ), as previous research found that photo classification participants offered an uncertain option overused it (). We restricted volunteers to completing a maximum of 2 trials (240 photo pair comparisons) per 24-hour period to avoid observer fatigue (). For analysis, we excluded incomplete matching trials () on the assumption a user was merely curious about the project and not putting in genuine effort ().

Figure 1 

A screenshot of our study website that presented volunteers with two cat photos for comparison. Users could zoom/pan to explore detail.

Covariates

To investigate which traits make people better at identifying cats or make a cat photo more identifiable, we summarized matching results at the level of volunteers or cat photos along with covariates of interest for each. With the human models, we examined only the volunteers from our target demographic of cat advocates.

Human model covariates included the random effect of the user, and the fixed effects of gender, whether they care for feral cats, whether they have pet cats, whether they have ever volunteered with cats, their level of education collapsed to whether they hold a bachelor’s degree or higher, if they have previously participated in other image-classification citizen science projects, and the mean amount of time they spent completing a trial (with outlier pauses of 120 seconds or more removed on the assumption users had stepped away from the task). We believe that the small bias created by these time deletions, which resulted in occasionally undercounting total completion time by likely seconds, was more acceptable than the bias of over-counting total completion times by up to thousands of minutes.

Cat photo model covariates included the random effect of each photo, and fixed effects of whether the cat’s color/pattern was black, how much of the cat’s face was visible, the proportion of the frame occupied by the cat (a proxy for distance from camera and therefore cat resolution), whether the cat had a differentiator in the form of a visible collar or visible removed ear tip (an indicator of a sterilized feral cat), and the mean amount of time users spent viewing the photo (with outlier pauses of 120 seconds or more removed). Face visibility was grouped into three categories: none (face obscured), partial (one quarter to three quarters of the face visible), or full (face toward camera with markings on both sides visible). In calculating the proportion of the photo’s frame occupied by a cat, the cat’s longest aspect was measured in pixels using the ruler tool in Adobe Photoshop 2020 (Adobe, Inc., California, USA) and divided by the horizontal width of the photo (4032 pixels) to create a proportion of the frame occupied by the cat. We excluded cats’ tails from inclusion in width calculations as they are not always fully extended or visible. Measurements maximized cat length to account for body positions captured at different angles. We measured walking cats from a cranial point such as snout or front paws to a caudal point such as hind feet or rump, or in seated cats from rump to ears. For cats obscured by objects, our measurement only includes the visible portion of the cat. To account for minor variations in measurement, each cat was measured twice and we used the mean number of pixels.

After data collection was completed, we discovered that one cat photo in our collection had a black cat hiding in the shadows under bushes in the background. This cat was difficult to see and we decided to retain this photo’s matching results data in our analysis on the assumption that the second cat was very unlikely, although not impossible, to have been spotted by volunteers and confused for the cat in the center of the photo frame to be compared.

Analysis

We extracted user response data from our website such that all pairwise comparison responses were converted to a binary of correct/incorrect and whether they were true positives (correct match), true negatives (correct non-match), false positives (volunteers selected that two cat photos were a match when they were not a match), or false negatives (volunteers selected that two cat photos were not a match when they were a match). We compared the results of our cat advocates and student volunteers and summarized volunteer survey demographics for each group.

We used Generalized Linear Mixed Models (GLMMs) to analyze mixed effects data, along with Generalized Linear Models (GLMs) for those without random effects. We modeled our response variable under the binomial distribution where the number of pairwise photo comparisons were treated as trials and the number of correct matches as successes. Our GLMMs included 1) models with random effect plus one predictor variable, 2) a full model with all potential predictors, and 3) an intercept-only GLM. Based on Akaike Information Criterion corrected for small sample sizes (AICc) rankings of these models and which variables were emerging as common to best-performing models, additional combinations were created and compared. For our human accuracy question, we fit ten additional models that included the following combinations of covariates: user, whether they have pet cats, and gender; user, whether they have pet cats, and if they have participated in other image classification citizen science projects; user, whether they have pet cats, and the mean amount of time spent completing a trial; user, whether they have pet cats, and whether they care for feral cats; user, whether they have pet cats, and whether they hold a bachelor’s degree or higher; user, whether they have pet cats, and whether they have ever volunteered with cats; user, whether they have pet cats, whether they have ever volunteered with cats, and whether they care for feral cats; user, whether they have pet cats, whether they have ever volunteered with cats, and whether they hold a bachelor’s degree or higher; user, whether they have pet cats, whether they have ever volunteered with cats, whether they care for feral cats, and whether they hold a bachelor’s degree or higher; user, whether they have pet cats, whether they hold a bachelor’s degree or higher, and whether they care for feral cats. For our cat photograph identifiability question, we fit four additional models that included the following combinations of covariates: photo and the proportion of the frame occupied by the cat; photo and whether a cat had a visible collar or removed eartip; photo and the mean time users viewed the photo; and photo, proportion of the frame occupied by the cat, and the mean time users viewed the photo. Upon determining the best model of each question, a GLM was built without random effects to determine their importance. Lastly, we re-ranked our models by AICc to determine the new best model for each question.

We performed analyses using R version 3.6.2 () in RStudio version 1.2.5033 () with the packages lme4 (), MuMIn (), and plyr ().

Results

In total, 151 cat advocates and 17 life science university students participated in this study by completing a total of 37,800 pairwise photo comparisons using our online platform (See Supplemental Files 1–5 for data analyzed).

Our volunteer background and demographics survey revealed similarities and differences between groups. The mean age for cat advocates was 47.1 (range 19–76) and 23.9 (range 18–50) for students. In both volunteer groups, the majority were women (cat advocates 90.7%, students 82.3%) and selected white as their race/ethnicity (cat advocates 90.1%, students 70.6%). Many in both groups had a pet cat or cats (cat advocates 90.1%, students 58.8%), with 35.1% of cat advocates and no students reporting that they cared for feral/free-roaming cats. Only 7.9% of cat advocates had previously participated in an online citizen science project doing image identification or classification, compared with 23.5% of students. A majority (62.2%) of cat advocates held a bachelor’s degree or higher. Regarding experience volunteering with cats, 57.6% of cat advocates were currently involved, 17.9% previously volunteered, and 24.5% reported having never volunteered with cats. Among students, 17.6% had previously volunteered to do image identification or classification as part of research that was not online citizen science, such as viewing camera trap images for University of Washington wildlife researchers. This student group was comprised largely of undergraduates (82.3%), with a few master’s and doctoral students (5.9% and 11.8%, respectively). See volunteer demographics summarized in Table 1.

Table 1

A summary of survey responses from volunteers.


CAT ADVOCATESSTUDENTS

Number15117

Are you currently involved in volunteering with cats in some way?

      Yes57.6%

      No24.5%

      Not now, but in the past17.9%

Do you have a disability or personal limitation (such as being a parent/caregiver)
that prevents you from volunteering with cats in a typical offline setting like a shelter?

      Yes8.6%

      No78.2%

      Sometimes13.2%

Do you currently have a cat/cats in your care?

      Yes, a pet cat/cats58.3%58.8%

      Yes, I care for feral/free-roaming cats3.3%0.0%

      Yes, a pet cat/cats AND Yes, I care for feral/free-roaming cats31.8%0.0%

      No6.6%41.2%

Have you ever participated in an online citizen science project doing image identification or classification?

      Yes7.9%23.5%

      No92.1%76.5%

Have you ever volunteered to do image identification or classification as part of research that is
NOT online citizen science, such as viewing camera trap images for UW wildlife researchers?

      Yes17.6%

      No82.4%

Age

      Mean (range)47.1 (19–76)23.9 (18–50)

Retired

      Yes21.2%0.0%

      No78.8%100.0%

Gender

      Man8.0%11.8%

      Woman90.7%82.3%

      Nonbinary/Other1.3%5.9%

Race/ethnicity

      White90.1%70.6%

      All other options (including mixed race selections that included white)9.9%29.4%

Highest level of education

      Less than bachelor’s degree33.8%

      Bachelor’s degree or higher66.2%

What is your current standing in school?

      Undergraduate82.3%

      Master’s Student5.9%

      Doctoral Student11.8%

Cat advocates’ matching attempts (n = 34,080) were correct 98.1% of the time compared with students’ 97.5% (n = 3,720). Among cat advocates, there were 33,089 true negatives, 329 true positives, 143 false negatives, and 519 false positives. The students had 3,594 true negatives, 34 true positives, 10 false negatives, and 82 false positives. While both groups performed highly, students made 27.3% more errors than cat advocates. This difference between groups was statistically significant using a chi-squared test (χ2 = 4.831, df = 1, p-value = 0.0280) using an alpha of 0.05. In considering the two different error types, which add confusion to animal capture histories and impact population estimates, cat advocates were 3.6 times more likely to make false positive than false negative errors, compared with the students’ 8.2-fold increase.

Our best model (by AICc) to determine what influences a human’s ability to match cat photos accurately was the GLMM, which contained user as a random effect and whether one had a pet cat, whether one had volunteered with cats previously, and whether one held a bachelor’s degree or higher as fixed effects (Table 2). Reporting a pet cat increased a volunteer’s cat photo identification accuracy, and holding less than a bachelor’s degree or having ever volunteered with cats reduced accuracy. Model diagnostics supported the model as fitting the data. In performing an ANOVA test between our best model and an intercept-only GLM to determine the importance of including user as a random effect, our best model was significant (<0.05). A Chi-squared goodness-of-fit test indicated no lack of fit (p > 0.05). The model was underdispersed due to a high accuracy rate by most volunteers.

Table 2

Models predicting which personal traits affect cat advocacy citizen science volunteer’s accuracy in matching cat photos, ranked by AICc values. AICc values are used in model comparison and selection. The lowest score represents the most plausible model of those considered. The weight values are the relative likelihood of a model.


MODELSDFAICCWEIGHT

GLMMS

user + pet_cats + volunteer_ever + degree5691.50.289

user + pet_cats + volunteer_ever + degree + feral_cats6692.60.170

user + pet_cats + degree4693.20.127

user + pet_cats + degree + feral_cats5693.30.119

user + pet_cats + volunteer_ever4694.40.071

user + pet_cats3695.10.050

user + pet_cats + feral_cats4695.10.048

user + pet_cats + volunteer_ever + feral_cats5695.30.045

user + pet_cats + time_viewing4695.30.044

user + pet_cats + citsci4697.00.018

user + pet_cats + gender5698.70.008

user + pet_cats + volunteer_ever + degree + feral_cats + gender + citsci + time_viewing10699.90.004

user + degree3700.70.003

user + time_viewing3702.10.001

user + feral_cats3702.40.001

user + volunteer_ever3703.00.001

user + citsci3704.10.001

user + gender4705.90.000

GLMs

intercept-only null model1880.40.000

Our best model (by AICc) to determine what makes a cat photo more identifiable was the GLMM which contained photo as a random effect and whether the cat was black as a fixed effect (Table 3). If a cat was a color other than black, its ability to be correctly identified increased. Model diagnostics supported the model as fitting the data. In performing an ANOVA test between our best model and an intercept-only GLM to determine the importance of including photo as a random effect, our best model was significant (<0.05). A Chi-squared goodness-of-fit test indicated no lack of fit (p > 0.05). As with above, this model was underdispersed due to a high matching accuracy rate by most volunteers.

Table 3

Models predicting which cat photo traits were linked to accurate matching by cat advocacy citizen science volunteers, ranked by AICc values. AICc values are used in model comparison and selection. The lowest score represents the most plausible model of those considered. The weight values are the relative likelihood.


MODELSDFAICCWEIGHT

GLMMS

photo + black3337.60.366

photo + black + prop_frame4338.20.263

photo + black + time_viewed4339.50.139

photo + black + prop_frame + time_viewed5339.90.113

photo + black + tip_or_collar4339.90.113

photo + black + face + prop_frame + tip_or_collar + time_viewed8345.80.006

photo + tip_or_collar3351.70.000

photo + prop_frame3351.90.000

photo + time_viewed3352.20.000

photo + face4352.90.000

GLMs

intercept-only null model1558.00.000

Discussion

Here, we demonstrate that our sample of cat advocate citizen science volunteers were not only adequate at identifying and matching individual cats in smartphone photos, but performed better at the task than our sample of life science university students. These findings support our desire to engage cat advocates in matching cats and building capture histories for use in photographic mark-recapture population estimation. Further, by casting a wide net through online citizen science, we attracted nearly nine times as many participants as responded to a well-advertised student volunteer position at a large university.

Our results reveal that in our program’s target demographic of cat advocates, participants who were best at identifying and matching cats in smartphone photos are those who reported having a pet cat, holding a bachelor’s degree or higher, and having not previously volunteered with cats. The finding that a user having volunteered with cats reduced their level of photo identification accuracy was unexpected. One potential explanation for this seemingly paradoxical result may be that participants with cat volunteering experience (one metric of expertise in cats) were over-confident in their abilities and less careful in determining matches. However, a different metric of expertise, having a pet cat, increased users’ cat identification accuracy. From an applied perspective, this suggests that volunteer recruitment from an audience of cat advocates with pet cats is important, but recruiting those who already have experience volunteering with cats is unnecessary or even counter-productive. This allows us to solicit future volunteers from a broader audience of people who like cats while not incentivizing us to poach volunteers from cat welfare organizations.

Defining (and self-defining) expertise in animal photo identification is a complex phenomenon that has been explored by previous researchers. In a study which asked attendants at a herpetofauna conference to sort images of newts, Austen et al. () noted that their volunteer with the highest accuracy score self-rated their newt identification skills as worse than their peers, but the volunteer with the lowest accuracy score self-rated as possessing the same level of expertise as their peers. Overall, however, no statistically significant difference in accuracy was detected in comparing groups of volunteers based on how they self-assessed their abilities.

Other research has also asked the question of which human traits lead to better success in photo identification. Similarly to our finding that possessing a bachelor’s degree improved cat identification accuracy, Delaney et al. () discovered that education was an important predictor of accuracy in field-based species-level identification among participants in a marine invasive species program, where volunteers with two years of university education exceeded 95% accuracy. Testing the effect of an online training program on species-level identification, researchers found that volunteers with a background in biology had a higher percentage correct than those without, although the latter’s accuracy rose to a similar level with the addition of training (). In a validation study of the potential of using citizen science volunteers for identifying individual Andean bears, Van Horn et al. () found no personal characteristics that had a meaningful effect on participant accuracy.

Both identification error types introduce bias into capture histories for photographic mark-recapture population estimation, with a false positive trend resulting in an underestimation of the true population size () through an increase in the capture probability (). In our study, cat advocates were 3.6 times more likely, and students 8.2 times more likely, to make false positive than false negative errors in cat identification. Looking at identification errors in camera trap images of giant pandas, Zheng et al. () discovered a mean bias of 1.58 toward false positives. Conversely, using camera trap photos of snow leopards, researchers found that capture events were 3 times more likely to be false negatives than false positives (). Testing images of salamanders, Chesser () found that observers were 185 times more likely to make false negative identification errors than false positives. An Andean bear photo identification study found a bias toward false negatives (). Studying the feasibility of using natural markings to identify giant anteaters, 87.5% of errors were false negatives (). In seeking to identify polar bears by whisker spots, researchers found that both experienced an inexperienced participants were more likely to make false negative than false positive errors (). These varied findings underscore the need for species-specific validation of photo identification that includes reporting not just general accuracy but also error-type trends. Chesser () noted that false positives, as was the trend in our cat study, could be viewed as the preferable problem because they are less time-consuming than false negatives to identify and correct with intervention.

Unsurprisingly, many previous studies of photo identification in wildlife qualitatively report that “distinctive” animals in “high quality” photos are most readily identified. However, the effect of some metric of visual distinctiveness or photo quality are rarely explored at an analytical level (). In our study, we found that whether a cat was black was the key covariate in determining its identifiability, outweighing other factors such as face visibility or the presence of a collar or ear tip. However, this result should not be construed as demonstrating that black cats as a whole are unidentifiable and that all photos containing black cats must be removed from photographic mark-recapture photo data collections. We would argue that thresholds for whether to include photos in building capture histories should not be made on coat color alone, but take into consideration that some black cats and some black cat photos will be sufficiently distinctive for identification, just as some nonblack cats will fall below a threshold for inclusion. (Figures 2 and 3 show our four least and four most identifiable cat photos.)

Figure 2 

The four least identifiable cat photos in our study, all of which had solid black fur.

Figure 3 

The four most identifiable cat photos in our study, none of which had solid black fur. We share these sets of images to illustrate that by-eye identifiability among domestic cats does not require a cat to be close to the camera, uniform in its pose, or even easy to initially notice within the frame.

Species-level identification accuracy in citizen science has been well explored in the literature. In analyzing Snapshot Serengeti, researchers tested untrained citizen science volunteers in species identifications and compared their success with expert opinion (). There was a 97.9% overall agreement between these two groups of classifiers, although accuracy was lower in less-common species. Similarly, Clare et al. () reported 93.4% accuracy in crowdsourced species identifications, also with higher error rates in less-common species. In comparing the skills of volunteers self-reporting no biology background, some biology background, and professional biologists, both with and without training, Katrak-Adefowora, Blickley, and Zellmer () found that there was no statistically significant difference among five of the six groups, and only those without a biology background and without training displayed a low proportion of correct answers.

Apart from supporting our program’s intended approach and establishing that individual domestic cats are generally highly identifiable from smartphone photos, our results add nuance to the way that we delineate experts versus non-experts in animal identification. People with professional experience or studying for life science degrees may not necessarily be experts at the task of identifying individuals of a given species, and enthusiastic non-professionals may possess higher levels of competency in some instances. Researchers should explore whether a hobby or subculture exists around a species or study system of interest and consider working with individuals from such groups not just to save costs and engage the public in research, but to potentially improve the quality of their data. Regardless of the entity responsible for animal identification—citizen science volunteers, student assistants, professional researchers, or artificial intelligence algorithms—validation work and pre-testing of accuracy rates and error trends should accompany all research that relies on photo identification.

Conclusions

Discussions about the validity of citizen science approaches are often framed in terms of untrained members of the public versus professional scientists. There is a notable lack of acknowledgment that data collection and interpretation in traditional, university-based research may be performed by another type of nonprofessional: students. Student research assistants, with varying and unknown levels of skill, training, and dedication, complete a sizable amount of grunt work in some fields, which can include sifting through photos in camera trap–based population ecology. We argue that validation studies of citizen science approaches should compare the performance of their participants not necessarily against that of professional scientists, but with those who otherwise might be tasked with a specific responsibility.

Our study sought to demonstrate that a group of people from our program’s target demographic could identify individual cats at a similar level of success as students from a major research university. Our citizen science volunteers outperformed the students both in terms of quality and quantity of work. We do not argue that this would be the case in every discipline or situation, but we cannot make data quality comparisons without university-based researchers subjecting their processes to the same level of skepticism and scrutiny.

Data Accessibility Statements

Data analyzed during this study are included in this published article and its supplemental data files.

Supplementary Files

The supplementary files for this article can be found as follows:

Supplemental File 1

aeluro_photos.csv. Cat photo matching results. DOI: https://doi.org/10.5334/cstp.465.s1

Supplemental File 2

aeluro_demographicscat.csv. Cat advocate demographics. DOI: https://doi.org/10.5334/cstp.465.s2

Supplemental File 3

aeluro_demographicsstudents.csv. Student demographics. DOI: https://doi.org/10.5334/cstp.465.s3

Supplemental File 4

aeluro_matchingcat.csv. Matching results from cat advocates. DOI: https://doi.org/10.5334/cstp.465.s4

Supplemental File 5

aeluro_matchingstudent.csv. Matching results from students. DOI: https://doi.org/10.5334/cstp.465.s5