Introduction

Technologies that allow computers and machines to perform tasks normally requiring human intelligence are often referred to as artificial intelligence (AI). These technologies allow machines to complete tasks with traits or capabilities ordinarily associated with human cognition, such as reasoning, problem solving, common-sense knowledge management, planning, learning, translation, perception, vision, speech recognition, and social intelligence (). Research in AI is rapidly increasing, as indicated when comparing the annual publishing rate of papers focused on AI between 1996 and 2017 against the publishing rates of papers focused on any topic or against the publishing rates of papers in the field of computer science (see the growth of annually published papers by topic in Shoham et al. []). This growth in AI publications has prompted researchers to critically explore the potential promises and risks of AI (; ; ) as well as ethics and responsibilities (; ; ; ).

AI has been used in citizen science projects for about 20 years. It was first used in this context in 2000, in collaborative AI databases such as the Generic Artificial Consciousness (GAC)/Mindpixel Digital Mind Modeling Project () and the Open Mind Common Sense project (). In these models, user-submitted propositions were meant to create a database of common-sense knowledge that could function as a kind of digital brain. This relationship between collective knowledge and algorithmic processing evolved in many directions and, in 2019, is predominantly represented by machine learning, especially applied to computer vision, which includes diverse methods of automatically identifying objects from digital photographs. For example, the iNaturalist platform, a citizen science project and online social network, is designed to enable citizen scientists and ecologists alike to upload observations from the natural world, such as images of animals and plants (). The platform is one among many () that include an automated species/taxon-identification machine-learning algorithm applied to computer vision (). Images can be identified via an AI model that has been trained on the large database of “research grade” observations on iNaturalist (; [https://www.inaturalist.org/pages/help#quality]).

The same types of machine-learning algorithms used by iNaturalist’s community of users are also helping ecologists to classify millions of underwater snapshots of corals via the XL Catlin Global Reef Record project (). Currently, AI researchers, whether in citizen science or more broadly, tend to test their algorithms on a few standard data sets. For instance, image-recognition software is generally tested on ImageNet (for examples see ), a database of around 14 million photographs () including people, scenes, and objects, as well as plants and animals. In the field of biodiversity, in 2017 iNaturalist made one of its data sets of 5,000 photographs of birds, mammals, amphibians, and other taxonomic groups available for attendees of the Computer Vision and Pattern Recognition Conference in Honolulu, Hawaii, to train and test computer-vision algorithms ().

With the proliferation of connected devices and increased data collection, AI technology has the potential to dramatically impact society, including business and the workforce. The benefits of a prudent and planned approach to AI are manifold, from increasing user engagement in scientific activities to producing better scientific outcomes. As with any endeavour that could impact human well-being, it is important to examine the risks and opportunities of AI before developing citizen science projects that include it, in order to make informed decisions. For example, before we design and deploy computer-vision technology, we may want to ask the question: How do we acknowledge, respect, and reward the people whose data and expertise have helped to train the computer-vision algorithms? Data in citizen science are usually open and accessible to participants. However, to prevent the concentration of wealth and power in the hands of the AI companies controlling the data-processing technology, the regulation of data ownership requires more thought. If access to AI resources is restricted by commercial interests, data contributors (i.e., citizens) may be excluded from decisions about data use or from involvement in research that uses AI. Therefore, it is important that AI computing resources are openly accessible and available to all, creating opportunities for citizens to be involved in AI research and to understand how the data they collect are being used.

Intergovernmental agencies, technologists, and conservationists have identified the need to coordinate the creation and use of technologies to solve global problems (; ). The citizen science community is well positioned to contribute in a variety of ways to global coordination initiatives, such as the United Nations Sustainable Development Goals (https://sustainabledevelopment.un.org/), whether through providing methodologies or contributing data not otherwise obtainable (). Innovative solutions such as AI are required to make meaning of large datasets, and citizen science has a significant role to play in ensuring that data are collected, analysed, and interpreted in meaningful ways that benefit everyone. Here, we provide a systematic overview of AI technologies currently being implemented in citizen science. We then explore potential opportunities and risks that may arise as technologies evolve. Lastly, we provide recommendations to ensure that the opportunities and risks of AI use are adequately identified. It is our intention for this article to serve as a practical introduction to how AI is used in citizen science, and for it to elicit more in-depth discussions about AI use by members of the citizen science community.

Our Approach for This Essay

To explore the current use, opportunities, and risks of AI in citizen science, we elected to conduct a systematic overview () of the use of AI in citizen science. Our overview is intended to provide readers with a broad understanding of AI and its applicability to citizen science, rather than providing an exhaustive list of citizen science projects applying AI. We did, however, want to ensure that we captured the diversity of AI technologies being included in citizen science. To develop a broad understanding of current AI use in citizen science, we queried two technology-focused academic literature databases, the Association of Computer Machinery Digital Library (ACM DL: [https://dl.acm.org/]) and the Institute of Electrical and Electronics Engineers (IEEE Xplore: [https://ieeexplore.ieee.org]) databases, using the terms “artificial intelligence” and “citizen science.” The ACM DL and IEEE Xplore databases returned 92 and 8 articles respectively. We reviewed these articles to understand whether and how AI was being implemented, without making an assessment of the quality of the research, as this was not relevant to our aims. We identified that some form of AI used in citizen science was found in 50 and 6 articles from ACM DL and IEEE Xplore databases respectively. We identified the following types of AI in those papers: Automated reasoning and machine learning; computer vision and computer hearing; knowledge representation and ontologies; natural language processing; and robotic systems. These types are defined and described below. Given the interdisciplinary nature of citizen science research and associated publishing, we supplemented ACM DL and IEEE Xplore database query results with additional peer-reviewed literature by drawing from our collective knowledge. The authors are involved in citizen science globally, with particularly extensive knowledge of projects across Europe, Australia, and the United States. We decided that, for a specific AI technology to be considered currently applied in citizen science, some articles explicitly discussing its use in a citizen science project must be published in academic literature.

Current Applications of AI in Citizen Science

In this section we provide an overview of citizen science, AI, and how the two currently interplay. To set the stage, we begin by broadly describing citizen science and AI. Then we describe the types of AI already being applied in citizen science and highlight the use of these technologies by describing associated exemplary projects.

Citizen science can be described as work undertaken by civic educators and scientists together with citizen communities to advance science, foster a broad scientific mentality, and/or encourage democratic engagement, which allows society to deal rationally with complex modern problems (). Put simply, it involves public participation and collaboration in scientific research with the aim of increasing scientific knowledge. The citizen science community occasionally uses supporting technologies that allow computers and machines to function in an intelligent manner, to achieve particular traits or capabilities often associated with AI.

AI can be described as intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and other animals. In computer science, AI research is defined as the study of “intelligent agents,” which are any devices that perceive their environment and take actions that maximise the chance of successfully achieving goals (). Colloquially, the expression artificial intelligence is applied when a machine mimics cognitive functions that people associate with human minds, such as learning and problem solving (). AI can be a challenging concept for humans (). Intrinsically, humans want to believe that the wonders of the mind (for example, in identifying species or sounds) are inaccessible by material processes—that minds are, if not literally miraculous, then mysterious in ways that defy natural science. This is, among other motives, because of something truly unsettling to a human mind: Competence without comprehension ().

Below we provide a description of the technologies commonly used in citizen science that allow machines to complete tasks and achieve particular traits or capabilities that are often referred to as AI, such as machine learning. Real-world examples are provided, with references, so that people less familiar with the AI technologies will have a way to conceptualise use of these AI types and their impacts.

Automated reasoning and machine learning

Automated reasoning is an area of computer science and mathematical logic dedicated to understanding different aspects of reasoning. Automated reasoning helps to produce computer programs that allow computers to reason semiautomatically, or entirely automatically. Machine learning uses statistical techniques to give computers the ability to “learn” (i.e., progressively improve performance on a specific task) with data. With machine learning, programs can be designed to learn things on their own. One program, for example, can learn to detect a specimen of a specific taxon in a picture. It is not necessary to tell the program whether each picture has a specimen of that specific taxon in it or not; the program will learn that itself using machine learning. A motivation for research in this area, for example, is the desire to design programs that simulate empathy and improve the program’s understanding of human nature (). The machine interprets the emotional state of humans and adapts its behaviour to them, in an attempt to give an appropriate response to the human’s emotional state (; ; ; ). One common machine-learning approach involves the application of deep-learning techniques (or artificial neural networks), which have been shown to be effective and efficient in addressing classification-type problems such as identifying objects or categorising digital imagery.

Computer vision and computer hearing

Computer vision and hearing are interdisciplinary fields that explore how computer algorithms and systems can classify and/or identify content and achieve high-level understanding from digital images, videos, or audio recordings. They could broadly be called a subfield of AI and machine learning, which may involve the use of specialised methods and make use of general learning algorithms. We distinguish computer vision from machine learning because of the high number of applications using computer vision specifically, but we would like to make clear that they are not separate fields of research. Computer vision and computer hearing are used on citizen science data and camera-trap data, to assist or replace citizen scientists in fine-grain image classification for taxon/species detection and identification (plant or animal). A good example of this is iNaturalist (discussed above), built on the concept of mapping and sharing observations of biodiversity across the globe. As of July 2018, iNaturalist users have contributed more than 14,000,000 observations of plants, animals, and other organisms worldwide. In addition to observations being identified by the user community, iNaturalist includes an automated species identification tool based on computer vision. Images can be identified via an AI model, which has been trained on the large database of “research grade” observations on iNaturalist (). A broader taxon such as a genus or family is typically provided if the model cannot decide what the species is. If the image has poor lighting, is blurry, or contains multiple subjects, it can be difficult for the model to determine the species and it may decide incorrectly. Multiple species suggestions are typically provided, with the species that the algorithm believes the image to be most closely aligned placed at the top of the list of suggested matches. iNaturalist still relies on experts to validate users’ recordings, but deep convolutional neural networks are reducing the amount of repetitive expert-input required. Currently, limited availability of experts remains one of the biggest bottlenecks in the growth of validated user observations (). Computer vision and computer hearing also can be used to automatically annotate previously collected data on undescribed or undiscovered species (; ).

Knowledge representation and ontologies

Knowledge representation is the field of AI dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks such as assessing environmental impact or having a dialog in a natural language.

“Ontology,” in philosophy, refers to the set of “things” that a person believes to exist. In AI, it has proven more than convenient to extend the term “ontology” beyond this primary meaning and use it for the set of “things” that a computer program must be able to deal with to do its job (). An ontology then encompasses a representation of the categories, properties, and relations among the concepts, data, and entities of a domain (). Several organisations work on the development of a recommendation on how to represent data and metadata in citizen science. This work is based on previous efforts by the US Citizen Science Association’s (CSA) international Data and Metadata Working Group. The Group’s aim is to promote collaboration in citizen science through the development and/or improvement of international standards for citizen science data and metadata. This working group collaborates on citizen science at the international level, and became a coordinating and umbrella group crossing many thematic and geographically distributed organisations that provide relevant complementary work. Contributions have been provided by the European Citizen Science Association (ECSA), the COST Action 15212’s Working Group 5 (“Improve data standardization and interoperability”), and the Australian Citizen Science Association (). These organisations also address the definition of interoperability standards for data exchange, reusability, and compatibility in citizen science. They contributed to defining core building blocks of these interoperability standards, and outlined the way ahead based on the CSA Data and Metadata Working Group’s previous work. Providing guidance on how to use standards across communities with varying knowledge and technical expertise will support uptake of project results and improve project sustainability.

Natural language processing

Natural language processing (NLP) is an area of computer science and AI concerned with the interactions between computers and human (natural) languages. In particular, NLP considers how to program computers to process and analyse large amounts of natural language data ().

Robotic systems

Robotics is an interdisciplinary branch of engineering and science that includes mechanical engineering, electronic engineering, information engineering, computer science, and others. Robotics deals with the design, construction, operation, and use of robots and computer systems for their control, sensory feedback, and information processing abilities ().

Categorisation of Applied Uses of AI

As discussed above, there are a number of types of AI techniques, and a number of ways in which each type can be applied across science disciplines (e.g., ; ; ). To better understand how AI is currently used in citizen science, and the possible extensions of its current use into the future, we divided uses into three broad and overlapping categories (Tables 1 and 2). These categories are arbitrary and group an otherwise long list of uses. The first category is assisting or replacing humans in completing tasks, which means that AI is enabling tasks traditionally done by people to be partly or completely automated. The second category of AI use is associated with influencing human behaviour. Human behaviour is a major source of data in the current digital economy and in the training of AI. At the same time, it is also one of the main objects of data science in the sense that many data science, AI, and citizen science models are aimed at influencing human behaviour (e.g., through personalisation and behavioural segmentation, or providing people a means to be comfortable with citizen science and get involved). The third category of AI use relates to having improved insights as a result of using AI to enhance data analysis. For example, AI can now offer greater insights from data to inform research and policies, thanks to the training of computer-vision and computer-hearing algorithms using citizen science data. AI also can facilitate sharing the meaning of terms among machines thanks to the use of ontologies.

Table 1

Summary of the categories of AI used in citizen science and their applications. (The list of categories is not ranked in terms of importance.)

Description of instances where AI is appliedTypes of AIExamples of citizen-science software-applications

Applied use and impact: Assisting or replacing humans in completing tasks

Improving image or audio classificationComputer vision and computer hearingComputer vision and computer hearing can be applied to photographic images (e.g., from cameras that are triggered by motion detection) or acoustic data, to assist or replace citizen scientists in classifying images or sounds for species detection and identification (). Examples include citizen science biodiversity project iNaturalist (; ); improvement of species monitoring and automatic annotation of previously collected data on undescribed or undiscovered species (; ); and automatic detection of acoustic events such as bat vocalisations from audio recordings ().
Accelerating the digitization of biodiversity research specimensComputer vision and computer hearingIn digitising museum specimens, computer vision can assist citizens with tasks related to identifying labels, sorting handwritten versus typed labels, capturing label data, parsing information into field notes, normalising data, and minimising duplication. Examples include Leafsnap, for the identification of tree species in the North-eastern United States (); SPIDA, for the identification of one family of Australasian ground spiders ().
Verifying the accuracy and consistency of contributors’ submissionsAutomated reasoning and machine learningThe citizen-science biodiversity projects eBird () and iNaturalist.
Providing more rapid response to complex modern problemsAutomated reasoning and machine learningThe citizen-science monitoring project Citclops for early warning of harmful algal blooms ().
Applied use and impact: Influencing human behaviour

Extend social impact of citizen scienceRobotic systemsA community-oriented robotic system designed to extend the social, educational, economic, and health benefits from citizen science to a more general public ().
Using social media for collaborative species identification and occurrenceNatural language processing, Knowledge representation and ontologiesUsing specific social media to engage participants in contributing their observations over a long time-period ().
Applied use and impact: Improving insights

Training of computer-vision and computer-hearing algorithms using citizen-science dataComputer vision and computer hearingData collected by citizens are used by knowledge engineers, people who integrate knowledge into computer systems to solve complex problems normally requiring a high level of human expertise, to train AIs. Examples include citizen-science biodiversity projects iNaturalist (), Leafsnap and Pl@ntNet (as discussed in ).
Facilitating sharing the meaning of termsKnowledge representation and ontologiesCitizen-science associations and projects based in the US, Europe, and Australia working together to design an ontology to represent knowledge in the domain of citizen science ().
Mining social-network dataNatural language processingCitizen science projects can collect and analyse Twitter/Google data about health or the environment. An example is Aurorasaurus, a project to collect auroral observations ().

Table 2

Summary of new applications of AI in citizen science likely to appear in the near future.

Description of instances where AI is likely to be appliedTypes of AIExamples of citizen science software applications

Applied use and impact: Assisting or replacing humans in completing tasks

Filtering out hard, repetitive, routine, or mundane tasksAutomated reasoning and machine learningSoftware applications that allow citizen scientists to focus on more engaging tasks, for example, focusing on observations of interactions, or developing/contributing to innovative projects in the field.
Providing training/supportAutomated reasoning and machine learningAI systems that can be used in regions where citizen science training/support by humans is limited, such as when direct access to people with expertise is limited and/or human-language barriers exist.
Identifying speciesComputer vision and computer hearingAI tools that can instantly classify species based on images or sounds.
Applied use and impact: Influencing human behaviour

Describing and formally representing the domain of citizen science in all languagesKnowledge representation and ontologiesAn ontology that can facilitate the creation of new citizen science applications in any language and the translation of existing applications into any language.
Making information and data more accessible in citizen science applicationsAutomated reasoning and machine learning; Natural language processingApplications using machine learning and natural language processing to overcome information overload in citizen science platforms.
Providing an easy, engaging, and enjoyable citizen scientist experience with AI-based virtual assistanceAutomated reasoning and machine learningVirtual/simulated environments, in which citizens interact with AI to test tasks before real-world deployment.
Notifying citizens about what is likely to occur near them or what/when they could observeAutomated reasoning and machine learningMobile apps providing satellite-based information to citizen scientists (e.g., satellite-overpass maps). Applications that provide contextual information to citizens: What is measured, why, when, and where.
Adaptively managing and changing citizen science activitiesAutomated reasoning and machine learningTrigger service for citizens to measure at certain times/frequencies (e.g., measuring at a satellite overpass or triggering a measurement for a certain monitoring request). Environmental data can be used to change the frequency or moment of monitoring by citizens, for example when an AI detects that there will be no satellite coverage due to cloud presence and alerts citizens to provide more observations in that particular time and location. AI models that benefit from information theory and statistics to help to prioritise effort in field work.
Motivating citizen scientists to participateAutomated reasoning and machine learningApplications providing personalised reward models for making tools appealing to users. AI that optimises reward models to reflect the personality of the individual. Applications introducing context, information requirements, and gamification aspects.
Providing personalised notifications to increase engagementAutomated reasoning and machine learningNotifications about collecting or analysing data, which are provided when and where appropriate and with personalised frequency.
Applied use and impact: Improving insights

Improving data quality controlAutomated reasoning and machine learningApplications that provide means to quality control data using cross checks between citizen science and other in-situ methods to address issues in the data that cannot be addressed by internal quality control (e.g., combining citizen data with satellite data).
Validating outputs through automatic proceduresAutomated reasoning and machine learningMachine-learning algorithms trained to filter out irrelevant data.

Future Applications of AI in Citizen Science

In addition to more people integrating AI into a wider diversity of projects and improvement of existing methods, we foresee a wider array of AI technologies being applied to citizen science, which we explore in the section below. We have created two scenarios relating to different potentials of AI to impact citizen science and potentially society more broadly. The first scenario describes a future in which AI competence is inferior to human competence in relation to citizen science tasks. The second scenario describes a future in which AI competence equals or surpasses human capability in relation to citizen science ().

Scenario one: AI for engaging citizens

Imagine we have a project with a large dataset of images, and computer scientists apply computer vision to identify objects of interest from images. Citizen scientists can be engaged to identify objects and train algorithms to improve their accuracy rates. Apart from improving its automated image classification, AI proves a very effective tool for engaging and connecting people to science. AI benefits the amateur participants and creates a more inclusive, inspiring, and impactful scientific practice.

Scenario two: AI for engaging citizens and as basis for new applications

Imagine a similar scenario as the one outlined above, though a key difference is that AI computer-vision techniques can identify objects in images with a competence equal to or superior to human competence: AI tools can instantly analyse and identify animals and plants in our environment, without the need for human-based methods of classification. In this case, not only is AI a tool to engage citizens, it also opens the possibility of creating new applications based on automatic nature classification.

Opportunity exploration

The positive impact of AI is clear from Scenario one, with AI proving an effective tool to engage and connect people to science. Positive impact related to Scenario two is potentially less clear, if the “human training AI” relationship is removed. However, imagine being in a forest and encountering a rare type of mushroom, wondering whether it would be advisable to pick it up and add it to your dinner plans or if this might lead to serious food poisoning. A tool for nature classification would come in handy. You could then point your phone to the mushroom, snap, and it would instantly tell you everything there is to know about it, including whether cooking it is a wise choice. Some organisations are working on exactly this (), training their AI algorithms on the huge amounts of past data and observations collected by scientists and citizens worldwide. AI tools that can instantly classify species could be valuable in other ways. For instance, plant-recognition software and other similar tools appear to be awakening botanical interest among much of the general population, sparking their curiosity about the natural world. Furthermore, computer algorithms trained on classifying dried plants could help researchers to process herb samples, a process that often requires hours of human work ().

Deep learning can be combined with massive-scale citizen science to improve large-scale image classification. Sullivan et al. () showed how citizens and AI excel at different types of classifications and that citizen output can be used to augment and improve deep-learning models. These authors speculated that the integration of scientific tasks into established computer games will be a commonly used approach in the future to harness the brain processing power of humans. They concluded that intricate designs of citizen science games that feed directly into machine-learning models through techniques such as reinforcement learning have the power to rapidly leverage the output of large-scale science efforts. Other examples of data annotated by citizens that have the potential to inform AI in the future are projects administered on websites such as Zooniverse (https://www.zooniverse.org/) and DigiVol (http://digivol.org), and citizens transcribing and annotating museum collection information (). Apart from extending current use, new applications of AI in citizen science are likely to appear in the near future as summarised above (Table 2). We believe that a wide array of AI applications have the potential to provide new opportunities and positive impact.

Risks exploration

The exploration of risks related to the use of AI in citizen science is driven, at least in part, by the recognition of an existential risk from artificial general intelligence (AGI) (; ; ), which is the hypothesis that substantial progress in AGI could someday, among other impacts, result in human extinction or some other unrecoverable global catastrophe. Even if this risk is small and the use of AI in citizen science is limited, the potential significant negative consequences for humanity should be reason enough to highlight concerns about the possible impact of AGI ().

In relation to the use of AGI, Dennett highlights the importance of distinguishing between peripheral and central intellectual powers, and of not prematurely ceding authority to AI. “So far, there is a fairly sharp boundary between machines that enhance our ‘peripheral’ intellectual powers (of perception, algorithmic calculation, and memory) and machines that at least purport to replace our ‘central’ intellectual powers of comprehension (including imagination), planning, and decision-making” (2017; p. 402). Citizen science’s use of AI can contribute to the danger of overestimating AI tools, “prematurely ceding authority to them far beyond their competence.”

Ethical concerns commonly associated with robots and other artificially intelligent systems programmed with AI are typically divided into two groups: (1) the moral behaviour of humans as they design, construct, use, and treat artificially intelligent beings, and (2) the moral behaviour of artificial moral agents/machines (AMAs), or machine ethics (; ; ; ; ; ; ; ; ). In this paper we focus on the first group, given that the presence of AMAs in citizen science is currently very limited.

As the use of AI grows and humans increasingly rely on machines to complete tasks, it is important that the citizen science community gathers data on how AI is used and on the ethical considerations that arise. In contemplating this scenario, we give an overview of AI risks specific to citizen science (and sometimes broader), and are important to consider into the future.

With respect to citizen engagement in citizen science, risks exist that citizens disengage if:

  • when contributing expertise to develop and train AI, they are not properly and fairly acknowledged, respected, and rewarded;
  • they think that new technologies could be driven more by short-term commercial necessity than longer-term social good;
  • they are not comfortable sharing their data because of concerns that their data might be unfairly appropriated (especially for commercial purposes);
  • they are forced (because of ethical considerations) to provide too-frequent re-confirmation of their willingness to share their data openly. (See GDPR () as an example of where good intention can sometimes become burdensome.)

Technology giants like Google and Facebook () are emerging as likely oligopolists in the new world of digital advertising (; ), monetising personal data by offering target-oriented advertising services (; ). Their competitive advantage is largely due to their exclusive access to personal data used to train their algorithms (; ). Himel and Seamans noted that “Artificial intelligence (“AI”) relies on the use of large datasets to train AI algorithms. Access to such data is therefore a critical resource, the lack of which may create barriers to entry for both AI startups and established firms developing AI technologies” (). It is now recognised that the existing regulatory frameworks for anti-competitive behaviour have not adequately evaluated the risk nor intervened to prevent data oligopoly, due to lack of recognition of the critical value of data (; ). This is a key lesson for citizen science: There is a risk that, as AI-based services arise in the field of citizen science, the same restrictive data policies used by technology giants could be used to create similar oligopolies.

It is possible that citizen science AI startups which lack a long-term funding model will adopt revenue models to monetise their “value-added” services, i.e., algorithmic intellectual property (; ; ). Where citizens indeed value such services, the market should be left to determine the viability of such revenue models. Citizens engage in citizen science and contribute data for a number of reasons, including public good, curiosity, fun, prestige, and the desire to name their own species (). When citizens contribute data for public good (to mitigate against the risk of creating new oligopolies where they have no choice but to pay for services created from data they contributed), we recommend that an open-data policy is adopted by default. That is, in partnering with technology startups, it should be agreed up front that all data contributed by citizen scientists should be made openly available via Creative Commons licensing. We also recommend exploring whether fragmenting solutions hinders effectiveness in delivering outcomes that users want. It is much easier to contribute expertise in the context of one large well-connected system than through dozens of discrete systems, each with their own quirks.

One of the drawbacks of using some AI approaches, for example deep-learning techniques, is that they are opaque. Specifically, the limitation is the difficulty of explaining, in human terms, the results of large and complex models, such as why a certain decision was reached. The risk is to treat AI as a final authority. For example, validation mechanisms could be established for the automatic verification, by AI, of the accuracy of submissions of data. If this becomes the case, the lack of transparency in reasoning, coupled with our tendency to trust in technology, will inhibit a critical debate in the decisions reached by AI. Among other constraints, regulators will need rules and choice criteria to be clearly explainable to meet transparency and accountability requirements. Some nascent approaches to increasing model transparency, including local-interpretable-model-agnostic explanations (LIME), which attempt to identify which parts of input data a trained model relies on most to make predictions, may help to resolve this explanation challenge in many cases (; ). The general recommendation, at least in the short term, is to treat AI as a tool that ideally may be further validated or overturned by human experts. With respect to human relationship with machines, recommendations should be provided about which processes and tasks should be carried out by humans and which ones by machines as well as about how to best manage the replacement or augmentation of humans by machines.

Even if open-source machine-learning toolsets are becoming increasingly available for all to use, an issue with current Google, Microsoft, Amazon, Facebook, IBM, Apple, Baidu, Alibaba, and Tencent ethics policies (Google’s DeepMind, for instance), is that we hardly know what the ethics panels are all about (); they are not transparent to public observers. Publicly accountable ethics panels should supervise the processes of AI augmenting the way that people think or taking over certain cognitive tasks. Also, in a data economy where AI algorithms often tend to use personal data as training sets, the ability of AI algorithms to spot patterns makes them very effective at re-identifying personal data in “anonymised” data sets, causing significant concerns about individual and group privacy. The risks related to AI industry are not limited to ethics; a separate risk exists of the AI industry dictating the general direction of citizen science.

Finally, there is an emerging issue of gender and racial bias in AI. Leavy () highlighted the over-representation of white men in the design of technologies. Also, machines largely reflect values of their creators, which can be deeply embedded in machine algorithms. For example, facial recognition software works best for those who are white and male (). These gender and racial biases can be reflected in naming, ordering, and descriptions. The risk is that technologies developed for use by citizen scientists (applications and platforms, for example) may alienate users if not tailored to their needs. In addition there is the risk of embedding western views of science and taxonomy into AI, which may preclude ways of grouping organisms according to indigenous knowledge frameworks or alternative cultures. Citizen science presents a special opportunity to engage a wider cohort in training algorithms, which would help in not extending to algorithms the existing biases that are entrenching gender and racial discrimination in modern society.

Discussion and Recommendations

Writing about opportunities and risks of AI in citizen science is difficult. Citizen science is not settled science, despite the growing body of research. AI is not settled science either; it inherently belongs to the frontier, not to the textbook, therefore referencing AI literature, in particular in relation to the human social context, has clear limits. In this paper, we did not write about the AI field in general, but confined ourselves to the field of its application to citizen science, where we can knowingly or unknowingly encounter AI. At times the very terminology can be alienating, and terms such as “AI” should be carefully chosen and well defined. The expression “machine learning” can often be a useful alternative. For example, machine learning applied to computer vision, which is the most common AI technology in citizen science:

  • is used by biodiversity projects to verify the accuracy/consistency of contributors’ submissions (coming, for example, from iNaturalist, which has created one of the world’s largest network of citizen scientists, who have collected over 25 million records of rare and common species around the world);
  • supports citizen science monitoring projects in early warning of harmful algal blooms; and
  • identifies the taxon of a species in a photo so that it can be monitored more easily.

Even in the reduced domain of citizen science, rapid advances in AI and the development of improved sensing systems offer the chance to introduce something dramatically new. Many people now engage in citizen science apps on their smartphones daily. As the list of applications grows, so too does awareness of AI in our lives. As a result, technologists pushing for the next big thing in automation now face more questions about what the public really wants. The small group of companies that are investing billions of dollars in using machine learning find themselves having to address the question of how to deal with the public’s perception of AI ().

A big part of citizen science is about connecting people to science, nature, and discovery, and about empowering human minds, mainly through education. Many established citizen science programmes see AI as having a role in this, and some of the biggest names in technology are now entering the citizen science sector through these programmes. Advocates of AI say that technology can make people’s lives easier by filtering out hard/repetitive/mundane tasks, so that volunteer efforts can focus on more engaging tasks.

Let us consider projects where AI can streamline the identification of user observations, thus increasing the total number of records being identified. On the one hand we can see risks associated with the unnecessary use of AI. While AI may provide identification help for projects where citizen scientists contribute data and may increase validated users’ recordings, there are other ways, apart from AI, to increase the expertise of a citizen science system. These include increasing expertise amongst users, improving the connectivity between experts, and providing more incentive for experts to participate. Moreover, it is not clear that increasing the validated users’ recordings through AI helps in progressing citizen science or connecting more people to nature. Connecting and incentivising more human expertise, instead, is likely to progress citizen science and connect more people to nature. According to this vision, AI doesn’t necessarily increase users’ overall experience (e.g., their general interest, knowledge, or capability to recognise the same organism next time).

On the other hand, we can see opportunities associated with the ability to tackle global-scale challenges. There is little prospect of experts and new citizen scientists by themselves delivering the volumes of data that we need to monitor and understand earth systems, including biodiversity. We need this information for conservation, food security, and many other aspects, for example those related to the Sustainable Development Goals. We should be evaluating the risks associated with the introduction of AI, but we also should consider the risk of ignoring the tools we have to deliver much more data, in a much more usable form, much more quickly.

Since the turn of the millennium, a brute-force approach has been applied to the technology of machine learning, in which huge volumes of data are analysed to look for patterns (). Thanks to increasing citizen engagement and technological improvement, larger repositories of citizen-collected data are now available. As highlighted earlier, larger data repositories available for training AI are a potential risk. To address this, we recommend following the below practices whenever using people’s data for AI training:

  • An ethics framework about AI use should be created and applied (e.g., ; ; ).
  • A data stewardship plan (e.g., ) should inform citizens about plans for and expected outcomes of using data for AI training.
  • Good anonymity practices should be adopted. It is important to evaluate to what extent the patterns of information captured may reveal personal information even if names or personal details are not retained. For example, all of the observations in certain areas may derive from a single individual. If any information about their movements is incorporated into the AI training, there is a risk (albeit very small) of revealing personal information about that individual. Anonymity management should be part of the documentation/information provided beforehand to citizens.
  • Citizens should be given a standard opt-in/opt-out option (opt-in being best practice).
  • Designers should be diverse in ethnicity, gender, and disciplines. This addresses issues such as “data bias”.
  • Measures of success should be clear. Saying that AI is “successful” in engaging citizens is not enough. Measurements should exist to determine whether citizen science is helping people to engage with nature.
  • It should be possible to delete one’s data from an AI system (untrain the system).
  • It should be possible to challenge the AI. For example, if the number one expert in nudibranchs finds that an AI incorrectly identifies the image of a nudibranch on their phone, who do they call? Who do they talk to? Is there a phone number? A feedback link? How is that handled?

Conclusion

Most people today are only somewhat aware of the rise of AI and its potential impact on their lives. In this paper we discuss this impact in relation to the use of AI in citizen science. It is true that, for all their potential, AI technologies still have many limitations. Current AI limitations include not just issues related to data requirements, but also: (1) regulatory obstacles; (2) lack of social and user acceptance; (3) the challenge of labelling training data (which often must be done manually by citizens and is necessary for supervised learning); (4) the difficulty of obtaining data sets that are sufficiently large and comprehensive to be used for training; (5) the difficulty of explaining in human terms the results from large and complex models (Why was a certain decision reached?); (6) the generalisability of learning (AI models continue to have difficulties in carrying their experiences from one set of circumstances to another); and (7) the risk of bias in data and algorithms (). Societal concern and regulation, for example about safety, privacy, and use of personal data, can constrain AI use in the public and social sectors if these issues are not properly addressed.

At the same time, the scale of the potential economic and societal impact of AI creates an incentive for all the participants (AI innovators, AI-using organisations, citizens, scientists, and policy-makers) to ensure an AI environment that is friendly and can effectively and safely achieve economic and societal benefits. The potential value that could be harnessed provides the incentive for technology developers, companies, policy makers, and users to try to tackle current AI issues ().

At present, the impact of AI on citizen science is limited, but it is indubitable that technological developments will gather momentum in the next few decades. We anticipate that the result will be all the applications of AI described in this paper and many more. If citizen science is to continue to make meaningful contributions to society and science in the near future, it will not only need to make sense of AI, it also will need to incorporate AI in a meaningful and considered way in future projects.

There is no question that AI potentially introduces significant risks for society and democracy, and ethical considerations regarding how we might retain some control in “central” intellectual powers should be carefully considered by policymakers and legislators.

However, at the same time, we are facing tremendous global-scale challenges across areas of human and planetary health. This means we have a moral obligation to make benign use of AI and every other appropriate and sustainable technology at our disposal to accelerate collection of the data needed to understand our environment, and to use this greater understanding to push for evidence-based decision making to put appropriate mitigation and safeguards in place. Therefore, the authors urge the citizen science community to implement AI, but in a careful way (i.e., only to enhance our “peripheral” intellectual powers). If carefully used, AI is an important tool for accelerating citizen science to ultimately massively scale scientific research.