Dynamics of Engagement in Citizen Science: Results from the “Yes, I do!”-Project

Citizen science projects need to attract citizens and motivate them to dedicate their energy and time to science. Recruiting enough participants and keeping them engaged throughout the project is often a big challenge for the scientists involved. In this paper, recruiting and engagement strategies are evaluated for a successful midsize online citizen science project in the field of humanities. Quantitative measures are applied to track the quantity and quality of citizens’ contributions over time, allowing understanding of the dynamics of engagement in citizen science. The study shows that monitoring the level of activity and the quality of contributions provides useful insights about a project’s dynamics. We found that a small core group of volunteers was responsible for most of the input to the project, that their transcriptions were very accurate from the start, and that prompt feedback on their performance was important to keep activity levels high.


Introduction
The interest and use of online citizen science projects have increased in the last ten years (Preece 2016;Wald, Longo, and Dobell 2016;Wiggins and Crowston 2015), and so have the number of published case studies reporting about and comparing specific online citizen science projects (Aristeidou, Scanlon, and Sharples 2017;Curtis 2018;Masters et al. 2016). Prior studies on citizen science have observed that large numbers of people make only small contributions to a citizen science project, while a relatively small group of volunteers contributes the largest part Sauermann and Franzoni 2015). The sustained engagement of volunteers throughout a project is important for online citizen science to benefit from the time and human resource efficiency (Franzoni and Sauermann 2014;Shirk et al. 2012) and quality outcomes (Nov, Arazy, and Anderson 2014) that result from participants learning through repeated contributions.
Despite the uneven distribution of contributions and the importance of sustained engagement, little has been said about how projects should evaluate the effectiveness of their recruitment and engagement strategies in achieving project objectives (Curtis 2018;West and Pateman 2016;Wiggins 2013), including both the quantity and quality of data. Because of their voluntary character, lack of formal contracts, and absence of monetary rewards, recruiting and engaging citizens throughout the course of a citizen science project are the biggest challenges that these projects have to face (Frensley et al. 2017;West and Pateman 2016). Moreover, the wide variety of citizen science projects, with different project designs varying in their goals (educational, scientific, conservation, preservation), tasks (data collection, processing, analysis, and validation), duration, and governance (Blohm et al. 2018; makes it challenging to unravel how citizens can be converted into committed volunteers. Though recommendations have been published on how to best achieve sustained participation in citizen science (West and Pateman 2016), to our knowledge the link between recruiting and engaging strategies and the resulting project outcomes (i.e., number and quality of submitted contributions) has hardly been studied on the basis of empirical (meta)data .
Unlike prior work that has focused on the motivations of citizens to become and remain engaged (Eveleigh et al. 2014), the objective of this paper is to examine the dynamics of engagement and to assess the effectiveness of engagement strategies adopted by a successful midsize citizen science project. We study a collaborative (Bonney et al. 2009) citizen science project in the field of humanities, where citizens volunteered to transcribe handwritten Dutch premarital registers from the 17th, 18th, and 19th centuries. The project is considered successful because it achieved its objective of transcribing 90,000 premarriage acts in two years.
We contribute to the understanding of sustained engagement and its dynamics in midsize citizen science projects in three ways. First, most published research about online citizen science has focused on environmental, biomedical, and technology-related topics (Frensley et al. 2017), while less attention has been paid to projects in other disciplines such as the humanities (Dobreva 2016). Our study covers a humanities project and shows the possibilities for sustained engagement. Second, we present quantitative measures to track citizens' activity and the quality of their contributions throughout a project. In particular we propose using the optimal string alignment (OSA) distance to measure the quality of citizen transcriptions. Finally, we evaluate the effectiveness of recruitment and engagement strategies in achieving a project's goals.

Sustained Engagement of Volunteers in Citizen Science
Citizen science literature focused on motivation has mostly discussed the motivations of people to join and to stay engaged in a project for an extended period of time (Eveleigh et al. 2014;Raddick et al. 2010;Rotman et al. 2014). Only recently have researchers begun to study the impact of different strategies to recruit participants and to influence their sustained engagement throughout a project West and Pateman 2016). The voluntary character of citizen science, the effect of personal circumstances (West and Pateman 2016), a possible lack of time (Frensley et al. 2017), and the existence of other initiatives competing for volunteers' attention (Simula 2013) seem to be the main hurdles to their sustained participation. Therefore, project organizers use various means to avoid a high loss of volunteers whose time and effort are needed to keep a project moving. To evaluate the effectiveness of various engagement strategies in an online citizen science project, we rely mostly on related research in the fields of information and computer science, without excluding learnings from relevant cases in the natural sciences. In the following pages, we define the concept of engagement and review the strategies most commonly used in citizen science and the measures of engagement proposed in the literature.

Engagement strategies in citizen science
Engagement is a psychological term often used in human resource management (Schaufeli 2014) and in the study of online services and their users (Lehmann et al. 2012). Overall, engagement refers to a positive affective-cognitive feeling that is observable in people's perseverance in pursuing an activity that requires time, effort, and/or concentration (Lehmann et al. 2012;Schaufeli 2014;Simpson 2009).
The citizen science literature has reported about the recruiting and engagement strategies of specific projects (Causer and Wallace 2012) and connected these strategies to fields such as volunteering (West and Pateman 2016) and marketing . It is, for instance, recommended to recruit participants throughout a project to fill in the gap left by people leaving and to increase the size of the group to make up for the uneven distribution of effort . Therefore, recruiting is seen as a recurring activity through which a project is advertised via various means to reach the greatest number of potential participants West and Pateman 2016). To attract large numbers of people, publicity can be sought through press releases and with broad online presence (Causer and Wallace 2012), but also by joining well established online citizen science platforms (Sauermann and Franzoni 2015). However, not all projects can afford the costs of continuously promoting the project and recruiting new people.
To help interested participants become engaged volunteers, the citizen science literature points to prompt feedback and regular support as necessary factors for engaging participants in long-term collaborations (Causer and Wallace 2012;Cooper et al. 2007). Feedback, rewards, and recognition foster engagement because they indicate the worth of one's efforts (Crawford et al. 2014). Similarly, the use of forums and other means for volunteers to communicate with each other is also a strategy used to increase engagement (Causer and Wallace 2012;West and Pateman 2016) because it allows for reciprocal feedback and recognition. The level of task challenge, determined by the type of task and its fit with volunteers' abilities (Crawford et al. 2014), also can be influenced by the project organizers, through the design of tasks and their level of granularity (Nov et al. 2014), and by providing opportunities for learning, such as guidelines and training (Causer and Wallace 2012;West and Pateman 2016). Adequate tasks and skills acquired through learning influence volunteers' sense of competence and feeling of being cared for, which positively affect their sustained engagement (Crawford et al. 2014;Newton, Becker, and Bell 2014).

Measuring engagement in citizen science
Recently, a few researchers have proposed quantitative measures to compare projects and to create volunteer profiles based on their levels of engagement ( Table 1). Most of these measures are based on the time dimension of engagement, including the time that volunteers spend in a project, the number of hours they dedicate to a task, and the periods of inactivity (Aristeidou et al. 2017;. The effort dimension of engagement also has been measured, for instance with the median of the number of submitted tasks (Cox et al. 2015) or Gini coefficients and Lorenz curves that indicate the distribution of effort among the participants of a project (Sauermann and Franzoni 2015). Finally, the extent to which volunteers are concentrated on a project task also has been considered in relation to the time (hours) devoted to a task on an active day .
Next to engagement, the assessment of citizen science projects usually covers the calculation of essential indicators such as project appeal (Table 1), measured by the number of participants who have contributed throughout a project and the quantification of engagement strategies such as communication, determined by the number of communicative events during the project (Cox et al. 2015). However, all of these measures insufficiently capture the dynamics of engagement throughout a project. By using these measures, we cannot evaluate the effect of specific strategies on changing levels of engagement within the project. • Project active period is project duration excluding (if any) periods of project inactivity: In the following pages we introduce the research context, the citizen science project under study, the data collected, and the measures used to examine the project dynamics and to assess the adopted strategies. We then present the actual strategies used in the project, the engagement dynamics resulting from the data analysis, the conclusions of our study, and recommendations for future research.

Research context
The Amsterdam premarriage acts are the topic of research from which the "Yes, I do!"-project (Dutch: Ja, ik wil!) emerged. These acts stem from the compulsory registration (from the Council of Trent [1563] onwards) of intended marriages at least three Sundays before the actual marriage. These registrations were begun parallel to a strengthened parental authority, an increasing church and state interference in marriages, and control of marriage procedures (De Moor and van Zanden 2010). Since late 1570s, couples with the intent to marry in Amsterdam have had to register the banns with the Commissarissen van Huwelijkse Zaken, aldermen specifically appointed for dealing with all issues of marriage and divorce, after which the marriage would be announced three times, hence giving opponents to the marriage the opportunity to speak out. Such registration marked the start of the process for a couple to change their relationship into an "official" or "regular" marriage.
These registers are available for the period 1578-1811 without interruption. The registration of each intended marriage contains information about the ages of both husband and wife (in the case of first marriages), marital status, place of origin, place of residence (in Amsterdam), the occupation of the husband (only in the period 1600-1715), the religion of both partners (only in the period 1755-1811), the names of their witnesses, and whether parents were still alive and agreed with the marriage (in the case of first marriages). The information density, completeness, and length of the period covered make these registers a unique source to study marriage behavior in a complete city.
In February 2014, the "Yes, I do!"-project was launched on velehanden.nl, a crowdsourcing platform for the heritage sector that relies on online volunteers to enter and process data. The Vele Handen-platform started in 2011 following a successful first project of the Amsterdam City Archives and Picturae, a company specialized in the digitization of audiovisual materials. Since that time about 100 projects have been initiated through this platform, which currently includes almost 16,000 registered volunteers. The crowdsourced tasks in this platform range from the identification and classification of historical objects (such as photographs) to the transcription of archival documents, such as in the Yes, I do!-project.
The objective of the "Yes, I do!"-project was to digitally transcribe data from more than 90,000 premarriage acts in two years. The project resulted in a dataset covering detailed information about 20% of all Amsterdam premarriage acts (consisting of a total of approximately 900,000 individuals -thus 450,000 couples -who intended to get married in the period 1578-1811). This huge amount of data on early modern marriage practices and many related issues is now available for further research.
Given that reading and transcribing historical sources is difficult without instruction and training, in particular handwriting as early as the sixteenth and seventeenth century, a high dropout rate and a low transcription accuracy were expected. To limit the dropout rate and to ensure a good level of accuracy, the project team (led by one of the authors of this article) adopted several recruiting and engagement strategies, including project promotion through contacts and websites, outreach activities such as lectures, training workshops, feedback, and rewards, which we examine in this paper.
The "Yes, I do!"-project registered 521 interested participants in the course of two years, of which 182 people submitted at least one transcribed scan (a scan contained 5.7 acts on average, resulting in 192,307 entries). Among those active volunteers we distinguish a "core" group of 21 people who entered more than 1,000 acts (91% of the acts in total), a larger group of active volunteers whom we call "outer core" (100-1,000 acts), a group at the "periphery" which contributed less often (10-100 acts), and a "rest" group that submitted less than ten acts ( Table 2).

Data collection
The whole participation procedure -from logging in all the way to the request for a reward -was registered via the metadata collected throughout the project. These data allow us to evaluate the dynamics of volunteers' engagement and the evolution of quality over time. The raw data contain three transcriptions for each act: Two by the volunteers, and one checked or corrected by the appointed controllers (Figure 1). The transcription itself is split into a number of fields, such as groom age, bride age, and groom address. Each entry can be linked to a scan identifier and a volunteer or project member identifier; in combination with the timestamps on the entries, these data allow us to chart activity. 1 Because we can trace data entry all the way to the (anonymized) individual volunteers, we can examine the dynamics of engagement by tracking the amount and quality of entries throughout the project.

Data analysis
The analysis of data includes the quantitative measurement of two essential aspects of citizen science discussed above: 1) sustained engagement of volunteers, and 2) the quality of the contributions. We also identify the engagement strategies and qualitatively assess their effect on engagement.

Sustained engagement
The measures summarized in Table 1 provide a good view of the level of volunteer engagement for a project overall and allow comparisons among projects (see measures for the "Yes, I do!"-project in supplementary file). However, such measures do not allow tracking the engagement of volunteers throughout a project nor assessment of the effect of specific strategies on engagement within the project. Project appeal, measured as the average number of people who have contributed throughout a project (Cox et al. 2015), cannot be used to assess recruiting strategies during a project. Therefore, we propose to complement project appeal with two additional measures. First is the ratio of new volunteers (RN), that is, the percentage of new volunteers in relation to the total number of volunteers registered in the same period (day, week, month) of the project. Second, in the case of projects running on multi-project platforms, is the ratio of platform members (RP) as the number of project volunteers who already were members of the platform before the start of the project. Such measures ( Table 3) can be used to track participation over time and to assess whether fluctuations are related to recruiting and outreach strategies.
Other measures, such as activity ratio (A), variation in periodicity (V), the median number of contributions per volunteer (PC), and the median time of active participation (SE) (Aristeidou et al. 2017;Cox et al. 2015) indicate overall project activity and are useful to compare engagement across projects. However, it is important to consider the fluctuation of engagement throughout a project. Activity differs among participants and changes over time, including periods with more or less transcriptions or even temporary inactivity. Therefore, we believe that tracking the actual level of activity (LA) over time is a straightforward way to evaluate engagement strategies. Similarly, the distribution of effort measured with the Gini coefficient (Cox et al. 2015;Sauermann and Franzoni 2015) is an important indicator to understand engagement but cannot be linked to the strategies applied by the project team.
Engagement understood as the extent to which someone is involved in a task can be measured by the time devoted (D) to a task on a daily basis . This measure also can be tracked over time if a project collects this information, but that was not the case in the "Yes, I do!"-project. An alternative way to track engagement in terms of effort and concentration is the share of contributions made over time across different types of tasks (TI), as well as the growth in accuracy (AG). Together these measures indicate the dedication to the project and the learning effect.
Finally, given the importance of feedback as an engagement strategy in citizen science (Causer and Wallace 2012;Eveleigh et al. 2014;Wal et al. 2016), we measure the extent to which feedback is provided promptly by tracking the number of days between data entry (in our case by the second volunteers) and the moment that both   • Total number of active volunteers in every period of the project (day or week or month) (RN).
• Number of active volunteers that join at each period (day or week or month) of the project (RP).
• Number of volunteers who were already members of the platform before the project (RP).
• Number of submitted contributions in each period (day or week or month) of the project (LA, TI).
• Number of insertions, deletions, substitutions, and/or transpositions made in the data of each submission (AG).
• Date of each submitted data contribution (LA, TI, FD).
• Date of each submitted data correction (verification or validation) (FD).
RP Ratio of platform members Percentage of volunteers who were members of the platform before the project in relation to the total number of volunteers of the project in the same period (day, week, month) during the project.

Figure 9
See next section "Quality" AG Accuracy growth Growth in accuracy estimated using regression. entries were compared and verified, validated, or in our case corrected (FD).

Quality
Engagement in citizen science, including sustained participation and an increasing number of completed tasks, also affects the development of skills that, in turn, contribute to data quality (Cooper et al. 2007), which is essential for the reliability of scientific research. Citizen science projects ensure the quality of citizen contributions, for instance, by comparing them with prior literature or scientific standards (Riesch and Potter 2014;, and in the case of transcriptions, having peers or experts review and correct them (Brumfield 2012). However, to our knowledge no method has been proposed to measure the quality of citizens' transcriptions against such peers or experts and to track quality levels over time.
In the transcription of manuscripts, quality is expressed in terms of accuracy, understood as the extent to which a transcription matches the original manuscript. In our case, we are able to measure the accuracy of contributions because a team of experienced controllers provided the final corrected entries -a "ground truth" -which have been used as a benchmark value. It would be undesirable, however, to categorize entries as entirely right or wrong, because some entries contain far more mistakes than others. We therefore use a string distance to measure the agreement between the volunteers' entries and the final corrected entries. Specifically, we use the optimal string alignment (OSA) distance, which counts the number of insertions, deletions, substitutions, and transpositions needed to go from one string to the next. We use a normalized version of the OSA distance, which divides the distance by the maximum possible distance to correct for the number of mistakes that the volunteers could make in a transcription (van der Loo 2014). These normalized distances are expressed as a similarity, where a score of one is a perfect match and zero is the maximum number of possible mistakes.
We compare the entries on the transcription of the groom's address. Though the names of the bride and groom would have been an ideal test, these were already provided in the index to the premarital registers, so the volunteers did not transcribe them. We therefore chose the groom's address because it is present in 97 percent of the cases and there was limited scope for guessing based on a few letters, as would for instance be the case with the place of birth. The accuracy measure is illustrated in Table 4. Groom address 1 and 2 are the volunteers' submissions, and groom address final refers to the corrected entry. The "sim" columns provide the OSA string similarity score.

Recruiting and outreach
Recruiting efforts for the "Yes, I do!"-project took place mostly before the start and during the first few months of the project. The project team set out a long-term communication plan to identify target groups (e.g., genealogists or visitors of archives), contact moments, and several communication instruments. For instance, before the start of the project, the City Archives of Amsterdam and the Central Bureau for Genealogy of the Netherlands were contacted, amongst others, to promote the project among their members. Announcements of the upcoming project were included in several amateur historian journals, and people potentially interested in the topic, such as visitors of archives, were offered attractive postcards with witty texts and a short explanation of the project goals to attract them to participate. The project also benefitted from being part of the velehanden.nl crowdsourcing platform and hence being announced on its website and newsletters. This resulted in about 47 percent of the participants (for which it could be determined) already being members of the platform before the start of the "Yes, I do!"-project.
The recruiting strategy was successful in attracting volunteers at the start of the project, as half of the participants (93) joined in the first four months. Throughout the project a total of 521 people registered to participate. Few new volunteers were recruited from mid-2014 onwards, with the exception of a spike in February 2015, when half of the active volunteers were new volunteers. The paleography workshop held during that month is unlikely to be behind this increase, as it attracted 12 participants and was aimed at volunteers who already were active in the project. A more likely explanation according to the project leader was the mention of the project in the popular Dutch genealogy Gen (Van Weeren and Boele 2016) and in science magazines (amongst others EOS Wetenschap [Van der Kraan 2016]) and on the citizen science website iedereenwetenschapper.nl.
A total of 182 active volunteers transcribed one or more scans ( Table 2); the number of active volunteers, however, varied over the course of the project. After the initial enthusiasm of the first month (February 2014), the number of active volunteers declined from 57 in February to 36 by May 2014 (Figure 2).
The project experienced a rapid growth in data entry right from the start (Figures 3 and 4). However, the number of entries dropped significantly after five months, from an average of more than 2,000 acts per week in April-June to an average of less than 1,000 acts per week in September 2014. The drop in total activity between June and September 2014 (Figure 3) coincided with a drop in the number of volunteers (Figure 2). The level of activity recovered after September 2014 and stayed more or less stable at about 2,000 acts per week until the end of the project. The high number of transcriptions throughout 2015 was entered by a relative stable number of volunteers (Figure 2) of about 30 people on average from March 2015 onwards. In other words, while the number of entries remained high after the second half of 2014 (Figure 3), the number of active volunteers did not recover its initial burst (Figure 2). This means that the high number of entries was maintained by a small group of volunteers ( Table 2). This is also confirmed by the plots on user activity (Figure 5), which show the skewedness of the distribution in number of entries among participants. This means that the increased activity per volunteer was key in understanding overall activity levels in the second half of the project.
A number of outreach activities also were undertaken to keep people interested by satisfying volunteers' appetite for knowledge about the historical period and related topics (Table 5). These activities mostly involved lectures given by well-known Dutch historians. They were announced via the project's website (www.collective-action.info/Ja-ik-wil) and the velehanden.nl platform's newsletter. Though these lectures were free and open to anyone, project volunteers had priority due to the limited space.   The introductory lecture by De Moor (March 2014) was followed by an increase in the number of entered transcriptions (Figure 3) despite the decline in active volunteers (Figure 2). The lecture by Prak (October 2014) also led to an increase in activity after the event. However, in general we do not observe any systematic relation between the outreach events and the activity level of volunteers measured by the number of entries. Moreover, we do not know what would have happened without these events, and whether the project may have fizzled out. Answering this question would require comparing the activity of those who attended and those who did not attend the events, or alternatively comparing this project and a project without outreach events. Despite the initial recruiting success, the number of active participants was limited. Therefore, key to understanding the success of the "Yes, I do!"-project is to examine how the project organizers managed to keep participants engaged.

Sustained engagement
To avoid the risk of losing momentum after the initial burst of enthusiasm and to keep people engaged, the project organizers took the following actions. First, from the beginning of the project citizen participants could choose their preferred historical period of data entry, allowing them to adapt the difficulty of the task to their selfperceived level of paleography skills. Second, to enhance the transcription skills of the volunteers, paleography workshops were organized throughout the project. Third, through an online blog, advice was given to volunteers in case of particular difficulties in reading the acts. Fourth, the data entry and control procedure fulfilled the double role of providing feedback and rewarding participants.
Each time a scan was entered the participant received three points. Each time a scan was checked, the volunteer who had originally entered the data received another three points, regardless of the quality of the original entry. Controllers also received three points per controlled scan. The number of points of each participant was visible on the project's environment on the velehanden-platform. These points could be exchanged for books at the project's bookstore. The sections that follow show the effect of the described actions on sustained engagement in terms of activity and quality dynamics throughout the project.

Task-competence fit
The tasks in the "Yes, I do!"-project involved reading and digitally transcribing information from the handwritten premarriage acts. These tasks differed in their level of difficulty. For instance, early 17th-century handwriting is substantially harder to read than that from the late 18th century and is also more likely to contain unfamiliar words. To ensure a good fit between the difficulty of the task and the participants' self-perceived competence, citizen participants could select their preferred historical period of data entry (Figure 6).
Volunteers began with the easiest transcription tasks, which included the scans of the period 1751-1811 (Figure 7, bottom right chart). As the project progressed, and volunteers were trained to tackle the more difficult 17th-century handwriting, the share of transcribed acts from that period (Figure 7, upper charts) increased and became the most popular task by mid-2014. Because of the popularity of these harder-to-transcribe acts, in the early months of 2015 the share of transcribed 18th-century acts slowed down considerably. As a result of a high task interest (TI) in 17th-century acts, these tasks were actually completed earlier and a substantial number of acts from the second half of the 18th century (easier to transcribe) had to be completed at the end of the project.

Training
Given the difficulty of the tasks and the need to achieve a high transcription accuracy, the project organizers offered paleography workshops, for both basic and advanced levels ( Table 6). The workshops took place in the evening (after 7:00 pm) and were held at a central location (Utrecht) to facilitate participation. Volunteers were also invited to "entry café" sessions, where they could work together in a co-working space, instead of participating from home. These learning opportunities were not only meant to improve accuracy, but also to reward volunteers in terms of human capital. No clear pattern has been observed in the level of activity that could be directly linked to these workshops (Figure 3).

Feedback
The project did not include a formal procedure to give volunteers substantive feedback on the quality of their contributions. Unless volunteers posed a very specific question, they did not receive any feedback on the content or accuracy of their entries. However, the quality control process can be considered a feedback method. Feedback  involved receiving points once a scan was checked (regardless of its quality). Whenever an entry had been controlled, volunteers saw their number of points increase. Despite this limited type of feedback, the analysis of data indicates an effect on the level of activity. It seems that the volunteers' activity was influenced by the delay between the entry of their transcription and the final quality control. Because volunteers did not get credit for their submissions until these were checked, this delay could have demotivated them. Figure 8 includes two charts which show that as the delay increased (upper chart), the number of entries dropped sharply (bottom chart). Once the delay between entry and verification was reduced, volunteers' activity increased again.
This reaction to feedback delays seems to point to a need for recognition from the experts. Given that most volunteers still had not exchanged their points for books at the end of the project, apparently volunteers were not so much interested in the material rewards they could buy, but on the implicit meaning of receiving those points: An indication of recognition or immaterial reward. Because trading points for books or other benefits would lead to a drop in the number of points collected in total, this would also affect their reputation as an active volunteer. Many highly active volunteers seemed to prefer to keep their number of points -also visible on their profile on the velehanden-site -over material rewards.

Recognition
An essential aspect of the interaction between project organizers and volunteers was the availability of the organizers and their prompt reaction in dealing with volunteers' queries. According to the organizers, this was perceived as being more important than giving contentrelated feedback on the entries. However, it would not be correct to conclude that volunteers did not care about getting feedback or about the quality of their transcrip-tions. Throughout the project, intensive communication took place to support the volunteers in their tasks and to help them deal with very specific problems. Communication took place via the project's blog, email, and project website. Messages on the project's blog and website remained available throughout the project, and whenever a common error or problem was identified by the project organizers, it was included in the project's volunteer manual. Moreover, organizers put a lot of time and effort into explaining the goals of the project and reporting on the progress achieved. For instance, the project's website (www.collective-action.info/Ja-ik-wil) was an important place for announcing intermediate results, such as stories and background related to particular archival findings of the volunteers.

Quality
The objective of the "Yes, I do!"-project was not only to engage citizens in the transcription of more than 90,000 premarriage acts and to achieve that in two years, but also to make these transcriptions available for further research. This meant that transcriptions needed to be as accurate as possible. For this purpose, project staff provided training to participants and used the velehanden.nl platform's quality control procedure. The development in the monthly average accuracy is shown in Figure 9. Accuracy increased in the early months of the project, from 0.86 in February to 0.93 in March 2014, with the average level of accuracy remaining relatively stable above 0.90 for the rest of the project. The overall improvement in the early months of the project was due to both individual volunteers improving and the exit of less accurate volunteers.
Generally, the most active core volunteers were also the most accurate. Because there were substantial differences in the difficulty of the transcription tasks, some of the progress made by the volunteers cannot be seen in Figure 9. As the project progressed, volunteers chose to transcribe  (Figure 7). Volunteers did not perform worse on these more challenging tasks, so improvements in the aggregated data might be masked by this shift towards more difficult scans. In Table A1 (supplementary file) we investigate the volunteers' improvements in accuracy in more detail. The analysis in Table  A1 shows that controlling for handwriting period to capture task difficulty does not change the estimate of the improvement rate substantially.
Overall, volunteers improved as they entered more scans at a rate of 0.01 (see the estimate on log(n entries) in Table  A1 in the supplemental file, ranging from 0.009-0.016, which we round here to 0.01). This means that for every doubling of the number of entries, accuracy improved by 0.01 * log(2) = 0.007 points. To get a feeling for the size of this improvement, a difference of 0.1 would be equivalent to the difference between transcribing the address "Blomgracht" as "Blomgraft" (two edits; similarity 0.8) and the transcription "blomgracht" (one edit; similarity 0.9). The improvement rate as the volunteers transcribed more acts was therefore not very high, but keep in mind that the starting accuracy was over 0.8.
In Figure 10 we plot the starting accuracies and improvement rates of the volunteers against the number Figure 9: Accuracy over time (monthly average accuracy measured by the optimal string alignment (OSA, see page 11) string distance metric between the volunteers' transcription and the final correction of the groom's address), by volunteer group (core: >1,000 scans; not core: ≤1,000 scans).  of entries to investigate the relationship between activity and accuracy. As an example, one volunteer has been highlighted in the figure: This volunteer contributed about 5,000 scans, and started with an accuracy of only 0.4, but improved very strongly, by a rate of 0.0006 for every 1 percent increase of entries, implying an estimated end-accuracy of 0.9 (0.4 + 0.06 * log(5000)).
Overall, though, it can be seen that there was no strong relationship between the number of entries made by a volunteer and the improvement rate. Very active volunteers did tend to have somewhat higher starting accuracies, which could be one reason to stay in the project and hence become very active.

Discussion
After an initial burst of enthusiasm, people might lose interest in online citizen science projects, as has been shown in the "Yes, I do!"-project. This is the reason why prior studies have recommended to recruit participants throughout the course of a project West and Pateman 2016). Because not all projects can afford the costs of continuously promoting the project, project organizers need to engage existing volunteers so that they continue to contribute for extended periods of time. Our research shows that engagement strategies which contribute to individuals' sense of worth (Crawford et al. 2014), such as outreach events, training, feedback, and recognition, play an important role in sustaining activity in citizen science.
Although it is hard to pinpoint the exact effect of the outreach events, these may have played an essential role in keeping the momentum of the project going, partly because they enhanced the community formation among the volunteers but also because they may have been considered as tokens of appreciation by the volunteers. The most active volunteers in this project were clearly well informed before the project but considered their participation also as a challenge. Hence, the provision of training also encouraged volunteers to take up challenging tasks, such as the transcription of 17th-century texts, which are difficult to decipher. We have seen how volunteers started with easier tasks (late 18th-century transcriptions) but quickly became engaged and started to transcribe more difficult scans.
In their learning process, feedback proved to be an essential element to the project's volunteers as in any other learning process. Prior research has shown that formative feedback in citizen science projects has a positive effect not only on citizens' learning but also on their long-term engagement or retention (Wal et al. 2016). However, due to limited resources, most projects limit their feedback to acknowledging and indicating the correctness of citizens' contributions. We have provided quantitative empirical evidence from the "Yes, I do!"-project indicating that the speed at which feedback is given also influences citizens' engagement, and that delays in feedback result in declining activity. These findings are in line with earlier research pointing to citizens' concerns about the uncertainty of the quality of their contributions (Eveleigh et al. 2014), but they also show that volunteers are eager to move on, possibly also stimulated by the recognition or immaterial rewards that this delivers.
Material incentives were not very important for a highly demanding project like "Yes, I do!" where people seemed to value more the recognition of their expertise by professionals. The value of professional recognition also has been discussed in other online contexts, for instance in firm-hosted user communities and especially among innovative users (Jeppesen and Frederiksen 2006). Immaterial rewards appear to be more rewarding than material incentives in situations with high intellectual demands. This may also translate into high demands towards the project organizers to support the formation of a "volunteer community."

Recommendations for practice
Based on the success of the "Yes, I do!"-project, we recommend online citizen science projects and their organizers to develop their own communication plans and recruiting strategies before starting the project to identify target groups and determine how and when to approach them. However, communication is needed before, during and even after the project, and agile planning is necessary. Plans may have to be adjusted throughout the project, as organizers will learn during the project what works best in their specific project and with their specific community of volunteers. The project coordinators should regularly report the progress made by volunteers to the volunteers, collectively, but also on an individual level.
We also recommend online citizen science projects that want to support learning and sustained engagement to propose clear and manageable tasks, to reply quickly to questions, requests, and problems of volunteers, to encourage volunteers to answer to other people questions, and to provide the means for the creation of a project community. In other words, we recommend them to challenge their volunteers and let these show off their expertise. Volunteers appreciate it when their built-up expertise is consulted during the project and, over time, they may even want to take over tasks from project organizers. This, again, can also be a form of recognition of the volunteer's expertise. Moreover, volunteers of whom a substantial intellectual effort is demanded very much value the interaction with researchers and the learning that goes with it. Reciprocity between researchers and volunteers therefore is essential to making a project successful, and hence project organizers should be aware of the time and effort it requires.
Finally, one might question whether the efforts in attracting a large group of volunteers is worthwhile for medium-sized projects, especially when most of the output is delivered by a small core group of volunteers. Based on the experience in the "Yes, I do!"-project, we believe that a critical mass of participants is needed to ensure the functioning of the "core", in order to have a pool to recruit active volunteers from, but also as a necessary part of the reputation mechanism. If there is no outer group to compare their own achievements to, the most active volunteers may be less motivated. However, future research is needed to examine the dynamics between the multiple layers within the volunteer community. For instance, it would be useful to learn whether the mass of peripheral participants can act as a reservoir of volunteers who can potentially be engaged and become more active, and to what extent the core group can encourage them to do so.

Limitations and future research
This study may be limited by the type of project examined. The potential reach of the "Yes, I do!"-project was limited by the language of the manuscripts (17th-to 19thcentury Dutch), hence reducing the number of potentially interested and skilled participants. Because the number of studies reporting on projects with such a relatively small reach is still limited, future research could focus on similar projects. A second limitation of our study might be the retrospective analysis of the data, instead of monitoring the engagement dynamics in real time as they happened. Moreover, because all volunteers became part of the same community and were invited to the same outreach programs, it is difficult to assess the effectiveness of those programs, because we do not know how the project would have progressed without them. Future research should therefore take a broader look and compare various citizen science projects, with different community sizes, types of volunteers, incentive structures, and communication methods to help single out which strategies work for different types of projects.

Conclusion
The "Yes, I do!"-project can be considered a success because the objective of transcribing and creating a high-quality dataset with 90,000 premarriage acts was achieved within the planned two years. The project attracted a considerable number of interested people but was primarily carried out by a small core group of engaged volunteers, as 11.5 percent of the total number of volunteers transcribed 91 percent of the scans. Recruiting and outreach strategies seem to have been effective, as 47 percent of the participants were recruited from the existing pool of volunteers in the velehanden.nl platform and 53 percent registered as new volunteers during the project (excluding users for whom no registration date is known). These newly registered volunteers contributed substantially, sending in 51 percent of all entries throughout the project.
Giving participants the autonomy to choose the type of task and training their paleography skills have proven to be effective strategies to engage participants and to ensure that difficult tasks were completed on time and with high quality (average OSA of 0.9). The point system that was initially set up as a way to claim material rewards (books) turned out to be a feedback and recognition system for volunteers. Feedback and recognition also have a positive influence on engagement, as shown by the effect of the speed at which entries were controlled and volunteers rewarded with points on the level of activity throughout the project. Quality control and its corresponding feedback notifications should therefore be given as soon as possible after volunteers' submissions, otherwise they may stop contributing altogether. The project also resulted in high quality transcriptions, mainly thanks to volunteers being considerably accurate right from the start and improving slightly over the course of the project.
Overall, our research expands the citizen science literature by adding the study of a successful midsized online citizen science project in the humanities. Moreover, this study contributes to a better understanding of the dynamics of engagement by measuring citizens' activity and quality of their contributions over time. The evaluation of the project shows that data processing by volunteers in a citizen science project -even with a high-demanding task involving early modern paleography skills -can be highly successful, both in terms of quantity and quality, if tasks are manageable, volunteers are engaged through outreach events and training, and feedback and recognition are provided promptly.

Note
1 Though the timestamps on the metadata allow us in principle to measure time in seconds, all measures in Tables 1, 3, and B (Supplementary file), as well as figure 8 are in days. When measuring engagement and quality over time, we aggregate to either weekly or monthly data to deal with data volatility.

Supplemental File
The supplementary file for this article can be found as follows: • Appendix. Regression models of mean daily transcription accuracy. DOI: https://doi.org/10.5334/ cstp.212.s1