Event Abstract

Data Quality in Citizen Science Projects: Challenges and Solutions

  • 1 WasserCluster Lunz GmbH, Austria

Introduction Data quality is one of the greatest challenges in Citizen Science (CS) projects. Insufficient data quality originates from the attempt to reduce the effort in data acquisition through the use of trained volunteers. Yet, the involvement of volunteers adds a chaotic component to the study design as neither the sample size nor the temporal and spatial distribution of the collected data can be determined a-priori. Variability among volunteers in knowledge, skills, and motivation, and over-simplification of tasks reduce data quality further. Data quality, however, determines the acceptance of results by both the scientific community and the stakeholders in environmental management. Thus, strategies are needed to identify potential sources of errors in the research design and to control data quality in CS projects. The current paper is the result of a workshop during the 2nd Austrian Citizen Science Conference. It identifies categories of data quality problems and offers solutions for different types of ecological CS projects. Types of data quality problems Data quality problems were grouped into four categories: (1) the scientific value of the data, (2) the objective bias, (3) the subjective bias, and (4) the quality control. The scientific value of the data addresses questions of sample size, completeness of data sets, and the explanatory power of simplified data required for scientific analyses (Suppl.Table 1). Hence, this category questions whether a scientific project is suited for CS in principle. The objective bias arises from the individuality of the volunteers, their personal skills, knowledge, and attitude towards the project, or the amount of time they are willing to afford (Suppl.Table 2). In global projects, cultural differences among participants may present additional challenges. The subjective bias originates more from the specific research subject rather than the individual observer, but may be interlinked with the objective bias (Suppl.Table 3). Seasonal and daily fluctuations in studied parameters, such as species occurrence or water chemistry, may create a temporal bias. Different accessibility to locations (e.g. shoreline of lakes vs. free-water zone) or probability of animal-volunteer-encounters may create spatial patterns unexpected in a-priori planned study designs. The quality control, at last, includes tools for identifying and correcting errors (Suppl.Table 4). This category is partly linked to the first three categories, but additionally addresses errors which may also occur in traditional research projects, such as errors in data entry or wrong identification of rare and unexpected species. Problems in data quality differ among the different types of ecological CS projects: a) Monitoring of species: Volunteers are trained in species identification and record species occurrence and frequency. Specific problems are wrong identifications, especially of rare species (over-motivation), and under-estimation of abundant species (under-motivation). The research tasks usually require high skills and high motivation of volunteers. b) Animal sightings: Volunteers report animal encounters. Specific problems are the temporal and spatial bias of animal-volunteer encounters and the lack of absence data. Volunteer skills are usually of less importance due to the simplicity of the tasks. c) Environmental observations (e.g. phenology): Volunteers report observations in environmental changes. Specific problems are the high amount of data needed and the spatial bias. d) Environmental quality analyses (e.g. water, air): Volunteers measure environmental quality parameters. Specific problems are the accuracy of simplified methods versus established analytical methods, sources of pollution during sampling, and the subjective bias. Potential solutions and strategies Suppl.Tables 1-4 list reasons and solutions for various data quality problems in ecological CS projects based on the workshop discussions. Data quality problems and solutions, especially regarding the objective bias, have been addressed by numerous authors. The use of registered participants with certificates and the ongoing training of volunteers are seen as prerequisites for successful CS projects (e.g. Gouveia et al., 2004; Cohn, 2008; Dickinson et al., 2010). Many authors also stress the importance of external communication experts and pilot-tests to optimize working protocols (Bonney et al., 2009). Descriptions of automatic filters in online data forms, which should prevent wrong data entry or incomplete data sets, are provided by e.g. Bonney et al. (2009) and Bonter and Cooper (2012). Before developing a CS project, scientists need to check the suitability of the research question for CS and the required sample size for scientific analyses (e.g. Conrad and Hilchey, 2011). These questions are especially critical for projects dealing with animal sightings, as e.g. the lack of absence data prevents the scientific analyses of the data. In environmental quality projects, simplification of tasks and adaptions of methods may reduce the scientific output markedly. In such projects, simplified methods have to be validated via comparisons with established methods (e.g. Au et al., 2000; Fore et al., 2001). Besides, spatial and temporal patterns of the required data have to be considered before the project start and need to be addressed in working protocols to reduce the subjective bias. Conclusions Many CS projects are by far more labor-intensive than expected in order to guarantee the data quality required for scientific analyses. Much of the work goes into the recruitment, proper training, and continuous motivation of the volunteers. The preparation phase is especially important in CS projects. A-priori defined no-go criteria, such as minimum sample size or sampling sites required for scientific analyses, prevent the collection of large amounts of data which are neither publishable nor usable for other purposes (e.g. for environmental management). Data quality assurance requires a different approach of CS projects to the research design than traditional research. After the development of the research design, scientists need to reconsider the whole concept from the perspective of possible data quality problems created through the use of volunteers. In this step, scientists need to question which quality problems can be handled within the proposed research design and which problems may afford adaptions of the research concept. Last, scientists need to critically review whether such adaptions may threaten the scientific output of the project. CS can, thus, provide scientifically sound data if potential problems and restrictions are considered in advance and addressed through the application of adequate quality control tools.

Acknowledgements

The conference was funded by the Austrian Ministry of Science, Education, and Research and the Ludwig-Boltzmann-Gesellschaft. We thank the workshop participants R. Brodschneider, M. Cieslinski, B. Heinisch-Obermoser, S. Kainzinger, S. Kroop, G. Leimüller, S. Loiselle, I. Maggini, A. Maringer, K. Premke, S. Pysarczuk, K. Raab-Oertel, B. Rotter, J. Rüdisser, T. Schauppenlehner, F. Schublach, U. Steiner, M. Theurl, T. Walter, K. Wölfel, and R. Zink for their contributions.

References

Au, J., Bagchi, P., Chen, B., Martinez, R., Dudley, S. A., and Sorger, G. J. (2000). Methodology for public monitoring of total coliforms, Escherichia coli and toxicity in waterways by Canadian high school students. J. Environ. Manage. 58, 213–230. doi:10.1006/jema.2000.0323
Bonney, R., Cooper, C.B., Dickinson J., Kelling S., Phillips T., Rosenberg K. V., et al. (2009). Citizen Science: A Developing Tool for Expanding Science Knowledge and Scientific Literacy. BioScience 59, 977–984. doi:10.1525/bio.2009.59.11.9
Bonter, D., and Cooper, C. B. (2012). Data validation in citizen science: a case study from Project FeederWatch. Front. Ecol. Environ. 10, 305–307.
Cohn, J.P. (2008). Citizen Science: Can Volunteers do real research? Bioscience 58, 192–197.
Conrad, C.C., and Hilchey, K.G. (2011). A review of citizen science and community-based environmental monitoring: Issues and opportunities. Environ. Monit. Assess. 176, 273–291. doi 10.1007/s10661-010-1582-5.
Dickinson, J.L., Zuckerberg, B., & Bonter, D.N. (2010). Citizen science as an ecological research tool: Challenges and benefits. Annu. Rev. Ecol. Evol. Syst. 41, 149–172. doi 10.1146/annurev-ecolsys-102209-144636.
Fore, L.S., Paulsen, K., and O’Laughlin, K. (2001). Assessing the performance of volunteers in monitoring streams. Freshwat Biol 46, 109–123.
Gouveia, C., Fonseca, A., Camara, A., and Ferreira, F. (2004). Promoting the use of environmental data collected by concerned citizens through information and communication technologies. J. Environ. Manage. 71, 135–154.

Keywords: citizen science, data quality, objective bias, Subjective bias, Quality control

Conference: Austrian Citizen Science Conference 2016, Lunz am See, Austria, 18 Feb - 19 Feb, 2016.

Presentation Type: Oral Presentation

Topic: Citizen Science - Quo vadis?

Citation: Weigelhofer G and Pölz E (2016). Data Quality in Citizen Science Projects: Challenges and Solutions. Front. Environ. Sci. Conference Abstract: Austrian Citizen Science Conference 2016. doi: 10.3389/conf.FENVS.2016.01.00011

Copyright: The abstracts in this collection have not been subject to any Frontiers peer review or checks, and are not endorsed by Frontiers. They are made available through the Frontiers publishing platform as a service to conference organizers and presenters.

The copyright in the individual abstracts is owned by the author of each abstract or his/her employer unless otherwise stated.

Each abstract, as well as the collection of abstracts, are published under a Creative Commons CC-BY 4.0 (attribution) licence (https://creativecommons.org/licenses/by/4.0/) and may thus be reproduced, translated, adapted and be the subject of derivative works provided the authors and Frontiers are attributed.

For Frontiers’ terms and conditions please see https://www.frontiersin.org/legal/terms-and-conditions.

Received: 05 Apr 2016; Published Online: 08 Sep 2016.

* Correspondence: Dr. Gabriele Weigelhofer, WasserCluster Lunz GmbH, Lunz am See, 3293, Austria, gabriele.weigelhofer@wkl.ac.at