In the beginning of 2020, the bush fires in Australia went out of control, killing thousands of wild animals and plants. At the same time, relevant parts of the Brazilian Amazon rain forest were burning. Events like these obviously destroy biodiversity on a larger scale. However, also in Europe, natural habitats are endangered by natural and human influences. The ongoing debate about the indicators, scope and consequences of climate change adds further urgency to the question of how biodiversity is developing over time.
While there is an increasing demand to measure biodiversity (and its loss) accurately, the number of taxonomic experts has been declining over the past decades due to budget savings. This brings up the question of how emerging technologies can help us measure biodiversity in a more efficient way. At Biodiversity Next conference 2019 in Leiden (NL), naturalists and biologists from all over the world (including myself) met in order to find ways to speed up biodiversity research and build a worldwide network of biodiversity data resources.
Automating Observation & Interpretation
One very time-consuming task in biodiversity research is data collection. Traditionally, a scientist might have spent hours waiting for one single observation, chasing away most timid animals and therefore distorting the data. Machine observations free researchers from tedious tasks and even make rare observations possible at all. Researchers have been using camera traps in order to monitor bigger animals like lions or antelopes. But after collecting huge amounts of images, the problem remains that the amount of information exceeds the capacity of human interpreters, with only a small percentage of the collected material being relevant at all. That’s why automating the collection of data will only reach its full potential if data analysis can be automated to a certain degree.
Artificial Intelligence (AI) applications for the interpretation of automatically collected observation data is a popular and fast-growing research field. At Biodiversity Next conference, Sara Beery was the most important voice from this area. She works closely with Microsoft AI for Earth and Google Research/Wildlife Insights. Other projects like PlantNet, iNaturalist and Google Photos combine huge image databases with massive computing power in order to assist users with species identification from images. Australia’s national science research agency CSIRO launched another very promising project: linking the National Research Collections Australia (NRCA) with leading data scientists from the country’s Data 61 project, they are hoping to begin a new program of work focused on using AI and Machine Learning to build a framework and tools that can help identify a specimen from an image. The framework will include AI models that have been trained by expert taxonomists, thus providing a high level of accuracy.
Another benefit from using data science methods in biodiversity research is its expandability. Once a comprehensive base of biodiversity data with a sufficient degree of granularity is available, databases can be linked to other resources like climate databases or soil databases. Such combined databases allow for conclusions about the responses of organisms to abiotic factors, which has the potential to bring highly relevant insights to the current climate change debate. One example is the successful creation of a high-resolution climatic map of the flora of the Netherlands, showing the response of wild plants to climate patterns. According to the authors (Sparrius & van der Hak, 2019), the map was created by combining open data from various sources.
Digitising the Entomological Collection
With around two million specimens, the Entomological Collection of ETH Zurich is among the largest scientific insect collections in Switzerland. It is the result of more than 170 years of collecting by numerous natural scientists. In 2017, the Entomological Collection of Zurich has begun to photograph insect specimens and their associated data labels in the course of ETH Library’s IMAGO project, thus creating extensive collections of correctly identified images with well-curated metadata that can serve as training data sets for machine learning algorithms. In this way, 150 000 specimens from the Palaearctic Macrolepidoptera collection (butterflies and moths) with an emphasis on Switzerland were photographed and their label data recorded in a database.
The Entomological Collection of ETH Zurich provides one of the most comprehensive image data sets available for the Central European insect fauna. With rare specimens being over-represented, it constitutes key primary data for biodiversity research with a view to analysing the distribution and migration of species, for instance. In order to render this research data easy to use again, one of the project’s explicit aims is to make them freely accessible online.
Currently, collections are handling specimens in the following way: at regular intervals, new batches of specimens arrive at the collection, be it from donations or from scientific projects. These batches often consist of tens of thousands of specimens. Each specimen has to be integrated into the already existing main collection of the institution. Since most of the incoming specimens have not been identified properly, an expert usually checks the identification, sorts the specimens and correctly integrates them into the existing collection. This consists to a large part of repetitive tasks such as rearranging specimens, transferring them to new drawers and printing new labels. Given the increasing shortage of experts, this tedious but relevant steps often become a bottleneck in the workflow or are completely skipped. As a consequence, most specimens are databased without prior determinations by experts resulting in incorrect entries or data repositories which cannot be used for scientific studies.
Improving Workflows with Artificial Intelligence
In order to address the persisting challenges, the Entomological Collection started a collaboration with ETH Library Lab to develop a mobile application which could make the workflow of classifying specimens in natural historic collections more efficient and less dependent on the availability of taxonomic experts. ETH Library Lab supports the fellows working on the project and provides the necessary resources and oversight. As collection manager of the Entomological Collection of ETH Zurich, I bring in my domain knowledge in the field of Taxonomy. Already two ETH Library Lab fellows have contributed to the project:
Ankit Dhall has a background in computer vision and machine learning. His research was done during his master thesis on Learning representations for images with hierarchical labels under the supervision of Professor Andreas Krause’s Learning & Adaptive Systems Group at ETH Zurich. Ankit Dhall pre-processed the images and restructured the meta-data from the Entomological Collection of ETH Zurich for machine learning. His work exploits label-hierarchy knowledge (i.e. the taxonomical hierarchies) to improve off-the-shelf image classifier performance. In addition, his thesis shows that using embedding-based models, more commonly used in natural language processing, can be jointly embedded with label hierarchies and images to achieve improvements in image classification tasks. Thanks to the hierarchical information, Ankit Dhall’s CNN-based (convolutional neural network) classifiers and the entailment models both outperform the hierarchy-agnostic classifier on the ETH butterfly image dataset.
Barry Sunderland holds a Bachelor of Mechanical Engineering from the University of Limerick, Ireland and worked for several years as a project engineer in America and Europe. He first got in involved with the Automated Species Identification project as part of his capstone project for the Data Science Bootcamp at Propulsion Academy in Zurich. His main goal now at the Library Lab is to bring Ankit Dhall’s research into practice. In parallel to improving the classification model, Barry Sunderland will deploy the model as a cloud based API to serve the mobile app with the species classifications.
For developing the mobile app, ETH Library Lab has enlisted the support of Propulsion Academy, continuing the rewarding collaboration between the two groups and ensuring that the project reaches production as soon as possible. When released, the app will assist untrained users and citizen scientists as well as experts in research collections, enabling the classification of hundreds of specimens in a day. On a more general level, the project will contribute to the development of new concepts for efficient workflows in natural history collections and for the rapid mobilisation of biodiversity data.
Combining new technologies with natural historic collections provides interesting opportunities for biodiversity research. ETH Zurich is an ideal place to develop innovative applications within and beyond academic research. For example, AI-equipped drones may help farmers to detect insect pests in crops in the near future. With our Automated Species Identification project, we hope to make the workflows in collections more efficient, allowing scientists to spend more time focussing on their research. We also hope to make biodiversity data from natural historic collections openly accessible and easier to analyse.