Proteomics data collection for AI

Collection of open-source and categorized datasets in the field of Proteomics

The project aims to build an all-in-one standard collection of pre-selected open-source available datasets in the field of proteomics and to provide them for machine learning applications. The datasets will be pre-categorized, indexed and cleaned, so that they are immediately ready-for use to appropriate projects.

Many open-source datasets are available, but they are not collected in one place. Moreover, the datasets are not well described, not categorized in which particular subfield of e.g.: Proteomics they correspond, and they are often very dirty.

In a further step, the dataset collections could be containerized per Proteomics sub-field, with its descriptive metadata. In this way research scientists and / or fellow students within the field of biochemistry, could only load the respective containers / packages over a “Python” or “R” command and use them directly for machine learning as train and test datasets.

In the publication of (Mann et al., 2021) the increasing importance of transfer learning and the elaboration of a more transparent open-source architectures that allow combination of different data, is well described. This is especially needed within the field of OMICS-data, which are nowadays extensively used for precision medicine. Up to today, the search for appropriate open-source datasets and the cleaning is still done manually and for every project from the scratch (unless an internal database was set-up in the respective research institute), this approach is very time consuming and also brings in some bias between research groups and institutions, since not all are using the same data for training and testing of the AI/ML models. This problem can be solved by establishing a standard collection of open-source datasets and making them easily available over constrainers or packages, which is the main goal of this project.

Project Owner

Kristina Djordjevic

Master Student Medical Informatics at FHNW, Muttenz

We use cookies to help us give you the best possible user experience on our website. By continuing to browse the site you are agreeing to the use of cookies. More information about privacy can be found here.

Accept