Machine Learning Prediction of CITE-seq Protein Expression from scRNA-seq Data

ECBME 4060 (2022) Project

Our project is based on the data available from the Kaggle Competition Open Problems - Multimodal Single-Cell Integration and consists in the prediction of cell surface protein expression (CITE-seq) from single-cell RNA expression data (scRNA-seq).

Our main goal was to understand and evaluate how patient-specific, or independent, is the genetic information.

We constructed machine learning models (LR, MLP, XGBoost & SVM) incorporating feature selection using PCA and SVD and improved the performance by 10% compared to the best scoring on Kaggle.

Quentin Chappat
Quentin Chappat
Senior AI Engineer at Balbix

Related