Machine Learning Prediction of CITE-seq Protein Expression from scRNA-seq Data
ECBME 4060 (2022) Project
Our project is based on the data available from the Kaggle Competition Open Problems - Multimodal Single-Cell Integration and consists in the prediction of cell surface protein expression (CITE-seq) from single-cell RNA expression data (scRNA-seq).
Our main goal was to understand and evaluate how patient-specific, or independent, is the genetic information.
We constructed machine learning models (LR, MLP, XGBoost & SVM) incorporating feature selection using PCA and SVD and improved the performance by 10% compared to the best scoring on Kaggle.