Projects | Quentin Chappat

MRI Motion Artefacts Simulation and Correction using Complex-Valued Deep Learning

Tue, 16 May 2023 09:12:35 +0000

Magnetic resonance imaging (MRI) has become an indispensable tool in modern medicine, providing unparalleled soft tissue contrast for diagnosis and treatment planning. However, MRI scans are notoriously sensitive to patient motion, which can severely degrade image quality. We set out to tackle this long-standing challenge by applying state-of-the-art deep learning techniques directly on raw k-space MRI data.

We were eager to test novel deep network architectures for simultaneous motion correction and high-fidelity image reconstruction. Our work leveraged the massive open-source fastMRI dataset from Facebook AI Research & NYU, comprising over 10,000 raw brain MRI scans, as training data. We modified the TorchIO library to simulate realistic motion artifacts on the raw complex k-space by applying randomized elastic transformations. This created motion-corrupted scans that formed critical augmented training data for our deep networks.

At the core of our methodology lies AFT-Net, a cutting-edge deep network combining artificial Fourier transform layers with convolutional units optimized for complex MRI data. We trained customized AFT-Net models end-to-end to learn mappings from motion-corrupted inputs directly to target motion-free image reconstructions.

The results have us thrilled about the possibilities: our networks reliably corrected simulated motion artifacts, improving structural similarity over 0.9 against ground truth on test cases. This is a very promising outcome that demonstrates the power of data-driven deep learning to combat this age-old MRI challenge.

While more work remains in validating on real motion-corrupted data, our study highlights the potential for efficient and robust motion correction through deep learning on raw k-space. This could truly prove to be a game changer in improving diagnostic reliability of MRI scans along with reducing costs.

Robust Multi-Omics Prediction for RNA Expression & Protein Surface Levels

Fri, 21 Apr 2023 12:52:49 +0000

Understanding how DNA, RNA, and proteins interact within individual cells can provide important insights into cellular function and disease. Recent advances in single-cell genomics have enabled the measurement of multiple molecular modalities within the same cell. In this project, we developed machine learning models to predict RNA expression from chromatin accessibility data and protein levels from RNA expression in single hematopoietic stem and progenitor cells.

Background

Multi-omics single-cell data provides a unique opportunity to model the complex regulatory relationships between different layers of molecular information. Chromatin accessibility indicates which regions of DNA are available for transcription, directly influencing gene expression. Likewise, RNA levels serve as a template for protein synthesis. By training models to predict across modalities, we can better understand these processes.

Methods

We utilized two large datasets from the Kaggle Competition Open Problems - Multimodal Single-Cell Integration that we already worked on during the Fall semester:

Multiome: 105,942 cells, 228,942 genomic features, 23,418 RNA targets
CITEseq: 70,988 cells, 22,050 genes, 38 protein targets

To handle the large data sizes and high sparsity, we converted the data to sparse matrices using scipy.sparse.csr_matrix. This allowed for significant memory optimization, as sparse matrices only store nonzero values.

We then performed extensive preprocessing, including normalization, scaling, and dimensionality reduction using truncated singular value decomposition (SVD). SVD allowed us to extract the most important signals and reduce the data to tractable sizes for modeling.

For each dataset, we compared multiple regression algorithms, including elastic net, LightGBM, and neural networks. Models were trained on early time points and evaluated on their ability to predict unseen future time points.

We performed hyperparameter optimization using RandomizedSearchCV and Bayesian Optimization to find the best model configurations. Regularization and neural architecture tuning were critical to prevent overfitting.

Results

The final models demonstrated excellent predictive performance in cross-validation, with Pearson correlations above 0.9 between predicted and true labels. This suggests chromatin accessibility is highly indicative of gene expression patterns, and RNA levels accurately reflect protein abundance.

Conclusions

This work showed RNA and protein levels can be predicted directly from sequence data alone. Our models leveraged single-cell multi-omics measurements to learn regulatory relationships. The success highlights the wealth of information encoded across modalities. Integrative computational analysis will be critical for maximizing insights. Future work should investigate more modalities and model dynamics over time.

Overall this project demonstrated the power of combining cutting-edge genomics with machine learning to uncover new biology. Predictive modeling across layers of cellular information represents an exciting area for continued methodology development and discovery.

Machine Learning Prediction of CITE-seq Protein Expression from scRNA-seq Data

Wed, 21 Dec 2022 12:52:49 +0000

Our project is based on the data available from the Kaggle Competition Open Problems - Multimodal Single-Cell Integration and consists in the prediction of cell surface protein expression (CITE-seq) from single-cell RNA expression data (scRNA-seq).

Our main goal was to understand and evaluate how patient-specific, or independent, is the genetic information.

We constructed machine learning models (LR, MLP, XGBoost & SVM) incorporating feature selection using PCA and SVD and improved the performance by 10% compared to the best scoring on Kaggle.

Introduction to personal finance

Thu, 16 Jun 2022 19:57:49 +0000

I organized with a friend a conference on personal finance to introduce this subject to Télécom Paris' engineering students and explain them why they should invest as soon as possible, where they could start investing and how they could develop their portfolio in the future.

We first explain them how to handle their money by planning (for retirement, for safety and in case of hazards, etc.) Then, we explained them why it is important the sooner to retrieve highest interests and not loose money because of inflation. Finally, we introduced them investment options in various areas (funds, real estate, etc.).

Sentiment Analysis Of Tweets

Thu, 16 Jun 2022 09:28:14 +0000

With a group of 3 other students, we competed on Kaggle during 2 weeks with other teams from our school to analyse and classify tweets' sentiments into positive, neutral or negative labels using natural language processing (NLP) techniques.

We first realized a deep preprocessing of Tweets using many NLP techniques (regular expressions, smiley conversion, etc.). Then, we developed a natural language processing model using Google’s BERT algorithm and Hugging Face frameworks to predict sentiments of Tweets.

Worldwide Temperature Prediction

Thu, 16 Jun 2022 09:12:35 +0000

With 3 other classmates we worked on the prediction of worldwide temperature starting with data from dry and mild temperate climate locations.
We worked on data analysis and cleaning and then used machine learning algorithms (Lasso Regression and XGBoost Regressor) to train models to predict temperature in locations everywhere on the globe while knowing we had a problem with our training data which was incomplete near the equator.

Wine Ratings Prediction using Vivino.com database

Thu, 16 Jun 2022 08:53:54 +0000

The project aims to predict the ratings of French red wine thanks to machine learning and web-scraped data from the website Vivino.com.

The main goal is to help customers to find the best bottle for them according to some of their own criteria. We assume that one possible goal for the readers of the project is also to understand what really drives the ratings of wine on Vivino.com.

It could also help the wine-growers to know how a wine newly commercialized would be potentially welcomed by customers.

We first accessed the website’s database using the API requests and built a structured JSON file.

We then trained and improved machine learning models (LR, SVM & XGBoost) to predict wines’ ratings using their respective chemical properties (acidity, tannin, intensity & sweetness) and prices.

Automatic deforestation detection through satellite imagery?

Sun, 24 Oct 2021 10:12:42 +0000

The goal of this project was to study how satellite imagery can meet the needs of forest monitoring for environmental issues. I worked with 5 other classmates on automatic detection of deforestation in Amazonia. We studied satellite imagery databases of NASA and ESA in infrared and visible lights.

After analyzing the available data sources and their advantages and disadvantages, the objective was to develop a model in Python to update a deforestation map with image analysis approaches. We developed solutions in Python using clustering algorithms like k-means to detect deforestation.

Adjuteo - Developing a local social network for students

Sun, 24 Oct 2021 07:31:00 +0000

Adjuteo is a social network (coded on Android and web) that I developed with 7 classmates during a school project called TCLP (Thematic Collaborative Learning Project).

The idea was to simplify and centralize all needs of help by facilitating mutual aid and swop between students of a same campus or residence.

We developed a website and an Android application that included a swipe system, recommandation algorithm, gamification to encourage users to help others, an automical tags creation system to categorize every publication automaticaly, etc.

I worked on the development of the Android application (backend and frontend) and the recommender system (coded in Python).