Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts of omic data in the next years. In this scenario, we explore the potential of machine learning and, particularly, deep learning for breast cancer subtyping. Due to the paucity of publicly available data, we leverage on pan-cancer and non-cancer data to design semi-supervised settings. We make use of multi-omic data, including microRNA expressions and copy number alterations, and we provide an in-depth investigation of several supervised and semi-supervised architectures Obtained accuracy results show simpler models to perform at least as well as the deep semi-supervised approaches on our task over gene expression data. When multi-omic data types are combined together, performance of deep models shows little (if any) improvement in accuracy, indicating the need for further analysis on larger datasets of multi-omic data as and when they become available. From a biological perspective, our linear model mostly confirms known gene-subtype annotations. Conversely, deep approaches model non-linear relationships, which is reflected in a more varied and still unexplored set of representative omic features that may prove useful for breast cancer subtyping.

Investigating Deep Learning based Breast Cancer Subtyping using Pan-cancer and Multi-omic Data

Cascianelli S.;Canakoglu A.;Carman M.;Nanni L.;Pinoli P.;Masseroli M.
2022-01-01

Abstract

Breast Cancer comprises multiple subtypes implicated in prognosis. Existing stratification methods rely on the expression quantification of small gene sets. Next Generation Sequencing promises large amounts of omic data in the next years. In this scenario, we explore the potential of machine learning and, particularly, deep learning for breast cancer subtyping. Due to the paucity of publicly available data, we leverage on pan-cancer and non-cancer data to design semi-supervised settings. We make use of multi-omic data, including microRNA expressions and copy number alterations, and we provide an in-depth investigation of several supervised and semi-supervised architectures Obtained accuracy results show simpler models to perform at least as well as the deep semi-supervised approaches on our task over gene expression data. When multi-omic data types are combined together, performance of deep models shows little (if any) improvement in accuracy, indicating the need for further analysis on larger datasets of multi-omic data as and when they become available. From a biological perspective, our linear model mostly confirms known gene-subtype annotations. Conversely, deep approaches model non-linear relationships, which is reflected in a more varied and still unexplored set of representative omic features that may prove useful for breast cancer subtyping.
Deep learning
Genomics
Multi-omics
Semi supervised learning
Variational autoencoder
File in questo prodotto:
File Dimensione Formato  
09280414.pdf

Accesso riservato

: Publisher’s version
Dimensione 4.13 MB
Formato Adobe PDF
4.13 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11311/1159915
Citazioni
  • ???jsp.display-item.citation.pmc??? 3
  • Scopus 5
  • ???jsp.display-item.citation.isi??? 5
social impact