Exposing and characterizing subpopulations of distinctly regulated genes by k-plane regression

Frasca, F.; Matteucci, M; Morelli, Mj; Masseroli, M

doi:10.1007/978-3-030-34585-3_20

Understanding the roles and interplays of histone marks and transcription factors in the regulation of gene expression is of great interest in the development of non-invasive and personalized therapies. Computational studies at genome-wide scale represent a powerful explorative framework, allowing to draw general conclusions. However, a genome-wide approach only identifies generic regulative motifs, and possible multi-functional or co-regulative interactions may remain concealed. In this work, we hypothesize the presence of a number of distinct subpopulations of transcriptional regulative patterns within the set of protein coding genes that explain the statistical redundancy observed at a genome-wide level. We propose the application of a K-Plane Regression algorithm to partition the set of protein coding genes into clusters with specific shared regulative mechanisms. Our approach is completely data-driven and computes clusters of genes significantly better fitted by specific linear models, in contrast to single regressions. These clusters are characterized by distinct and sharper histonic input patterns, and different mean expression values.