Real latent common and specific patterns from genome-wide omics data across diverse interrelated biological conditions
2019-06-10 | | 【Print】

With the rapid development of high-throughput biotechnology, huge number of genome-wide omics data in diverse interrelated biological scenarios are generating. How to simultaneously extract the shared and individual data-specific patterns from pairwise or multiple datasets is becoming a challenging and essential issue. To this end, Shihua Zhang, a professor of Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China proposed a flexible matrix factorization framework to simultaneously reveal such common and specific patterns from data generated under interrelated biological scenarios.

 

CSMF combines integration and comparative analysis into one paradigm

“High-throughput biological technologies such as RNA-seq, ChIP-seq and current popular single cell RNA-seq rapidly accelerate the accumulation of genome-wide omics data in diverse interrelated biological scenarios such as cells, tissues and time series,” says Dr. Zhang, “Integration and differential analysis are two common paradigms for exploring and analyzing such data. However, current integrative methods usually ignore the differential part, and typical differential analysis methods either fail to identify combinatorial patters of difference or require matched dimensions of the data. What makes CSMF special compared to these approaches is that CSMF combines integration and comparative analysis into one paradigm to reveal interpretable biological patterns, which only requires one matched dimension and is suitable for analyzing data generated by different techniques from mathematical view.”

 

Understanding complex biological systems by learning latent common and specific patterns

“Extensive analysis yields novel insights into hidden combinatorial patterns embedded in four interrelated multi-modal datasets, which includes ChIP-seq data from different cell lines, RNA-seq data from different cancers, RNA-seq data from different cancer subtypes and single cell RNA-seq data from embryonic stem cell differentiation at different time points” says Dr Zhang, “For example, by comparing transcriptional profiles of two types of cancers or various subtypes of breast cancer, CSMF identified cancer hallmark enriched common modules and cancer-specific or subtype-specific biological modules, which enriched with specific biological pathways and biological function networks.”

 

Finally, Dr Zhang says “We believe that CSMF will become a powerful tool to analyze multiple big biological data in the near future, which will carry forward understanding upon complex biological systems.”

 

 

 

 

+86 871 65199125cceaeg@mail.kiz.ac.cn
Chinese Academy of Sciences(CAS) Kunming Institute of Zoology, CAS Institute of Zoology (IOZ), CAS Shanghai Institute for Biological Sciences, CAS Academy of Mathematics and Systems Science, CAS
Institute of Genetics And Developmental Biology,CAS Institute of Hydrobiology,CAS Beijing Institute of Genomics, CAS Beijing Institute of Life Sciences,CAS Insititue of Vetebrate Plaeontology and Paleanthopolgy,CAS
Chengdu Institute of Biology, CAS Xi'an Branch, CAS University of Science and Technology of China