Ongoing projects

Project 1 illustration

Mapping and modelling trans-omic networks by trans-omic data integration

This is a major initiative in our group to integrate trans-omics datasets generated from mouse embryonic stem cells (mESCs) differentiation process. The goal is to investigate the cross-talk of signalling cascades, epigenomic, transcriptomic and proteomic regulations, and there feedback regulations. The high-throughput data that we are seeking to integrate include time-course mass spectrometry-based proteomics and phosphoproteomics, and next-generation sequencing-based RNA-seq and ChIP-seq data.
People: Pengyi Yang, Sean Humphrey, Dinuka Perera
Resource: The Stem Cell Atlas

Project 2 illustration

Mixture modelling in single-cell RNA-seq data

We are working closely with Prof. Jean Yang's group on mixture modelling from single-cell RNA-seq (scRNA-seq) data. This include developing novel statistical models to capture various unique aspects in scRNA-seq data. We are applying our model to understand cell differentiation and tissue development processes in human and mouse.
People: Taiyun Kim, Shila Ghazanfar, Yingxin Lin, Jean Yang, Pengyi Yang
Relevant publications: here

Project 4 illustration

Identification of transcription factor target genes using machine learning

We have recently developed an adaptive sampling approach for learning from positive-unlabeled dataset. We are extending this semi-supervised learning approach for the identification of transcription factor target genes in differentiating mouse embryonic stem cells (mESCs) by integrating transcriptomics, proteomics, and epigenomics data.
People: Dinuka Perera, Chendong Ma, Pengyi Yang
Relevant publications: here

Potential new projects

Machine learning application in trans-omics

Build on top of mapping and modelling trans-omic networks, A PhD position is available in developing and applying machine learning models to multi-layered trans‐omics datasets generated by state‐of‐the‐art mass spectrometer (MS) and next generation sequencer (NGS). Our research project, funded by the Australian Research Council (ARC), aims to develop novel machine learning algorithms to analyse and integrate large‐scale MS‐based omics data with ultra‐fast NGS‐based omics data generated from complex biological systems. Characterising the signaling cascades, transcriptional networks, and translational protein networks and their cross‐talks are critical for comprehensive understanding of complex biological systems. Our large‐scale multi-layered trans‐omics data generated from state‐of‐the‐art technological platforms provide a unique opportunity to uncover novel biology and molecular mechanisms that are critical for treating complex diseases and personalised medicine.
Contact Dr. Pengyi Yang to discuss specific aspects of this project.

Understand the formation of fat tissue

The understanding of fat tissue formation is critical for treating various metabolic and obesity related diseases. In collaboration with Prof. David James lab, we have developed several machine learning approaches to elucidate the signaling cascades in fat cells (Humphrey et al. 2013 Cell Metab.). We are planning for global profiling of the fat cell (3T3-L1) differentiation process to understand the trans-omic networks underlying this process using various statistical modelling approaches. You will get involved in developing key statistical models to integrate trans-omics data and infer their feedback regulation during the maturation process of fat cells (i.e. adipogenesis).
Contact Dr. Pengyi Yang, Dr. Sean Humphrey, or Prof. David James for more details.

Application of deep learning in bioinformatics

Deep learning is the latest development in machine learning that has been successfully utilised to address many bioinformatics applications. We are interested in its application such as in transcription factor (TF) binding site identification and target gene prediction. The advance in chromatin immunoprecipitation followed by ultrafast sequencing (ChIP-seq) allows the profiling of TF binding sites genome-wide in a population of cells. The massive amount of sequencing data generated from these genome-wide profiling of TF requires sophisticated computational algorithms to be developed for accurately identifying TF binding sites. In this project, we aim to develop and apply deep learning models for predicting TF binding sites by integrating ChIP-seq data with other biological knowledge. This project aims to develop and apply cutting-edge deep learning algorithms for solving a key biological problem. You will get involved in all aspects of the development including algorithm design, implementation and testing.
Contact Dr. Pengyi Yang or Dr. Ashnil Kumar to discuss specific aspects of this project.

Pathway reconstruction and annotation using statistical models and trans-omics data sets

While all cells from a given organism have the same DNA sequence that codes for the same genes, different cell types of that organism only have a subset of genes “turned on”. Genes are commonly annotated into pathways for summarising their collective effect in the biological systems. One of the main drawbacks in current pathway annotation is that they are NOT cell type-specific. We propose to identify and curate cell type-specific pathways, for example, in embryonic stem cells (ESCs) using the trans-omics datasets. The key assumption is that genes within a pathway should have correlated expression profile changes when perturbed. We have collected ESC differentiation data profiled in a time-course on both proteome and transcriptome levels. Following the above assumption, we aim to (1) identify pathways that are regulated specifically in ESC differentiation; and (2) curate ESC-specific pathways using statistical learning. This project will expose you to the development and application of cutting-edge statistical learning methods to the state-of-the-art bio-molecular applications. It sits at the heart of the interdisciplinary research conducted in our group.
Contact Dr. Pengyi Yang or Prof. Jean Yang to discuss specific aspects of this project.

Openings

PhD scholarships ($39,100 tuition fee plus $25,861 stipend contribution for the duration of PhD) and honours projects are immediately available. Successful candidates will undertake exciting new projects in the general areas of Systems and Computational Biology. All projects require programming skill in at least one programming/scripting language (R, Perl, Python, Java, C++, C and Matlab). There are ample opportunity to work closely with biologists, statisticians, bioinformaticians, and computer scientists.

To Apply: Please send CV via email to Dr. Pengyi Yang

More information about research and student opportunity at the University of Sydney can be found here.