Software
Most of the software and packages in developing phase are hosted on
https://github.com/PengyiYang/
R packages
- AdaSampling (https://CRAN.R-project.org/package=AdaSampling)
Implements the adaptive sampling procedure, a framework for both positive unlabeled learning and learning with class label noise.
Reference:
Yang, P.†, Ormerod, J., Liu, W., Ma, C., Zomaya, A., Yang, J. (2018)
AdaSampling for positive-unlabeled and label noise learning with bioinformatics applications.
IEEE Transactions on Cybernetics, doi:10.1109/TCYB.2018.2816984
[PDF],
[Repo]
- ClueR (https://CRAN.R-project.org/package=ClueR)
CLUster Evaluation (CLUE) is a computational method for identifying optimal number of clusters in a given
time-course dataset clustered by cmeans or kmeans algorithms and subsequently identify key kinases or
pathways from each cluster. Its implementation in R is called ClueR.
Reference:
Yang, P.†, Zheng, X., Jayaswal, V., Hu, G., Yang, J. & Jothi, R. (2015).
Knowledge-based analysis for detecting key signaling events from time-series phosphoproteomics data.
PLoS Computational Biology, 11(8), e1004403.
[Pubmed],
[PDF]
- directPA (https://cran.r-project.org/package=directPA)
Direction analysis is a set of tools designed to identify combinatorial effects of multiple treatments
and/or perturbations on pathways and kinases profiled by microarray, RNA-seq, proteomics, or phosphoproteomics.
Reference:
Yang, P.✢, Patrick, E.✢, Tan, S., Fazakerley, D., Burchfield, J., Gribben, C., Prior, M., James, D. & Yang, J. (2014).
Direction pathway analysis of large-scale proteomics data reveals novel features of the insulin action pathway.
Bioinformatics, 30(6), 808-814.
[Pubmed],
[PDF]
Shiny apps
- KinasePA (http://shiny.maths.usyd.edu.au/KinasePA)
Kinase perturbation analysis (KinasePA) is a web tool that allows you to identify key kinases that are
perturbed in two treatments compared to control conditions (such as basal or unstimulated conditions).
Description:
The input data should be a csv file separated by comma. The rows of the data file are phosphorylation
sites and the columns are treatment1 vs control and treatment2 vs control. The values of the data file
should be log2 fold changes. Here is an example File
KinasePA has also been incorporated into "directPA" R package. Install the package and find out more:
install.packages("directPA")
Reference:
Yang, P., Patrick, E., Humphrey, S., Ghazanfar, S., James, D., Jothi, R. & Yang, J. (2016).
KinasePA: Phosphoproteomics data annotation using hypothesis driven kinase perturbation analysis.
Proteomics, 16(13), 1868-1871
Standalone
- PUEL (https://github.com/PengyiYang/KSP-PUEL)
PUEL is an implementation of positive-unlabeled ensemble learning model for kinase-substrate prediction using
kinase recognition motifs and dynamic phosphoproteomics data.
Prediction results for Akt, mTOR, AMPK, and ERK from different organisms using large-scale phosphoproteomics data
are available from here.
Reference:
Yang, P., Humphrey, S., James, D., Yang, J. & Jothi, R. (2016).
Positive-unlabeled ensemble learning for kinase substrate prediction from dynamic phosphoproteomics data.
Bioinformatics, 32(2), 252-259.
- SSO (http://www.maths.usyd.edu.au/u/pengyi/software/Sampling.html)
Sample subset optimization (SSO) is a sampling technique that utilize an evolutionary algorithm to optimize sample subsets for learning from
imbalanced dataset. Please see more details in the reference below.
Reference:
Yang, P.†, Yoo, P., Fernando, J., Zhou, B., Zhang, Z. & Zomaya, A. (2014).
Sample subset optimization techniques for imbalanced and ensemble learning problems in bioinformatics applications.
IEEE Transactions on Cybernetics, 44(3), 445-455.
[IEEE Xplore]
[PDF]
Projects hosted on Google Code
Other resources
- Prediction results for Akt, mTOR, AMPK, and ERK from different organisms using large-scale phosphoproteomics data
are available from here.