威尼斯wnsr666学术报告
题目: Machine Learning in Protein Structure Prediction
演讲人:Sheng Wang, Ph.D.
Research Scientist,
King Abdullah University of Science and Technology,
Thuwal, Saudi Arabia
时间:2017年11月14日(星期二)13:30-14:30
地点:金光生命科学大楼411
Abstract:
Proteins, as a linear chain of amino acids which are translated from the genes, are the main working molecules in the cell. The sequence of amino acids determines each protein’s unique 3-dimensional (3D) structure and its specific function. The prediction the 3D structure of a given protein sequence becomes one of the most challenging problems in computational biology in the past 50 years, especially without using any template information, i.e., an initio folding.
Recently, ab initio protein folding using 2D predicted contacts and 1D predicted secondary structure as restraints has made some progress, but it requires accurate contact and secondary structure prediction, which by existing methods can only be achieved on some large-sized protein families with thousands of sequence homologs. To improve the prediction for small-sized protein families, we employ the emerging Deep Learning technique from Computer Science, a powerful technique that can learn complex patterns from large datasets and has revolutionized object and speech recognition, machine translation and the GO game.
Our approach for contact prediction differs from existing methods mainly in (1) formulating contact prediction as a pixel-level image labeling problem instead of an image-level classification problem; (2) simultaneously predicting all contacts of an individual protein to make effective use of contact occurrence patterns; and (3) integrating both one-dimensional and two-dimensional deep convolutional neural networks to effectively learn complex sequence-structure relationship including high-order residue correlation. The 1D deep network could be directly applied to predict secondary structure.
The result of our one-dimensional deep convolutional neural networks achieved state-of-the-art accuracy of ~84% at protein secondary structure prediction that breaking the long-lasting ~80% accuracy for decades of years. Our contact prediction method performed the best in CASP12 in terms of the F1 score of 38 free-modeling targets. After CASP12, we have been testing our method in a fully automated and online blind test CAMEO, in which we successfully ab inito predicted 10 proteins with novel fold. Finally, we demonstrated that a deep transfer learning method could be easily applied to predict membrane protein structures.
欢迎各位老师同学积极参加!