English  |  正體中文  |  简体中文  |  Items with full text/Total items : 43312/67235
Visitors : 2149132      Online Users : 4
RC Version 5.0 © Powered By DSPACE, MIT. Enhanced by NTU/NCHU Library IR team.

Please use this identifier to cite or link to this item: http://nchuir.lib.nchu.edu.tw/handle/309270000/153798

標題: 採用可靠與有效的方法輔助預測人類蛋白質亞細胞位置
REALoc: Reliable and effective methods to assist predicting human protein subcellular localization
作者: 孫翰豪
Sun, Han-Hao
Contributors: 朱彥煒
關鍵字: 人類蛋白質;亞細胞位置;機器學習
human protein;subcellular localization;singleplex;multiplex;machine learning
日期: 2013
Issue Date: 2013-11-19 12:02:32 (UTC+8)
Publisher: 基因體暨生物資訊學研究所
摘要: 蛋白質亞細胞位置一直是生物研究的重要一環,藥物開發與探討蛋白質功用都需要亞細胞位置資訊的輔助。我們發展出可以同時預測人類Singleplex和Multiplex兩種不同類型蛋白質的系統,REALoc,其具有兩層系統架構,整合了one-to-one 與many-to-many的不同機器學習方法,使用許多sequence based features和function based features,除了胺基酸組成、surface accessibility之外,還包含我們發展的weighted sign AAindex、sequence similarity profile及藉由regular-mRMR特徵選擇的Gene Ontology資訊。
REALoc用於預測六個亞細胞位置 (細胞膜、細胞質、內質網/高基氏體、粒線體、細胞核和細胞外),並且與4個相關預測網站進行比較,REALoc在訓練資料庫5倍交叉驗證得到75.34%的absolute true success rate,獨立測驗資料庫則為57.14%,高於其他預測系統10%以上。最後,我們分析Vote與GANN二種模型在單位置與多位置之預測效能,也測試protein-protein interaction與亞細胞位置的關係。
Protein subcellular localization is an important part of biological research; which could support drug development and explore the function of proteins. Many subcellular localization prediction tools has developed, most of them used the data of eukaryotes or prokaryotes for model training, however, the related predictors for human proteins are rare.
We established a system to predict subcellular localization of human proteins with Singleplex and Multiplex, called REALoc. It based on two layers architecture integrated with two different machine learning methods, one-to-one and many-to may. Besides, system included many sequence based features and function based features, such as amino acid composition, surface accessibility. In addition, we developed a series of computing features like weighted sign AAindex, sequence similarity profile and regular-mRMR feature selection for Gene Ontology. 5 folds Cross-validation was performed with iLoc-Hum on training dataset covers 6 location sites (Cell membrane, Cytoplasm, Endoplasmic reticulum/Golgi apparatus, Mitochondrion, Nucleus, secreted), overall absolute true success rate of REALoc is 75.34%, and on testing dataset is 57.14% which performances are about 10% higher than other four prediction systems. Finally, this study discussed the performance of the two decision mechanism of vote and GANN for predicting single location and multiple locations. Furthermore, the relationship between the protein-protein interaction and subcellular localization by using motifs was investigated.
Appears in Collections:[依資料類型分類] 碩博士論文

Files in This Item:

File SizeFormat






聯絡網站維護人員:wyhuang@nchu.edu.tw,04-22840290 # 412。

DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU/NCHU Library IR team Copyright ©   - Feedback