|摘要: ||蛋白質亞細胞位置一直是生物研究的重要一環，藥物開發與探討蛋白質功用都需要亞細胞位置資訊的輔助。我們發展出可以同時預測人類Singleplex和Multiplex兩種不同類型蛋白質的系統，REALoc，其具有兩層系統架構，整合了one-to-one 與many-to-many的不同機器學習方法，使用許多sequence based features和function based features，除了胺基酸組成、surface accessibility之外，還包含我們發展的weighted sign AAindex、sequence similarity profile及藉由regular-mRMR特徵選擇的Gene Ontology資訊。|
REALoc用於預測六個亞細胞位置 (細胞膜、細胞質、內質網/高基氏體、粒線體、細胞核和細胞外)，並且與4個相關預測網站進行比較，REALoc在訓練資料庫5倍交叉驗證得到75.34%的absolute true success rate，獨立測驗資料庫則為57.14%，高於其他預測系統10%以上。最後，我們分析Vote與GANN二種模型在單位置與多位置之預測效能，也測試protein-protein interaction與亞細胞位置的關係。
Protein subcellular localization is an important part of biological research; which could support drug development and explore the function of proteins. Many subcellular localization prediction tools has developed, most of them used the data of eukaryotes or prokaryotes for model training, however, the related predictors for human proteins are rare.
We established a system to predict subcellular localization of human proteins with Singleplex and Multiplex, called REALoc. It based on two layers architecture integrated with two different machine learning methods, one-to-one and many-to may. Besides, system included many sequence based features and function based features, such as amino acid composition, surface accessibility. In addition, we developed a series of computing features like weighted sign AAindex, sequence similarity profile and regular-mRMR feature selection for Gene Ontology. 5 folds Cross-validation was performed with iLoc-Hum on training dataset covers 6 location sites (Cell membrane, Cytoplasm, Endoplasmic reticulum/Golgi apparatus, Mitochondrion, Nucleus, secreted), overall absolute true success rate of REALoc is 75.34%, and on testing dataset is 57.14% which performances are about 10% higher than other four prediction systems. Finally, this study discussed the performance of the two decision mechanism of vote and GANN for predicting single location and multiple locations. Furthermore, the relationship between the protein-protein interaction and subcellular localization by using motifs was investigated.