|摘要: ||蛋白質四級結構複合物在細胞中多聚體結構各自扮演不同重要角色，像是具有二聚體結構的轉錄因子參與著基因調控，而三聚體結構的病毒感染相關醣蛋白則與人類免疫系統缺陷病毒相關，因此若能分類蛋白質四級結構複合物，對於後基因體時代的蛋白質體學研究是有相當大的幫助。現今針對研究單體與多聚體序列的預測系統並不普遍。因此，本研究設計兩層機器學習的架構，發展蛋白質四級結構複合物分類預測系統PClass。將蛋白質四級結構複合物分為五類包括單體、二聚體、三聚體、四聚體及其他亞基類，第一層在拔靴法架構下配合support vector machine提出新的模型選擇方法，每類複合物以序列組成、entropy及accessible surface area 之特徵編碼，產生多個特徵模組透過評估的方式挑選效能最佳模型作為每類複合物的特徵模組，準確度可以達到馬修斯相關係數70%以上。接著第二層的建構結合了第一層特徵模組進行整合機制並利用六種機器學習方法改善預測效能，使得每類複合物預測準確度皆能提升10%以上。最後，以二聚體結構的轉錄因子與三聚體結構的病毒感染相關醣蛋白實際驗證預測系統。|
Protein quaternary structure complex is also known multimer, which plays an important role in the cell. Such as dimer structure of the transcription factor involved in gene regulation, but trimer structure of the virus infection associated glycoprotein is related to the system with the human immunodeficiency virus. Therefore, if we can classification the protein quaternary structure complex for post genome era of proteomics research is of great help. Nowadays, the classification systems among protein quaternary structures have not been widely developed yet, therefore, in this study, we designed the architecture of the two layer machine learning and developed the classification system, PClass. Protein quaternary structure of the complex is divided into five categories, including monomer, dimer, trimer, tetramer and other subunits class. The first layer in the framework of the bootstrap method with support vector machine to propose a new model selection method, each type of complex according to sequences, entropy and accessible surface area as the feature encoding, generating a plurality of feature models and through the evaluation way to select the optimal model of effectiveness as each kind of complex feature model. In this stage, the best performance can reach as high as 70% of MCC. Then the second layer construction combines the first layer model to integrate mechanisms and use of six machine learning methods to improve the prediction performance, this system can be improved over 10% in MCC. Finally, we analyzed the performance of our classification system by transcription factor in dimer structure and virus infection associated glycoprotein in trimer structure.