English  |  正體中文  |  简体中文  |  Items with full text/Total items : 43312/67235
Visitors : 2106436      Online Users : 5
RC Version 5.0 © Powered By DSPACE, MIT. Enhanced by NTU/NCHU Library IR team.

Please use this identifier to cite or link to this item: http://nchuir.lib.nchu.edu.tw/handle/309270000/154561

標題: 基於頻譜改造之中文語音合成
Mandarin Speech Synthesis based on Spectrum Reform
作者: 林尚毅
Lin, Shang-Yi
Contributors: 余明興
資訊網路多媒體研究所
關鍵字: 語音合成;頻譜調整;連音
speech synthesis;spectrum reform;liaison
日期: 2012
Issue Date: 2013-11-21 10:56:39 (UTC+8)
Publisher: 資訊網路多媒體研究所
摘要: 本論文主要在探討語音合成系統中的合成方法,我們將利用真人的聲音去進行韻律調整,期待能夠得到原音重現的合成音。而本論文的研究主軸在於韻律調整中的音高、音長、音量的調整與連音再造。
在音高、音長、音量調整中嘗試以經常被使用到的基週同步疊加法(Pitch Synchronous Overlap and Add, PSOLA)的架構為基礎,配合上離散傅立葉轉換(Discrete Fourier Transform, DFT)頻譜以及離散餘弦轉換(Discrete Cosine Transform, DCT)頻譜的改造,以期待能夠得到一個比基週同步疊加法更好聽的語音。在離散傅立葉轉換調整音高時,利用到弦波還原來改變弦波的頻率,跟一般的反傅立葉轉換(Inverse DFT)相比較不會失真。另外也利用語音合成較少用到的離散餘弦轉換,增加其頻譜解析度,能獲得更佳的聲音品質。
在連音再造部分,同樣使用了基週同步疊加法的架構,再配合上大家熟知的線性預估編碼(Linear Predictive Coding, LPC)來描述口腔模型,造出連音段的過渡頻譜,再利用頻譜合成出連音聲波。使得利用單音合成的語音合成器也能夠有真人發音時會產生的連音段。
最後利用音節(syllable)為單位的合成單元搭配上從真人的語句上截取下來的韻律參數,得到107句的合成句。利用這些合成句來進行兩種實驗,可辨度與自然度。可辨度是用召回率(recall)來計分,而自然度是用平均主觀分數(Mean Opinion Score, MOS)來評量。最後藉由這兩項指標來評斷這些方法的效能。
This thesis is to investigate the synthesis methods in a speech synthesis system. We adjust the prosody by using the sound of a real person to get a sound which is very similar to the original. The spindle of this thesis is prosody adjustment of pitch, duration, and volume, and liaison reproduction.
In the adjustment of pitch, duration and volume, we reform the spectrum of discrete Fourier transform (DFT) and the spectrum of discrete cosine transform (DCT) based on the structure of pitch synchronous overlap and add (PSOLA) try to get better sound than that by using PSOLA. When using the discrete Fourier transform to adjust pitches, we use wave reconstruction to change the frequency of the wave and get less distortion than Inverse DFT. We also use the discrete cosine transform to increase the spectrum resolution to get better quality of sound.
In liaison reproduction, we also take the structure of PSOLA as a basis, and use the well-known Linear Predictive Coding (LPC) to create a liaison transition spectrum. Then use spectrum to reconstruct liaison wave, make the sounds generated by syllable synthesis have liaison segment like human speech.
In the end, we take the prosody information captured from human speech and the syllable synthesis unit to generate 107 sentences. Then we use them to judge two factors, intelligibility and comprehension. Intelligibility is judged by recall score, and comprehension is judged by Mean Opinion Score (MOS). We use them to assess the effect of those methods mentioned above and make a conclusion.
Appears in Collections:[依資料類型分類] 碩博士論文

Files in This Item:

File Description SizeFormat
nchu-101-7099083007-1.pdf7012Kb314View/Open
index.html0KbHTML141View/Open


 


學術資源

著作權聲明

本網站為收錄中興大學學術著作及學術產出,已積極向著作權人取得全文授權,並盡力防止侵害著作權人之權益。如仍發現本網站之數位內容有侵害著作權人權益情事者,請權利人通知本網站維護人員,將盡速為您處理。

本網站之數位內容為國立中興大學所收錄之機構典藏,無償提供學術研究與公眾教育等公益性使用。

聯絡網站維護人員:wyhuang@nchu.edu.tw,04-22840290 # 412。

DSpace Software Copyright © 2002-2004  MIT &  Hewlett-Packard  /   Enhanced by   NTU/NCHU Library IR team Copyright ©   - Feedback