以深層卷積神經網路對中文語調進行分類

Shih-Che Chen; 陳釋澈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70814

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星(Jyh-Shing Jang)
dc.contributor.author	Shih-Che Chen	en
dc.contributor.author	陳釋澈	zh_TW
dc.date.accessioned	2021-06-17T04:39:30Z	-
dc.date.available	2021-08-09
dc.date.copyright	2018-08-09
dc.date.issued	2018
dc.date.submitted	2018-08-07
dc.identifier.citation	[1] Y. Chao, “A system of tone letters,” Le maitre phonetique, vol. 45, 1980. [2] XU Li, ZHANG Wenle, ZHOU Ning, LEE Chaoyang, LI Yongxin, CHEN Xiuwu, ZHAO Xiaoyan, “Mandarin Chinese Tone Recognition with an Artificial Neural Network,” Journal of Otology, Volume 1, Issue 1, Pages 30-34, June 2006. [3] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, no. 4, pp. 541–551, Dec. 1989. [4] Charles Chen, Razvan C. Bunescu, Li Xu, Chang Liu, “Tone Classification in Mandarin Chinese Using Convolutional Neural Networks,” Interspeech, 2016. [5] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, 'Gradient-based learning applied to document recognition', Proc. IEEE, vol. 86, no. 11, pp. 2278-2324, Nov. 1998. [6] Alex Krizhevsky , Ilya Sutskever , Geoffrey E. Hinton, ' ImageNet classification with deep convolutional neural networks ', Proceedings of the 25th International Conference on Neural Information Processing Systems, p.1097-1105, December 03-06, 2012, Lake Tahoe, Nevada [7] Matthew D Zeiler, Rob Fergus, ' Visualizing and understanding convolutional networks '. CoRR, abs/1311.2901, 2013. Published in Proc. ECCV, 2014. [8] K. Simonyan, A. Zisserman, ' Very deep convolutional networks for large-scale image recognition “, CoRR, abs/1409.1556, 2014 [9] Szegedy, Christian, Liu, Wei, Jia, Yangqing, Sermanet, Pierre, Reed, Scott, Anguelov, Dragomir, Erhan, Dumitru, Vanhoucke, Vincent, and Rabinovich, Andrew, ' Going deeper with convolutions ', CoRR, abs/1409.4842, 2014. [10] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun, ' Deep residual learning for image recognition ', arXiv preprint arXiv:1512.03385, 2015.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/70814	-
dc.description.abstract	在華語系統中，聲調扮演十分重要之角色，同樣的一個音節，只要聲調的不同，即會產生完全不同的意義。母語是否為中文，常常可藉由講出來字詞之聲調辨認。為此，本論文提出一個對語音聲調進行分類的方法：先將聲音訊號轉為頻譜，將頻譜視為圖片，輸入至現有之影像識別卷積神經網路架構中，訓練出聲調分類模型，比較現成之影像辨識模型對處理聲調分類的效果如何。最後以此建立出不需對音訊進行過多處理步驟，即可達到一定程度之聲調分類架構。此聲調分類架構可套用至華語教學系統之中，為語言教學之方式提供新的選擇。	zh_TW
dc.description.abstract	In Mandarin Chinese system, the tone plays an important role. Different tone patterns of the same syllable may result in different meanings. People whose native language aren’t Mandarin can be distinguished by their tone patterns. Therefore, we propose a method for tone classification. First, we convert the audio signal into the spectrogram. We treat the spectrogram as images, apply them as the image inputs for image recognition convolutional neural networks, and create tone classification models. We compare different image recognition models for tone classification. This approach can achieve good accuracy without too many processes on the audio signal. The tone classification architecture can be applied to Chinese teaching methods which will lead to educational success.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T04:39:30Z (GMT). No. of bitstreams: 1 ntu-107-R03922016-1.pdf: 7043019 bytes, checksum: 39a98aa8092804ccae2a843505a70ff6 (MD5) Previous issue date: 2018	en
dc.description.tableofcontents	口試委員審定書 i 誌謝 ii 摘要 iii Abstract iv 圖目錄 v 表目錄 xi 第１章緒論 1 研究動機 1 華語聲調簡介與分類 2 研究方向與方法簡介 4 章節概要 5 第２章文獻探討 6 第３章系統架構 9 第１節特徵抽取（Feature Extraction） 10 第１項頻譜（Spectrogram） 10 第２項短時距傅立葉變換（Short-Time Fourier Transform） 10 第３項補零（Zero Padding） 11 第４項內插（Interpolation） 12 第２節卷積神經網路（Convolutional Neural Network） 12 第１項卷積神經網路基本架構 13 卷積層（Convolution Layer） 13 池化層（Pooling Layer） 14 全連接層（Fully Connected Layer） 15 激活函數（Activation Function） 15 第２項 LeNet-5 17 第３項 AlexNet 18 線性整流函數（ReLU） 19 Dropout 19 重疊池化（Overlapping Pooling） 20 多圖形處理器（GPU）訓練 20 第４項 ZFNet 20 第５項 VGG 21 第６項 GoogLeNet 22 Inception結構 24 輔助分類器（Auxiliary classifier） 24 第７項 ResNet 25 第４章資料集與前處理 27 第１節實驗資料 27 第１項單音節語料 27 第２項唐詩詩句語料 28 第３項龎帝語言教材 29 第２節資料前處理 29 第１項貼標籤（Labeling） 29 單聲調（Uni-tone） 29 雙聲調（Bi-tone） 29 第２項切割（Segmentation） 30 人工切割 30 程式自動切割 30 第３項分區（Partitioning） 30 第３節處理後資料統計 31 第１項單音節語料 31 第２項唐詩詩句語料 32 第３項龎帝語言教材 34 第４節遷移式學習（Transfer Learning）資料整理 35 第１項相同通道數值 35 第２項通道數值補零 36 第３項低、高通濾波器 36 第５章實驗參數與實驗結果 38 第１節實驗參數 38 第１項頻譜參數 38 第２項神經網路參數 38 LeNet-5 38 AlexNet 38 ZFNet 39 VGG-16 39 VGG-19 39 GoogLeNet 40 ResNet-34 40 ResNet-50 40 第３項訓練環境 41 電腦1 41 電腦2 41 第２節實驗結果 41 第１項單音節語料單聲調標籤模型 41 訓練記錄 41 模型比較 45 性別比較 46 音量及時長 46 混淆矩陣（Confusion Matrix） 49 錯誤分析 51 第２項唐詩詩句語料單聲調標籤模型 55 訓練記錄 55 模型比較 59 性別比較 60 音量及時長 60 混淆矩陣 61 訓練語料結果比較 62 第３項單音節語料單聲調標籤頻譜內插模型 63 模型比較 63 頻譜處理比較 64 音量及時長 65 第４項單音節語料單聲調標籤遷移式學習 65 訓練記錄 65 結果比較 68 第５項前人做法比較 69 第６項雙聲調標籤模型 70 訓練記錄 70 混淆矩陣 71 聲調標籤變換結果 73 第７項單音節語料單聲調標籤階層式模型 74 準確率比較 74 混淆矩陣 75 各聲調準確率 75 第６章結論與未來展望 77 第１節結論 77 第２節未來展望 78 參考文獻 79
dc.language.iso	zh-TW
dc.subject	聲調分類	zh_TW
dc.subject	華語	zh_TW
dc.subject	卷積神經網路	zh_TW
dc.subject	影像識別	zh_TW
dc.subject	頻譜	zh_TW
dc.subject	spectrogram	en
dc.subject	tone classification	en
dc.subject	Mandarin Chinese	en
dc.subject	image recognition	en
dc.subject	convolutional neural network	en
dc.title	以深層卷積神經網路對中文語調進行分類	zh_TW
dc.title	Mandarin Tone Classification Using CNN/DNN	en
dc.type	Thesis
dc.date.schoolyear	106-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	王新民(Hsin-Min Wang),楊奕軒(Yi-Hsuan Yang)
dc.subject.keyword	聲調分類,頻譜,影像識別,卷積神經網路,華語,	zh_TW
dc.subject.keyword	tone classification,spectrogram,image recognition,convolutional neural network,Mandarin Chinese,	en
dc.relation.page	80
dc.identifier.doi	10.6342/NTU201802657
dc.rights.note	有償授權
dc.date.accepted	2018-08-07
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-107-1.pdf 未授權公開取用	6.88 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。