通過半監督學習改進端到端台語至中文語音翻譯

林育駿; Yu-Chun Lin

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89881

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張智星	zh_TW
dc.contributor.advisor	Jyh-Shing Roger Jang	en
dc.contributor.author	林育駿	zh_TW
dc.contributor.author	Yu-Chun Lin	en
dc.date.accessioned	2023-09-22T16:31:27Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-09-22	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-12	-
dc.identifier.citation	R. Ardila, M. Branson, K. Davis, M. Kohler, J. Meyer, M. Henretty, R. Morais, L. Saunders, F. Tyers, and G. Weber. Common voice: A massively-multilingual speech corpus. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4218–4222, 2020. A. Baevski, Y. Zhou, A. Mohamed, and M. Auli. wav2vec 2.0: A framework for self-supervised learning of speech representations. Advances in neural information processing systems, 33:12449–12460, 2020. D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. L. Bentivogli, M. Cettolo, M. Gaido, A. Karakanta, A. Martinelli, M. Negri, and M. Turchi. Cascade versus direct speech translation: Do the differences still make a difference? arXiv preprint arXiv:2106.01045, 2021. R. N. Bracewell and R. N. Bracewell. The Fourier transform and its applications, volume 31999. McGraw-Hill New York, 1986. E. Casanova, C. Shulby, A. Korolev, A. C. Junior, A. d. S. Soares, S. Aluísio, and M. A. Ponti. A single speaker is almost all you need for automatic speech recognition. arXiv preprint arXiv:2204.00618, 2022. W. Chan, N. Jaitly, Q. Le, and O. Vinyals. Listen, attend and spell: A neural network for large vocabulary conversational speech recognition. In 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 4960–4964. IEEE, 2016. P.-Y. Chen, C.-H. Wu, H.-S. Lee, S.-K. Tsao, M.-T. Ko, and H.-M. Wang. Using taigi dramas with mandarin chinese subtitles to improve taigi speech recognition. In 2020 23rd Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pages 71–76. IEEE, 2020. Y. Chen, W. Ding, and J. Lai. Improving noisy student training on non-target domain data for automatic speech recognition. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023. Y.-F. Cheng, H.-S. Lee, and H.-M. Wang. Allost: low-resource speech translation without source transcription. arXiv preprint arXiv:2105.00171, 2021. K. Cho, B. Van Merriënboer, D. Bahdanau, and Y. Bengio. On the properties of neural machine translation: Encoder-decoder approaches. arXiv preprint arXiv:1409.1259, 2014. K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078, 2014. A. Conneau, A. Baevski, R. Collobert, A. Mohamed, and M. Auli. Unsupervised cross-lingual representation learning for speech recognition. arXiv preprint arXiv:2006.13979, 2020. P. Dighe, A. Asaei, and H. Bourlard. On quantifying the quality of acoustic models in hybrid DNN-HMM ASR. Speech Communication, 119:24–35, 2020. R. Errattahi, A. El Hannani, and H. Ouahmane. Automatic speech recognition errors detection and correction: A review. Procedia Computer Science, 128:32–37, 2018. M. L. Forcada, M. Ginestí-Rosell, J. Nordfalk, J. O＇Regan, S. Ortiz-Rojas, J. A. Pérez-Ortiz, F. Sánchez-Martínez, G. Ramírez-Sánchez, and F. M. Tyers. Apertium: a free/open-source platform for rule-based machine translation. Machine translation, 25:127–144, 2011. A. Graves. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711, 2012. A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In 2006 23rd international conference on Machine learning, pages 369–376, 2006. A. Graves, N. Jaitly, and A.-r. Mohamed. Hybrid speech recognition with deep bidirectional LSTM. In 2013 IEEE workshop on automatic speech recognition and understanding, pages 273–278. IEEE, 2013. J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai, et al. Recent advances in convolutional neural networks. Pattern recognition, 77:354–377, 2018. A. Gulati, J. Qin, C.-C. Chiu, N. Parmar, Y. Zhang, J. Yu, W. Han, S. Wang, Z. Zhang, Y. Wu, et al. Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100, 2020. W.-N. Hsu, B. Bolte, Y.-H. H. Tsai, K. Lakhotia, R. Salakhutdinov, and A. Mohamed. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:3451–3460, 2021. W.-N. Hsu, A. Sriram, A. Baevski, T. Likhomanenko, Q. Xu, V. Pratap, J. Kahn, A. Lee, R. Collobert, G. Synnaeve, et al. Robust wav2vec 2.0: Analyzing domain shift in self-supervised pre-training. arXiv preprint arXiv:2104.01027, 2021. D. Ivanko, D. Ryumin, A. Kashevnik, A. Axyonov, and A. Karnov. Visual speech recognition in a driver assistance system. In 2022 30th European Signal Processing Conference (EUSIPCO), pages 1131–1135. IEEE, 2022. D. P. Kingma and J. Ba. Adam: A method for stochastic optimization, 2017. Y.-F. Liao, J. S. Tsay, P. Kang, H.-L. Khoo, L.-K. Tan, L.-C. Chang, U.-G. Iunn, H.-L. Su, T.-G. Thiann, H.-K. Tiun, et al. Taiwanese Across Taiwan corpus and its applications. In 2022 25th Conference of the Oriental COCOSDA International Committee for the Co-ordination and Standardisation of Speech Databases and Assessment Techniques (O-COCOSDA), pages 1–5. IEEE, 2022. Y.-B. Lin, Y.-F. Liao, S.-H. Chen, S.-H. Hwang, and Y.-R. Wang. Voicetalk: Multimedia-iot applications for mixing mandarin, taiwanese, and english. ACM Transactions on Internet Technology, 23(2):1–30, 2023. B. Logan et al. Mel frequency cepstral coefficients for music modeling. In Ismir, volume 270, page 11. Plymouth, MA, 2000. A. Lopez. Statistical machine translation. ACM Computing Surveys (CSUR), 40(3):1–49, 2008. W.-Y. Ma and K.-J. Chen. Design of ckip chinese word segmentation system. Chinese and Oriental Languages Information Processing Society, 14(3):235–249, 2005. A. Mathur, T. Saxena, and R. Krishnamurthi. Generating subtitles automatically using audio extraction and speech recognition. In 2015 IEEE International Conference on Computational Intelligence & Communication Technology, pages 621–626. IEEE, 2015. M. Mehrabani, S. Bangalore, and B. Stern. Personalized speech recognition for internet of things. In 2015 IEEE 2nd World Forum on Internet of Things (WF-IoT), pages 369–374. IEEE, 2015. Z. Min, Q. Ge, Z. Li, et al. PAMP: A unified framework boosting low resource automatic speech recognition. arXiv preprint arXiv:2302.03498, 2023. A. Mohamed, H.-y. Lee, L. Borgholt, J. D. Havtorn, J. Edin, C. Igel, K. Kirchhoff, S.-W. Li, K. Livescu, L. Maaløe, et al. Self-supervised speech representation learning: A review. IEEE Journal of Selected Topics in Signal Processing, 2022. A. v. d. Oord, Y. Li, and O. Vinyals. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748, 2018. M. Ott, S. Edunov, A. Baevski, A. Fan, S. Gross, N. Ng, D. Grangier, and M. Auli. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038, 2019. V. Panayotov, G. Chen, D. Povey, and S. Khudanpur. Librispeech: an asr corpus based on public domain audio books. In 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), pages 5206–5210. IEEE, 2015. J. Pang. Spectrum energy based voice activity detection. In 2017 IEEE 7th Annual Computing and Communication Workshop and Conference (CCWC), pages 1–5. IEEE, 2017. K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu. Bleu: a method for automatic evaluation of machine translation. In 2022 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002. D. S. Park, W. Chan, Y. Zhang, C.-C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le. Specaugment: A simple data augmentation method for automatic speech recognition. arXiv preprint arXiv:1904.08779, 2019. D. S. Park, Y. Zhang, Y. Jia, W. Han, C.-C. Chiu, B. Li, Y. Wu, and Q. V. Le. Improved noisy student training for automatic speech recognition. arXiv preprint arXiv:2005.09629, 2020. R. Prabhavalkar, K. Rao, T. N. Sainath, B. Li, L. Johnson, and N. Jaitly. A comparison of sequence-to-sequence models for speech recognition. In Interspeech, pages 939–943, 2017. A. Radford, J. W. Kim, T. Xu, G. Brockman, C. McLeavey, and I. Sutskever. Robust speech recognition via large-scale weak supervision. arXiv preprint arXiv:2212.04356, 2022. N. Reimers and I. Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. arXiv preprint arXiv:1908.10084, 2019. X. Ren, H. Zhu, L. Wei, M. Wu, and J. Hao. Improving mandarin speech recogntion with block-augmented transformer. arXiv preprint arXiv:2207.11697, 2022. D. A. Reynolds et al. Gaussian mixture models. Encyclopedia of biometrics, 741(659-663), 2009. F. S. Richards. A method of maximum-likelihood estimation. Journal of the Royal Statistical Society: Series B (Methodological), 23(2):469–475, 1961. K. Singh, V. Manohar, A. Xiao, S. Edunov, R. Girshick, V. Liptchinsky, C. Fuegen, Y. Saraf, G. Zweig, and A. Mohamed. Large scale weakly and semi-supervised learning for low-resource video ASR. arXiv preprint arXiv:2005.07850, 2020. F. Stahlberg. Neural machine translation: A review. Journal of Artificial Intelligence Research, 69:343–418, 2020. P. Swietojanski, A. Ghoshal, and S. Renals. Revisiting hybrid and GMM-HMM system combination techniques. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 6744–6748. IEEE, 2013. TencentGameMate. chinese speech pretrain. https://github.com/TencentGameMate/chinese_speech_pretrain, 2022. J.-M. Valin. A hybrid DSP/deep learning approach to real-time full-band speech enhancement. In 2018 IEEE 20th international workshop on multimedia signal processing (MMSP), pages 1–5. IEEE, 2018. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. M. Vetterli and C. Herley. Wavelets and filter banks: Theory and design. IEEE transactions on signal processing, 40(ARTICLE):2207–2232, 1992. C. Wang, H. Inaguma, P.-J. Chen, I. Kulikov, Y. Tang, W.-N. Hsu, M. Auli, and J. Pino. Simple and effective unsupervised speech translation. arXiv preprint arXiv:2210.10191, 2022. S. Watanabe, T. Hori, S. Karita, T. Hayashi, J. Nishitoba, Y. Unno, N. E. Y. Soplin, J. Heymann, M. Wiesner, N. Chen, et al. Espnet: End-to-end speech processing toolkit. arXiv preprint arXiv:1804.00015, 2018. S.-w. Yang, P.-H. Chi, Y.-S. Chuang, C.-I. J. Lai, K. Lakhotia, Y. Y. Lin, A. T. Liu, J. Shi, X. Chang, G.-T. Lin, et al. Superb: Speech processing universal performance benchmark. arXiv preprint arXiv:2105.01051, 2021. B. Zhang, H. Lv, P. Guo, Q. Shao, C. Yang, L. Xie, X. Xu, H. Bu, X. Chen, C. Zeng, et al. Wenetspeech: A 10000+ hours multi-domain mandarin corpus for speech recognition. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6182–6186. IEEE, 2022. Q. Zhang, H. Lu, H. Sak, A. Tripathi, E. McDermott, S. Koo, and S. Kumar. Transformer transducer: A streamable speech recognition model with transformer encoders and rnn-t loss. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 7829–7833. IEEE, 2020. T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019. J. Zhao and W.-Q. Zhang. Improving automatic speech recognition performance for low-resource languages with self-supervised models. IEEE Journal of Selected Topics in Signal Processing, 16(6):1227–1241, 2022. S. Zhou, S. Xu, and B. Xu. Multilingual end-to-end speech recognition with a single transformer on low-resource languages. arXiv preprint arXiv:1806.05059, 2018.	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89881	-
dc.description.abstract	台語語音辨識主要面對問題分為: 1. 缺乏大量且公開的台語語料集，2. 台語文字書寫系統不統一，前者導致進行語音辨識的任務上面臨資料不足，後者造成輸出格式不統一且難以讀解。本研究以台語語音辨識結合中文翻譯為任務，透過預訓練語音模型結合端到端深度學習模型的架構，建立台語語音翻譯模型。以少量台語語音配對中文文本語料為基礎，透過大量蒐集網路台語語音資料進行半監督式學習，並設計資料清洗演算法，改善台語語音翻譯系統以及台語語料。研究探討主要分為端到端語音翻譯模型、預訓練語音模型特徵、疊代訓練方法以及語料清洗四種改進方向。根據實驗結果，驗證上述方法皆能有效改善台語語音翻譯中文的表現。	zh_TW
dc.description.abstract	The challenges in Taiwanese speech recognition can be primarily categorized into two aspects: 1) the lack of abundant and publicly available Taiwanese speech corpora, and 2) the inconsistency in the written system of Taiwanese. The former results in insufficient data for speech recognition tasks, while the latter leads to inconsistent output formats and difficulties in interpretation. In this study, we focus on the task of combining Taiwanese speech recognition with Chinese translation and propose a framework that integrates pretrained speech models with end-to-end deep learning models to build a Taiwanese speech translation system. Based on a limited amount of Taiwanese speech-Chinese text paired data, we utilize semi-supervised learning through a large collection of Taiwanese speech data gathered from the internet and design data cleaning algorithms to improve both the Taiwanese speech translation system and the Taiwanese speech corpora. The research explores four main improvement directions: end-to-end speech translation models, pretrained speech model features, iterative training methods, and data cleaning. Experimental results validate the effectiveness of the aforementioned approaches in improving the performance of Taiwanese speech translation to Chinese.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-09-22T16:31:27Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-09-22T16:31:27Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	致謝 i 摘要 iii Abstract v 目錄 vii 圖目錄 xi 表目錄 xiii 第一章緒論 1 1.1 研究動機 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究貢獻 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 章節概述 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 第二章相關文獻 5 2.1 機器翻譯與語音辨識 . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 機器翻譯 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1.1 基於規則條件的機器翻譯 . . . . . . . . . . . . . . . 5 2.1.1.2 統計機器翻譯 . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1.3 神經機器翻譯 . . . . . . . . . . . . . . . . . . . . . . 8 2.1.2 語音辨識 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.2.1 序列到序列模型 . . . . . . . . . . . . . . . . . . . . 9 2.1.2.2 連接時序分類 (CTC) . . . . . . . . . . . . . . . . . . 10 2.1.2.3 RNN-Transducer . . . . . . . . . . . . . . . . . . . . 11 2.1.2.4 LAS . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.3 語音翻譯 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.2 語音特徵 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2.1 對數梅爾頻譜圖 (Log-mel spectrogram) . . . . . . . . . . . . . . 14 2.2.2 梅爾頻率倒譜系數 (Mel-frequency cepstral coefficient, MFCC) . . 14 2.2.3 自監督語音模型特徵 . . . . . . . . . . . . . . . . . . . . . . . . 15 2.3 低資源語言相關研究 . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3.1 自監督語音模型提取語音特徵 . . . . . . . . . . . . . . . . . . . 16 2.3.2 半監督學習訓練方法 . . . . . . . . . . . . . . . . . . . . . . . . 17 2.3.3 資料擴增 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 第三章資料集和任務介紹 19 3.1 資料集介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1 台語資料集 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1.1 TAT 資料集 . . . . . . . . . . . . . . . . . . . . . . . 19 3.1.1.2 TAI YouTube 資料集 . . . . . . . . . . . . . . . . . . 20 3.1.2 英文資料集 LibriSpeech . . . . . . . . . . . . . . . . . . . . . . . 21 3.1.3 中文資料集 Common Voice Chinese . . . . . . . . . . . . . . . . . 21 3.2 任務介紹 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.2.1 台語語音翻譯任務 . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.2 正規化方法 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.3 評量指標 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.3.1 字錯誤率 (Character error rate, CER) . . . . . . . . . 25 3.2.3.2 雙語替換評測 (bilingual evaluation understudy, BLEU) 25 第四章研究方法 27 4.1 端到端台語語音翻譯系統 . . . . . . . . . . . . . . . . . . . . . . . . 27 4.1.1 WAV2VEC 2.0 進行特徵抽取 . . . . . . . . . . . . . . . . . . . . 29 4.1.2 Conformer 端到端語音翻譯模型 . . . . . . . . . . . . . . . . . . 29 4.2 預訓練語音模型微調 . . . . . . . . . . . . . . . . . . . . . . . . . . 30 4.3 半監督式疊代訓練 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4.4 語料清洗系統 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 4.4.1 標註語料清洗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4.4.2 文本處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 4.4.2.1 語言模型過濾器 (LM filter) . . . . . . . . . . . . . . 34 4.4.2.2 語速過濾器 (SR filter) . . . . . . . . . . . . . . . . . 35 4.4.3 語音處理 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4.4.3.1 語音活性偵測方法 . . . . . . . . . . . . . . . . . . . 36 4.4.3.2 語言辨識方法 . . . . . . . . . . . . . . . . . . . . . . 37 第五章實驗設計與結果討論 39 5.1 實驗路線圖與實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.1 實驗路線圖 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 5.1.2 實驗設定 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.1.2.1 訓練設備環境 . . . . . . . . . . . . . . . . . . . . . . 40 5.1.2.2 實驗參數設定 . . . . . . . . . . . . . . . . . . . . . . 41 5.2 端到端語音翻譯模型 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5.2.1 模型訓練與錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1.1 錯誤分析 . . . . . . . . . . . . . . . . . . . . . . . . 42 5.2.1.2 評量指標分析 . . . . . . . . . . . . . . . . . . . . . . 44 5.2.2 下游模型比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2.3 語音特徵比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 自監督式預訓練語音模型 . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.1 預訓練語音模型比較 . . . . . . . . . . . . . . . . . . . . . . . . 49 5.3.2 台語預訓練語音模型微調 . . . . . . . . . . . . . . . . . . . . . . 50 5.4 半監督學習疊代改善 . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.4.1 無標註語料擴增 . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.5 資料清洗流程改善 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.1 標註語料清洗 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.5.2 資料清洗流程: 文本處理 . . . . . . . . . . . . . . . . . . . . . . 54 5.5.3 資料清洗流程: 語音處理 . . . . . . . . . . . . . . . . . . . . . . 55 5.6 實驗總結與比較 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 第六章結論與未來工作 61 6.1 結論 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.2 未來工作 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 參考文獻 65	-
dc.language.iso	zh_TW	-
dc.subject	自動語音辨識	zh_TW
dc.subject	機器翻譯	zh_TW
dc.subject	端到端語音辨識	zh_TW
dc.subject	自監督式學習	zh_TW
dc.subject	End-to-end speech recognition	en
dc.subject	Machine translation	en
dc.subject	Automatic speech recognition	en
dc.subject	Self-supervised learning	en
dc.title	通過半監督學習改進端到端台語至中文語音翻譯	zh_TW
dc.title	Improving End-to-end Taiwanese-to-Chinese Speech Translation by Semi-supervised Learning	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王新民;廖元甫	zh_TW
dc.contributor.oralexamcommittee	Hsin-Min Wang;Yuan-Fu Liao	en
dc.subject.keyword	自動語音辨識,自監督式學習,端到端語音辨識,機器翻譯,	zh_TW
dc.subject.keyword	Automatic speech recognition,Self-supervised learning,End-to-end speech recognition,Machine translation,	en
dc.relation.page	73	-
dc.identifier.doi	10.6342/NTU202301825	-
dc.rights.note	同意授權(全球公開)	-
dc.date.accepted	2023-08-13	-
dc.contributor.author-college	電機資訊學院	-
dc.contributor.author-dept	資訊工程學系	-
dc.date.embargo-lift	2028-08-08	-
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 此日期後於網路公開 2028-08-08	2.8 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。