請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86959完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 李琳山 | zh_TW |
| dc.contributor.advisor | Lin-shan Lee | en |
| dc.contributor.author | 張致強 | zh_TW |
| dc.contributor.author | Chih-Chiang Chang | en |
| dc.date.accessioned | 2023-05-02T17:04:52Z | - |
| dc.date.available | 2023-11-09 | - |
| dc.date.copyright | 2023-05-02 | - |
| dc.date.issued | 2022 | - |
| dc.date.submitted | 2023-01-13 | - |
| dc.identifier.citation | D. M. Eberhard, G. F. Simons, and C. D. Fennig, Eds., Ethnologue: Languages of the World, 25th ed. Dallas, Texas: SIL International, 2022. [Online]. Available: http://www.ethnologue.com
F. Och. (2006) Statistical machine translation live. Google AI Blog. Google. [Online]. Available: https://ai.googleblog.com/2006/04/statistical-machine-translation-live.html Q. V. Le and M. Schuster. (2016) A neural network for machine translation, at production scale. Google AI Blog. Google. [Online]. Available: https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, M. Krikun, Y. Cao, Q. Gao, K. Macherey et al., “Google’s neural machine translation system: Bridging the gap between human and machine translation,” ArXiv preprint, vol.abs/1609.08144, 2016. [Online]. Available: https://arxiv.org/abs/1609.08144 A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need,” in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, I. Guyon, U. von Luxburg, S. Bengio, H. M. Wallach, R. Fergus, S. V. N. Vishwanathan, and R. Garnett, Eds., 2017, pp. 5998–6008. [Online]. Available: https://proceedings.neurips.cc/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html I. Caswell and B. Liang. (2020) Recent advances in google translate. Google AI Blog. Google. [Online]. Available: https://ai.googleblog.com/2020/06/recent-advances-in-google-translate.html M. X. Chen, O. Firat, A. Bapna, M. Johnson, W. Macherey, G. Foster, L. Jones, M. Schuster, N. Shazeer, N. Parmar, A. Vaswani, J. Uszkoreit, L. Kaiser, Z. Chen, Y. Wu, and M. Hughes, “The best of both worlds: Combining recent advances in neural machine translation,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp. 76–86. [Online]. Available: https://aclanthology.org/P18-1008 R. Cattoni, M. A. D. Gangi, L. Bentivogli, M. Negri, and M. Turchi, “Must-c: A multilingual corpus for end-to-end speech translation,” Comput. Speech Lang., vol. 66, p. 101155, 2021. [Online]. Available: https://doi.org/10.1016/j.csl.2020.101155 M. Cettolo, J. Niehues, S. Stüker, L. Bentivogli, R. Cattoni, and M. Federico, “The IWSLT 2015 evaluation campaign,” in Proceedings of the 12th International Workshop on Spoken Language Translation: Evaluation Campaign, Da Nang, Vietnam, 2015. [Online]. Available: https://aclanthology.org/2015.iwslt-evaluation.1 E. Ansari, A. Axelrod, N. Bach, O. Bojar, R. Cattoni, F. Dalvi, N. Durrani, M. Federico, C. Federmann, J. Gu, F. Huang, K. Knight, X. Ma, A. Nagesh, M. Negri, J. Niehues, J. Pino, E. Salesky, X. Shi, S. Stüker, M. Turchi, A. Waibel, and C. Wang, “FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN,” in Proceedings of the 17th International Conference on Spoken Language Translation. Online: Association for Computational Linguistics, 2020, pp. 1–34. [Online]. Available: https://aclanthology.org/2020.iwslt-1.1 N. Arivazhagan and C. Cherry. (2021) Stabilizing live speech translation in google translate. Google AI Blog. Google. [Online]. Available: https://ai.googleblog.com/2021/01/stabilizing-live-speech-translation-in.html R. Al-Khanji, S. El-Shiyab, and R. Hussein, “On the use of compensatory strategies in simultaneous interpretation,” Meta: Journal des traducteurs/Meta: Translators’Journal, vol. 45, no. 3, pp. 548–557, 2000. B. Moser-Mercer, A. Künzli, and M. Korac, “Prolonged turns in interpreting: Effects on quality, physiological and psychological stress (pilot study),” Interpreting, vol. 3, no. 1, pp. 47–64, 1998. L. Dong and B. Xu, “CIF: continuous integrate-and-fire for end-to-end speech recognition,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. IEEE, 2020, pp. 6079–6083. [Online]. Available: https://doi.org/10.1109/ICASSP40776.2020.9054250 R. Salakhutdinov, “Deep learning,” in The 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA - August 24 - 27, 2014, S. A. Macskassy, C. Perlich, J. Leskovec, W. Wang, and R. Ghani, Eds. ACM, 2014, p. 1973. [Online]. Available: https://doi.org/10.1145/2623330.2630809 D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1412.6980 Y. LeCun, B. E. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. E. Hubbard, and L. D. Jackel, “Backpropagation applied to handwritten zip code recognition,” Neural Comput., vol. 1, no. 4, pp. 541–551, 1989. [Online]. Available: https://doi.org/10.1162/neco.1989.1.4.541 O. Abdel-Hamid, A. Mohamed, H. Jiang, and G. Penn, “Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition,” in 2012 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2012, Kyoto, Japan, March 25-30, 2012. IEEE, 2012, pp. 4277–4280. [Online]. Available: https://doi.org/10.1109/ICASSP.2012.6288864 T. N. Sainath, A. Mohamed, B. Kingsbury, and B. Ramabhadran, “Deep convolutional neural networks for LVCSR,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2013, Vancouver, BC, Canada, May 26-31, 2013. IEEE, 2013, pp. 8614–8618. [Online]. Available: https://doi.org/10.1109/ICASSP.2013.6639347 D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” in 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2015. [Online]. Available: http://arxiv.org/abs/1409.0473 T. Luong, H. Pham, and C. D. Manning, “Effective approaches to attention based neural machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015, pp. 1412–1421. [Online]. Available: https://aclanthology.org/D15-1166 K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,”in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 2016, pp.770–778. [Online]. Available: https://doi.org/10.1109/CVPR.2016.90 L. J. Ba, J. R. Kiros, and G. E. Hinton, “Layer normalization,” ArXiv preprint, vol.abs/1607.06450, 2016. [Online]. Available: https://arxiv.org/abs/1607.06450 C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016. IEEE Computer Society, 2016, pp. 2818–2826. [Online]. Available: https://doi.org/10.1109/CVPR.2016.308 J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp.4171–4186. [Online]. Available: https://aclanthology.org/N19-1423 L. Dong, S. Xu, and B. Xu, “Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition,” in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2018, Calgary, AB, Canada, April 15-20, 2018. IEEE, 2018, pp. 5884–5888. [Online]. Available: https://doi.org/10.1109/ICASSP.2018.8462506 A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby,“An image is worth 16x16 words: Transformers for image recognition at scale,”in 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net, 2021. [Online]. Available: https://openreview.net/forum?id=YicbFdNTTy A. Graves, S. Fernández, F. J. Gomez, and J. Schmidhuber, “Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks,” in Machine Learning, Proceedings of the Twenty-Third International Conference (ICML 2006), Pittsburgh, Pennsylvania, USA, June 25-29, 2006, ser. ACM International Conference Proceeding Series, W. W. Cohen and A. W. Moore, Eds., vol. 148. ACM, 2006, pp. 369–376. [Online]. Available: https://doi.org/10.1145/1143844.1143891 A. Bérard, O. Pietquin, L. Besacier, and C. Servan, “Listen and translate: A proof of concept for end-to-end speech-to-text translation,” in NIPS Workshop on end-to-end learning for speech and audio processing, 2016. M. Sperber and M. Paulik, “Speech translation and the end-to-end promise: Taking stock of where we are,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020, pp. 7409–7421. [Online]. Available: https://aclanthology.org/2020.acl-main.661 R. J. Weiss, J. Chorowski, N. Jaitly, Y. Wu, and Z. Chen, “Sequence-to-sequence models can directly translate foreign speech,” in Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Stockholm, Sweden, August 20-24, 2017, F. Lacerda, Ed. ISCA, 2017, pp.2625–2629. [Online]. Available: http://www.isca-speech.org/archive/Interspeech_2017/abstracts/0503.html Y. Tang, J. M. Pino, C. Wang, X. Ma, and D. Genzel, “A general multi-task learning framework to leverage text data for speech to text tasks,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. IEEE, 2021, pp. 6209–6213. [Online]. Available: https://doi.org/10.1109/ICASSP39728.2021.9415058 Y. Tang, J. Pino, X. Li, C. Wang, and D. Genzel, “Improving speech translation by understanding and learning from the auxiliary text translation task,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, 2021, pp.4252–4261. [Online]. Available: https://aclanthology.org/2021.acl-long.328 S. Bansal, H. Kamper, K. Livescu, A. Lopez, and S. Goldwater, “Pretraining on high-resource speech recognition improves low-resource speech-to-text translation,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp. 58–68. [Online]. Available: https://aclanthology.org/N19-1006 A. Anastasopoulos and D. Chiang, “Tied multitask learning for neural speech translation,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). New Orleans, Louisiana: Association for Computational Linguistics, 2018, pp. 82–91. [Online]. Available: https://aclanthology.org/N18-1008 H. Le, J. Pino, C. Wang, J. Gu, D. Schwab, and L. Besacier, “Lightweight adapter tuning for multilingual speech translation,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). Online: Association for Computational Linguistics, 2021, pp. 817–824. [Online]. Available: https://aclanthology.org/2021.acl-short.103 X. Li, C. Wang, Y. Tang, C. Tran, Y. Tang, J. Pino, A. Baevski, A. Conneau, and M. Auli, “Multilingual speech translation from efficient finetuning of pretrained models,” in Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). Online: Association for Computational Linguistics, 2021, pp. 827–838. [Online]. Available: https://aclanthology.org/2021.acl-long.68 Y. Jia, M. Johnson, W. Macherey, R. J. Weiss, Y. Cao, C. Chiu, N. Ari, S. Laurenzo, and Y. Wu, “Leveraging weakly supervised data to improve end-to-end speech-to-text translation,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019. IEEE, 2019, pp. 7180–7184. [Online]. Available: https://doi.org/10.1109/ICASSP.2019.8683343 J. Pino, L. Puzon, J. Gu, X. Ma, A. D. McCarthy, and D. Gopinath, “Harnessing indirect training data for end-to-end automatic speech translation: Tricks of the trade,” in Proceedings of the 16th International Conference on Spoken Language Translation. Hong Kong: Association for Computational Linguistics, 2019. [Online]. Available: https://aclanthology.org/2019.iwslt-1.18 J. M. Pino, Q. Xu, X. Ma, M. J. Dousti, and Y. Tang, “Self-training for end-to-end speech translation,” in Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, H. Meng, B. Xu, and T. F. Zheng, Eds. ISCA, 2020, pp.1476–1480. [Online]. Available: https://doi.org/10.21437/Interspeech.2020-2938 G. E. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” ArXiv preprint, vol. abs/1503.02531, 2015. [Online]. Available: https://arxiv.org/abs/1503.02531 Y. Kim and A. M. Rush, “Sequence-level knowledge distillation,” in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, 2016, pp. 1317–1327. [Online]. Available: https://aclanthology.org/D16-1139 Y. Liu, H. Xiong, J. Zhang, Z. He, H. Wu, H. Wang, and C. Zong, “End-to-end speech translation with knowledge distillation,” in Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, G. Kubin and Z. Kacic, Eds. ISCA, 2019, pp. 1128–1132. [Online]. Available: https://doi.org/10.21437/Interspeech.2019-2582 M. Gaido, M. A. D. Gangi, M. Negri, and M. Turchi, “On knowledge distillation for direct speech translation,” in Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, March 1-3, 2021, ser. CEUR Workshop Proceedings, J. Monti, F. Dell’Orletta, and F. Tamburini, Eds., vol. 2769. CEUR-WS.org, 2020. [Online]. Available: http://ceur-ws.org/Vol-2769/paper_28.pdf K. Papineni, S. Roukos, T. Ward, and W.-J. Zhu, “Bleu: a method for automatic evaluation of machine translation,” in Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Philadelphia, Pennsylvania, USA: Association for Computational Linguistics, 2002, pp. 311–318. [Online]. Available: https://aclanthology.org/P02-1040 H. He, J. Boyd-Graber, and H. Daumé III, “Interpretese vs. translationese: The uniqueness of human strategies in simultaneous interpretation,” in Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California: Association for Computational Linguistics, 2016, pp. 971–976. [Online]. Available: https://aclanthology.org/N16-1111 J. Gu, G. Neubig, K. Cho, and V. O. Li, “Learning to translate in real-time with neural machine translation,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers. Valencia, Spain: Association for Computational Linguistics, 2017, pp. 1053–1062. [Online]. Available: https://aclanthology.org/E17-1099 K. Cho and M. Esipova, “Can neural machine translation do simultaneous translation?” ArXiv preprint, vol. abs/1606.02012, 2016. [Online]. Available: https://arxiv.org/abs/1606.02012 M. Ma, L. Huang, H. Xiong, R. Zheng, K. Liu, B. Zheng, C. Zhang, Z. He, H. Liu, X. Li, H. Wu, and H. Wang, “STACL: Simultaneous translation with implicit anticipation and controllable latency using prefix-to-prefix framework,”in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019, pp. 3025–3036. [Online]. Available: https://aclanthology.org/P19-1289 C. Cherry and G. F. Foster, “Thinking slow about latency evaluation for simultaneous machine translation,” ArXiv preprint, vol. abs/1906.00048, 2019. [Online]. Available: https://arxiv.org/abs/1906.00048 N. Arivazhagan, C. Cherry, W. Macherey, C.-C. Chiu, S. Yavuz, R. Pang, W. Li, and C. Raffel, “Monotonic infinite lookback attention for simultaneous machine translation,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019, pp. 1313–1323. [Online]. Available: https://aclanthology.org/P19-1126 A. Grissom II, H. He, J. Boyd-Graber, J. Morgan, and H. Daumé III, “Don’t until the final verb wait: Reinforcement learning for simultaneous machine translation,”in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Doha, Qatar: Association for Computational Linguistics, 2014, pp. 1342–1352. [Online]. Available: https://aclanthology.org/D14-1140 F. Dalvi, N. Durrani, H. Sajjad, and S. Vogel, “Incremental decoding and training methods for simultaneous translation in neural machine translation,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers). New Orleans, Louisiana: Association for Computational Linguistics, 2018, pp. 493–499. [Online]. Available: https://aclanthology.org/N18-2079 C. Raffel, M. Luong, P. J. Liu, R. J. Weiss, and D. Eck, “Online and linear-time attention by enforcing monotonic alignments,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 2837–2846. [Online]. Available: http://proceedings.mlr.press/v70/raffel17a.html C. Chiu and C. Raffel, “Monotonic chunkwise attention,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=Hko85plCW X. Ma, J. M. Pino, J. Cross, L. Puzon, and J. Gu, “Monotonic multihead attention,”in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=Hyg96gBKPS B. Zheng, R. Zheng, M. Ma, and L. Huang, “Simultaneous translation with flexible policy via restricted imitation learning,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Florence, Italy: Association for Computational Linguistics, 2019, pp. 5816–5822. [Online]. Available: https://aclanthology.org/P19-1582 B. Zheng, R. Zheng, M. Ma, and L. Huang, “Simpler and faster learning of adaptive policies for simultaneous translation,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Hong Kong, China: Association for Computational Linguistics, 2019, pp. 1349–1354. [Online]. Available: https://aclanthology.org/D19-1137 B. Zheng, K. Liu, R. Zheng, M. Ma, H. Liu, and L. Huang, “Simultaneous translation policies: From fixed to adaptive,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020, pp. 2847–2853. [Online]. Available: https://aclanthology.org/2020.acl-main.254 Y. Miao, P. Blunsom, and L. Specia, “A generative framework for simultaneous machine translation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 6697–6706. [Online]. Available: https://aclanthology.org/2021.emnlp-main.536 L. Yu, J. Buys, and P. Blunsom, “Online segment to segment neural transduction,”in Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Austin, Texas: Association for Computational Linguistics, 2016, pp.1307–1316. [Online]. Available: https://aclanthology.org/D16-1138 L. Huang, C. Cherry, M. Ma, N. Arivazhagan, and Z. He, “Simultaneous translation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: Tutorial Abstracts. Online: Association for Computational Linguistics, 2020, pp. 34–36. [Online]. Available: https://aclanthology.org/2020.emnlp-tutorials.6 H. He, A. Grissom II, J. Morgan, J. Boyd-Graber, and H. Daumé III, “Syntax-based rewriting for simultaneous machine translation,” in Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. Lisbon, Portugal: Association for Computational Linguistics, 2015, pp. 55–64. [Online]. Available: https://aclanthology.org/D15-1006 R. Zhang, C. Zhang, Z. He, H. Wu, and H. Wang, “Learning adaptive segmentation policy for simultaneous translation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Online: Association for Computational Linguistics, 2020, pp. 2280–2289. [Online]. Available: https://aclanthology.org/2020.emnlp-main.178 J. Chen, R. Zheng, A. Kita, M. Ma, and L. Huang, “Improving simultaneous translation by incorporating pseudo-references with fewer reorderings,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 5857–5864. [Online]. Available: https://aclanthology.org/2021.emnlp-main.473 Y. Ren, J. Liu, X. Tan, C. Zhang, T. Qin, Z. Zhao, and T.-Y. Liu, “SimulSpeech: End-to-end simultaneous speech to text translation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Online: Association for Computational Linguistics, 2020, pp. 3787–3796. [Online]. Available: https://aclanthology.org/2020.acl-main.350 X. Ma, J. Pino, and P. Koehn, “SimulMT to SimulST: Adapting simultaneous text translation to end-to-end simultaneous speech translation,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing. Suzhou, China: Association for Computational Linguistics, 2020, pp. 582–587. [Online]. Available: https://aclanthology.org/2020.aacl-main.58 X. Ma, Y. Wang, M. J. Dousti, P. Koehn, and J. M. Pino, “Streaming simultaneous speech translation with augmented memory transformer,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. IEEE, 2021, pp. 7523–7527. [Online]. Available: https://doi.org/10.1109/ICASSP39728.2021.9414897 L. Dong, F. Wang, and B. Xu, “Self-attention aligner: A latency-control end-to-end model for ASR using self-attention network and chunk-hopping,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2019, Brighton, United Kingdom, May 12-17, 2019. IEEE, 2019, pp. 5656–5660. [Online]. Available: https://doi.org/10.1109/ICASSP.2019.8682954 E. Tsunoo, Y. Kashiwagi, T. Kumakura, and S. Watanabe, “Transformer ASR with contextual block processing,” in IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019. IEEE, 2019, pp. 427–433. [Online]. Available: https://doi.org/10.1109/ASRU46091.2019.9003749 Z. Tian, J. Yi, Y. Bai, J. Tao, S. Zhang, and Z. Wen, “Synchronous transformers for end-to-end speech recognition,” in 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2020, Barcelona, Spain, May 4-8, 2020. IEEE, 2020, pp. 7884–7888. [Online]. Available: https://doi.org/10.1109/ICASSP40776.2020.9054260 C. Wu, Y. Wang, Y. Shi, C. Yeh, and F. Zhang, “Streaming transformer-based acoustic models using self-attention with augmented memory,” in Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, H. Meng, B. Xu, and T. F. Zheng, Eds. ISCA, 2020, pp. 2132–2136. [Online]. Available: https://doi.org/10.21437/Interspeech.2020-2079 Y. Shi, Y. Wang, C. Wu, C. Yeh, J. Chan, F. Zhang, D. Le, and M. Seltzer,“Emformer: Efficient memory transformer based acoustic model for low latency streaming speech recognition,” in IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2021, Toronto, ON, Canada, June 6-11, 2021. IEEE, 2021, pp. 6783–6787. [Online]. Available: https://doi.org/10.1109/ICASSP39728.2021.9414560 J. Chen, M. Ma, R. Zheng, and L. Huang, “Direct simultaneous speech-to-text translation assisted by synchronized streaming ASR,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics, 2021, pp. 4618–4624. [Online]. Available: https://aclanthology.org/2021.findings-acl.406 X. Zeng, L. Li, and Q. Liu, “RealTranS: End-to-end simultaneous speech translation with convolutional weighted-shrinking transformer,” in Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. Online: Association for Computational Linguistics, 2021, pp. 2461–2474. [Online]. Available: https://aclanthology.org/2021.findings-acl.218 H. Nguyen, Y. Estève, and L. Besacier, “Impact of encoding and segmentation strategies on end-to-end simultaneous speech translation,” in Interspeech 2021, 22nd Annual Conference of the International Speech Communication Association, Brno, Czechia, 30 August - 3 September 2021, H. Hermansky, H. Cernocký, L. Burget, L. Lamel, O. Scharenborg, and P. Motlícek, Eds. ISCA, 2021, pp.2371–2375. [Online]. Available: https://doi.org/10.21437/Interspeech.2021-608 Q. Dong, Y. Zhu, M. Wang, and L. Li, “Learning when to translate for streaming speech,” in Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2022, Dublin, Ireland, May 22-27, 2022, S. Muresan, P. Nakov, and A. Villavicencio, Eds. Association for Computational Linguistics, 2022, pp. 680–694. [Online]. Available: https://aclanthology.org/2022.acl-long.50 A. Baevski, Y. Zhou, A. Mohamed, and M. Auli, “wav2vec 2.0: A framework for self-supervised learning of speech representations,” in Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 2020, virtual, H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., 2020. [Online]. Available: https://proceedings.neurips.cc/paper/2020/hash/92d1e1eb1cd6f9fba3227870bb6d7f07-Abstract.html D. Liu, M. Du, X. Li, Y. Li, and E. Chen, “Cross attention augmented transducer networks for simultaneous translation,” in Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic: Association for Computational Linguistics, 2021, pp. 39–55. [Online]. Available: https://aclanthology.org/2021.emnlp-main.4 A. Graves, “Sequence transduction with recurrent neural networks,” ArXiv preprint, vol. abs/1211.3711, 2012. [Online]. Available: http://arxiv.org/abs/1211.3711 M. Müller, A. Rios, and R. Sennrich, “Domain robustness in neural machine translation,” in Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Track). Virtual: Association for Machine Translation in the Americas, 2020, pp. 151–164. [Online]. Available: https://aclanthology.org/2020.amta-research.14 C.-C. Chang, S.-P. Chuang, and H.-y. Lee, “Anticipation-free training for simultaneous machine translation,” in Proceedings of the 19th International Conference on Spoken Language Translation (IWSLT 2022). Dublin, Ireland (in-person and online): Association for Computational Linguistics, 2022, pp.43–61. [Online]. Available: https://aclanthology.org/2022.iwslt-1.5 G. E. Mena, D. Belanger, S. W. Linderman, and J. Snoek, “Learning latent permutations with gumbel-sinkhorn networks,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 -May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=Byt3oJ-0W K. Chousa, K. Sudoh, and S. Nakamura, “Simultaneous neural machine translation using connectionist temporal classification,” ArXiv preprint, vol. abs/1911.11933, 2019. [Online]. Available: https://arxiv.org/abs/1911.11933 R. P. Adams and R. S. Zemel, “Ranking via sinkhorn propagation,” ArXiv preprint, vol. abs/1106.1925, 2011. [Online]. Available: http://arxiv.org/abs/1106.1925 R. Sinkhorn, “A relationship between arbitrary positive matrices and doubly stochastic matrices,” The annals of mathematical statistics, vol. 35, no. 2, pp. 876–879, 1964. D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” in 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, April 14-16, 2014, Conference Track Proceedings, Y. Bengio and Y. LeCun, Eds., 2014. [Online]. Available: http://arxiv.org/abs/1312.6114 E. Jang, S. Gu, and B. Poole, “Categorical reparameterization with gumbelsoftmax,”in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. [Online]. Available: https://openreview.net/forum?id=rkE3y85ee Q. Ran, Y. Lin, P. Li, and J. Zhou, “Guiding non-autoregressive neural machine translation decoding with reordering information,” in Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021. AAAI Press, 2021, pp. 13 727–13 735. [Online]. Available: https://ojs.aaai.org/index.php/AAAI/article/view/17618 J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics, 2019, pp.4171–4186. [Online]. Available: https://aclanthology.org/N19-1423 J. Libovický and J. Helcl, “End-to-end non-autoregressive neural machine translation with connectionist temporal classification,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 3016–3021. [Online]. Available: https://aclanthology.org/D18-1336 J. Chen and J. Zhang, Machine Translation: 14th China Workshop, CWMT 2018, Wuyishan, China, October 25-26, 2018, Proceedings. Springer, 2019, vol. 954. C. Callison-Burch, P. Koehn, C. Monz, and J. Schroeder, “Findings of the 2009 Workshop on Statistical Machine Translation,” in Proceedings of the Fourth Workshop on Statistical Machine Translation. Athens, Greece: Association for Computational Linguistics, 2009, pp. 1–28. [Online]. Available: https://aclanthology.org/W09-0401 M. Elbayad, L. Besacier, and J. Verbeek, “Efficient wait-k models for simultaneous machine translation,” in Interspeech 2020, 21st Annual Conference of the International Speech Communication Association, Virtual Event, Shanghai, China, 25-29 October 2020, H. Meng, B. Xu, and T. F. Zheng, Eds. ISCA, 2020, pp.1461–1465. [Online]. Available: https://doi.org/10.21437/Interspeech.2020-1241 J. Gu, J. Bradbury, C. Xiong, V. O. K. Li, and R. Socher, “Non-autoregressive neural machine translation,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=B1l8BtlCb J. Libovický and J. Helcl, “End-to-end non-autoregressive neural machine translation with connectionist temporal classification,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 3016–3021. [Online]. Available: https://aclanthology.org/D18-1336 C. Zhou, J. Gu, and G. Neubig, “Understanding knowledge distillation in non-autoregressive machine translation,” in 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, 2020. [Online]. Available: https://openreview.net/forum?id=BygFVAEKDH L. Biewald, “Experiment tracking with weights and biases,” 2020, software available from wandb.com. [Online]. Available: https://www.wandb.com/ X. Ma, M. J. Dousti, C. Wang, J. Gu, and J. Pino, “SIMULEVAL: An evaluation toolkit for simultaneous translation,” in Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Online: Association for Computational Linguistics, 2020, pp. 144–150. [Online]. Available: https://aclanthology.org/2020.emnlp-demos.19 M. Post, “A call for clarity in reporting BLEU scores,” in Proceedings of the Third Conference on Machine Translation: Research Papers. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 186–191. [Online]. Available: https://aclanthology.org/W18-6319 P. Koehn, “Statistical significance tests for machine translation evaluation,” in Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Barcelona, Spain: Association for Computational Linguistics, 2004, pp. 388–395. [Online]. Available: https://aclanthology.org/W04-3250 Z.-Y. Dou and G. Neubig, “Word alignment by fine-tuning embeddings on parallel corpora,” in Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Online: Association for Computational Linguistics, 2021, pp. 2112–2128. [Online]. Available: https://aclanthology.org/2021.eacl-main.181 W. Maass, “Networks of spiking neurons: the third generation of neural network models,” Neural networks, vol. 10, no. 9, pp. 1659–1671, 1997. L. Dong, C. Yi, J. Wang, S. Zhou, S. Xu, X. Jia, and B. Xu, “A comparison of label-synchronous and frame-synchronous end-to-end models for speech recognition,” ArXiv preprint, vol. abs/2005.10113, 2020. [Online]. Available: https://arxiv.org/abs/2005.10113 C. Yi, S. Zhou, and B. Xu, “Efficiently fusing pretrained acoustic and linguistic encoders for low-resource speech recognition,” IEEE Signal Processing Letters, vol. 28, pp. 788–792, 2021. C.-C. Chang and H.-y. Lee, “Exploring continuous integrate-and-fire for adaptive simultaneous speech translation,” Interspeech, 2022, to be published. C. Wang, Y. Tang, X. Ma, A. Wu, D. Okhonko, and J. Pino, “Fairseq S2T: Fast speech-to-text modeling with fairseq,” in Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: System Demonstrations. Suzhou, China: Association for Computational Linguistics, 2020, pp. 33–39. [Online]. Available: https://aclanthology.org/2020.aacl-demo.6 Y. N. Dauphin, A. Fan, M. Auli, and D. Grangier, “Language modeling with gated convolutional networks,” in Proceedings of the 34th International Conference on Machine Learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017, ser. Proceedings of Machine Learning Research, D. Precup and Y. W. Teh, Eds., vol. 70. PMLR, 2017, pp. 933–941. [Online]. Available: http://proceedings.mlr.press/v70/dauphin17a.html A. Mohamed, D. Okhonko, and L. Zettlemoyer, “Transformers with convolutional context for asr,” ArXiv preprint, vol. abs/1904.11660, 2019. [Online]. Available: https://arxiv.org/abs/1904.11660 S.-P. Chuang, Y.-S. Chuang, C.-C. Chang, and H.-y. Lee, “Investigating the reordering capability in CTC-based non-autoregressive end-to-end speech translation,” in Findings of the Association for Computational Linguistics: ACLIJCNLP 2021. Online: Association for Computational Linguistics, 2021, pp.1068–1077. [Online]. Available: https://aclanthology.org/2021.findings-acl.92 E. Ansari, A. Axelrod, N. Bach, O. Bojar, R. Cattoni, F. Dalvi, N. Durrani, M. Federico, C. Federmann, J. Gu, F. Huang, K. Knight, X. Ma, A. Nagesh, M. Negri, J. Niehues, J. Pino, E. Salesky, X. Shi, S. Stüker, M. Turchi, A. Waibel, and C. Wang, “FINDINGS OF THE IWSLT 2020 EVALUATION CAMPAIGN,” in Proceedings of the 17th International Conference on Spoken Language Translation. Online: Association for Computational Linguistics, 2020, pp. 1–34. [Online]. Available: https://aclanthology.org/2020.iwslt-1.1 D. S. Park, W. Chan, Y. Zhang, C. Chiu, B. Zoph, E. D. Cubuk, and Q. V. Le, “Specaugment: A simple data augmentation method for automatic speech recognition,” in Interspeech 2019, 20th Annual Conference of the International Speech Communication Association, Graz, Austria, 15-19 September 2019, G. Kubin and Z. Kacic, Eds. ISCA, 2019, pp. 2613–2617. [Online]. Available: https://doi.org/10.21437/Interspeech.2019-2680 T. Kudo and J. Richardson, “SentencePiece: A simple and language independent subword tokenizer and detokenizer for neural text processing,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Brussels, Belgium: Association for Computational Linguistics, 2018, pp. 66–71. [Online]. Available: https://aclanthology.org/D18-2012 T. Kudo, “Subword regularization: Improving neural network translation models with multiple subword candidates,” in Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics, 2018, pp.66–75. [Online]. Available: https://aclanthology.org/P18-1007 D. Hendrycks and K. Gimpel, “Gaussian error linear units (gelus),” ArXiv preprint, vol. abs/1606.08415, 2016. [Online]. Available: https://arxiv.org/abs/1606.08415 S. Kim, T. Hori, and S. Watanabe, “Joint ctc-attention based end-to-end speech recognition using multi-task learning,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2017, New Orleans, LA, USA, March 5-9, 2017. IEEE, 2017, pp. 4835–4839. [Online]. Available: https://doi.org/10.1109/ICASSP.2017.7953075 C. Chiu, A. Kannan, R. Prabhavalkar, Z. Chen, T. N. Sainath, Y. Wu, W. Han, Y. Zhang, R. Pang, S. Kishchenko, P. Nguyen, A. Narayanan, H. Liao, and S. Zhang, “A comparison of end-to-end models for long-form speech recognition,”in IEEE Automatic Speech Recognition and Understanding Workshop, ASRU 2019, Singapore, December 14-18, 2019. IEEE, 2019, pp. 889–896. [Online]. Available: https://doi.org/10.1109/ASRU46091.2019.9003854 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86959 | - |
| dc.description.abstract | 同步自動語音翻譯(Simultaneous Speech Translation)是一個以機器達成串流語音翻譯的任務;翻譯系統需要在來源語者說話的同時進行翻譯,因此需要有能力翻譯不完整的輸入、及決定讀取寫出的策略。同步自動語音翻譯講求較短延遲與較佳翻譯品質之間的折衝,亦即要能在低延遲下維持好的翻譯品質。此外,由於現實中的語音輸入是連續的,因此模型能否泛化到能處理長語句也顯得重要。在同步自動語音翻譯的做法中,同步機器翻譯(Simultaneous Machine Translation)是預設存在一串流語音辨識(Streaming Speech Recognition)系統,因此其輸入為文字;端到端同步自動語音翻譯則直接以語音訊號作為輸入。
在同步機器翻譯上,普遍使用一般翻譯資料集來訓練同步機器翻譯模型;但可能因重排序(Reordering)的問題,造成錯誤的學習目標,或非必要的提高延遲。既有的做法常將參考譯文改寫為單調翻譯(Monotonic Translation)來讓機器學習。相對地,本論文則將翻譯切開為「單調翻譯」與「重排序」兩個模型,並在推論階段只保留單調翻譯的部分,以達到同步機器翻譯。透過實驗發現,本論文所提出之作法可以提升英中翻譯在低延遲下的翻譯品質。 在端到端同步自動語音翻譯方面,本論文透過實驗發現,既有的基於單調多頭專注(Monotonic Multihead Attention)機制的作法無法泛化到長語句之語音輸入,因而提出基於連續整合發放(Continuous Integrate-and-fire)機制的作法,並由實驗證實 泛化到長語句的能力較佳,而且也在低延遲下可超越基於單調多頭專注機制的作法的翻譯品質。 綜上所述,本論文提出之方法可以提升同步自動語音翻譯在低延遲或長語句的翻譯品質,使得同步自動語音翻譯更貼近現實應用。 | zh_TW |
| dc.description.abstract | Simultaneous speech translation (SimulST) involves translating streaming speech using machines. In this task the translation system has to translate while the source speaker is still speaking, so it requires the ability to translate incomplete input, and to determine the read-write policy. SimulST strives for a better trade off between latency and translation quality, and it is essential to maintain good quality at low latency. Additionally, since speech signals are continuous in the real world, it is also crucial that the model can generalize to long utterances. Among the SimulST approaches, Simultaneous machine translation (SimulMT) assumes the existence of a streaming speech recognition system, so it takes texts as input; Meanwhile, end-to-end SimulST directly takes speech signals as input.
It is common to leverage typical translation datasets to train SimulMT systems. However, this inevitably causes incorrect learning objectives or high latency, due to the reordering problem between different languages. Existing approaches rewrite the reference translation to be monotonic. In contrast, in this thesis the translation process was divided into a monotonic translation part and a reordering part, while only the monotonic translation part was kept during inference in order to achieve SimulMT. The experiments showed that the proposed approach could improve the English-to-Chinese translation quality at low latency. For end-to-end SimulST, in this thesis the Continuous integrate-and-fire (CIF) mechanism was adapted to the SimulST task. It was found in the experiments that existing approaches based on monotonic multihead attention (MMA) failed to generalize to long utterances, while the proposed CIF-based approach can generalize better. Besides, the CIF-based approach was also shown to outperform the MMA-based approaches under low latency. In summary, this thesis proposed methods that could improve the translation quality at low latency or with long utterance, which are expected to make the SimulST task closer to practical applications. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-05-02T17:04:52Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2023-05-02T17:04:52Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 口試委員會審定書. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
誌謝. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 中文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii 英文摘要. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv 一、導論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 研究動機. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 研究方向. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 主要貢獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 章節安排. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 二、背景知識. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1 深層類神經網路. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.2 卷積式類神經網路. . . . . . . . . . . . . . . . . . . . . . . . 8 2.1.3 遞迴式類神經網路. . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.4 序列至序列模型. . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.1.5 專注機制. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.1.6 轉換器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.7 鏈結式時序分類器. . . . . . . . . . . . . . . . . . . . . . . . 16 2.2 自動語音翻譯. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2.2 多任務學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.3 預訓練模型. . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.4 資料增強. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.5 知識蒸餾. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.6 評量方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3 同步自動語音翻譯. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.1 緣起:同步口譯. . . . . . . . . . . . . . . . . . . . . . . . . . 24 2.3.2 同步自動語音翻譯簡介. . . . . . . . . . . . . . . . . . . . . . 25 2.3.3 評量方法. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.4 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 三、同步自動語音翻譯之相關研究. . . . . . . . . . . . . . . . . . . . . . . . 29 3.1 同步機器翻譯之相關研究. . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.1 早期. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.1.2 固定性策略前綴至前綴模型. . . . . . . . . . . . . . . . . . . 30 3.1.3 調適性策略前綴至前綴模型. . . . . . . . . . . . . . . . . . . 31 3.1.4 減少長距離重排序與預期. . . . . . . . . . . . . . . . . . . . 33 3.2 端到端同步自動語音翻譯之相關研究. . . . . . . . . . . . . . . . . . 35 3.2.1 編碼機制與預決策. . . . . . . . . . . . . . . . . . . . . . . . 36 3.2.2 調適性策略. . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 四、消除預期影響以改善低延遲同步機器翻譯. . . . . . . . . . . . . . . . . . 40 4.1 簡介. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.1.1 重排序之影響與解法. . . . . . . . . . . . . . . . . . . . . . . 40 4.1.2 重排學習. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.2 本論文所提出之框架. . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4.3 模型架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.1 因果式編碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.2 輔助排序網路. . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.3.3 長度投射層. . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 4.3.4 推論階段. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4.1 語料庫. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 4.4.2 實作細節. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 4.4.3 比較基準. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 4.5 與基準模型之比較實驗. . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.6 模型輸出分析實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 4.6.1 模型輸出文字之比較. . . . . . . . . . . . . . . . . . . . . . . 57 4.6.2 輔助排序網路之輸出. . . . . . . . . . . . . . . . . . . . . . . 59 4.7 模型組成元件之重要性探討. . . . . . . . . . . . . . . . . . . . . . . 61 4.7.1 甘氏-辛氏運算子. . . . . . . . . . . . . . . . . . . . . . . . 61 4.7.2 參數初始化. . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.8 其他討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 4.9 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 五、基於連續整合發放機制之低延遲與長語句同步自動語音翻譯. . . . . . . 66 5.1 連續整合發放機制簡介. . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.1 簡史. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.1.2 運作模式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.2 模型架構. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 5.2.1 區塊化處理語音編碼器. . . . . . . . . . . . . . . . . . . . . . 68 5.2.2 連續整合發放機制. . . . . . . . . . . . . . . . . . . . . . . . 70 5.2.3 解碼器. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 5.3 訓練方式. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5.3.1 鏈結式時序分類器減損函數. . . . . . . . . . . . . . . . . . . 71 5.3.2 數量減損函數. . . . . . . . . . . . . . . . . . . . . . . . . . . 72 5.3.3 延遲減損函數. . . . . . . . . . . . . . . . . . . . . . . . . . . 73 5.3.4 總減損函數. . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4 實驗設定. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.1 語料庫. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 5.4.2 比較基準. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.4.3 實作細節. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 5.5 與基準模型之比較實驗. . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.6 長語句之比較實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.7 模型輸出分析實驗. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.8 連續整合發放機制之平行化演算法. . . . . . . . . . . . . . . . . . . 83 5.9 本章總結. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 六、結論與展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.1 研究貢獻與討論. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 6.2 未來展望. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 參考文獻. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 端到端 | zh_TW |
| dc.subject | 語音翻譯 | zh_TW |
| dc.subject | 同步翻譯 | zh_TW |
| dc.subject | 重排序 | zh_TW |
| dc.subject | 連續整合發放 | zh_TW |
| dc.subject | Simultaneous Translation | en |
| dc.subject | Speech translation | en |
| dc.subject | End-to-end | en |
| dc.subject | Continuous Integrate-and-fire | en |
| dc.subject | Reordering | en |
| dc.title | 邁向低延遲與長語句之同步自動語音翻譯 | zh_TW |
| dc.title | Towards Simultaneous Speech Translation for Long Utterances at Low Latency | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 111-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 李宏毅;鄭秋豫;王小川;陳信宏;簡仁宗 | zh_TW |
| dc.contributor.oralexamcommittee | Hung-yi Lee;Chiu-yu Tseng;Hsiao-Chuan Wang;Sin-Horng Chen;Jen-Tzung Chien | en |
| dc.subject.keyword | 語音翻譯,同步翻譯,重排序,連續整合發放,端到端, | zh_TW |
| dc.subject.keyword | Speech translation,Simultaneous Translation,Reordering,Continuous Integrate-and-fire,End-to-end, | en |
| dc.relation.page | 115 | - |
| dc.identifier.doi | 10.6342/NTU202300075 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2023-01-16 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-111-1.pdf | 12.28 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
