請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19454完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 丁肇隆(Chao-Lung Ting) | |
| dc.contributor.author | Pei-Chun Lin | en |
| dc.contributor.author | 林佩君 | zh_TW |
| dc.date.accessioned | 2021-06-08T01:59:48Z | - |
| dc.date.copyright | 2020-08-21 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-08-18 | |
| dc.identifier.citation | 1. Ou, G. and Y.L.J.P.R. Murphey, Multi-class pattern classification using neural networks. 2007. 40(1): p. 4-18. 2. Read, J., et al., Classifier chains for multi-label classification. 2011. 85(3): p. 333. 3. Tsoumakas, G., I.J.I.J.o.D.W. Katakis, and Mining, Multi-label classification: An overview. 2007. 3(3): p. 1-13. 4. Deng, L.J.I.S.P.M., The mnist database of handwritten digit images for machine learning research [best of the web]. 2012. 29(6): p. 141-142. 5. Manning, C.D., C.D. Manning, and H. Schütze, Foundations of statistical natural language processing. 1999: MIT press. 6. Bakshi, R.K., et al. Opinion mining and sentiment analysis. in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). 2016. IEEE. 7. Husby, S. and D. Barbosa. Topic classification of blog posts using distant supervision. in Proceedings of the Workshop on Semantic Analysis in Social Media. 2012. 8. CHEN, Q.-h., et al., Research on Chinese micro-blog sentiment classification based on recurrent neural network. 2017(cst). 9. Ye, F., Sentiment Classification for Chinese Micro-blog Based on the Extension of Network Terms Feature, in Advances in Computer and Computational Sciences. 2018, Springer. p. 231-241. 10. Ma, W.-Y. and K.-J. Chen. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. in Proceedings of the second SIGHAN workshop on Chinese language processing. 2003. 11. Srivastava, N.J.U.o.T., Improving neural networks with dropout. 2013. 182(566): p. 7. 12. Zhou, P., et al., Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. 2016. 13. Kingma, D.P. and J.J.a.p.a. Ba, Adam: A method for stochastic optimization. 2014. 14. 邱建晴, 以卷積神經網路分析部落格社群網站垃圾文章. 2016(2016 年): p. 1-68. 15. Kim, Y.J.a.p.a., Convolutional neural networks for sentence classification. 2014. 16. Hochreiter, S. and J.J.N.c. Schmidhuber, Long short-term memory. 1997. 9(8): p. 1735-1780. 17. Kalchbrenner, N., E. Grefenstette, and P.J.a.p.a. Blunsom, A convolutional neural network for modelling sentences. 2014. 18. Yao, Y. and Z. Huang. Bi-directional LSTM recurrent neural network for Chinese word segmentation. in International Conference on Neural Information Processing. 2016. Springer. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19454 | - |
| dc.description.abstract | 由於網路每天有巨量文章產出,所以正確的文章分類,可以加速讀者在閱讀搜尋上的效率。據痞客邦網站的統計,有近50%的部落格文章未勾選文章所屬類別。本論文提出一自定義損失函數,協助提高這類的文章來進行正確的主題分類。經過本論文所提出之分類系統,可協助痞客邦系統後台自動得知該文章之主題分類 。 文章分別以Jieba斷詞系統及CKIP斷詞系統進行斷詞,實驗結果發現使用Jieba斷詞系統之分類正確率為92.60%,而使用CKIP斷詞系統之正確率為93.35%,顯示繁體中文文章在分類分析時,CKIP斷詞系統為輸入文章斷詞之首選。 斷詞後的文章經過預先訓練的詞向量進行編碼,編碼後輸入長短期記憶模型或卷積神經網路進行訓練。訓練時使用自定義之損失函數,其結果之正確率為93.35%,比傳統使用之損失函數之正確率92.98%有更好的成效。顯示本論文提出之自定義損失函數,可協助部落格文章進行更準確之分類。 | zh_TW |
| dc.description.abstract | Due to the huge amount of articles produced on the Internet every day, well-organized article labels can help improve user experience in reading and searching. However, according to the statistics of the Pixnet website, nearly 50% of blog posts are not being labeled by the author. To address this problem, our paper proposes a custom loss function to provide an automatic article labeling system in the website back end. Through this labeling system we can automatically assign accurate labels onto those articles without a label. We use Jieba word segmentation system and CKIP word segmentation system to segment articles. The experimental result in our study shows that the classification accuracy of the Jieba system is 92.60%, and the accuracy of the CKIP system is 93.35%. Thus, for traditional Chinese characters, the CKIP system is the first choice in word segmentation. After word segmentation, the articles are coded by pre-trained word vectors, and after encoding, they are input into Long Short-Term Memory models or Convolutional Neural Networks for training. When using our custom loss function during training, the accuracy of the result is 93.35%, which is better than the accuracy of 92.98% of the categorical_crossentropy loss function. In conclusion, our custom loss function proposed in this paper can help blog articles to be classified automatically and accurately. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-08T01:59:48Z (GMT). No. of bitstreams: 1 U0001-1708202001344200.pdf: 2395415 bytes, checksum: 6e39d99061d8e6a04246ea2777987228 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 摘要 i Abstract ii 目錄 iii 圖目錄 v 表目錄 vi 第1章 緒論 1 1.1 源起 1 1.2 分類簡介 2 1.3 文獻回顧 4 1.4 論文架構 6 第2章 網路架構 7 2.1 循環神經網路(Recurrent Neural Network, RNN) 7 2.2 長短期記憶網路(Long Short-Term Memory Network, LSTM) 8 2.3 Batch Normalization 11 2.4 Dropout 12 2.5 損失函數(Loss Function) 13 2.6 卷積神經網路 18 第3章 輸入資料的前處理及輸入之格式 19 3.1 中文斷詞介紹 20 3.2 刪除停用詞 20 3.3 建立字典 21 3.4 word2vec 22 第4章 實驗結果與討論 26 4.1 資料集 26 4.2 Baseline 使用關鍵字分類 26 4.3 不同斷詞方法的影響 30 4.4 刪除停用詞之影響 32 4.5 輸入文章內容長度的影響 32 4.6 輸入詞向量之維度 34 4.7 不同網路架構對分類之影響 34 4.8 不同損失函數的影響 35 4.9 Dropout之影響 36 4.10 探討各類別錯誤率 37 第5章 結論 40 參考文獻 42 | |
| dc.language.iso | zh-TW | |
| dc.title | 深度學習應用於部落格文章分類 | zh_TW |
| dc.title | Topic Classification of Blog Posts Using Deep Learning | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 張瑞益(Ray-I Chang),張恆華(Herng-Hua Chang),黃乾綱(Chien-Kang Huang) | |
| dc.subject.keyword | 自然語言處理,機器學習,社群網站,損失函數,斷詞系統, | zh_TW |
| dc.subject.keyword | Natural language processing,Machine learning,Social network,Loss function,Word segmentation system, | en |
| dc.relation.page | 43 | |
| dc.identifier.doi | 10.6342/NTU202003652 | |
| dc.rights.note | 未授權 | |
| dc.date.accepted | 2020-08-19 | |
| dc.contributor.author-college | 工學院 | zh_TW |
| dc.contributor.author-dept | 工程科學及海洋工程學研究所 | zh_TW |
| 顯示於系所單位: | 工程科學及海洋工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1708202001344200.pdf 未授權公開取用 | 2.34 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
