Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19454
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor丁肇隆(Chao-Lung Ting)
dc.contributor.authorPei-Chun Linen
dc.contributor.author林佩君zh_TW
dc.date.accessioned2021-06-08T01:59:48Z-
dc.date.copyright2020-08-21
dc.date.issued2020
dc.date.submitted2020-08-18
dc.identifier.citation1. Ou, G. and Y.L.J.P.R. Murphey, Multi-class pattern classification using neural networks. 2007. 40(1): p. 4-18.
2. Read, J., et al., Classifier chains for multi-label classification. 2011. 85(3): p. 333.
3. Tsoumakas, G., I.J.I.J.o.D.W. Katakis, and Mining, Multi-label classification: An overview. 2007. 3(3): p. 1-13.
4. Deng, L.J.I.S.P.M., The mnist database of handwritten digit images for machine learning research [best of the web]. 2012. 29(6): p. 141-142.
5. Manning, C.D., C.D. Manning, and H. Schütze, Foundations of statistical natural language processing. 1999: MIT press.
6. Bakshi, R.K., et al. Opinion mining and sentiment analysis. in 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom). 2016. IEEE.
7. Husby, S. and D. Barbosa. Topic classification of blog posts using distant supervision. in Proceedings of the Workshop on Semantic Analysis in Social Media. 2012.
8. CHEN, Q.-h., et al., Research on Chinese micro-blog sentiment classification based on recurrent neural network. 2017(cst).
9. Ye, F., Sentiment Classification for Chinese Micro-blog Based on the Extension of Network Terms Feature, in Advances in Computer and Computational Sciences. 2018, Springer. p. 231-241.
10. Ma, W.-Y. and K.-J. Chen. Introduction to CKIP Chinese word segmentation system for the first international Chinese Word Segmentation Bakeoff. in Proceedings of the second SIGHAN workshop on Chinese language processing. 2003.
11. Srivastava, N.J.U.o.T., Improving neural networks with dropout. 2013. 182(566): p. 7.
12. Zhou, P., et al., Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. 2016.
13. Kingma, D.P. and J.J.a.p.a. Ba, Adam: A method for stochastic optimization. 2014.
14. 邱建晴, 以卷積神經網路分析部落格社群網站垃圾文章. 2016(2016 年): p. 1-68.
15. Kim, Y.J.a.p.a., Convolutional neural networks for sentence classification. 2014.
16. Hochreiter, S. and J.J.N.c. Schmidhuber, Long short-term memory. 1997. 9(8): p. 1735-1780.
17. Kalchbrenner, N., E. Grefenstette, and P.J.a.p.a. Blunsom, A convolutional neural network for modelling sentences. 2014.
18. Yao, Y. and Z. Huang. Bi-directional LSTM recurrent neural network for Chinese word segmentation. in International Conference on Neural Information Processing. 2016. Springer.
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/19454-
dc.description.abstract由於網路每天有巨量文章產出,所以正確的文章分類,可以加速讀者在閱讀搜尋上的效率。據痞客邦網站的統計,有近50%的部落格文章未勾選文章所屬類別。本論文提出一自定義損失函數,協助提高這類的文章來進行正確的主題分類。經過本論文所提出之分類系統,可協助痞客邦系統後台自動得知該文章之主題分類 。
文章分別以Jieba斷詞系統及CKIP斷詞系統進行斷詞,實驗結果發現使用Jieba斷詞系統之分類正確率為92.60%,而使用CKIP斷詞系統之正確率為93.35%,顯示繁體中文文章在分類分析時,CKIP斷詞系統為輸入文章斷詞之首選。
斷詞後的文章經過預先訓練的詞向量進行編碼,編碼後輸入長短期記憶模型或卷積神經網路進行訓練。訓練時使用自定義之損失函數,其結果之正確率為93.35%,比傳統使用之損失函數之正確率92.98%有更好的成效。顯示本論文提出之自定義損失函數,可協助部落格文章進行更準確之分類。
zh_TW
dc.description.abstractDue to the huge amount of articles produced on the Internet every day, well-organized article labels can help improve user experience in reading and searching. However, according to the statistics of the Pixnet website, nearly 50% of blog posts are not being labeled by the author. To address this problem, our paper proposes a custom loss function to provide an automatic article labeling system in the website back end. Through this labeling system we can automatically assign accurate labels onto those articles without a label.
We use Jieba word segmentation system and CKIP word segmentation system to segment articles. The experimental result in our study shows that the classification accuracy of the Jieba system is 92.60%, and the accuracy of the CKIP system is 93.35%. Thus, for traditional Chinese characters, the CKIP system is the first choice in word segmentation.
After word segmentation, the articles are coded by pre-trained word vectors, and after encoding, they are input into Long Short-Term Memory models or Convolutional Neural Networks for training. When using our custom loss function during training, the accuracy of the result is 93.35%, which is better than the accuracy of 92.98% of the categorical_crossentropy loss function. In conclusion, our custom loss function proposed in this paper can help blog articles to be classified automatically and accurately.
en
dc.description.provenanceMade available in DSpace on 2021-06-08T01:59:48Z (GMT). No. of bitstreams: 1
U0001-1708202001344200.pdf: 2395415 bytes, checksum: 6e39d99061d8e6a04246ea2777987228 (MD5)
Previous issue date: 2020
en
dc.description.tableofcontents摘要 i
Abstract ii
目錄 iii
圖目錄 v
表目錄 vi
第1章 緒論 1
1.1 源起 1
1.2 分類簡介 2
1.3 文獻回顧 4
1.4 論文架構 6
第2章 網路架構 7
2.1 循環神經網路(Recurrent Neural Network, RNN) 7
2.2 長短期記憶網路(Long Short-Term Memory Network, LSTM) 8
2.3 Batch Normalization 11
2.4 Dropout 12
2.5 損失函數(Loss Function) 13
2.6 卷積神經網路 18
第3章 輸入資料的前處理及輸入之格式 19
3.1 中文斷詞介紹 20
3.2 刪除停用詞 20
3.3 建立字典 21
3.4 word2vec 22
第4章 實驗結果與討論 26
4.1 資料集 26
4.2 Baseline 使用關鍵字分類 26
4.3 不同斷詞方法的影響 30
4.4 刪除停用詞之影響 32
4.5 輸入文章內容長度的影響 32
4.6 輸入詞向量之維度 34
4.7 不同網路架構對分類之影響 34
4.8 不同損失函數的影響 35
4.9 Dropout之影響 36
4.10 探討各類別錯誤率 37
第5章 結論 40
參考文獻 42
dc.language.isozh-TW
dc.title深度學習應用於部落格文章分類zh_TW
dc.titleTopic Classification of Blog Posts Using Deep Learningen
dc.typeThesis
dc.date.schoolyear108-2
dc.description.degree碩士
dc.contributor.oralexamcommittee張瑞益(Ray-I Chang),張恆華(Herng-Hua Chang),黃乾綱(Chien-Kang Huang)
dc.subject.keyword自然語言處理,機器學習,社群網站,損失函數,斷詞系統,zh_TW
dc.subject.keywordNatural language processing,Machine learning,Social network,Loss function,Word segmentation system,en
dc.relation.page43
dc.identifier.doi10.6342/NTU202003652
dc.rights.note未授權
dc.date.accepted2020-08-19
dc.contributor.author-college工學院zh_TW
dc.contributor.author-dept工程科學及海洋工程學研究所zh_TW
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
U0001-1708202001344200.pdf
  未授權公開取用
2.34 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved