請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86206完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳靜枝(Ching-Chin Chern) | |
| dc.contributor.author | Yun-Wei Wu | en |
| dc.contributor.author | 吳昀蔚 | zh_TW |
| dc.date.accessioned | 2023-03-19T23:42:15Z | - |
| dc.date.copyright | 2022-09-06 | |
| dc.date.issued | 2022 | |
| dc.date.submitted | 2022-09-01 | |
| dc.identifier.citation | [1] Ali, R. H. and Linstead, E., 'Modeling topic exhaustion for programming languages on StackOverflow,' Proceedings of the 32nd International Conference on Software Engineering and Knowledge Engineering - SEKE '20, pp. 400-405, 2020. [2] Allamanis, M. and Sutton, C., 'Why, when, and what: analyzing Stack Overflow questions by topic, type, and code,' Proceedings of the 10th Working Conference on Mining Software Repositories - MSR '13, pp. 53-56, 2013. [3] Asaduzzaman, M., Mashiya, A. S., Roy, C. K. and Schneider K. A., 'Answering questions about unanswered questions of Stack Overflow,' Proceedings of the 10th Working Conference on Mining Software Repositories - MSR '13, pp. 97-100, 2013. [4] Bajaj, K., Pattabiraman, K. and Mesbah, A., 'Mining questions asked by web developers,' Proceedings of the 11th Working Conference on Mining Software Repositories - MSR '14, pp. 112-121, 2014. [5] Baltadzhieva, A and Chrupała, G., 'Predicting the quality of questions on Stackoverflow,' Proceedings of Recent Advances in Natural Language Processing, pp. 32-40, 2015. [6] Barua, A., Thomas, S. W. and Hassan, A. E., 'What are developers talking about? An analysis of topics and trends in Stack Overflow,' Empirical Software Engineering, Vol. 19, Issue 3, pp. 619–654, 2014. [7] Beyer, B. and Pinzger, M., 'Grouping Android tag synonyms on Stack Overflow,' Proceedings of the 13th IEEE/ACM Working Conference on Mining Software Repositories - MSR '16, pp. 430-440, 2016. [8] Beyer, S., Macho, C., Penta, M. D. and Pinzger, M., 'What kind of questions do developers ask on Stack Overflow? A comparison of automated approaches to classify posts into question categories,' Empirical Software Engineering, Vol. 25, Issue 3, pp. 2258-2301, 2020. [9] Biggers, L. R., Bocovich, C., Capshaw R., Eddy, B. P., Etzkorn, L. H. and Kraft, N. A., 'Configuring latent Dirichlet allocation based feature location,' Empirical Software Engineering, Vol. 19, Issue 3, pp. 465-500, 2014. [10] Blei, D. M., Ng, A. Y. and Jordan, M. I., 'Latent Dirichlet allocation,' Journal of Machine Learning Research, 3, pp. 993-1022, 2003. [11] Bonaccorsi, A. and Rossi, C., 'Why Open Source software can succeed,' Research Policy, Vol. 32, Issue 7, pp. 1243-1258, 2003. [12] Calefato, F., Lanubile, F., Marasciulo, M. C. and Novielli, N., 'Mining successful answers in Stack Overflow,' Proceedings of the 12th Working Conference on Mining Software Repositories - MSR '15, pp. 430-433, 2015. [13] Calefatoa, F., Lanubileb, F. and Novielli, N., 'How to ask for technical help? Evidence-based guidelines for writing questions on Stack Overflow,' Information and Software Technology, 94, pp. 186-207, 2018. [14] Chauhan, A., Verma, A., Deepanshi and Gurve, M., 'Tag based answer recommendation system,' Proceedings of the 2019 International Conference on Signal Processing and Communication - ICSC '19, pp. 195-201, 2019. [15] vChua, A. Y. K. and Banerjee, S., 'Answers or no answers: Studying question answerability in Stack Overflow,' Journal of Information Science, Vol. 41, Issue 5, pp. 720-731, 2015. [16] Do, L. N. Q., Wright, J. R. and Ali, K., 'Why do software developers use static analysis tools? A user-centered study of developer needs and motivations,' IEEE Transactions on Software Engineering, 2020. [17] Gajduk, A., Madjarov, G. and Gjorgjevikj, D., 'Intelligent tag grouping by using an agglomerative clustering algorithm,' Proceedings of the 10th Conference for Informatics and Information Technology - CIIT '13, pp. 94-96, 2013. [18] Griffiths, T. L. and Steyvers, M., 'Finding scientific topics,' Proceedings of the National Academy of Sciences - PNAS'04, vol. 101, Suppl. 1, pp. 5228-5235, 2004. [19] Haenni, N., Lungu, M., Schwarz, N. and Nierstrasz, O., 'Categorizing developer information needs in software ecosystems,' Proceedings of the 2013 International Workshop on Ecosystem Architectures - WEA '13, ACM, pp. 1-5, 2013. [20] Haenni, N., Lungu, M., Schwarz, N. and Nierstrasz, O., 'A quantitative analysis of developer information needs in software ecosystems,' Proceedings of the 2014 European Conference on Software Architecture Workshops - ECSAW '14, pp. 1-6, 2014. [21] Heinrich, G., 'Parameter estimation for text analysis,' Technical report, 2009. http://www.arbylon.net/publications/text-est2.pdf, accessed, 2022-06-02. [22] Hertzum, M. and Pejtersen, A. M., 'The information-seeking practices of engineers: searching for documents as well as for people,' Information Processing and Management, Vol. 36, Issue 5, pp. 761-778, 2000. [23] Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li1, Y. and Zhao, L., 'Latent Dirichlet allocation (LDA) and topic modeling: models, applications, a survey,' Multimedia Tools and Applications, Vol. 78, Issue 11, pp. 15169-15211, 2019. [24] Kramer, J., 'Python -the fastest growing programming language,' International Research Journal of Engineering and Technology (IRJET), Vol. 4, Issue 12, pp. 354- 357, 2017. [25] Lee, M., Jeon, S. and Song, M., 'Understanding user's interests in NoSQL Databases in Stack Overflow,' Proceedings of the 7th International Conference on Emerging Databases, Springer, pp. 128-137, 2018. [26] Lukins, S. K., Kraft, N. A. and Etzkorn, L. H., 'Bug localization using latent Dirichlet allocation,' Information and Software Technology, Vol. 52, Issue 9, pp. 972-990, 2010. [27] Martinez, M. and Lecomte, S., 'Discovering discussion topics about development of cross-platform mobile applications using a cross-compiler development framework,' 2017. [28] Mehrab, Z., Yousuf, R. B., Tahmid, I. A. and Shahriyar, R., 'Mining developer questions about major web frameworks,' Proceedings of the 14th International Conference on Web Information Systems and Technologies, pp. 191-198, 2018. [29] Mondal, S., Saifullah, C. M. K., Bhattacharjee, A., Rahman M. M. and Roy, C. K., 'Early detection and guidelines to improve unanswered questions on Stack Overflow,' Proceedings of the14th Innovations in Software Engineering Conference - ISEC '21, pp. 1-11, 2021. [30] Panichella, A., Dit, B., Oliveto, R., Penta, M. D., Poshynanyk, D. and Lucia, A. D., 'How to Effectively Use Topic Models for Software Engineering Tasks? An Approach Based on Genetic Algorithms,' in Proceedings of the 35th International Conference on Software Engineering - ICSE'13, pp. 522-531, 2013. [31] Panichella1, S. and Zaugg, N., 'An empirical investigation of relevant changes and automation needs in modern code review,' Empirical Software Engineering, Vol. 25, Issue 6, pp. 1-40, 2020. [32] Perens, B., 'The Open Source definition,' Linux Gazette, 1998. [33] Saha, R. K., Saha, A. K. and Perry, D. E., 'Toward understanding the causes of unanswered questions in software information sites- A case study of Stack Overflow,' Proceedings of the 2013 9th Joint Meeting on Foundations of Software Engineering - ESEC/FSE '13, ACM, pp. 663-666, 2013. [34] Sayeth-Saabith, A. L. S., Fareez, MMM and Vinothraj, T., 'Python current trend applications- an overview,' International Journal of Advance Engineering and Research Development, Vol. 6, Issue 10, pp. 6-12, 2019. [35] Shao, B. and Yan, J., 'Recommending answerers for Stack Overflow with LDA model,' Proceedings of the 12th Chinese Conference on ComputerSupported Cooperative Work and Social Computing - ChineseCSCW '17, ACM, pp. 80-86, 2017. [36] Shihab, E., 'What are mobile developers asking about? A large scale study using stack overflow,' Empirical Software Engineering, Vol. 21, Issue 3, pp. 1192-1223, 2015. [37] Treude, C., Barzilay, O. and Storey, M., 'How do programmers ask and answer questions on the web? (NIER track),' Proceedings of the 33rd International Conference on Software Engineering - ICSE '11, pp. 804-807, 2011. [38] Uddin, G. and Robillard, M. P., 'How API documentation fails,' IEEE Software, Vol. 32, Issue 4, pp. 68-75, 2015. [39] Kovalenko, V., Tintarev, N., Pasynkov, E., Bird, C. and Bacchelli, A., 'Does reviewer recommendation help developers?' IEEE Transactions on Software Engineering, Vol. 46, Issue 7, pp. 710-731, 2020. [40] Wallach, H. M., Mimno, D. and McCallum, A., 'Rethinking LDA: Why Priors Matter,' in Advances in Neural Information Processing Systems 22 - NIPS'09, pp. 1973-1981, 2009. [41] Wei, X. and Croft, W. B., 'Rethinking LDA: Why Priors Matter,' in Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval - SIGIR'06, pp. 178-185, 2006. [42] Zou, Y., Ye, T., Lu, Y., Mylopoulos, J. and Zhang, L, 'Learning to rank for question-oriented software text retrieval (T),' Proceedings of the 30th IEEE/ACM International Conference on Automated Software Engineering - ASE '15, pp. 1-11, 2015. [43] Apple Developer Forum, accessed 2021-12-05. https://developer.apple.com/forums/. [44] Bing, accessed 2021-12-08. https://www.bing.com. [45] Bytes, accessed 2021-10-20. https://bytes.com/topic/python/. [46] CSS search, accessed 2021-10-20. https://meyerweb.com/eric/thoughts/2006/10/24/css-search/. [47] Dev Community, accessed 2021-10-20. https://dev.to. [48] DevX, accessed 2021-10-20. https://www.devx.com. [49] DuckDuckGo, accessed 2021-12-08. https://duckduckgo.com. [50] GeeksforGeeks: History of Python, accessed 2021-10-19. https://www.geeksforgeeks.org/history-of-python/. [51] GENSIM LDA model, accessed 2022-03-29. https://radimrehurek.com/gensim/models/ldamodel.html. [52] GitHub Support Community, accessed 2021-12-05. https://github.community. [53] GO RAILS, accessed 2021-12-05. https://gorails.com/forum. [54] Google, accessed 2021-12-08. https://www.google.com. [55] IEEE Spectrum, accessed 2021-10-19. https://spectrum.ieee.org/top-programming-languages/. [56] Java Programming Forum, accessed 2021-10-20. https://www.javaprogrammingforums.com. [57] MICROSOFT Q&A, accessed 2021-12-05. https://docs.microsoft.com/en-us/answers/products/ [58] MICROSOFT Q&A Tags, accessed 2021-12-05. https://docs.microsoft.com/en- us/answers/topics.html. [59] NumPy, accessed 2021-12-05. https://numpy.org/doc/stable/. [60] Open Source Initiative, accessed 2021-10-19. https://opensource.org/docs/osd. [61] Oracle Communities, accessed 2021-12-05. https://community.oracle.com/tech/developers/categories/. [62] Python, accessed 2021-10-19. https://www.python.org/about/. [63] Python Discuss, accessed 2021-10-20. https://discuss.python.org. [64] Python Forum, accessed 2021-12-05. https://python-forum.io. [65] Search.Net, accessed 2021-10-20. https://searchdotnet.com/default.aspx. [66] SitePoint, accessed 2021-10-20. https://www.sitepoint.com/community/. [67] Stack Exchange sites FAQ, accessed 2021-12-08. https://meta.stackexchange.com/questions/7931/faq-for-stack-exchange-sites. [68] Stack Overflow, accessed 2021-10-20. https://stackoverflow.com/. [69] Stack Overflow Developer Survey, accessed 2021-12-07. [70] Stack Overflow Python Tag Trend, accessed 2021-12-04. https://insights.stackoverflow.com/trends?tags=python. [71] Stack Overflow Tags, accessed 2021-12-05. https://stackoverflow.com/tags. [72] StackExchange Data Explore, accessed 2022-03-29. https://data.stackexchange.com/stackoverflow/. [73] TechRepublic, accessed 2021-10-20. https://www.techrepublic.com/article/which- is-the-fastest-growing-programming-language-hint-its-not-javascript/. [74] YouTube Data API, accessed 2021-12-05. https://developers.google.com/youtube/v3/. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86206 | - |
| dc.description.abstract | 近年來,程式開發的問題愈來愈複雜,開發者傾向前往開發者社群平台(如: Stack Overflow),以尋求其他有經驗的開發者的協助。然而,此類社群平台回答率 卻越來越低,使開發者的資訊檢索日益困難,因此,本研究希望找出開發者最關鍵 的資訊需求並分析回答率日漸低落的原因,以作為未來資訊提供之指引。 本研究的實驗資料選用 2008 至 2021 年間,Stack Overflow 上與 Python 相關 的共 1,897,336 筆問答討論,我們利用隱含狄利克雷分布模式(Latent Dirichlet Allocation)對這些問題進行主題模型的訓練,並透過實驗選擇表現最佳的資料與參 數組合之模型,同時使用主題之標籤分布相似度驗證了該模型在分類問題上的有 效性,最終透過該模型擷取出其中最重要的四十個需求主題。 接著,我們利用這些訓練出的主題進行後續的分析,並獲得了以下結論:在針 對主題發展趨勢的分析中,我們發現討論度下降、過時的主題通常是內容與應用較 為固定而無變化的主題;而討論度上升的主題則是近年來興起的技術且大多與資 料分析、機器學習相關。再者,關於主題特性的分析使我們了解到困難的主題較為 熱門卻有較低的回答率,因此應被視為資訊需求最急迫的主題。最後,部分的提問 者擁有較高的被回答率,同時,擁有良好提問習慣 (如:附上程式碼及不濫用標籤 等)的提問者亦更可能獲得解答。 整體而言,本研究提供了數個關於程式開發者需求研究的方法與發現,我們期 望這些經驗可以有助於未來改善開發者的資訊檢索,同時為開發者營造一個更好 的工作環境。 | zh_TW |
| dc.description.abstract | As developing issues are getting complicated, programming developers tend to seek experienced developers in the programming communities such as Stack Overflow for help. However, the forum’s declining answer rate is making information retrieval more and more difficult. Thus, we aim to find developers’ critical needs and the reasons for the dropping answer rates to provide guidance for complementing related information. This study collects 1,897,336 Python-related posts on Stack Overflow and conducts topic model training using these posts and the Latent Dirichlet Allocation (LDA) model. Next, we conduct trials to select the most relevant datasets and parameters and verify the trained model’s effectiveness in categorizing posts using tag similarities. Finally, the forty most critical topics are extracted from the model and used in the following analysis. First, the topics’ trend analysis shows that topics with decreasing popularity have stable contents and applications. In contrast, the increasing topics have risen rapidly in the past decade and are mostly related to data analytics. Second, the topics’ feature tests reveal that difficult topics are more popular while having lower answer rates. Thus, the information needs on these topics should be considered the most urgent. Lastly, some of the askers have higher answered rates. Moreover, askers receive more solutions if they have good asking habits, such as attaching code snippets and not overusing tags. This research provides several methods and conclusions on developers’ needs. We expect that the findings in this research can be adopted to improve developers’ information needs, which results in a better working environment for developers. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-19T23:42:15Z (GMT). No. of bitstreams: 1 U0001-2508202206450300.pdf: 4460632 bytes, checksum: 80058d11556b1241930107091e768aa0 (MD5) Previous issue date: 2022 | en |
| dc.description.tableofcontents | 論文口試委員審定書.............................................. i 謝辭 ..................................................................... ii 論文摘要............................................................... iii THESIS ABSTRACT............................................... iv Content................................................................. v List of Tables ........................................................ viii List of Figures ....................................................... x Chapter 1 Introduction .......................................... 1 1.1 Background and Motivation ............................. 1 1.2 Research Objectives ....................................... 4 Chapter 2 Literature Review ................................. 6 2.1 Information Access .......................................... 6 2.2 Communities.................................................... 8 2.3 Stack Overflow (SO) ....................................... 10 2.4 Analysis of Developers’ Information Needs..... 12 2.5 Latent Dirichlet Allocation (LDA) .................... 15 2.6 Reasons for Needs Being Fulfilled .................. 18 2.7 Python ............................................................ 20 Chapter 3 Problem Description ............................ 22 3.1 Problem Description ....................................... 22 3.2 Topic Modeling................................................ 23 3.3 Topics Extraction and Validation..................... 24 3.4 Trends of Topics ............................................ 26 3.5 Features of Topics.......................................... 27 3.6 Askers’ Backgrounds ..................................... 29 Chapter 4 Research Methods .............................. 34 4.1 Data Source .................................................... 34 4.2 Data Collection and Preprocessing................. 36 4.3 Topic Modeling............................................... 44 4.4 Topics Extraction and Validation..................... 48 4.5 Trends of Topics ............................................. 51 4.6 Features of Topics........................................... 53 4.7 Askers’ backgrounds....................................... 55 Chapter 5 Experiments and Results ..................... 59 5.1 Datasets........................................................... 59 5.2 Model Selection .............................................. 61 5.3 Model Validation ............................................. 64 5.4 Topics Trends.................................................. 70 5.5 Topics Features .............................................. 77 5.6 Askers’ Backgrounds ...................................... 84 5.7 Managerial Implication .................................... 88 Chapter 6 Conclusion and Future Work ................ 92 6.1 Conclusion ...................................................... 92 6.2 Future Work .................................................... 93 Reference ............................................................. 96 Appendix A Performances of Candidate Datasets and Parameters ...... 103 Appendix B Results of Topic Trend ....................... 104 Appendix C Topic Trend Charts ............................ 105 Appendix D Normality Tests for Topic Features.... 109 | |
| dc.language.iso | zh-TW | |
| dc.subject | 隱含狄利克雷分布 | zh_TW |
| dc.subject | Python | zh_TW |
| dc.subject | 趨勢分析 | zh_TW |
| dc.subject | 開發社群 | zh_TW |
| dc.subject | 主題模型 | zh_TW |
| dc.subject | Programming Community | en |
| dc.subject | Topic Modeling | en |
| dc.subject | Python | en |
| dc.subject | Latent Dirichlet Allocation | en |
| dc.subject | Trend Analysis | en |
| dc.title | 運用社群探勘技術探討程式開發者的資訊需求與需求未被滿足之原因 | zh_TW |
| dc.title | Mining Program Developers’ Information Needs and Reasons for Needs Being Unfulfilled in Programming Communities | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 蕭鉢(Bo Hsiao),黃奎隆(Kwei-Long Huang) | |
| dc.subject.keyword | 主題模型,隱含狄利克雷分布,開發社群,趨勢分析,Python, | zh_TW |
| dc.subject.keyword | Topic Modeling,Latent Dirichlet Allocation,Programming Community,Trend Analysis,Python, | en |
| dc.relation.page | 109 | |
| dc.identifier.doi | 10.6342/NTU202202794 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2022-09-01 | |
| dc.contributor.author-college | 管理學院 | zh_TW |
| dc.contributor.author-dept | 資訊管理學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-09-06 | - |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2508202206450300.pdf | 4.36 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
