Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 管理學院
  3. 資訊管理學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86206
Title: 運用社群探勘技術探討程式開發者的資訊需求與需求未被滿足之原因
Mining Program Developers’ Information Needs and Reasons for Needs Being Unfulfilled in Programming Communities
Authors: Yun-Wei Wu
吳昀蔚
Advisor: 陳靜枝(Ching-Chin Chern)
Keyword: 主題模型,隱含狄利克雷分布,開發社群,趨勢分析,Python,
Topic Modeling,Latent Dirichlet Allocation,Programming Community,Trend Analysis,Python,
Publication Year : 2022
Degree: 碩士
Abstract: 近年來,程式開發的問題愈來愈複雜,開發者傾向前往開發者社群平台(如: Stack Overflow),以尋求其他有經驗的開發者的協助。然而,此類社群平台回答率 卻越來越低,使開發者的資訊檢索日益困難,因此,本研究希望找出開發者最關鍵 的資訊需求並分析回答率日漸低落的原因,以作為未來資訊提供之指引。 本研究的實驗資料選用 2008 至 2021 年間,Stack Overflow 上與 Python 相關 的共 1,897,336 筆問答討論,我們利用隱含狄利克雷分布模式(Latent Dirichlet Allocation)對這些問題進行主題模型的訓練,並透過實驗選擇表現最佳的資料與參 數組合之模型,同時使用主題之標籤分布相似度驗證了該模型在分類問題上的有 效性,最終透過該模型擷取出其中最重要的四十個需求主題。 接著,我們利用這些訓練出的主題進行後續的分析,並獲得了以下結論:在針 對主題發展趨勢的分析中,我們發現討論度下降、過時的主題通常是內容與應用較 為固定而無變化的主題;而討論度上升的主題則是近年來興起的技術且大多與資 料分析、機器學習相關。再者,關於主題特性的分析使我們了解到困難的主題較為 熱門卻有較低的回答率,因此應被視為資訊需求最急迫的主題。最後,部分的提問 者擁有較高的被回答率,同時,擁有良好提問習慣 (如:附上程式碼及不濫用標籤 等)的提問者亦更可能獲得解答。 整體而言,本研究提供了數個關於程式開發者需求研究的方法與發現,我們期 望這些經驗可以有助於未來改善開發者的資訊檢索,同時為開發者營造一個更好 的工作環境。
As developing issues are getting complicated, programming developers tend to seek experienced developers in the programming communities such as Stack Overflow for help. However, the forum’s declining answer rate is making information retrieval more and more difficult. Thus, we aim to find developers’ critical needs and the reasons for the dropping answer rates to provide guidance for complementing related information. This study collects 1,897,336 Python-related posts on Stack Overflow and conducts topic model training using these posts and the Latent Dirichlet Allocation (LDA) model. Next, we conduct trials to select the most relevant datasets and parameters and verify the trained model’s effectiveness in categorizing posts using tag similarities. Finally, the forty most critical topics are extracted from the model and used in the following analysis. First, the topics’ trend analysis shows that topics with decreasing popularity have stable contents and applications. In contrast, the increasing topics have risen rapidly in the past decade and are mostly related to data analytics. Second, the topics’ feature tests reveal that difficult topics are more popular while having lower answer rates. Thus, the information needs on these topics should be considered the most urgent. Lastly, some of the askers have higher answered rates. Moreover, askers receive more solutions if they have good asking habits, such as attaching code snippets and not overusing tags. This research provides several methods and conclusions on developers’ needs. We expect that the findings in this research can be adopted to improve developers’ information needs, which results in a better working environment for developers.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/86206
DOI: 10.6342/NTU202202794
Fulltext Rights: 同意授權(全球公開)
metadata.dc.date.embargo-lift: 2022-09-06
Appears in Collections:資訊管理學系

Files in This Item:
File SizeFormat 
U0001-2508202206450300.pdf4.36 MBAdobe PDFView/Open
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved