藉由依賴關係與描述文件預測軟體下載量

魏立昇; Li-Sheng Wei

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95748

標題:	藉由依賴關係與描述文件預測軟體下載量 Forecasting Software Downloads with Dependencies and Descriptions
作者:	魏立昇 Li-Sheng Wei
指導教授:	盧信銘 Hsin-Min Lu
關鍵字:	軟體熱門程度,時間序列預測,圖卷積網路,多圖卷積網路,BERT, Software popularity,time series forecasting,graph convolutional network,multi-graph convolutional network,BERT,
出版年 :	2024
學位:	碩士
摘要:	本研究旨在通過利用套件之間的依賴關係以及套件描述文件之間的相似性，預測開源軟體套件在未來一段時間內的下載量變化。在開源軟體生態系中，軟體的熱門程度是開發者用來評估套件對使用者吸引力的重要指標。相較於套件的使用情況與評分，下載量更容易取得且更能反映套件的熱門程度。　　開源套件的下載次數與影響力有關，若我們能更加精確地預測開源套件下載量變化，就可能提前識別哪些套件將成為具有更大影響力的關鍵套件。為開源套件開發者提供寶貴的市場洞察，並支持他們在開發與改進套件時做出更明智的決策，從而促進開源項目的成功與持續發展。　　然而，傳統的下載量預測研究將其視為產品生命週期預測問題，主要利用下載量變化曲線進行預測。這類方法忽略了套件的上下游關係，以及透過套件描述文件獲取有用資訊以提升模型預測能力的途徑。　　在本研究中我們試圖填補這一領域的研究空白。藉由引入多圖卷積模型，通過建構基於套件依賴關係、下載量變化相似度和描述文件相似度的圖，提高下載量預測的準確性與有效性。　　為了研究如何透過套件之間的隱性關係提升套件下載量預測模型的準確性，我們收集了PyPI上熱門套件的月下載量資料、套件的依賴關係與描述文件，用以建構、訓練並驗證我們的模型。　　最後，本研究在軟體熱門程度變化預測領域做出兩方面的貢獻。首先，我們的研究創新性的將軟體下載量預測視為一種類似交通流量預測的任務，並透過應用多圖卷積模型有效增強預測的準確性，替未來的開源軟體套件下載量預測研究提供了新的思路。其次，我們的模型在最熱門的前30%（約900個）套件中，取得了優於傳統方法的預測結果。　　通過實驗驗證，我們的模型比表現最佳的Baseline方法降低了約13.5%的RMSE，並在最熱門的900個套件的下載量預測方面大幅優於傳統方法，包含Gated Recurrent Unit (GRU)模型，為未來的研究提供了有價值的參考和啟發。 This study aims to forecast download changes of open-source software packages by leveraging the dependencies between packages and the similarities between package descriptions. In the open-source software ecosystem, the popularity of a software package is a crucial metric for developers to evaluate its appeal to users. Compared to usage data and ratings, download volume is more easily accessible and better reflects the popularity of a package. The download counts of open-source packages are closely related to their influence. By accurately forecasting the changes in download volumes, we can identify which packages are likely to become key influential packages in advance. This provides valuable market insights for open-source package developers and supports them in making more informed decisions during the development and improvement of their packages, thereby promoting the success and sustainable development of open-source projects. However, past download volume forecasting research regards it as a product life cycle prediction problem and mainly uses the download volume change curve for forecasting. This type of method ignores the upstream and downstream relationships of the package and the way to obtain useful information through the package description to improve the model's forecasting capabilities. In this study we attempt to fill the research gap in this area. By introducing a multi-graph convolution model, the accuracy and effectiveness of download volume forecasting are improved by constructing a graph based on package dependencies, downloads trend similarity, and description similarity. In order to study how to improve the accuracy of the package download forecast model through the implicit relationships between packages, we collected monthly download data of popular packages on PyPI, package dependencies and descriptions to construct, train and verify our model. Finally, this study makes two contributions in the field of software popularity forecasting. First, our research innovatively treats software download forecasting as a task similar to traffic flow forecasting, and effectively enhances the accuracy of forecast by applying a multi-graph convolution model. It provides new ideas for future research on downloads forecasting of open source software packages. Second, our model achieves significantly better forecast results than traditional methods in the top 30% (about 900) of the most popular packages. Through experimental verification, our model reduces the RMSE by about 13.5% compared with the best-performing Baseline method, and significantly outperforms traditional methods, including the basic Gated Recurrent Unit (GRU) model, which provides valuable reference and inspiration for future research.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95748
DOI:	10.6342/NTU202402450
全文授權:	同意授權(限校園內公開)
電子全文公開日期:	2025-09-01
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 授權僅限NTU校內IP使用（校園外請利用VPN校外連線服務）	2.19 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。