請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95748完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 盧信銘 | zh_TW |
| dc.contributor.advisor | Hsin-Min Lu | en |
| dc.contributor.author | 魏立昇 | zh_TW |
| dc.contributor.author | Li-Sheng Wei | en |
| dc.date.accessioned | 2024-09-16T16:14:16Z | - |
| dc.date.available | 2024-09-17 | - |
| dc.date.copyright | 2024-09-16 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-08-01 | - |
| dc.identifier.citation | Alsmadi, I., & Alazzam, I. (2017). Software attributes that impact popularity. 2017 8Th International Conference on Information Technology (ICIT),
Bai, S., Kolter, J. Z., & Koltun, V. (2018). An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271. Banks, D., Leonard, C., Narayan, S., Thompson, N., Kramer, B., & Korkmaz, G. (2022). Measuring the Impact of Open Source Software Innovation Using Network Analysis on GitHub Hosted Python Packages. 2022 Systems and Information Engineering Design Symposium (SIEDS), Borges, H., Valente, M. T., Hora, A., & Coelho, J. (2015). On the popularity of GitHub applications: A preliminary note. arXiv preprint arXiv:1507.00604. Box, G. E., & Jenkins, G. M. (1968). Some recent advances in forecasting and control. Journal of the Royal Statistical Society. Series C (Applied Statistics), 17(2), 91-109. Castán-Lascorz, M., Jiménez-Herrera, P., Troncoso, A., & Asencio-Cortés, G. (2022). A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting. Information Sciences, 586, 611-627. Datta, D., & Kajanan, S. (2013). Do app launch times impact their subsequent commercial success? an analytical approach. 2013 International Conference on Cloud Computing and Big Data, Dey, T., & Mockus, A. (2018). Are software dependency supply chain metrics useful in predicting change of popularity of npm packages? Proceedings of the 14th International Conference on Predictive Models and data analytics in software engineering, Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv preprint arXiv:2203.05794. Kipf, T. N., & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907. Lim, B., & Zohren, S. (2021). Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194), 20200209. Lv, M., Hong, Z., Chen, L., Chen, T., Zhu, T., & Ji, S. (2020). Temporal multi-graph convolutional network for traffic flow prediction. IEEE Transactions on Intelligent Transportation Systems, 22(6), 3337-3348. Olivares, K. G., Challu, C., Marcjasz, G., Weron, R., & Dubrawski, A. (2023). Neural basis expansion analysis with exogenous variables: Forecasting electricity prices with NBEATSx. International Journal of Forecasting, 39(2), 884-900. Qiu, S., Kula, R. G., & Inoue, K. (2018). Understanding popularity growth of packages in JavaScript package ecosystem. 2018 IEEE International Conference on Big Data, Cloud Computing, Data Science & Engineering (BCD), Salinas, D., Flunkert, V., Gasthaus, J., & Januschowski, T. (2020). DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3), 1181-1191. Sigg, S., Lagerspetz, E., Peltonen, E., Nurmi, P., & Tarkoma, S. (2019). Exploiting usage to predict instantaneous app popularity: Trend filters and retention rates. ACM Transactions on the Web (TWEB), 13(2), 1-25. Standish, T. A. (1984). An essay on software reuse. IEEE Transactions on Software Engineering(5), 494-497. Wang, T., Wu, D., Zhang, J., Chen, M., & Zhou, Y. (2016). Measuring and analyzing third-party mobile game app stores in china. IEEE Transactions on Network and Service Management, 13(4), 793-805. Wu, N., Green, B., Ben, X., & O'Banion, S. (2020). Deep transformer models for time series forecasting: The influenza prevalence case. arXiv preprint arXiv:2001.08317. Yu, B., Yin, H., & Zhu, Z. (2017). Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. arXiv preprint arXiv:1709.04875. Zhu, H., Xiong, H., Ge, Y., & Chen, E. (2014). Discovery of ranking fraud for mobile apps. IEEE Transactions on Knowledge and Data Engineering, 27(1), 74-87. 吳禹辰. (2023). 運用長短期記憶模型及對比學習於產品生命週期預測 國立臺灣大學. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/95748 | - |
| dc.description.abstract | 本研究旨在通過利用套件之間的依賴關係以及套件描述文件之間的相似性,預測開源軟體套件在未來一段時間內的下載量變化。在開源軟體生態系中,軟體的熱門程度是開發者用來評估套件對使用者吸引力的重要指標。相較於套件的使用情況與評分,下載量更容易取得且更能反映套件的熱門程度。
開源套件的下載次數與影響力有關,若我們能更加精確地預測開源套件下載量變化,就可能提前識別哪些套件將成為具有更大影響力的關鍵套件。為開源套件開發者提供寶貴的市場洞察,並支持他們在開發與改進套件時做出更明智的決策,從而促進開源項目的成功與持續發展。 然而,傳統的下載量預測研究將其視為產品生命週期預測問題,主要利用下載量變化曲線進行預測。這類方法忽略了套件的上下游關係,以及透過套件描述文件獲取有用資訊以提升模型預測能力的途徑。 在本研究中我們試圖填補這一領域的研究空白。藉由引入多圖卷積模型,通過建構基於套件依賴關係、下載量變化相似度和描述文件相似度的圖,提高下載量預測的準確性與有效性。 為了研究如何透過套件之間的隱性關係提升套件下載量預測模型的準確性,我們收集了PyPI上熱門套件的月下載量資料、套件的依賴關係與描述文件,用以建構、訓練並驗證我們的模型。 最後,本研究在軟體熱門程度變化預測領域做出兩方面的貢獻。首先,我們的研究創新性的將軟體下載量預測視為一種類似交通流量預測的任務,並透過應用多圖卷積模型有效增強預測的準確性,替未來的開源軟體套件下載量預測研究提供了新的思路。其次,我們的模型在最熱門的前30%(約900個)套件中,取得了優於傳統方法的預測結果。 通過實驗驗證,我們的模型比表現最佳的Baseline方法降低了約13.5%的RMSE,並在最熱門的900個套件的下載量預測方面大幅優於傳統方法,包含Gated Recurrent Unit (GRU)模型,為未來的研究提供了有價值的參考和啟發。 | zh_TW |
| dc.description.abstract | This study aims to forecast download changes of open-source software packages by leveraging the dependencies between packages and the similarities between package descriptions. In the open-source software ecosystem, the popularity of a software package is a crucial metric for developers to evaluate its appeal to users. Compared to usage data and ratings, download volume is more easily accessible and better reflects the popularity of a package.
The download counts of open-source packages are closely related to their influence. By accurately forecasting the changes in download volumes, we can identify which packages are likely to become key influential packages in advance. This provides valuable market insights for open-source package developers and supports them in making more informed decisions during the development and improvement of their packages, thereby promoting the success and sustainable development of open-source projects. However, past download volume forecasting research regards it as a product life cycle prediction problem and mainly uses the download volume change curve for forecasting. This type of method ignores the upstream and downstream relationships of the package and the way to obtain useful information through the package description to improve the model's forecasting capabilities. In this study we attempt to fill the research gap in this area. By introducing a multi-graph convolution model, the accuracy and effectiveness of download volume forecasting are improved by constructing a graph based on package dependencies, downloads trend similarity, and description similarity. In order to study how to improve the accuracy of the package download forecast model through the implicit relationships between packages, we collected monthly download data of popular packages on PyPI, package dependencies and descriptions to construct, train and verify our model. Finally, this study makes two contributions in the field of software popularity forecasting. First, our research innovatively treats software download forecasting as a task similar to traffic flow forecasting, and effectively enhances the accuracy of forecast by applying a multi-graph convolution model. It provides new ideas for future research on downloads forecasting of open source software packages. Second, our model achieves significantly better forecast results than traditional methods in the top 30% (about 900) of the most popular packages. Through experimental verification, our model reduces the RMSE by about 13.5% compared with the best-performing Baseline method, and significantly outperforms traditional methods, including the basic Gated Recurrent Unit (GRU) model, which provides valuable reference and inspiration for future research. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-09-16T16:14:16Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-09-16T16:14:16Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
中文摘要 ii ABSTRACT iv 目次 vi 圖次 ix 表次 x Chapter 1 引言 1 Chapter 2 文獻回顧 4 2.1 軟體熱門程度 4 2.2 時間序列預測 5 2.2.1 基於深度學習的時間序列預測方法 6 2.2.2 使用圖的時間序列預測方法 7 2.2.2.1 T-MGCN 8 2.2.2.2 STGCN 13 2.3 分群模型 13 2.4 文獻回顧總結 15 Chapter 3 研究缺口與研究問題 16 Chapter 4 系統設計 16 4.1 模型 16 4.1.1 T-MGCN-MIX網路結構 17 4.1.2 雙向縮放層 18 4.1.3 輸入層 18 4.1.4 圖卷積層 20 4.1.4.1 依賴關係圖(dependency) 20 4.1.4.2 下載量變化模式圖(DTW) 21 4.1.4.3 描述文件相似度圖(Description) 22 4.1.4.4 軟體套件功能主題相似圖(Topic) 24 4.1.4.5 多圖結果合併 25 4.1.5 遞歸層 25 4.1.6 輸出層 26 Chapter 5 實驗設計 27 5.1 資料集 27 5.1.1 資料收集 27 5.1.2 資料分析 28 5.1.2.1 套件下載量 28 5.1.2.2 套件描述文件 31 5.1.2.3 套件維護者 32 5.1.2.4 套件依賴關係 33 5.1.2.5 套件分群結果 35 5.2 參數設定 37 5.3 實驗方法 37 5.3.1 模型對比 38 5.3.2 BASELINE 39 5.3.2.1 ARIMA 39 5.3.2.2 VAR 40 5.3.2.3 GRU 40 5.3.3 比較不同熱門度區間的模型預測性能 41 Chapter 6 實驗結果 41 6.1 實驗結果與討論 41 6.1.1 單圖預測結果 45 Chapter 7 結論與未來展望 49 REFERENCE 50 | - |
| dc.language.iso | zh_TW | - |
| dc.title | 藉由依賴關係與描述文件預測軟體下載量 | zh_TW |
| dc.title | Forecasting Software Downloads with Dependencies and Descriptions | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 112-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 陳建錦;林怡伶 | zh_TW |
| dc.contributor.oralexamcommittee | Chien Chin Chen;Yi-ling Lin | en |
| dc.subject.keyword | 軟體熱門程度,時間序列預測,圖卷積網路,多圖卷積網路,BERT, | zh_TW |
| dc.subject.keyword | Software popularity,time series forecasting,graph convolutional network,multi-graph convolutional network,BERT, | en |
| dc.relation.page | 52 | - |
| dc.identifier.doi | 10.6342/NTU202402450 | - |
| dc.rights.note | 同意授權(限校園內公開) | - |
| dc.date.accepted | 2024-08-05 | - |
| dc.contributor.author-college | 管理學院 | - |
| dc.contributor.author-dept | 資訊管理學系 | - |
| dc.date.embargo-lift | 2025-09-01 | - |
| 顯示於系所單位: | 資訊管理學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf 授權僅限NTU校內IP使用(校園外請利用VPN校外連線服務) | 2.19 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
