請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96071完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 吳文方 | zh_TW |
| dc.contributor.advisor | Wen-Fang Wu | en |
| dc.contributor.author | 鄭琳澂 | zh_TW |
| dc.contributor.author | Lin-Cheng Cheng | en |
| dc.date.accessioned | 2024-10-11T16:06:30Z | - |
| dc.date.available | 2024-10-12 | - |
| dc.date.copyright | 2024-10-11 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-10-07 | - |
| dc.identifier.citation | [1]Jordan, M. I., & Mitchell, T. M. (2015). Machine learning: Trends, perspectives, and prospects. Science, 349(6245), 255-260. https://doi.org/10.1126/science.aaa8415
[2]Zhuhadar, L. P., & Lytras, M. D. (2023). The application of AutoML techniques in diabetes diagnosis: Current approaches, performance, and future directions. Sustainability, 15(18), 13484. https://doi.org/10.3390/su151813484 [3]Sarker, I. H. (2021). Machine learning: Algorithms, real-world applications and research directions. SN Computer Science, 2, 160. https://doi.org/10.1007/s42979-021-00592-x [4]Dietterich, T.G. (2000). Ensemble methods in machine learning. In Multiple classifier systems. MCS 2000. Lecture Notes in Computer Science, 1857, 1-15. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45014-9_1 [5]Özöğür-Akyüz, S., Erdogan, B., Yıldız, Ö., & Karadayı Ataş, P. (2022). A novel hybrid house price prediction model. Computational Economics, 62, 1-18. https://doi.org/10.1007/s10614-022-10298-8 [6]Ho, W. K. O., Tang, B. S., & Wong, S. W. (2020). Predicting property prices with machine learning algorithms. Journal of Property Research, 38(1), 48-70. https://doi.org/10.1080/09599916.2020.1832558 [7]Yacim, J. A., & Boshoff, D. G. B. (2020). Neural networks support vector machine for mass appraisal of properties. Property Management, 38(2), 241-272. https://doi.org/10.1108/PM-09-2019-0053 [8]Zhou, W.-X., & Sornette, D. (2008). Analysis of the real estate market in Las Vegas: Bubble, seasonal patterns, and prediction of the CSW indices. Physica A: Statistical Mechanics and its Applications, 387(1), 243-260. [9]Huang, Y. (2019). Predicting home value in California, United States via machine learning modeling. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic.v7i1.435 [10]Furia, P., & Khandare, A. (2022). Real estate price prediction using machine learning algorithms. Advanced analytics and deep learning models. https://doi.org/10.1002/9781119792437.ch2 [11]Zhao, Y., Chetty, G., & Tran, D. (2019). Deep learning with XGBoost for real estate appraisal. In 2019 IEEE Symposium Series on Computational Intelligence (SSCI), 1396-1401. https://doi.org/10.1109/SSCI44817.2019.9002790 [12]邱國祥(2020)。 以多元線性迴歸與機器學習模型預估不動產價格 - 以台中市實價登錄為例。(碩士論文。國立中興大學)臺灣博碩士論文知識加值系統。 https://hdl.handle.net/11296/nd3u33。 [13]Velthorst, M., & Guven, C. (2019). Predicting housing market trends using Twitter data. In Proceedings of the 2019 IEEE International Conference on Smart Data Services (SDS), 113-118. https://doi.org/10.1109/SDS.2019.00010 [14]Greenaway-McGrevy, R., & Phillips, P. C. B. (2021). House prices and affordability. New Zealand Economic Papers, 55(1), 1-6. https://doi.org/10.1080/00779954.2021.1878328 [15]McGurk, Z. (2019). US real estate inflation prediction: Exchange rates and net foreign assets. The Quarterly Review of Economics and Finance, 75. https://doi.org/10.1016/j.qref.2019.04.004 [16]Li, L., & Chu, K.-H. (2017). Prediction of real estate price variation based on economic parameters. In Proceedings of the 2017 IEEE International Conference on Applied System Innovation (ICASI), 87-90. https://doi.org/10.1109/ICASI.2017.7988353 [17]Bin, Y., & Weijun, L. (2016). Research on prediction methods of residential real estate price based on improved BPNN. In Proceedings of the 2016 IEEE International Conference on Smart Grid and Electrical Automation (ICSGEA), 150-156. https://doi.org/10.1109/ICSGEA.2016.45 [18]Liu, G., & Zong, X. (2017). Research of second-hand real estate price forecasting based on data mining. In Proceedings of the 2017 IEEE 2nd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC),1675-1679. https://doi.org/10.1109/ITNEC.2017.8285080 [19]Hoxha, V. (2024). Comparative analysis of machine learning models in predicting housing prices: A case study of Prishtina's real estate market. International Journal of Housing Markets and Analysis. https://doi.org/10.1108/IJHMA-09-2023-0120 [20]Borde, S., Rane, A., Shende, G., & Shetty, S. (2017). Real estate investment advising using machine learning. International Research Journal of Engineering and Technology, 4(3), 1821-1825. [21]Xu, H., & Gade, A. (2017). Smart real estate assessments using structured deep neural networks. In Proceedings of the 2017 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computed, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation, 1-7. https://doi.org/10.1109/UIC-ATC.2017.8397560 [22]Pow, N., Janulewicz, E., & Liu, L. (2014). Applied machine learning project 4: Prediction of real estate property prices in Montréal. [23]Park, B., & Bae, J. K. (2015). Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Systems with Applications, 42(6), 2928-2934. https://doi.org/10.1016/j.eswa.2014.11.040 [24]Linear Regression in Machine Learning. (2024). GeeksforGeeks. https://www.geeksforgeeks.org/ml-linear-regression/ [25]Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81-106. https://doi.org/10.1007/BF00116251 [26]Azhari, M., Alaoui, A., Acharoui, Z., Ettaki, B., & Zerouaoui, J. (2019). Adaptation of the random forest method: Solving the problem of pulsar search. In Proceedings of the 4th International Conference on Smart City Applications (SCA '19), 1-6. https://doi.org/10.1145/3368756.3369004 [27]Hao, M., Li, H., Xu, G., Liu, S., & Yang, H. (2019). Towards efficient and privacy-preserving federated deep learning. In Proceedings of the 2019 IEEE International Conference on Communications (ICC), 1-6. https://doi.org/10.1109/ICC.2019.8761267 [28]Fan, X., Lv, S., Xia, C., Ge, D., Liu, C., & Lu, W. (2024). Strength prediction of asphalt mixture under interactive conditions based on BPNN and SVM. Case Studies in Construction Materials, 21, Article e03489. https://doi.org/10.1016/j.cscm.2024.e03489 [29]Eleshin, F. (2022). Health behaviours and vaccination predictions [Undergraduate thesis]. Ashesi University. https://hdl.handle.net/20.500.11988/968 [30]Li, Z., Pinker, R., Wang, J., Sun, L., Xue, W., Li, R., & Cribb, M. (2021). Himawari-8-derived diurnal variations in ground-level PM<sub>2.5</sub> pollution across China using the fast space-time Light Gradient Boosting Machine (LightGBM). Atmospheric Chemistry and Physics, 21(10), 7863-7880. https://doi.org/10.5194/acp-21-7863-2021 [31]Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)?– Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7, 1247-1250. https://doi.org/10.5194/gmd-7-1247-2014 [32]Rawat, T., & Khemchandani, V. (2019). Feature engineering (FE) tools and techniques for better classification performance. International Journal of Innovative Engineering and Technology, 8(2), 178-182. https://doi.org/10.21172/ijiet.82.024 [33]Yuan, J., Ran, X., Liu, K., Yao, C., Yao, Y., Wu, H., & Liu, Q. (2022). Machine learning applications on neuroimaging for diagnosis and prognosis of epilepsy: A review. Journal of Neuroscience Methods, 368, 109441. https://doi.org/10.1016/j.jneumeth.2021.109441 [34]Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288. http://www.jstor.org/stable/2346178 [35]Shao, L., Mahajan, A., Schreck, T., & Lehmann, D. J. (2017). Interactive regression lens for exploring scatter plots. Computer Graphics Forum, 36, 157-166. https://doi.org/10.1111/cgf.13176 [36]Marquardt, D., & Snee, R. (1975). Ridge regression in practice. The American Statistician, 29(1), 3-20. https://doi.org/10.1080/00031305.1975.10479105 [37]DATAtab Team. (2024). DATAtab: Online statistics calculator. DATAtab e.U. https://datatab.net [38]Mihirani, P., Yogarajah, B., & Ratnarajah, N. (2019). Efficient feature selection for prediction of diabetes using LASSO. In Proceedings of the 2019 International Conference on Advances in ICT for Emerging Regions (ICTer), 1-7. IEEE. https://doi.org/10.1109/ICTer48817.2019.9023720 [39]Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2), 301–320. https://doi.org/10.1111/j.1467-9868.2005.00503.x [40]Mantena, S., Vazeer, M., & Rao, K. (2023). Prediction of soil salinity in the Upputeru river estuary catchment, India, using machine learning techniques. Environmental Monitoring and Assessment, 195. https://doi.org/10.1007/s10661-023-11613-y [41]Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T.-Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS'17),3149-3157. Curran Associates Inc. [42]Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '16) ,785–794. Association for Computing Machinery. https://doi.org/10.1145/2939672.2939785 [43]Qi, H., Liang, Y., Ding, Q., & Zou, J. (2021). Automatic identification of peanut-leaf diseases based on stack ensemble. Applied Sciences, 11(4), 1950. https://doi.org/10.3390/app11041950 | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96071 | - |
| dc.description.abstract | 近年來人工智慧技術風潮盛起,各行各業開始利用機器學習解決自身的難題,包括在房地產業、半導體製造業、金融業、商業和行銷等領域上,機器學習都能做出許多貢獻。在房地產業中,房屋價格是政府與百姓皆相當重視的議題,為此,本研究找到一筆文獻記載的房屋資料數據集,先對資料數據進行預處理,再以八種機器學習模型預測資料集所記載的房屋價格。特別的,本研究先依據機器學習常見的嶺迴歸(Ridge Regression)、Lasso迴歸(Lasso Regression)、彈性網路(Elastic Net)三種迴歸方法,以及LightGBM、XGBoost兩種梯度提升框架,分別建立五種房價預測模型,再以集成學習(Ensemble Learning)中的投票(Voting)法與堆疊(Stacking)法集成上述五種預測模型,使分別形成Voting集成預測模型(本研究第六種模型)與Stacking集成預測模型(本研究第七種模型),最後再使用集成學習(Ensemble Learning)中的混合(Blending)法將第六與第七種模型再次集成,使成最終的Blending集成預測模型(本研究第八種模型)。本研究經測試集分析結果比較後發現,在未經集成的前五種模型中,Lasso迴歸模型的預測效能最佳;在兩種梯度提升框架中,雖XGBoost效能比LightGBM好,但梯度提升框架並不適合用於少量資料集的房屋價格預測。針對三種集成模型,Voting集成模型預測效能確實優於前五種模型,且沒有明顯過度擬合(Over Fitting)情形;Stacking集成模型的效能則僅優於LightGBM與嶺迴歸,顯示集成學習並非提升預測效能的萬靈丹;當然,在八種預測模型中,表現最好的還是Blending集成模型,其係將Voting與Stacking集成模型以最佳混和權重來集成,預測效能自然最佳。 | zh_TW |
| dc.description.abstract | In recent years, the rise of artificial intelligence technology has prompted various industries to adopt machine learning to address their unique challenges. Sectors such as real estate, semiconductor manufacturing, finance, commerce, and marketing have all seen significant contributions from machine learning. In the real estate industry, housing prices are a topic of great concern for both the government and the public. To address this, this study utilized a housing dataset documented in the literature, first preprocessing the data and then applying eight machine learning models to predict the housing prices recorded in the dataset.
Specifically, the study first developed five prediction models based on three commonly used machine learning regression methods—Ridge Regression, Lasso Regression, and Elastic Net—and two gradient boosting frameworks, LightGBM and XGBoost. These five models were then integrated using two ensemble learning techniques: Voting and Stacking, resulting in a Voting ensemble prediction model (the study’s sixth model) and a Stacking ensemble prediction model (the study’s seventh model). Finally, the Blending method from ensemble learning was used to integrate the sixth and seventh models into the final Blending ensemble prediction model (the study’s eighth model). After testing and comparing the results, the study found that, among the five non-ensemble models, the Lasso Regression model exhibited the best predictive performance. While XGBoost outperformed LightGBM, gradient boosting frameworks were not well-suited for housing price prediction with small datasets. As for the three ensemble models, the Voting ensemble model’s predictive performance was indeed superior to the five non-ensemble models, without significant overfitting. However, the Stacking ensemble model only outperformed LightGBM and Ridge Regression, indicating that ensemble learning is not a panacea for improving predictive performance. Ultimately, the Blending ensemble model was the best performer among the eight models, as it integrated the Voting and Stacking models with optimal blending weights, resulting in the highest predictive performance. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-10-11T16:06:30Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-10-11T16:06:30Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 誌謝 i
中文摘要 ii Abstract iii 目次 v 圖次 vii 表次 ix 1 第一章 前言 1 1.1 研究背景 1 1.2 問題與動機 4 1.3 研究流程與目的 5 第二章 文獻探討 7 2.1 影響房價因素探討 7 2.2 房屋價格預測相關研究 9 2.3 影響房屋價格之變數 18 2.4 研究方法之文獻回顧與探討 23 2.4.1 線性迴歸(Linear Regression) 23 2.4.2 決策樹(Decision Trees) 24 2.4.3 隨機森林(Random Forests) 25 2.4.4 人工神經網路(Artificial Neural Networks) 26 2.4.5 支持向量機(Support Vector Machines) 27 2.4.5 極限梯度提升(XGBoost) 28 2.4.6 輕質梯度提升(LightGBM) 29 2.4.7 均方根誤差(RMSE)與均方誤差(MSE) 31 第三章 研究方法 33 3.1 研究概述 33 3.2 研究範圍及限制 35 3.3 研究方法 36 3.3.1 Lasso 迴歸(Lasso Regression) 37 3.3.2 嶺迴歸(Ridge Regression) 38 3.3.3 彈性網路(Elastic Net) 42 3.3.4 輕質梯度增強機(LightGBM)原理 46 3.3.5 極限梯度提升(XGBoost)原理 49 3.3.6 集成學習(Ensemble Learning) 52 第四章 案例分析與結果 55 4.1 資料介紹與探索 55 4.2 資料預處理特徵工程 59 4.2.1 主要特徵分析與驗證 59 4.2.2 資料遺漏與極端值處理 67 4.2.3 資料對數轉換(Log Transformation) 71 4.3 預測模型之建立 75 4.3.1 正規化迴歸(Regularized Regression)模型建立與預測結果 76 4.3.2 梯度提升機(Gradient Boosting Machine)模型建立與預測結果 78 4.3.3 集成學習建立預測模型與預測結果 80 4.3.4 八種價格預測模型預測結果比較 84 4.4 資料預處理文獻對比 85 第五章 結論 87 參考文獻 89 | - |
| dc.language.iso | zh_TW | - |
| dc.title | 機器學習與集成學習方法在房價預測上之應用 | zh_TW |
| dc.title | An Application of Machine Learning and Ensemble Learning Methods to the Prediction of Housing Price | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-1 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 張瑞益;黃奎隆 | zh_TW |
| dc.contributor.oralexamcommittee | Ray-I Chang;Kwei-Long Huang | en |
| dc.subject.keyword | 機器學習,集成學習,資料預處理,房屋價格預測, | zh_TW |
| dc.subject.keyword | Machine Learning,Ensemble Learning,Data Preprocessing,Housing Price Prediction, | en |
| dc.relation.page | 93 | - |
| dc.identifier.doi | 10.6342/NTU202401820 | - |
| dc.rights.note | 未授權 | - |
| dc.date.accepted | 2024-10-07 | - |
| dc.contributor.author-college | 工學院 | - |
| dc.contributor.author-dept | 工業工程學研究所 | - |
| 顯示於系所單位: | 工業工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf 未授權公開取用 | 3.5 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
