有限誤差資料特徵重要度對壓縮及機器學習準確度之影響

李孟哲; Meng-Che Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89131

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	張瑞益	zh_TW
dc.contributor.advisor	Ray-I Chang	en
dc.contributor.author	李孟哲	zh_TW
dc.contributor.author	Meng-Che Lee	en
dc.date.accessioned	2023-08-16T17:15:40Z	-
dc.date.available	2023-11-09	-
dc.date.copyright	2023-08-16	-
dc.date.issued	2023	-
dc.date.submitted	2023-08-09	-
dc.identifier.citation	[1] M. N. Bhuiyan, M. M. Rahman, M. M. Billah and D. Saha, "Internet of Things (IoT): A Review of Its Enabling Technologies in Healthcare Applications, Standards Protocols, Security, and Market Opportunities," in IEEE Internet of Things Journal, vol. 8, no. 13, pp. 10474-10498, 1 July1, 2021, doi: 10.1109/JIOT.2021.3062630. [2] Y. Liang and W. Peng, “Minimizing energy consumptions in wireless sensor networks via two-modal transmission,” ACM SIGCOMM Computer Communication Review, vol. 40, no. 1, pp. 12–18, 2010. doi:10.1145/1672308.1672311 [3] C. -L. Lin, J. -H. Tsai, Y. -H. Chu and R. -I. Chang, "Concept of bounded error to improve wireless sensor network data compression," SENSORS, 2014 IEEE, Valencia, Spain, 2014, pp. 1240-1243, doi: 10.1109/ICSENS.2014.6985234 [4] Y. -H. Chen, N. -Y. Huang, Y. -H. Chu, M. -H. Li, R. -I. Chang and C. -H. Wang, "Dynamic bounded-error data compression and aggregation in wireless sensor network," SENSORS, 2012 IEEE, Taipei, Taiwan, 2012, pp. 1-4, doi: 10.1109/ICSENS.2012.6411224. [5] M. -H. Li, C. -C. Lin, C. -C. Chuang and R. -I. Chang, "Error-Bounded Data Compression Using Data, Temporal and Spatial Correlations in Wireless Sensor Networks," 2010 International Conference on Multimedia Information Networking and Security, Nanjing, China, 2010, pp. 111-115, doi: 10.1109/MINES.2010.31. [6] European Commission. General data protection regulation, 2018. [7] R.-I. Chang, L.-C. Wei, C.-H. Wang, and Y.-K. Tseng, “Blockchain for bounded-error-pruned content protection,” ICT Express, vol. 7, no. 3, pp. 295–299, 2021. doi:10.1016/j.icte.2021.08.013. [8] Z. Zheng et al., “An overview on smart contracts: Challenges, advances and platforms,” Future Generation Computer Systems, vol. 105, pp. 475–491, 2020. doi:10.1016/j.future.2019.12.019 [9] Sompolinsky, Y. and Zohar, A. (2015) ‘Secure high-rate transaction processing in bitcoin’, Financial Cryptography and Data Security, pp. 507–527. doi:10.1007/978-3-662-47854-7_32. [10] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning,” ACM Transactions on Intelligent Systems and Technology, vol. 10, no. 2, pp. 1–19, 2019. doi:10.1145/3298981 [11] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J. Mach. Learn. Res., vol. 3, no. 3, pp. 1157-1182, Mar. 2003. [12] A. Banga, R. Ahuja, and S. C. Sharma, “Performance analysis of regression algorithms and feature selection techniques to predict PM2.5 in smart cities,” International Journal of System Assurance Engineering and Management, 2021. doi:10.1007/s13198-020-01049-9 [13] A. Altmann, L. Toloşi, O. Sander, and T. Lengauer, "Permutation importance: a corrected feature importance measure," Bioinformatics, vol. 26, no. 10, pp. 1340-1347, May 2010. [14] B. M. Greenwell, B. C. Boehmke, and A. J. McCarthy, "A Simple and Effective Model-Based Variable Importance Measure," arXiv preprint arXiv:1805.04755, 2018. [15] U. Grömping, “Variable importance in regression models,” Wiley Interdisciplinary Reviews: Computational Statistics, vol. 7, no. 2, pp. 137–152, 2015. doi:10.1002/wics.1346 [16] M. Arif and R. S. Anand, “Run length encoding for speech data compression,” 2012 IEEE International Conference on Computational Intelligence and Computing Research, 2012. doi:10.1109/iccic.2012.6510185 [17] D. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952. doi:10.1109/jrproc.1952.273898. [18] W. S. Noble, “What is a support vector machine?,” Nature Biotechnology, vol. 24, no. 12, pp. 1565–1567, 2006. doi:10.1038/nbt1206-1565 [19] R. A. FISHER, “The use of multiple measurements in taxonomic problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188, 1936. doi:10.1111/j.1469-1809.1936.tb02137.x. [20] S. P. Lloyd, "Least squares quantization in PCM," Bell Telephone Laboratories Paper, vol. 36, no. 7, pp. 611-656, 1957. [21] Scikit-learn. (n.d.). Iris Dataset. Retrieved December, 2022, Available: https://scikitlearn.org/stable/modules/generated/sklearn.datasets.load_iris.html. [22] Candanedo, L. M., Feldheim, V., & Deru, M. (2016). Room Occupancy Detection Data Set. Kaggle. Available: https://www.kaggle.com/datasets/sachinsharma1123/room-occupancy [23] UCI Machine Learning Repository. (1997). Contraceptive Method Choice Dataset Kaggle. Available : https://www.kaggle.com/datasets/faizunnabi/contraceptive-method-choice [24] H. Hancock and T. M. Khoshgoftaar, "Survey on categorical data for neural networks," J. Big Data, vol. 7, no. 1, p. 28, 2020. doi: 10.1186/s40537-020-00305-w. [25] Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Duchesnay, É. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830. [26] M. Bora, D. Jyoti, D. Gupta, and A. Kumar, "Effect of different distance measures on the performance of K-means algorithm: an experimental study in Matlab," arXiv preprint arXiv:1405.7471, 2014. [27] Ze-Nian Li, Mark S. Drew, and Jiangchuan Liu, "Fundamentals of Multimedia," Prentice Hall, Upper Saddle River, NJ, USA, 2003, pp. 74 [28] 陳盈秀。「SVM類神經網路於單調性資料探勘之研究」。碩士論文，國立成功大學工業與資訊管理學系專班，2009。https://hdl.handle.net/11296/wwy6zr。	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89131	-
dc.description.abstract	物聯網技術的出現為人類帶來許多便利的服務，透過感測節點的部署，得以收集各式資料；然而感測節點所收集資料涉及敏感資訊，因此本研究先前提出有限誤差資料(Bounded-Error Data, BED)隱私保護 (BED Privacy Protection, BEDPP) 系統架構，旨在保護用戶資料的隱私。BEDPP系統架構分為兩個子系統：有限誤差物聯網 (Bounded-Error IoT, BEIoT) 以及基於區塊鏈的有限誤差資料市集 (BED Market on Blockchain, BEDMoB)。BEIoT負責資料收集並以先前研究所提出之有限誤差壓縮演算法Bounded-Error Run Length Encoding (BERLE) 或 Bounded-Error Huffman Coding (BEHC) 壓縮資料。BEDMoB負責根據智能合約中的隱私權限，為用戶提供有限誤差資料(Bounded-Error Data, BED)，達成具隱私保護之資料共享服務。BEDPP建構物聯網資料共享流程的同時，可降低節點資料傳輸量以延長物聯網裝置使用壽命。惟該架構在資料收集階段時未考量特徵於後續機器學習應用場景之重要程度，因此本研究針對BEIoT所處之資料收集階段，依據預期之機器學習目標進行特徵重要度衡量 (Feature Importance Measurement, FIM)，以作為資料收集時的誤差分派方針，除提升資料壓縮之成效外更強化BED在機器學習時之可用性。研究成果顯示，相較於採等比例誤差分派方法之原BEIoT系統，加入FIM模型後平均可改善BERLE與BEHC之壓縮率，分別達12.55% 與19.97%，並提升BED於 SVM、k-NN、K-Means 之機器學習表現，分別可達 3.24%、3.44% 及26.20%。	zh_TW
dc.description.abstract	The emergence of Internet of Things (IoT) technology has brought numerous convenient services. Through the deployment of sensor nodes, various types of data can be collected. As these data may involve sensitive information, our previous study proposed the Bounded-Error Data (BED) Privacy Protection (BEDPP) system framework to safeguard user data privacy. BEDPP consists of two subsystems: Bounded-Error IoT (BEIoT) and BED Market on Blockchain (BEDMoB). BEIoT is responsible for data collection and compression by our previous BERLE (Bounded-Error Run Length Encoding) or BEHC (Bounded-Error Huffman Coding) algorithms. BEDMoB ensures privacy-preserving data sharing services based on privacy permissions embedded in smart contracts of Blockchain. The BEDPP framework not only constructs a complete IoT data sharing process but also reduces the amount of data transmission to prolong the lifespan of IoT devices. However, this framework did not consider the feature importance (FI) in subsequent machine learning applications during the data collection stage. Therefore, in this study, for the data collection stage of BEIoT, FI is measured by a FI Measurement (FIM) module based on the expected objective in machine learning. FIM is used as the error allocation policy during data collection to enhance the effectiveness of data compression and the usability in machine learning. Compared to the original BEIoT system with a proportional error allocation method, the FIM model can improve the compression rates of BERLE and BEHC by 12.55% and 19.97%, respectively. Additionally, it enhances the performance of BED in SVM, k-NN, and K-Means by 3.24%、3.44% and 26.20%, respectively.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2023-08-16T17:15:40Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2023-08-16T17:15:40Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 I ABSTRACT II 目錄 III 圖目錄 V 表目錄 VII 第一章緒論 1 1.1 研究背景 1 1.2 研究動機 3 1.3 研究目的 3 第二章文獻探討 5 2.1 特徵選擇演算法 5 2.2 有限誤差壓縮演算法 5 2.3 機器學習模型 8 第三章研究方法 10 3.1 實驗流程與架構 10 3.1.1 資料集 12 3.1.2 資料集於分群演算法與分類演算法之處理方式 13 3.2 FIM模組 13 第四章實驗結果 17 4.1 考量FIM結果之誤差分派方法對於壓縮表現之影響 17 4.1.1 考量FIM結果對於BERLE表現之影響 18 4.1.2 考量FIM結果對於BEHC表現之影響 18 4.1.3 考量FIM結果對於BEHC與BERLE表現之比較 19 4.2 考量FIM結果之壓縮法對於機器學習表現之影響 21 4.2.1 考量FIM結果之BERLE對於機器學習表現之影響 21 4.2.2 考量FIM結果之BEHC對於機器學習表現之影響 25 4.2.3 考量FIM結果之BERLE與BEHC對於機器學習表現之比較 29 4.3 考量FIM結果對於壓縮與機器學習表現之通用準則 30 第五章結論與未來展望 31 5.1 結論 31 5.2 未來展望 33 5.3 BERLEHC 34 5.3.1 考量FIM結果之BERLEHC壓縮表現 35 5.3.2 考量FIM結果之BERLEHC機器學習表現 36 參考文獻 39	-
dc.language.iso	zh_TW	-
dc.subject	壓縮	zh_TW
dc.subject	物聯網	zh_TW
dc.subject	機器學習	zh_TW
dc.subject	特徵重要度	zh_TW
dc.subject	有限誤差	zh_TW
dc.subject	Bounded-Error	en
dc.subject	Machine Learning	en
dc.subject	Compression	en
dc.subject	IoT	en
dc.subject	Feature Importance	en
dc.title	有限誤差資料特徵重要度對壓縮及機器學習準確度之影響	zh_TW
dc.title	The Influence of Bounded-Error Data with Feature Importance on Compression and Machine Learning Accuracy	en
dc.type	Thesis	-
dc.date.schoolyear	111-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	王家輝;尹邦嚴;丁肇隆	zh_TW
dc.contributor.oralexamcommittee	Chia-Hui Wang;Peng-Yeng Yin;Chao-Lung Ting	en
dc.subject.keyword	有限誤差,壓縮,特徵重要度,機器學習,物聯網,	zh_TW
dc.subject.keyword	Bounded-Error,Compression,Feature Importance,Machine Learning,IoT,	en
dc.relation.page	42	-
dc.identifier.doi	10.6342/NTU202303010	-
dc.rights.note	未授權	-
dc.date.accepted	2023-08-10	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	工程科學及海洋工程學系	-
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	1.9 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。