有限誤差資料特徵重要度對壓縮及機器學習準確度之影響

李孟哲; Meng-Che Lee

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89131

標題:	有限誤差資料特徵重要度對壓縮及機器學習準確度之影響 The Influence of Bounded-Error Data with Feature Importance on Compression and Machine Learning Accuracy
作者:	李孟哲 Meng-Che Lee
指導教授:	張瑞益 Ray-I Chang
關鍵字:	有限誤差,壓縮,特徵重要度,機器學習,物聯網, Bounded-Error,Compression,Feature Importance,Machine Learning,IoT,
出版年 :	2023
學位:	碩士
摘要:	物聯網技術的出現為人類帶來許多便利的服務，透過感測節點的部署，得以收集各式資料；然而感測節點所收集資料涉及敏感資訊，因此本研究先前提出有限誤差資料(Bounded-Error Data, BED)隱私保護 (BED Privacy Protection, BEDPP) 系統架構，旨在保護用戶資料的隱私。BEDPP系統架構分為兩個子系統：有限誤差物聯網 (Bounded-Error IoT, BEIoT) 以及基於區塊鏈的有限誤差資料市集 (BED Market on Blockchain, BEDMoB)。BEIoT負責資料收集並以先前研究所提出之有限誤差壓縮演算法Bounded-Error Run Length Encoding (BERLE) 或 Bounded-Error Huffman Coding (BEHC) 壓縮資料。BEDMoB負責根據智能合約中的隱私權限，為用戶提供有限誤差資料(Bounded-Error Data, BED)，達成具隱私保護之資料共享服務。BEDPP建構物聯網資料共享流程的同時，可降低節點資料傳輸量以延長物聯網裝置使用壽命。惟該架構在資料收集階段時未考量特徵於後續機器學習應用場景之重要程度，因此本研究針對BEIoT所處之資料收集階段，依據預期之機器學習目標進行特徵重要度衡量 (Feature Importance Measurement, FIM)，以作為資料收集時的誤差分派方針，除提升資料壓縮之成效外更強化BED在機器學習時之可用性。研究成果顯示，相較於採等比例誤差分派方法之原BEIoT系統，加入FIM模型後平均可改善BERLE與BEHC之壓縮率，分別達12.55% 與19.97%，並提升BED於 SVM、k-NN、K-Means 之機器學習表現，分別可達 3.24%、3.44% 及26.20%。 The emergence of Internet of Things (IoT) technology has brought numerous convenient services. Through the deployment of sensor nodes, various types of data can be collected. As these data may involve sensitive information, our previous study proposed the Bounded-Error Data (BED) Privacy Protection (BEDPP) system framework to safeguard user data privacy. BEDPP consists of two subsystems: Bounded-Error IoT (BEIoT) and BED Market on Blockchain (BEDMoB). BEIoT is responsible for data collection and compression by our previous BERLE (Bounded-Error Run Length Encoding) or BEHC (Bounded-Error Huffman Coding) algorithms. BEDMoB ensures privacy-preserving data sharing services based on privacy permissions embedded in smart contracts of Blockchain. The BEDPP framework not only constructs a complete IoT data sharing process but also reduces the amount of data transmission to prolong the lifespan of IoT devices. However, this framework did not consider the feature importance (FI) in subsequent machine learning applications during the data collection stage. Therefore, in this study, for the data collection stage of BEIoT, FI is measured by a FI Measurement (FIM) module based on the expected objective in machine learning. FIM is used as the error allocation policy during data collection to enhance the effectiveness of data compression and the usability in machine learning. Compared to the original BEIoT system with a proportional error allocation method, the FIM model can improve the compression rates of BERLE and BEHC by 12.55% and 19.97%, respectively. Additionally, it enhances the performance of BED in SVM, k-NN, and K-Means by 3.24%、3.44% and 26.20%, respectively.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/89131
DOI:	10.6342/NTU202303010
全文授權:	未授權
顯示於系所單位：	工程科學及海洋工程學系

文件中的檔案：

檔案	大小	格式
ntu-111-2.pdf 未授權公開取用	1.9 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。