請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98664| 標題: | 基於生成式人工智慧的敏感圖像資料外洩防護系統 A Data Loss Prevention System for Sensitive Images Based on Generative Artificial Intelligence |
| 作者: | 楊智 Chih Yang |
| 指導教授: | 張瑞益 Ray-I Chang |
| 關鍵字: | 敏感資料偵測,生成式人工智慧,視覺隱私治理,圖卷積網路, Sensitive image detection,Generative Artificial Intelligence (GenAI),Visual Privacy Governance,Graph Convolutional Network (GCN), |
| 出版年 : | 2025 |
| 學位: | 碩士 |
| 摘要: | 隨著行動裝置與社群媒體普及,因分享圖像而導致的個資外洩風險日益嚴峻。此問題涵蓋兩個層面:一是使用者可能於無意間在社群平台上暴露敏感圖片,二是目前缺乏主動偵測網路上已外洩之身份證、駕照或護照等官方證件圖像的有效方法。現有研究及工具普遍存在模型架構老舊、未能有效模擬隱私屬性間語意關聯性、偏好模型僵化,及傳統資料外洩防護(DLP)系統未能有效主動偵測散佈於網路上的敏感圖像等限制。為應對上述挑戰,本研究提出一套全方位的視覺隱私治理與敏感圖像資料偵測框架,涵蓋基於GCN之智慧化隱私辨識模組的事前預防及基於GenAI之主動式外洩的事後偵測兩大核心。
在主動式敏感證件偵測方面,本研究透過生成式人工智慧技術,產生高度逼真的證件樣本,其真實度經pHash結構相似度測試顯著優於DALL·E 3等主流模型。再以創新的「動態生成遮罩」技術進行特徵比對,相比於傳統作法可有效降低假陽性配對點數達70.4%,顯著提升比對精度,並結合深度學習光學文字辨識(OCR)進一步驗證圖片內容是否確實包含隱私資訊。在真實網路環境實測中,本系統精確率且召回率皆達100%;在IDNet公開資料集中更達成100%精確率與99.7%召回率。此外,系統整合大型語言模型(Large Language Model, LLM)提供資安建議,在實際部署中已成功偵測多起真實敏感資料外洩事件,其中更包含企業級的外洩案例,展現高度的實務價值。在事前預防方面,針對社群分享時潛在的隱私暴露問題,本研究導入基於GCN之智慧化隱私辨識模組,以現代視覺模型為骨幹網路,並採用圖卷積網路(Graph Convolutional Network, GCN)作為分類器,有效模擬不同隱私標籤間的共生關係,產生更精準且具邏輯性的分類結果。實驗證明,本方法相較於最新研究在VISPR資料集上mAP提升6.0個百分點(52.88% vs. 46.88%),F1-score提升達10%,可提供更具意義的隱私保護建議。本研究所提出的框架不僅能主動偵測在網路上已外洩之高風險敏感圖像資料,更能為社群使用者提供動態且個人化的隱私設定建議,樹立了以使用者為中心的數位隱私保護新標竿。 With the proliferation of mobile devices and social media, the risk of personal data leakage through image sharing has become increasingly severe. This problem encompasses two aspects: first, users may inadvertently expose sensitive images on social platforms, and second, there is a lack of effective methods for proactively detecting official document images, such as ID cards, driver's licenses, or passports, that have already been leaked on Internet. Existing research and tools generally suffer from limitations such as outdated model architectures, failure to effectively model semantic correlations between privacy attributes, rigid preference models, and the inability of traditional Data Loss Prevention (DLP) systems to proactively detect sensitive images distributed across the internet. To address these challenges, this study proposes a comprehensive framework for visual privacy governance and sensitive image data detection, encompassing two core components: personalized privacy management and proactive leak detection. In the area of proactive sensitive document detection, this research utilizes Generative AI to produce highly realistic document samples, whose authenticity, as measured by pHash structural similarity tests, is significantly superior to mainstream models like DALL·E 3. It then employs an innovative "Dynamic Mask Generation" technique for feature matching, which effectively reduces false-positive matches by 70.4% compared to traditional methods, significantly enhancing matching precision. This is combined with deep learning-based Optical Character Recognition (OCR) to further verify if the image content indeed contains private information. In real-world online tests, the system achieved 100% for both precision and recall; on the public IDNet dataset, it achieved 100% precision and 99.7% recall. Furthermore, the system integrates a Large Language Model (LLM) to provide cybersecurity recommendations and has successfully detected multiple real-world sensitive data leaks in actual deployments, including enterprise-level incidents, demonstrating high practical value. For personalized privacy governance, addressing potential privacy exposure during social sharing, this research introduces a personalized privacy classification system. It uses a modern vision model as its backbone and employs a Graph Convolutional Network (GCN) as a classifier to effectively model the co-occurrence relationships between different privacy labels, generating more accurate and logically coherent classification results. Experiments show that, compared to the state-of-the-art, this method improves mAP by 6.0 percentage points (52.88% vs. 46.88%) and F1-score by 10% on the VISPR dataset, offering more meaningful privacy protection recommendations. The framework proposed in this study not only proactively detects high-risk sensitive image data already leaked on the internet but also provides social media users with dynamic and personalized privacy setting recommendations, setting a new benchmark for user-centric digital privacy protection. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98664 |
| DOI: | 10.6342/NTU202504034 |
| 全文授權: | 未授權 |
| 電子全文公開日期: | N/A |
| 顯示於系所單位: | 工程科學及海洋工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf 未授權公開取用 | 8.19 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
