Skip navigation

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料(如:文字、圖片、PDF)並使其易於取用。

點此認識 DSpace
DSpace logo
English
中文
  • 瀏覽論文
    • 校院系所
    • 出版年
    • 作者
    • 標題
    • 關鍵字
    • 指導教授
  • 搜尋 TDR
  • 授權 Q&A
    • 我的頁面
    • 接受 E-mail 通知
    • 編輯個人資料
  1. NTU Theses and Dissertations Repository
  2. 工學院
  3. 工程科學及海洋工程學系
請用此 Handle URI 來引用此文件: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98664
完整後設資料紀錄
DC 欄位值語言
dc.contributor.advisor張瑞益zh_TW
dc.contributor.advisorRay-I Changen
dc.contributor.author楊智zh_TW
dc.contributor.authorChih Yangen
dc.date.accessioned2025-08-18T01:16:15Z-
dc.date.available2025-08-18-
dc.date.copyright2025-08-15-
dc.date.issued2025-
dc.date.submitted2025-08-05-
dc.identifier.citationR. Gross and A. Acquisti, "Information revelation and privacy in online social networks," in Proceedings of the 2005 ACM workshop on Privacy in the electronic society, 2005, pp. 71-80.
N. Vishwamitra et al., "Towards automated content-based photo privacy control in user-centered social networks," in Proceedings of the Twelfth ACM Conference on Data and Application Security and Privacy, 2022, pp. 65-76.
IBM, Cost of a Data Breach Report 2024. Armonk, NY, USA: IBM Security, 2024.
S. Liu and R. Kuhn, "Data loss prevention," IT professional, vol. 12, no. 2, pp. 10-13, 2010.
L. Cheng, F. Liu, and D. Yao, "Enterprise data breach: causes, challenges, prevention, and future directions," WIREs Data Mining and Knowledge Discovery, vol. 7, no. 5, 2017, doi: 10.1002/widm.1211.
H. Jiang, J. Zuo, and Y. Lu, "Connecting Visual Data to Privacy: Predicting and Measuring Privacy Risks in Images," Electronics, vol. 14, no. 4, p. 811, 2025.
H. Adkins, "Cybersecurity Technology Salon: A Dialogue with Google’s VP of Security," in Cybersecurity Technology Salon, New Taipei City, Taiwan, 22 August 2024 2024: National Institute of Cyber Security & Google.
A. Chattopadhyay, D. Christian, A. Ulman, and C. Sawyer, "A middle-school case study: Piloting a novel visual privacy themed module for teaching societal and human security topics using social media apps," in 2018 IEEE Frontiers in Education Conference (FIE), 2018: IEEE, pp. 1-8.
S. Zerr, S. Siersdorfer, and J. Hare, "Picalert! a system for privacy-aware image classification and retrieval," in Proceedings of the 21st ACM international conference on Information and knowledge management, 2012, pp. 2710-2712.
T. Orekondy, B. Schiele, and M. Fritz, "Towards a visual privacy advisor: Understanding and predicting privacy risks in images," in Proceedings of the IEEE international conference on computer vision, 2017, pp. 3686-3695.
G. Yang, J. Cao, Q. Sheng, P. Qi, X. Li, and J. Li, "DRAG: Dynamic region-aware GCN for privacy-leaking image detection," in Proceedings of the AAAI conference on artificial intelligence, 2022, vol. 36, no. 11, pp. 12217-12225.
A. Paradise Vit, Y. Aronson, R. Fraidenberg, and R. Puzis, "Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images," Applied Sciences, vol. 14, no. 17, p. 7915, 2024. [Online]. Available: https://www.mdpi.com/2076-3417/14/17/7915.
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 770-778.
A. Dosovitskiy et al., "An image is worth 16x16 words: Transformers for image recognition at scale," arXiv preprint arXiv:2010.11929, 2020.
Z. Liu, H. Mao, C.-Y. Wu, C. Feichtenhofer, T. Darrell, and S. Xie, "A convnet for the 2020s," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 11976-11986.
W. K. Mutlag, S. K. Ali, Z. M. Aydam, and B. H. Taher, "Feature extraction methods: a review," in Journal of Physics: Conference Series, 2020, vol. 1591, no. 1: IOP Publishing, p. 012028.
S. Leutenegger, M. Chli, and R. Y. Siegwart, "BRISK: Binary robust invariant scalable keypoints," in 2011 International conference on computer vision, 2011: Ieee, pp. 2548-2555.
T. Lindeberg, "Scale invariant feature transform," 2012.
H. Bay, T. Tuytelaars, and L. Van Gool, "Surf: Speeded up robust features," in Computer Vision–ECCV 2006: 9th European Conference on Computer Vision, Graz, Austria, May 7-13, 2006. Proceedings, Part I 9, 2006: Springer, pp. 404-417.
E. Rublee, V. Rabaud, K. Konolige, and G. Bradski, "ORB: An efficient alternative to SIFT or SURF," in 2011 International conference on computer vision, 2011: Ieee, pp. 2564-2571.
M. Muja and D. G. Lowe, "Fast approximate nearest neighbors with automatic algorithm configuration," VISAPP (1), vol. 2, no. 331-340, p. 2, 2009.
D. G. Lowe, "Distinctive image features from scale-invariant keypoints," International journal of computer vision, vol. 60, pp. 91-110, 2004.
F. K. Noble, "Comparison of OpenCV's feature detectors and feature matchers," in 2016 23rd International Conference on Mechatronics and Machine Vision in Practice (M2VIP), 2016: IEEE, pp. 1-6.
Y. Du et al., "Pp-ocrv2: Bag of tricks for ultra lightweight ocr system," arXiv preprint arXiv:2109.03144, 2021.
S. Jain, K. Pulaparthi, and C. Fulara, "Content based image retrieval," Int. J. Adv. Eng. Glob. Technol, vol. 3, no. 10, pp. 1251-1258, 2015.
Z. Wu, Q. Ke, M. Isard, and J. Sun, "Bundling features for large scale partial-duplicate web image search," in 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009: IEEE, pp. 25-32.
J. Matas, O. Chum, M. Urban, and T. Pajdla, "Robust wide-baseline stereo from maximally stable extremal regions," Image and vision computing, vol. 22, no. 10, pp. 761-767, 2004.
B. Den Boer and A. Bosselaers, "Collisions for the compression function of MD5," in Workshop on the Theory and Application of of Cryptographic Techniques, 1993: Springer, pp. 293-304.
V. Klima, "Tunnels in hash functions: MD5 collisions within a minute," Cryptology ePrint Archive, 2006.
X. Wang, D. Feng, X. Lai, and H. Yu, "Collisions for hash functions MD4, MD5, HAVAL-128 and RIPEMD," Cryptology ePrint Archive, 2004.
X. Wang and H. Yu, "How to break MD5 and other hash functions," in Annual international conference on the theory and applications of cryptographic techniques, 2005: Springer, pp. 19-35.
T. Xie and D. Feng, "How to find weak input differences for MD5 collision attacks," Cryptology ePrint Archive, 2009.
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, "High-resolution image synthesis with latent diffusion models," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10684-10695.
P. Esser, R. Rombach, and B. Ommer, "Taming transformers for high-resolution image synthesis," in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 12873-12883.
D. Bernstein, "Containers and cloud: From lxc to docker to kubernetes," IEEE cloud computing, vol. 1, no. 3, pp. 81-84, 2014.
B. Burns, B. Grant, D. Oppenheimer, E. Brewer, and J. Wilkes, "Borg, omega, and kubernetes," Communications of the ACM, vol. 59, no. 5, pp. 50-57, 2016.
Y. Gao et al., "Retrieval-augmented generation for large language models: A survey," arXiv preprint arXiv:2312.10997, vol. 2, 2023.
-
dc.identifier.urihttp://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98664-
dc.description.abstract隨著行動裝置與社群媒體普及,因分享圖像而導致的個資外洩風險日益嚴峻。此問題涵蓋兩個層面:一是使用者可能於無意間在社群平台上暴露敏感圖片,二是目前缺乏主動偵測網路上已外洩之身份證、駕照或護照等官方證件圖像的有效方法。現有研究及工具普遍存在模型架構老舊、未能有效模擬隱私屬性間語意關聯性、偏好模型僵化,及傳統資料外洩防護(DLP)系統未能有效主動偵測散佈於網路上的敏感圖像等限制。為應對上述挑戰,本研究提出一套全方位的視覺隱私治理與敏感圖像資料偵測框架,涵蓋基於GCN之智慧化隱私辨識模組的事前預防及基於GenAI之主動式外洩的事後偵測兩大核心。
在主動式敏感證件偵測方面,本研究透過生成式人工智慧技術,產生高度逼真的證件樣本,其真實度經pHash結構相似度測試顯著優於DALL·E 3等主流模型。再以創新的「動態生成遮罩」技術進行特徵比對,相比於傳統作法可有效降低假陽性配對點數達70.4%,顯著提升比對精度,並結合深度學習光學文字辨識(OCR)進一步驗證圖片內容是否確實包含隱私資訊。在真實網路環境實測中,本系統精確率且召回率皆達100%;在IDNet公開資料集中更達成100%精確率與99.7%召回率。此外,系統整合大型語言模型(Large Language Model, LLM)提供資安建議,在實際部署中已成功偵測多起真實敏感資料外洩事件,其中更包含企業級的外洩案例,展現高度的實務價值。在事前預防方面,針對社群分享時潛在的隱私暴露問題,本研究導入基於GCN之智慧化隱私辨識模組,以現代視覺模型為骨幹網路,並採用圖卷積網路(Graph Convolutional Network, GCN)作為分類器,有效模擬不同隱私標籤間的共生關係,產生更精準且具邏輯性的分類結果。實驗證明,本方法相較於最新研究在VISPR資料集上mAP提升6.0個百分點(52.88% vs. 46.88%),F1-score提升達10%,可提供更具意義的隱私保護建議。本研究所提出的框架不僅能主動偵測在網路上已外洩之高風險敏感圖像資料,更能為社群使用者提供動態且個人化的隱私設定建議,樹立了以使用者為中心的數位隱私保護新標竿。
zh_TW
dc.description.abstractWith the proliferation of mobile devices and social media, the risk of personal data leakage through image sharing has become increasingly severe. This problem encompasses two aspects: first, users may inadvertently expose sensitive images on social platforms, and second, there is a lack of effective methods for proactively detecting official document images, such as ID cards, driver's licenses, or passports, that have already been leaked on Internet. Existing research and tools generally suffer from limitations such as outdated model architectures, failure to effectively model semantic correlations between privacy attributes, rigid preference models, and the inability of traditional Data Loss Prevention (DLP) systems to proactively detect sensitive images distributed across the internet. To address these challenges, this study proposes a comprehensive framework for visual privacy governance and sensitive image data detection, encompassing two core components: personalized privacy management and proactive leak detection.
In the area of proactive sensitive document detection, this research utilizes Generative AI to produce highly realistic document samples, whose authenticity, as measured by pHash structural similarity tests, is significantly superior to mainstream models like DALL·E 3. It then employs an innovative "Dynamic Mask Generation" technique for feature matching, which effectively reduces false-positive matches by 70.4% compared to traditional methods, significantly enhancing matching precision. This is combined with deep learning-based Optical Character Recognition (OCR) to further verify if the image content indeed contains private information. In real-world online tests, the system achieved 100% for both precision and recall; on the public IDNet dataset, it achieved 100% precision and 99.7% recall. Furthermore, the system integrates a Large Language Model (LLM) to provide cybersecurity recommendations and has successfully detected multiple real-world sensitive data leaks in actual deployments, including enterprise-level incidents, demonstrating high practical value. For personalized privacy governance, addressing potential privacy exposure during social sharing, this research introduces a personalized privacy classification system. It uses a modern vision model as its backbone and employs a Graph Convolutional Network (GCN) as a classifier to effectively model the co-occurrence relationships between different privacy labels, generating more accurate and logically coherent classification results. Experiments show that, compared to the state-of-the-art, this method improves mAP by 6.0 percentage points (52.88% vs. 46.88%) and F1-score by 10% on the VISPR dataset, offering more meaningful privacy protection recommendations. The framework proposed in this study not only proactively detects high-risk sensitive image data already leaked on the internet but also provides social media users with dynamic and personalized privacy setting recommendations, setting a new benchmark for user-centric digital privacy protection.
en
dc.description.provenanceSubmitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T01:16:15Z
No. of bitstreams: 0
en
dc.description.provenanceMade available in DSpace on 2025-08-18T01:16:15Z (GMT). No. of bitstreams: 0en
dc.description.tableofcontents口試委員會審定書 i
誌謝 ii
中文摘要 iii
ABSTRACT v
目次 vii
圖次 x
表次 xi
第一章 緒論 1
1.1 研究背景 1
1.2 研究動機與目的 2
1.3 論文結構 4
第二章 文獻探討 5
2.1 圖像隱私保護技術發展 5
2.2 圖像隱私偵測技術發展 6
2.3 現代視覺辨識骨幹架構 8
2.4 圖像特徵擷取與比對技術 9
2.4.1 BRISK 9
2.4.2 SIFT 10
2.4.3 SURF 11
2.4.4 ORB 12
2.4.5 特徵點比對方法 12
2.4.6 特徵點演算法效能比較與方法選擇依據 14
2.5 Perceptual Hashing與結構相似度分析 15
2.6 光學字元辨識技術 15
2.7 圖像搜尋技術與應用場景 16
第三章 系統設計與研究方法 18
3.1 系統架構 18
3.2 事前預防:基於GCN之智慧化隱私辨識模組 18
3.2.1 現代化特徵提取器:ConvNeXt 18
3.2.2 關聯性感測分類器:GCN分類頭 19
3.3 事後偵測:基於GenAI之主動式外洩偵測系統 19
3.3.1 輸入模組 20
3.3.2 生成模組 21
3.3.3 搜尋與過濾模組 21
3.3.4 辨識模組 23
3.3.5 標記與分類模組 24
3.3.6 報告生成模組 24
第四章 實驗結果與討論 26
4.1 事前預防:智慧化隱私辨識模型實驗 26
4.1.1 實驗資料集與前處理 26
4.1.2 模型架構與 GCN 設定 27
4.1.3 訓練與評估細節 27
4.1.4 模型效能分析 28
4.1.5 GCN 關聯性建模之可視化 29
4.2 圖像偵測實驗 31
4.2.1 真實網路環境圖像偵測實驗 31
4.2.2 pHash 結構相似度實驗 34
4.2.3 動態遮罩精度優化實驗 35
4.2.4 OCR 模組於 IDNet 資料集測試 36
4.2.5 整體系統處理時間與裝置評估 37
4.2.6 實驗結果討論 38
第五章 結論與未來展望 40
參考文獻 41
-
dc.language.isozh_TW-
dc.subject敏感資料偵測zh_TW
dc.subject生成式人工智慧zh_TW
dc.subject視覺隱私治理zh_TW
dc.subject圖卷積網路zh_TW
dc.subjectGenerative Artificial Intelligence (GenAI)en
dc.subjectVisual Privacy Governanceen
dc.subjectSensitive image detectionen
dc.subjectGraph Convolutional Network (GCN)en
dc.title基於生成式人工智慧的敏感圖像資料外洩防護系統zh_TW
dc.titleA Data Loss Prevention System for Sensitive Images Based on Generative Artificial Intelligenceen
dc.typeThesis-
dc.date.schoolyear113-2-
dc.description.degree碩士-
dc.contributor.oralexamcommittee陳彥廷;林正偉zh_TW
dc.contributor.oralexamcommitteeYen-Ting Chen;Jeng-Wei Linen
dc.subject.keyword敏感資料偵測,生成式人工智慧,視覺隱私治理,圖卷積網路,zh_TW
dc.subject.keywordSensitive image detection,Generative Artificial Intelligence (GenAI),Visual Privacy Governance,Graph Convolutional Network (GCN),en
dc.relation.page43-
dc.identifier.doi10.6342/NTU202504034-
dc.rights.note未授權-
dc.date.accepted2025-08-11-
dc.contributor.author-college工學院-
dc.contributor.author-dept工程科學及海洋工程學系-
dc.date.embargo-liftN/A-
顯示於系所單位:工程科學及海洋工程學系

文件中的檔案:
檔案 大小格式 
ntu-113-2.pdf
  未授權公開取用
8.19 MBAdobe PDF
顯示文件簡單紀錄


系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved