以AI語言模型優化專利關鍵字檢索

高承億; Chen-Yi Kao

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92741

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	陳達仁	zh_TW
dc.contributor.advisor	Dar-Zen Chen	en
dc.contributor.author	高承億	zh_TW
dc.contributor.author	Chen-Yi Kao	en
dc.date.accessioned	2024-06-18T16:07:05Z	-
dc.date.available	2024-06-19	-
dc.date.copyright	2024-06-18	-
dc.date.issued	2024	-
dc.date.submitted	2024-06-11	-
dc.identifier.citation	一、書籍與期刊 (依筆劃排列) 1. 王世仁（2023,190），《智慧財產權導論與專利實務》，台北：全華圖書。 2. 陈悦,宋凯,刘安蓉,曹晓阳 (2021,40)，《基于机器学习的人工智能技术专利数据集构建新策略》，情报学报 3. Abadi, H. H. N., & Pecht, M. (2020). Artificial intelligence trends based on the patents granted by the united states patent and trademark office. IEEE Access, 8, 81633-81643. 4. Alderucci, D., & Sicker, D. (2019). Applying artificial intelligence to the patent system. Technology & Innovation, 20(4), 415-425. 5. Chiyangwa, T. B., Van Biljon, J., & Renaud, K. (2021, December). Natural language processing techniques to reveal human-computer interaction for development research topics. In Proceedings of the International Conference on Artificial Intelligence and its Applications (pp. 1-7). 6. Clarke, N. S. (2018). The basics of patent searching. World Patent Information, 54, S4-S10. 7. Devika, R., Vairavasundaram, S., Mahenthar, C. S. J., Varadarajan, V., & Kotecha, K. (2021). A deep learning model based on BERT and sentence transformer for semantic keyphrase extraction on big social data. IEEE Access, 9, 165252-165261. 8. Feng, F., Yang, Y., Cer, D., Arivazhagan, N., & Wang, W. (2020). Language-agnostic BERT sentence embedding. arXiv preprint arXiv:2007.01852. 9. Fredström, A., Wincent, J., Sjödin, D., Oghazi, P., & Parida, V. (2021). Tracking innovation diffusion: AI analysis of large-scale patent data towards an agenda for further research. Technological Forecasting and Social Change, 165, 120524. 10. Fujii, H., & Managi, S. (2018). Trends and priority shifts in artificial intelligence technology invention: A global patent analysis. Economic Analysis and Policy, 58, 60-69. 11. Giczy, A. V., Pairolero, N. A., & Toole, A. A. (2022). Identifying artificial intelligence (AI) invention: A novel AI patent dataset. The Journal of Technology Transfer, 47(2), 476-505. 12. Krestel, R., Chikkamath, R., Hewel, C., & Risch, J. (2021). A survey on deep learning for patent analysis. World Patent Information, 65, 102035. 13. Hötte, K., Tarannum, T., Verendel, V., & Bennett, L. (2022). Exploring Artificial Intelligence as a General Purpose Technology with Patent Data. arXiv preprint arXiv:2204.10304. 14. Hunt, D., Nguyen, L., & Rodgers, M. (Eds.). (2012). Patent searching: tools & techniques. John Wiley & Sons. 15. Larkey, L. S. (1999, August). A patent search and classification system. In Proceedings of the fourth ACM conference on Digital libraries (pp. 179-187). 16. Laskar, M. T. R., Huang, X., & Hoque, E. (2020, May). Contextualized embeddings based transformer encoder for sentence similarity modeling in answer selection task. In Proceedings of the Twelfth Language Resources and Evaluation Conference (pp. 5505-5514). 17. Lee, J. S., & Hsiang, J. (2020). Patent classification by fine-tuning BERT language model. World Patent Information, 61, 101965. 18. Li, B., Zhou, H., He, J., Wang, M., Yang, Y., & Li, L. (2020). On the sentence embeddings from pre-trained language models. arXiv preprint arXiv:2011.05864. 19. Liu, F., Vulić, I., Korhonen, A., & Collier, N. (2021). Fast, effective, and self-supervised: Transforming masked language models into universal lexical and sentence encoders. arXiv preprint arXiv:2104.08027. 20. Setchi, R., Spasić, I., Morgan, J., Harrison, C., & Corken, R. (2021). Artificial intelligence for patent prior art searching. World Patent Information, 64, 102021. 21. Son, J., Moon, H., Lee, J., Lee, S., Park, C., Jung, W., & Lim, H. (2022). AI for Patents: A Novel Yet Effective and Efficient Framework for Patent Analysis. IEEE Access, 10, 59205-59218. 22. Wang, L., Chou, J., Rouck, D., Tien, A., & Baumgartner, D. (2024). Adapting Sentence Transformers for the Aviation Domain. In AIAA SCITECH 2024 Forum (p. 2702). 23. Xie, Z., & Miyazaki, K. (2013). Evaluating the effectiveness of keyword search strategy for patent identification. World Patent Information, 35(1), 20-30. 24. Tesfagergish, S. G., Kapočiūtė-Dzikienė, J., & Damaševičius, R. (2022). Zero shot emotion detection for semi-supervised sentiment analysis using sentence transformers and ensemble learning. Applied Sciences, 12(17), 8662. 二、網路文獻 (依母及筆劃排列) 1. 王允中 ( 工研院 ) ，〈國際大型語言模型應用技術觀測〉，載於：https://www.moea.gov.tw/MNS/doit/industrytech/IndustryTech.aspx?menu_id=13545&it_id=486 2. 經濟部智慧局〈專利檢索〉，載於:https://www.tipo.gov.tw/public/Attachment/571717171669.pdf 3. GitHub〈E5 Text Embeddings〉，載於：https://github.com/microsoft/unilm/blob/master/e5/README.md 4. OpenAI〈Embeddings API〉，載於:https://platform.openai.com/docs/guides/embeddings 5. SBERT.NET〈Sentence-Transformers〉，載於：https://www.sbert.net/index.html	-
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/92741	-
dc.description.abstract	當前的專利檢索技術，檢索者需要準確地列出相關關鍵詞，以提高檢索的精確性。專利文件通常使用較為抽象的概念詞彙撰寫，且同一技術元件可能會以不同的詞彙表達，這使得關鍵詞的列出往往無法精準匹配。即使投入大量的時間和人力，檢索者也很難找到所有相關的關鍵詞，以覆蓋所需的專利範圍。本研究聚焦於應用 AI 語言模型於專利檢索的實踐，分析其與傳統專利檢索方法的差異，探討是否能夠透過 AI 語言模型優化現有專利檢索關鍵詞議題，以期節省檢索所需的人力與時間，實現更高效、更完整的專利查找方式。 AI 語言模型將文檢索轉化為基於 AI 向量值的相似度匹配。所以 AI 語言模型的檢索需要進行前置處理，首先需匯入專利資料並建立「專利 AI 向量資料庫」。當後續進行 AI 專利檢索時，查詢詞需轉換為 AI 向量值，隨後與「專利 AI向量資料庫」進行相似度比對，根據相似度的高低順序產生與專利案件的關聯的案件結果，最後再透過相關性判斷結果的關聯性及正確性。本研究使用兩個案例透過以上的方法分析及比較 TIPO 與 AI 語言模型的檢索結果，驗證 AI 語言模型在專利檢索提供「更準更全」的能力。期望透過 AI 語言模型的優化，為專利檢索帶來革新，並探討其商業化的可能性。	zh_TW
dc.description.abstract	Current patent search techniques require searchers to accurately list relevant keywords to improve the precision of the search. Patent documents are often written using more abstract conceptual vocabulary, and the same technical component may be expressed with different terms, making it challenging for the listed keywords to match precisely. Even with considerable time and effort, it is difficult for searchers to find all the relevant keywords to cover the required range of patents. This study focuses on the practical application of AI language models for patent searching, analyzing the differences from traditional patent search methods, and exploring whether AI language models can optimize current issues with patent search keywords. The goal is to save the manpower and time required for searching, and to achieve a more efficient and comprehensive method of patent retrieval. AI language models transform text retrieval into similarity matching based on AI vector values. The retrieval of AI language models requires preprocessing; initially, a "Patent AI Vector Database" must be established. When subsequent AI patent searches are conducted, the query terms are transformed into AI vector values, followed by similarity matching with the "Patent AI Vector Database." Results are generated in the order of similarity, associating them with related patent cases. The final step involves manually reviewing and verifying the relevance and accuracy of the results. This study analyzes and compares the search results of TIPO and AI language models through two cases using the above methods, verifying the AI language model’s ability to provide "more accurate and comprehensive" patent retrieval capabilities. By optimizing AI language models, it is hoped to revolutionize patent search and explore its potential for commercialization.	en
dc.description.provenance	Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-06-18T16:07:05Z No. of bitstreams: 0	en
dc.description.provenance	Made available in DSpace on 2024-06-18T16:07:05Z (GMT). No. of bitstreams: 0	en
dc.description.tableofcontents	摘要 ................................................................................................................................... i Abstract ............................................................................................................................. ii 目次 ................................................................................................................................. iv 圖次 .............................................................................................................................. v 表次 ............................................................................................................................. vi 第一章緒論 .................................................................................................................... 1 第二章文獻回顧 ............................................................................................................ 3 傳統檢索技術 ........................................................................................................... 3 AI 在檢索中的演進 .................................................................................................. 4 第三章研究方法 ............................................................................................................ 7 樣本資料切割優化 ................................................................................................. 10 樣本資料向量化 (Embeddings) ............................................................................ 12 AI 語言模型技術 .................................................................................................... 13 樣本相似度比對 (Similarity) ................................................................................ 16 第四章案例研究 ........................................................................................................ 18 案例一 ........................................................................................................................ 21 案例二 ........................................................................................................................ 30 第五章結論 .............................................................................................................. 36 未來研究方向 ......................................................................................................... 37 結語 ......................................................................................................................... 37 參考文獻 ........................................................................................................................ 39 附錄 ................................................................................................................................ 42 附件 1: 案例一「農業 AND 影像辨識」參考資料 .............................................. 42 附件 2: 案例二「動物 AND 醫療器材」參考資料 .............................................. 45	-
dc.language.iso	zh_TW	-
dc.subject	AI 向量化	zh_TW
dc.subject	語言模型	zh_TW
dc.subject	相似度比對	zh_TW
dc.subject	Language Model	en
dc.subject	similarity comparison	en
dc.subject	AI vactor vlaue	en
dc.title	以AI語言模型優化專利關鍵字檢索	zh_TW
dc.title	Enhancing Patent Search Key-word with AI Language Model	en
dc.type	Thesis	-
dc.date.schoolyear	112-2	-
dc.description.degree	碩士	-
dc.contributor.oralexamcommittee	黃慕萱;李素華	zh_TW
dc.contributor.oralexamcommittee	Mu-Hsuan Huang;Su-Hua LEE	en
dc.subject.keyword	語言模型,AI 向量化,相似度比對,	zh_TW
dc.subject.keyword	Language Model,AI vactor vlaue,similarity comparison,	en
dc.relation.page	48	-
dc.identifier.doi	10.6342/NTU202401067	-
dc.rights.note	未授權	-
dc.date.accepted	2024-06-12	-
dc.contributor.author-college	工學院	-
dc.contributor.author-dept	工業工程學研究所	-
顯示於系所單位：	工業工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-112-2.pdf 未授權公開取用	1.62 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。