請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98548完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 黃從仁 | zh_TW |
| dc.contributor.advisor | Tsung-Ren Huang | en |
| dc.contributor.author | 林佐竺 | zh_TW |
| dc.contributor.author | Zuo-Zhu Lin | en |
| dc.date.accessioned | 2025-08-18T00:50:01Z | - |
| dc.date.available | 2025-08-18 | - |
| dc.date.copyright | 2025-08-15 | - |
| dc.date.issued | 2025 | - |
| dc.date.submitted | 2025-08-01 | - |
| dc.identifier.citation | K. Abramski, R. Improta, G. Rossetti, and M. Stella. The “llm world of words"english free association norms generated by large language models. Scientific data, 12(1):1–9, 2025.
A. Agiza, M. Mostagir, and S. Reda. Politune: Analyzing the impact of data selection and fine-tuning on economic and political biases in large language models. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, volume 7, pages 2–12, 2024. K. Chen, Z. He, J. Yan, T. Shi, and K. Lerman. How susceptible are large language models to ideological manipulation?, 2024a. URL https://arxiv.org/abs/2402.11725. L. Chen, S. Li, J. Yan, H. Wang, K. Gunaratna, V. Yadav, Z. Tang, V. Srinivasan, T. Zhou, H. Huang, and H. Jin. Alpagasus: Training a better alpaca with fewer data, 2024b. URL https://arxiv.org/abs/2307.08701. X. Dai, L. Zhou, B. Wang, and H. Li. From word to world: Evaluate and mitigate culture bias via word association test, 2025. URL https://arxiv.org/abs/2505.18562. T. Dettmers, A. Pagnoni, A. Holtzman, and L. Zettlemoyer. Qlora: Efficient finetuning of quantized llms, 2023. URL https://arxiv.org/abs/2305.14314. J. Digutsch and M. Kosinski. Overlap in meaning is a stronger predictor of semantic activation in gpt-3 than in humans. Scientific Reports, 13(1):5035, 2023. A. Grattafiori, A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Vaughan, A. Yang, A. Fan, A. Goyal, A. Hartshorn, A. Yang, A. Mitra, A. Sravankumar, A. Korenev, A. Hinsvark, A. Rao, A. Zhang, A. Rodriguez, A. Gregerson, A. Spataru, B. Roziere, B. Biron, B. Tang, B. Chern, C. Caucheteux, C. Nayak, C. Bi, C. Marra, C. McConnell, C. Keller, C. Touret, C. Wu, C. Wong, C. C. Ferrer, C. Nikolaidis, D. Allonsius, D. Song, D. Pintz, D. Livshits, D. Wyatt, D. Esiobu, D. Choudhary, D. Mahajan, D. Garcia-Olano, D. Perino, D. Hupkes, E. Lakomkin, E. AlBadawy, E. Lobanova, E. Dinan, E. M. Smith, F. Radenovic, F. Guzmán, F. Zhang, G. Synnaeve, G. Lee, G. L. Anderson, G. Thattai, G. Nail, G. Mialon, G. Pang, G. Cucurell, H. Nguyen, H. Korevaar, H. Xu, H. Touvron, I. Zarov, I. A. Ibarra, I. Kloumann, I. Misra, I. Evtimov, J. Zhang, J. Copet, J. Lee, J. Geffert, J. Vranes, J. Park, J. Mahadeokar, J. Shah, J. van der Linde, J. Billock, J. Hong, J. Lee, J. Fu, J. Chi, J. Huang, J. Liu, J. Wang, J. Yu, J. Bitton, J. Spisak, J. Park, J. Rocca, J. Johnstun, J. Saxe, J. Jia, K. V. Alwala, K. Prasad, K. Upasani, K. Plawiak, K. Li, K. Heafield, K. Stone, K. ElArini, K. Iyer, K. Malik, K. Chiu, K. Bhalla, K. Lakhotia, L. Rantala-Yeary, L. van der Maaten, L. Chen, L. Tan, L. Jenkins, L. Martin, L. Madaan, L. Malo, L. Blecher, L. Landzaat, L. de Oliveira, M. Muzzi, M. Pasupuleti, M. Singh, M. Paluri, M. Kardas, M. Tsimpoukelli, M. Oldham, M. Rita, M. Pavlova, M. Kambadur, M. Lewis, M. Si, M. K. Singh, M. Hassan, N. Goyal, N. Torabi, N. Bashlykov, N. Bogoychev, N. Chatterji, N. Zhang, O. Duchenne, O. Çelebi, P. Alrassy, P. Zhang, P. Li, P. Vasic, P. Weng, P. Bhargava, P. Dubal, P. Krishnan, P. S. Koura, P. Xu, Q. He, Q. Dong, R. Srinivasan, R. Ganapathy, R. Calderer, R. S. Cabral, R. Stojnic, R. Raileanu, R. Maheswari, R. Girdhar, R. Patel, R. Sauvestre, R. Polidoro, R. Sumbaly, R. Taylor, R. Silva, R. Hou, R. Wang, S. Hosseini, S. Chennabasappa, S. Singh, S. Bell, S. S. Kim, S. Edunov, S. Nie, S. Narang, S. Raparthy, S. Shen, S. Wan, S. Bhosale, S. Zhang, S. Vandenhende, S. Batra, S. Whitman, S. Sootla, S. Collot, S. Gururangan, S. Borodinsky, T. Herman, T. Fowler, T. Sheasha, T. Georgiou, T. Scialom, T. Speckbacher, T. Mihaylov, T. Xiao, U. Karn, V. Goswami, V. Gupta, V. Ramanathan, V. Kerkez, V. Gonguet, V. Do, V. Vogeti, V. Albiero, V. Petrovic, W. Chu, W. Xiong, W. Fu, W. Meers, X. Martinet, X. Wang, X. Wang, X. E. Tan, X. Xia, X. Xie, X. Jia, X. Wang, Y. Goldschlag, Y. Gaur, Y. Babaei, Y. Wen, Y. Song, Y. Zhang, Y. Li, Y. Mao, Z. D. Coudert, Z. Yan, Z. Chen, Z. Papakipos, A. Singh, A. Srivastava, A. Jain, A. Kelsey, A. Shajnfeld, A. Gangidi, A. Victoria, A. Goldstand, A. Menon, A. Sharma, A. Boesenberg, A. Baevski, A. Feinstein, A. Kallet, A. Sangani, A. Teo, A. Yunus, A. Lupu, A. Alvarado, A. Caples, A. Gu, A. Ho, A. Poulton, A. Ryan, A. Ramchandani, A. Dong, A. Franco, A. Goyal, A. Saraf, A. Chowdhury, A. Gabriel, A. Bharambe, A. Eisenman, A. Yazdan, B. James, B. Maurer, B. Leonhardi, B. Huang, B. Loyd, B. D. Paola, B. Paranjape, B. Liu, B. Wu, B. Ni, B. Hancock, B. Wasti, B. Spence, B. Stojkovic, B. Gamido, B. Montalvo, C. Parker, C. Burton, C. Mejia, C. Liu, C. Wang, C. Kim, C. Zhou, C. Hu, C.-H. Chu, C. Cai, C. Tindal, C. Feichtenhofer, C. Gao, D. Civin, D. Beaty, D. Kreymer, D. Li, D. Adkins, D. Xu, D. Testuggine, D. David, D. Parikh, D. Liskovich, D. Foss, D. Wang, D. Le, D. Holland, E. Dowling, E. Jamil, E. Montgomery, E. Presani, E. Hahn, E. Wood, E.-T. Le, E. Brinkman, E. Arcaute, E. Dunbar, E. Smothers, F. Sun, F. Kreuk, F. Tian, F. Kokkinos, F. Ozgenel, F. Caggioni, F. Kanayet, F. Seide, G. M. Florez, G. Schwarz, G. Badeer, G. Swee, G. Halpern, G. Herman, G. Sizov, Guangyi, Zhang, G. Lakshminarayanan, H. Inan, H. Shojanazeri, H. Zou, H. Wang, H. Zha, H. Habeeb, H. Rudolph, H. Suk, H. Aspegren, H. Goldman, H. Zhan, I. Damlaj, I. Molybog, I. Tufanov, I. Leontiadis, I.-E. Veliche, I. Gat, J. Weissman, J. Geboski, J. Kohli, J. Lam, J. Asher, J.-B. Gaya, J. Marcus, J. Tang, J. Chan, J. Zhen, J. Reizenstein, J. Teboul, J. Zhong, J. Jin, J. Yang, J. Cummings, J. Carvill, J. Shepard, J. McPhie, J. Torres, J. Ginsburg, J. Wang, K. Wu, K. H. U, K. Saxena, K. Khandelwal, K. Zand, K. Matosich, K. Veeraraghavan, K. Michelena, K. Li, K. Jagadeesh, K. Huang, K. Chawla, K. Huang, L. Chen, L. Garg, L. A, L. Silva, L. Bell, L. Zhang, L. Guo, L. Yu, L. Moshkovich, L. Wehrstedt, M. Khabsa, M. Avalani, M. Bhatt, M. Mankus, M. Hasson, M. Lennie, M. Reso, M. Groshev, M. Naumov, M. Lathi, M. Keneally, M. Liu, M. L. Seltzer, M. Valko, M. Restrepo, M. Patel, M. Vyatskov, M. Samvelyan, M. Clark, M. Macey, M. Wang, M. J. Hermoso, M. Metanat, M. Rastegari, M. Bansal, N. Santhanam, N. Parks, N. White, N. Bawa, N. Singhal, N. Egebo, N. Usunier, N. Mehta, N. P. Laptev, N. Dong, N. Cheng, O. Chernoguz, O. Hart, O. Salpekar, O. Kalinli, P. Kent, P. Parekh, P. Saab, P. Balaji, P. Rittner, P. Bontrager, P. Roux, P. Dollar, P. Zvyagina, P. Ratanchandani, P. Yuvraj, Q. Liang, R. Alao, R. Rodriguez, R. Ayub, R. Murthy, R. Nayani, R. Mitra, R. Parthasarathy, R. Li, R. Hogan, R. Battey, R. Wang, R. Howes, R. Rinott, S. Mehta, S. Siby, S. J. Bondu, S. Datta, S. Chugh, S. Hunt, S. Dhillon, S. Sidorov, S. Pan, S. Mahajan, S. Verma, S. Yamamoto, S. Ramaswamy, S. Lindsay, S. Lindsay, S. Feng, S. Lin, S. C. Zha, S. Patil, S. Shankar, S. Zhang, S. Zhang, S. Wang, S. Agarwal, S. Sajuyigbe, S. Chintala, S. Max, S. Chen, S. Kehoe, S. Satterfield, S. Govindaprasad, S. Gupta, S. Deng, S. Cho, S. Virk, S. Subramanian, S. Choudhury, S. Goldman, T. Remez, T. Glaser, T. Best, T. Koehler, T. Robinson, T. Li, T. Zhang, T. Matthews, T. Chou, T. Shaked, V. Vontimitta, V. Ajayi, V. Montanez, V. Mohan, V. S. Kumar, V. Mangla, V. Ionescu, V. Poenaru, V. T. Mihailescu, V. Ivanov, W. Li, W. Wang, W. Jiang, W. Bouaziz, W. Constable, X. Tang, X. Wu, X. Wang, X. Wu, X. Gao, Y. Kleinman, Y. Chen, Y. Hu, Y. Jia, Y. Qi, Y. Li, Y. Zhang, Y. Zhang, Y. Adi, Y. Nam, Yu, Wang, Y. Zhao, Y. Hao, Y. Qian, Y. Li, Y. He, Z. Rait, Z. DeVito, Z. Rosnbrick, Z. Wen, Z. Yang, Z. Zhao, and Z. Ma. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783. P. Haller, A. Aynetdinov, and A. Akbik. Opiniongpt: Modelling explicit biases in instruction-tuned llms, 2023. URL https://arxiv.org/abs/2309.03876. E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685. J. Hübotter, S. Bongni, I. Hakimi, and A. Krause. Efficiently learning at test-time: Active fine-tuning of llms, 2025. URL https://arxiv.org/abs/2410.08020. R. Jha, B. Wang, M. Günther, G. Mastrapas, S. Sturua, I. Mohr, A. Koukounas, M. K. Akram, N. Wang, and H. Xiao. Jina-colbert-v2: A general-purpose multilingual late interaction retriever, 2024. URL https://arxiv.org/abs/2408.16672. K. Killamsetty, D. Sivasubramanian, G. Ramakrishnan, A. De, and R. Iyer. Grad-match: Gradient matching based data subset selection for efficient deep model training, 2021. URL https://arxiv.org/abs/2103.00123. N. Lee, Y. Bang, T. Yu, A. Madotto, and P. Fung. Neus: Neutral multi-news summarization for mitigating framing bias, 2022. URL https://arxiv.org/abs/2204.04902. F. Motoki, V. Pinho Neto, and V. Rodrigues. More human than human: measuring chatgpt political bias. Public Choice, 198(1):3–23, 2024. Z. Nie, Z. Feng, M. Li, C. Zhang, Y. Zhang, D. Long, and R. Zhang. When text embedding meets large language model: A comprehensive survey, 2025. URL https://arxiv.org/abs/2412.09165. Qwen, :, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Wan, Y. Liu, Z. Cui, Z. Zhang, and Z. Qiu. Qwen2.5 technical report, 2025. URL https://arxiv.org/abs/2412.15115. D. Rozado. The political preferences of llms. PloS one, 19(7):e0306621, 2024. C. Tao, T. Shen, S. Gao, J. Zhang, Z. Li, Z. Tao, and S. Ma. Llms are also effective embedding models: An in-depth overview, 2024. URL https://arxiv.org/abs/2412.12591. R. Taori, I. Gulrajani, T. Zhang, Y. Dubois, X. Li, C. Guestrin, P. Liang, and T. B. Hashimoto. Alpaca: A strong, replicable instruction-following model. Stanford Center for Research on Foundation Models. https://crfm.stanford.edu/2023/03/13/alpaca. html, 3(6):7, 2023. Y. Wang, Y. Kordi, S. Mishra, A. Liu, N. A. Smith, D. Khashabi, and H. Hajishirzi. Self-instruct: Aligning language models with self-generated instructions, 2023. URL https://arxiv.org/abs/2212.10560. Y. Wang, Y. Deng, G. Wang, T. Li, H. Xiao, and Y. Zhang. The fluency-based semantic network of llms differs from humans. Computers in Human Behavior: Artificial Humans, 3:100103, 2025. X. Yang, S. Nie, L. Liu, S. Gururangan, U. Karn, R. Hou, M. Khabsa, and Y. Mao. Diversity-driven data selection for language model tuning through sparse autoencoder, 2025. URL https://arxiv.org/abs/2502.14050. B. Zhang, K. Chang, and C. Li. Simple techniques for enhancing sentence embeddings in generative language models, 2024. URL https://arxiv.org/abs/2404.03921. C. Zhou, P. Liu, P. Xu, S. Iyer, J. Sun, Y. Mao, X. Ma, A. Efrat, P. Yu, L. Yu, S. Zhang, G. Ghosh, M. Lewis, L. Zettlemoyer, and O. Levy. Lima: Less is more for alignment, 2023. URL https://arxiv.org/abs/2305.11206. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/98548 | - |
| dc.description.abstract | 大型語言模型預設的意識形態偏見威脅資訊客觀性,但傳統操控模型立場的方法往往成本高昂且容易受到資料雜訊影響,難以落實。本研究首次提出以認知心理學中的「自由聯想」為基礎的新方法,透過模型自身生成關鍵詞,低成本且高效地篩選建構 IdeoFA 資料集。實驗證明,IdeoFA 資料集能有效地提高模型的意識形態操控能力並具有跨主題泛化性。本方法克服傳統方法之缺點,為未來 LLM 精確控制提供實務上立即可行的解決方案。 | zh_TW |
| dc.description.abstract | Default ideological biases in large language models (LLMs) threaten objectivity, yet traditional steering methods are expensive and sensitive to dataset noise, limiting real‑world use. We introduce a free‑association approach that prompts the model to generate keywords, constructing the IdeoFA dataset at minimal cost. Experimental evidence shows that IdeoFA significantly enhances the robustness of ideological controllability and generalises across topics, thereby offering an immediately deployable solution for precise LLM governance. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2025-08-18T00:50:01Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2025-08-18T00:50:01Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | 致謝 i
摘要 iii Abstract v 目次 vii 圖次 xi 表次 xiii 第一章 緒論 1 第二章 相關文獻 3 2.1 模型的意識形態 3 2.2 訓練資料篩選 3 2.3 自由聯想 4 第三章 研究方法 5 3.1 生成自由聯想詞集 5 3.2 篩選資料集 6 3.2.1 關鍵詞篩選 6 3.2.2 排序篩選:句向量聚合 7 3.2.3 排序篩選:多詞相似度計算 8 3.2.4 模型評分篩選 8 3.3 評估指標 8 3.4 創建 IdeoFA 資料集 9 第四章 實驗流程 11 4.1 實驗設置 11 4.2 衡量模型偏見 11 4.3 尋找最佳資料篩選方法 12 4.4 比較 IdeoFA 與 IdeoINST 14 第五章 實驗結果 15 5.1 自由聯想詞集生成結果 15 5.2 資料篩選方法效能比較 15 5.3 IdeoFA 與 IdeoINST 效果比較 18 第六章 結論與討論 21 6.1 結論 21 6.2 研究限制與未來研究方向 21 參考文獻 23 附錄 A — 提示模板 31 A.1 生成自由聯想詞(System Prompt) 31 A.2 末位池化 31 A.3 模型評斷相似度(System Prompt) 32 A.4 生成 IdeoFA 指令 32 A.5 生成 IdeoFA 偏見(System Prompt) 35 A.6 生成回應(System Prompt) 35 A.7 生成標籤 35 A.8 評估偏見程度(System Prompt) 36 附錄 B — 演算法 37 B.1 嵌入篩選 37 附錄 C — IdeoFA 39 C.1 示例 39 C.2 隨機 1000 筆樣本訓練後意識操控結果比較 40 附錄 D — 超參數設置 43 D.1 模型與訓練參數設置 43 | - |
| dc.language.iso | zh_TW | - |
| dc.subject | 意識形態偏見 | zh_TW |
| dc.subject | 大型語言模型 | zh_TW |
| dc.subject | 自由聯想 | zh_TW |
| dc.subject | 資料篩選 | zh_TW |
| dc.subject | 微調 | zh_TW |
| dc.subject | Fine-Tuning | en |
| dc.subject | Data Selection | en |
| dc.subject | Free Association | en |
| dc.subject | Ideological Bias | en |
| dc.subject | LLMs | en |
| dc.title | 有限樣本在語言模型中的高效價值轉換 | zh_TW |
| dc.title | Efficient Value Transformation in Language Models Using Limited Training Samples | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-2 | - |
| dc.description.degree | 碩士 | - |
| dc.contributor.oralexamcommittee | 蔡宗翰;古倫維 | zh_TW |
| dc.contributor.oralexamcommittee | Tzong-Han Tsai;Lun-Wei Ku | en |
| dc.subject.keyword | 大型語言模型,意識形態偏見,微調,資料篩選,自由聯想, | zh_TW |
| dc.subject.keyword | LLMs,Ideological Bias,Fine-Tuning,Data Selection,Free Association, | en |
| dc.relation.page | 43 | - |
| dc.identifier.doi | 10.6342/NTU202501741 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2025-08-06 | - |
| dc.contributor.author-college | 共同教育中心 | - |
| dc.contributor.author-dept | 統計碩士學位學程 | - |
| dc.date.embargo-lift | 2025-08-18 | - |
| 顯示於系所單位: | 統計碩士學位學程 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-2.pdf | 5.17 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
