應用柔性強化學習於黏晶製程參數控制以改善溢膠品質

李郁恩; Yu-En Li

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100216

標題:	應用柔性強化學習於黏晶製程參數控制以改善溢膠品質 Soft Reinforcement Learning for Die Bonding Process Parameter Control to Improve Epoxy Bleed Out Quality
作者:	李郁恩 Yu-En Li
指導教授:	李家岩 Chia-Yen Lee
關鍵字:	黏晶製程,虛擬量測,機器學習,深度強化學習,製程控制, die bonding process,virtual metrology,machine learning,deep reinforcement learning,process control,
出版年 :	2025
學位:	碩士
摘要:	在半導體製程中，製程參數的控制對於提升產品良率與品質具有關鍵影響。本研究提出一套基於深度強化學習的製程參數控制框架，應用於黏晶製程中關鍵品質指標，包括溢膠距離與膠厚的優化控制。該框架整合虛擬量測模型與溫度變化模擬機制，以產生半合成品質指標，進一步設計同時考量品質偏差與參數邊界限制的獎勵函數，引導代理人有效調整製程參數。實驗結果顯示，所提出的深度強化學習控制器在穩定性與整體表現上均優於傳統的 d-EWMA 方法。在控制策略方面，單一參數調整於多數情境中表現更佳，於個別實驗中可達60.6%的總獎勵提升，並同時降低品質偏差。在預測模型方面，MLP在多數情境中表現優於XGBoost，總獎勵提升達64.4%。此外，固定初始條件有助於強化訓練穩定性，總獎勵較隨機初始化提升達50.1%。雖然整體趨勢在不同產品間大致一致，仍觀察到部分例外，凸顯於強化學習控制策略設計中，需同時考量產品特性與訓練條件的搭配。 In semiconductor manufacturing, the control of process parameters plays a critical role in improving product yield and quality. This study proposes a deep reinforcement learning (DRL)-based control framework, applied to the die bonding process for optimizing key quality indicators, including Epoxy Bleeding Overflow (EBO) distances and Bond Line Thickness (BLT). The framework integrates a virtual metrology model with a temperature fluctuation simulation mechanism to generate semi-synthetic quality indicator values, which supports the design of a reward function that accounts for both quality deviations and process parameter bound constraints. This reward function guides the agent in effectively adjusting process parameters. Experimental results show that the DRL controller outperforms the traditional d-EWMA method in both stability and quality performance. Regarding control strategies, single-parameter adjustment outperforms multi-parameter adjustment in most cases, yielding up to a 60.6% increase in total reward and a reduction in quality deviations. The MLP outperforms XGBoost in most scenarios, achieving a total reward improvement of up to 64.4%. Fixed initialization further enhances convergence, showing a 50.1% increase in total reward compared to random initialization. While consistent trends are observed across products, certain exceptions underscore the importance of considering product-specific characteristics and training conditions.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/100216
DOI:	10.6342/NTU202502896
全文授權:	同意授權(全球公開)
電子全文公開日期:	2027-08-05
顯示於系所單位：	資訊管理學系

文件中的檔案：

檔案	大小	格式
ntu-113-2.pdf 此日期後於網路公開 2027-08-05	4.59 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。