請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94554| 標題: | 開發適用於多成分及複雜化學結構之電腦輔助分子設計方法 Development of Computer-Aided Molecular Design Methods for Multicomponent and Complex Chemical Structures |
| 作者: | 黃晨軒 Chen-Hsuan Huang |
| 指導教授: | 林祥泰 Shiang-Tai Lin |
| 關鍵字: | 電腦輔助分子設計,分子表示法,化學物篩選,溶劑,離子液體,碳捕捉,分子生成式模型比較, Computer-aided molecular design,molecular representation,chemical screening,solvent,ionic liquid,carbon capture,comparisons among molecular generative models, |
| 出版年 : | 2024 |
| 學位: | 博士 |
| 摘要: | 本文分為三部分。第一部分闡述電腦輔助分子設計(computer-aided molecular design, CAMD)之概念框架,以及其能如何幫助特用化學品的早期研發。傳統上,特用化學品的研發主要依賴研究人員對問題的經驗,依其化學直覺(chemical intuition)反覆地進行試誤性(trial-and-error)實驗合成與鑑定。由於新課題常與研究員過去經驗有一定差距,在早期研發階段研究方向不明確時,常因冗餘的試驗而造成人力、物力、財力的浪費。電腦輔助分子設計即是想透過電算的方式作為輔助,以改善研發效率。此技術能讓研究者預先得知一小批候選化學物,再鎖定此範圍進行合成與鑑定。在本研究中,我們建立了原子級精細度的電腦輔助分子設計程序。使用者只須給定物化性規格,即可透過最佳化演算法與迭代來設計符合條件的分子。分子設計程序由三要件組成:MARS+分子資料結構(molecular data structure, MDS)、性質預測模型方法、在化學空間(chemical space)搜尋新分子之演算法。
在分子資料結構部分,我們以數學上的圖(graph)來表示一個分子結構。我們預定義了常見原子與一些基團,並指明它們可用的價鍵種類與數目,作為基本元素庫(base element library)。一給定的分子結構轉換成MARS+資料結構時,其組成原子會被解析為我們預定義的基本元素,並透過八個只包含零與正整數的陣列與兩個字串陣列來描述它們之間的鍵結狀況。其中,元素編號陣列(element indices array)與母元素編號陣列(parent indices array)決定分子內各元素間相對連接關係,鍵級陣列(bond order array)描述上述連接關係之鍵級。元素型別陣列(element type array)記錄各組成原子的種類。元素的異構性(isomerism)則由手性標記陣列(chirality flag array)與兩個順反標記陣列(cis-trans flag array)標示。環號標記陣列(cyclic flag array)與成環鍵結陣列(cyclic bond order array)紀錄分子中之環狀結構。 在性質預測方面,我們基於量化計算軟體,可以算得物質的光電性質,例如HOMO-LUMO能隙、絕熱游離能(adiabatic ionization potential)、絕熱電子親和力(adiabatic electron affinity),垂直游離能(vertical ionization potential)、垂直電子親和力(adiabatic electron affinity)、化學硬度(chemical hardness)、親電性指標(electrophilicity index)。此外,也可進行COSMO溶合計算,得到分子於溶劑中產生之屏蔽電荷(screening charge),並輸入至COSMO-SAC模型計算活性係數(activity coefficient),應用於相平衡計算。 在搜尋新分子之演算法方面,我們以基因演算法(genetic algorithm, GA)為基底,來對存於MARS+資料結構中的分子結構做修飾,以產生新分子。其模式主要分為添加(addition)、減去(subtraction)、插入(insertion)、元素改變(element change)、鍵級改變(bond change)、成環(cyclization)、開環(decyclization)、手性反轉(chirality inversion)、順反異構性反轉(cis-trans inversion)、片段交換(crossover)、接合(combination)、成分交換(component switch)。產生的新分子會先進行物化性之計算,並依照適應度函數(fitness function),賦予接近物化性規格要求者較高的適應度(fitness)。最後,以天擇演算法(selection algorithm)決定新分子何者可留存至下一迭代。本研究建立的天擇演算法包含輪轉法(roulette wheel, RW)、模擬退火(simulated annealing, SA)、適應度蒙地卡羅(fitness Monte Carlo, FMC)、非支配排序演算法(non-dominated sorting, NS)。反覆進行「基因演算法-性質預測-天擇演算法」迭代,即可逐步設計出接近物化性規格要求之分子。 本作第二部分以設計新型離子液體作為二氧化碳吸附劑作為範例,展示我們自建的分子設計能因應任務特異性(task-specific)進行設計。在此部份我們使用COSMO-SAC模型預測二氧化碳於離子液體的物理吸附溶解度。為了驗證模型的準確度,我們蒐集了96種離子液體共4537筆實驗數據,並比對其與COSMO-SAC模型預測結果的一致性,結果顯示其精確度足夠作為定性或半定量之用。設計出的3500種離子液體,有80 %其碳捕捉的表現與已被文獻報告者相當,而有少數比已知離子液體好許多。分子設計的結果顯示若要將二氧化碳溶解度提高,則離子液體的陰離子基團需要限縮至氟、氯、溴、碘,或者氫氧根離子。 本作第三部分使用GuacaMol與MolOpt兩套基準套件(benchmark suite)平臺來比較MARS+與其他生成式模型用於有機分子設計任務時的表現差異。GuacaMol平臺主要評估效度(effectiveness),亦即足夠長的迭代數下,生成式模型能否達成目標。而MolOpt平臺主要評估效率(efficiency),亦即制定非常有限的化學物產生數額度,觀察在額度內所產生的化學物之優選性(optimality)。在GuacaMol平臺的比較結果顯示MARS+的表現位列第二,僅次於GRAPH_GA模型。在多數任務中,MARS+和GRAPH_GA表現相匹敵,但在搜尋結構異構物(constitutional isomers)方面明顯比GRAPH_GA表現不好。MARS+的片段交換(crossover)操作子經過泛化(generalization)後,可顯著提昇結構異構物的搜尋能力,但同時也會大幅犧牲其在一些單目標任務(single-objective tasks)的表現。在MolOpt平臺的比較結果顯示MARS+的表現位列第三,僅次於第一的REINVENT模型與第二的GRAPH_GA模型。在多數任務中,MARS+和GRAPH_GA表現相匹敵,但在搜尋希樂葆(Celecoxib)藥物分子方面明顯比GRAPH_GA表現不好。主因可能在於GRAPH_GA有環片段交換(ring crossover)操作子來確保操作前後環的數量未減少。 在展望與未來工作方面主要有四點。第一點是運用CAMD於其他化學系統的設計。一些化學系統的設計任務是現行的MARS+可以做到,或者僅須經由小幅度修改程式即可做到。例如:藥物共晶(pharmaceutical cocrystals)、雙鹽類離子液體(double-salt ionic liquids, DSILs)、深共熔溶劑(deep eutectic solvents, DESs)、光電材料、生物巨分子、高分子聚合物等。第二點是進一步多樣化分子的操作機制在,例如在MARS+增加環片段交換(ring crossover)操作子。第三點是將分子設計與化工程序設計整合,形成整體的設計方法。第四點是定性比較MARS+內的各種選擇演算法(selection algorithm),以幫助我們進一步釐清這些演算法的行為。 This work is divided into three parts. The first part elucidates the conceptual framework of Computer-Aided Molecular Design (CAMD) and its potential to facilitate the early-stage development of specialty chemicals. Traditionally, the development of specialty chemicals has primarily relied on researchers' experience, involving iterative synthesis and characterization. Given the frequent discrepancies between new challenges and researchers' past experiences, the early development phase often suffers from directionless experimentation, leading to a waste of manpower, materials, and financial resources. CAMD aims to enhance research efficiency by leveraging computational methods to pre-identify a small pool of candidate chemicals for targeted synthesis and characterization. In this study, we have established an atomically detailed CAMD procedure. Users can input the desired physicochemical properties, and the system employs optimization algorithms and iterative processes to design molecules that meet these criteria. The molecular design process comprises three key components: the MARS+ molecular data structure (MDS), property prediction models, and algorithms for searching new molecules in chemical space. In the molecular data structure component, we represent a molecular structure as a mathematical graph. We predefine common atoms and certain functional groups, specifying their available valence bonds and numbers as a base element library. When a given molecular structure is converted into the MARS+ data structure, its constituent atoms are parsed into our predefined basic elements. Their bonding status is described using eight arrays, containing only zeros and positive integers, along with two string arrays. For property prediction, we use quantum calculation software to compute the optoelectronic properties of substances, such as the HOMO-LUMO gap, adiabatic ionization potential, adiabatic electron affinity, vertical ionization potential, vertical electron affinity, chemical hardness, and electrophilicity index. Additionally, COSMO solvation calculations are conducted to obtain the screening charge of molecules in solvents, which is then input into the COSMO-SAC model to calculate activity coefficients, applicable in phase equilibrium calculations. The algorithm for searching new molecules is based on the Genetic Algorithm (GA), which modifies molecular structures stored in the MARS+ data structure to generate new molecules. Newly generated molecules undergo physicochemical property calculations and are evaluated for fitness based on a fitness function, with those closely matching the desired specifications receiving higher fitness scores. Finally, a selection algorithm determines which new molecules advance to the next iteration. Our selection algorithms include Roulette Wheel (RW), Simulated Annealing (SA), Fitness Monte Carlo (FMC), and Non-dominated Sorting (NS). Repeated iterations of the "Genetic Algorithm - Property Prediction - Selection Algorithm" cycle progressively yield molecules that closely meet the specified physicochemical criteria. The second part of this work demonstrates the application of our molecular design framework to develop novel ionic liquids as CO2 adsorbents. In this section, we use the COSMO-SAC model to predict the physical absorption solubility of CO2 in ionic liquids. To validate the model's accuracy, we collected 4537 experimental data points for 96 ionic liquids and compared them with the COSMO-SAC model predictions. The results show sufficient accuracy for qualitative or semi-quantitative purposes. Among the 3500 designed ionic liquids, 80% exhibited CO2 capture performance comparable to those reported in the literature, with a few significantly outperforming known ionic liquids. The design results suggest that enhancing CO2 solubility requires constraining the anionic groups of the ionic liquids to fluoride, chloride, bromide, iodide, or hydroxide ions. In the third part of this study, we utilized the GuacaMol and MolOpt benchmark suites to assess the performance of MARS+ compared to other generative models in goal-directed tasks. GuacaMol evaluates effectiveness, measuring how well property targets are achieved over a sufficient number of iterations. MolOpt evaluates efficiency, assessing the optimality of generated species within a limited number of iterations. In GuacaMol, MARS+ ranked 2nd, closely behind the GRAPH_GA model. In MolOpt, MARS+ ranked 3rd, following the REINVENT model (1st) and GRAPH_GA (2nd). Generalizing the crossover operator in MARS+ significantly enhances its capability to search for constitutional isomers, albeit at the cost of performance in single-objective tasks. The ring crossover operator in GRAPH_GA appears to be a significant factor contributing to performance differences between MARS+ and GRAPH_GA. There are four potential avenues for future research. First, extending CAMD applications to other chemical systems where current MARS+ capabilities suffice or require minor program modifications, such as pharmaceutical cocrystals, double-salt ionic liquids (DSILs), deep eutectic solvents (DESs), optoelectronic materials, biomolecules, and polymers. Second, further diversifying molecular operational mechanisms, including integrating a ring crossover operator into MARS+. Third, integrating molecular design with chemical process design to make the design tasks more realistic. Fourth, conducting qualitative comparisons of various selection algorithms within MARS+ to gain deeper insights into their behaviors. |
| URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/94554 |
| DOI: | 10.6342/NTU202403528 |
| 全文授權: | 同意授權(全球公開) |
| 顯示於系所單位: | 化學工程學系 |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-112-2.pdf | 11.68 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
