增強學習之酬賞預測誤差：精神病、性格及模型探討

Chia-Tzu Li; 李家慈

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65388

標題:	增強學習之酬賞預測誤差：精神病、性格及模型探討 Reward Prediction Errors in Reinforcement Learning: Psychosis, Personality, and Modeling Issues
作者:	Chia-Tzu Li 李家慈
指導教授:	徐永豐
共同指導教授:	賴文崧
關鍵字:	酬賞預測誤差,增強學習模型,精神分裂症,精神病,三向度性格量表,貝氏估計, Bayesian estimation,psychosis,reinforcement learning model,reward prediction error,schizophrenia,Tridimensional Personality Questionnaire,
出版年 :	2012
學位:	碩士
摘要:	動物與人在不確定性環境中決策皆需經由試誤學習才能學習到環境的規則，對環境形成預期並依此作為未來決策的依據。根據增強學習理論的假設，預期的更新發生於預期與實際經驗之間有落差時，該落差被稱為酬賞預測誤差（reward prediction error, RPE）。當研究發現多巴胺細胞能夠記錄RPE訊號，RPE在神經科學的研究開始興起。本研究包含兩個議題，分別探討（i）精神分裂症病人（schizophrenia, SZ）的精神病症狀與RPE之關係，（ii）個體在性格上的差異是否影響RPE的處理歷程。首先，有學者認為多巴胺系統異常導致RPE處理錯誤是SZ病人產生精神病症狀（例如：幻覺與妄想）的原因。為檢驗該假設，本研究讓SZ病人進行以兩選項之機率學習的回饋性決策—動態酬賞作業，在作業中得到酬賞機率大的選項每過一段時間會改變，而且該改變不會告知受試者。本研究使用增強學習模型來分析資料，參數估計使用貝氏估計法。研究發現SZ病人更新預期的速度比較快且有較多探索性決策。此外，隨著病人的精神病症狀越嚴重，則會有越多的探索性決策。這些研究結果與假設相符。研究的第二部份分析Cloninger之三向度性格量表各向度得分與動態酬賞作業表現的相關，結果發現一般大學生在動態酬賞作業的表現存在性別差異，而且新奇追求傾向越高者其更新預期的速度越慢，酬賞依賴傾向越高者其探索性決策越多。然而，在SZ病人並無發現類似的結果。另外，本研究也約略討論增強學習模型的參數性質，包括參數間的相關性與單位不變性。 Making appropriate decisions involves the ability to update information of alternatives from previous experiences. In particular, the updated reward prediction error (RPE) – a discrepancy between the predicted and the actual rewards, is regarded as being encoded by dopamine neurons. Two issues about RPE were discussed in this thesis. First, dysfunction of RPE might link abnormal dopamine systems and therefore the formation of psychotic symptoms (i.e., hallucination and delusion) in schizophrenia (SZ). To examine this hypothesis, we tested SZ patients and healthy controls using a feedback-based “dynamic rewarding task,” in which the subject was required to choose between two different reward options that were alternated in a block fashion. We fit the experimental data with a (standard) reinforcement learning (RL) model using the Bayesian estimation approach. Model-fitting results revealed that SZ patients update their values more rapidly and have more exploratory decisions. We also found that the degree of exploration increases with the severity of the psychotic symptoms. These findings support the hypothesis that abnormal RPE processes correlate with aberrant dopaminergic activities and subjective psychotic experiences. Second, since an individual’s heritable trait might predispose her/his decision-making behavior, we conducted a Tridimensional Personality Questionnaire on subjects to investigate the correlation between personality traits and the estimated parameters in the RL model. Results showed that college students with higher novelty seeking scores have lower value-updating rates, and those with higher reward dependence scores have higher degree of explorations. Moreover, gender differences were found in the task performance. However, no similar patterns were found in SZ patients. Finally, we briefly discussed two modeling issues that are yet to be resolved. The first concerns the negative correlation between the learning rate parameter and the perseveration parameter in the RL model. The second concerns the issue of scale invariance with regard to the perseveration parameter in the RL model.
URI:	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65388
全文授權:	有償授權
顯示於系所單位：	心理學系

文件中的檔案：

檔案	大小	格式
ntu-101-1.pdf 目前未授權公開取用	1.67 MB	Adobe PDF

顯示文件完整紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。

DSpace

機構典藏 DSpace 系統致力於保存各式數位資料（如：文字、圖片、PDF）並使其易於取用。