類神經網路自組織增強式學習模型

Chang-Hsian Uang; 汪昌賢

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35240

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	劉長遠(Cheng-Yuan Liou)
dc.contributor.author	Chang-Hsian Uang	en
dc.contributor.author	汪昌賢	zh_TW
dc.date.accessioned	2021-06-13T06:45:09Z	-
dc.date.available	2012-07-28
dc.date.copyright	2011-07-28
dc.date.issued	2011
dc.date.submitted	2011-07-25
dc.identifier.citation	[1] J. A. Smith, “Applications of the self-organizing map to reinforcement learning,” Neural Networks, vol. 15, no. 1, pp. 8-9, Oct. 2002. [2] K. S. Hwang, S. W. Tan, and M. C. Tsai, “Reinforcement learning to adaptive control of nonlinear systems,” IEEE Transactions on Systems, vol. 33, no. 3, pp. 514–521, Jun. 2003. [3] C. J. Watkins and P. Dayan, “Technical note: Q-Learning,” Machine Learning, vol. 8, no.3, pp. 279-292, Mar. 1992. [4] L. Y. Chen, “A SOM-based Fuzzy Systems Q-learning in Continuous State and Action Space,” M.S. Thesis, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 2006. [5] H. Y. Lin, “Hardware Implementation of a FAST-based Q-learning Algorithm,” M.S. Thesis, Computer Science Department, National Chung Cheng University, Tainan, Taiwan, 2006. [6] L. E. Soto, “The Hierarchical Map Forming Model,” M.S. Thesis, Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, 2006. [7] A. H. Fagg, “A computational model of the cortical mechanisms involved in primate grasping,” Ph.D. Dissertation, Computer Science Department, University of Southern California, Los Angeles, California, U.S., 1996. [8] A. Fagg and M. Arbib, “Modeling parietal–premotor interactions in primate control of grasping,” Neural Networks, vol. 11, no. 7, pp. 1277-1303, Oct. 1998. [9] A. Murata, L. Fadiga, L. Fogassi, V. Gallese, V. Raos, and G. Rizzolatti, “Object representation in the ventral premotor cortex (area F5) of the monkey,” Journal of Neurophysiology, vol. 78, pp. 2226-2230, 1997. [10] G. Rizzolatti, R. Camarda, L. Fogassi, M. Gentilucci, G. Luppino, and M. Matelli, “Functional organization of inferior area 6 in the macaque monkey: Area F5 and the control of distal movements,“ Experimental Brain Research, vol. 71, no. 3, pp. 491-507, 1988. [11] I. R. Johnston, “The role of optical expansion-pattern in aerial location,” American Journal of Psychology, vol. 86, no. 2, pp. 311-324, Jun. 1973. [12] C. B. Holroyd and M. G. Coles, “The Neural Basis of Human Error Processing: Reinforcement Learning, Dopamine, and the Error-Related Negativity,” Psychological Review, vol. 109, no. 4, pp. 679-709, Oct. 2002. [13] E. Kohler, C. Keysers, M. A. Umilta, L. Fogassi, V. Gallese, and G. Rizzolattii, “Hearing Sounds, Understanding Actions: Action Representation in Mirror Neurons,” Science, vol. 297. no. 5582, pp. 846-848, Aug. 2002. [14] G. Rizzolatti and G. Luppino, “The cortical motor system,” Neuron, vol. 31, no. 6, pp. 889-901, Sep. 2001. [15] K. F. Muakkassa and P. L. Strick, “Frontal Lobe Inputs to Primate Motor Cortex. Evidence for Four Somatotopically Organized ‘Premotor’ Areas,” Brain Research, vol. 177, pp. 176-182, 1979. [16] A. Murata, V. Gallese, M. Kaseda, S. Kunimoto, and H. Sakata, “Selectivity for the Shape, Size, and Orientation of Objects for Grasping in Neurons of Monkey Parietal Area AIP,” Neurophysiology, vol. 83, no. 5, pp. 2580-2601, May 2000. [17] M. Kawato, K. Furukawa, and R. Suzuki, “A hierarchical neural network model for control and learning of voluntary movement,” Biological Cybernetics, vol. 57, pp. 169-185, 1987. [18] R. A. Bianchi, C. H. Ribeiro, and A. H. Costa, “Heuristic selection of actions in multiagent reinforcement learning,” in Proceedings of the 20th International Joint Conference on Artifical Intelligence, Hyderabad, India, Jan. 2007, pp. 690-696. [19] R. S. Sutton and A. G. Barto, Reinforcement Learning. Cambridge, Massachusetts, U.S.: MIT Press, 1998. [20] S. Russell and P. Norvig, Artificial Intelligence: A Modern Approach. Upper Saddle River, New Jersey, U.S.: Prentice Hall, 2003. [21] R. Sutton, “Learning to predict by the method of temporal difference,” Machine Learning, vol. 3, no. 1, pp. 9-44, 1998. [22] C. J. Watkins and P. Dayan, “Technical Note: Q-Learning,” Machine Learning, vol. 8, no. 3-4, pp. 279-292, 1992. [23] D. H. Ackley and M. L. Littman, “Generalisation and scaling in　reinforcement learning,” in Advances in neural　information processing systems, D. Touretzky, Ed., San Mateo, California, U.S.: Morgan Kaufmann, 1990, pp. 550–557. [24] S. Sehad and C. Touzet, “Self-organising map for reinforcement learning: Obstacle avoidance with khepera,” in Proceedings of from Perception to Action, Lausanne, Switzerland, Sep. 1994, pp. 420-423. [25] J. Peng and R. J. Williams, “Incremental multi-step q-learning,” Machine Learning, vol. 22, pp. 283–290, 1996. [26] V. Gullapalli, “A stochastic reinforcement learning algorithm for learning real-valued functions,” Neural Networks, vol. 3, pp. 671–692, 1990. [27] T. Kohonen, Self-Organizing Maps. Heidelberg, Germany: Springer Verlag, 2001. [28] G. V. Bekesy, Sensory Inhibition. Princeton, New Jersey, U.S.: Princeton University Press, 1967. [29] F. Mulier and V. Cherkassky, “Learning rate schedules for selforganizing maps,” in Proceedings of the 12th International Conferences on Pattern Recognition, Jerusalem, Israel, Oct. 1994, pp. 224-228. [30] J. S. Andrew, “Dynamic Generalisation of Continuous Action Spaces in Reinforcement Learning: A Neurally Inspired Approach,” Ph.D. Dissertation, Institute for Adaptive and Neural Computation Division of Informatics, University of Edinburgh, Midlothian, U.K., 2001. [31] M. J. Palakal, S. K. Murthy, Chittajallu, and D. Wong, “Tonotopic Representation of Auditory Responses Using Self-Organizing Maps,” Mathematical Computing Modeling, vol. 22, no. 2, pp. 7-21, 1995. [32] J. Wedel and D. Polani, “Critic-based learning of actions with self-organising feature maps,” Institute for Informatics, Johannes-Gutenberg University, Mainz, Rhineland Palatinate, Germany, Technical Report 5/96, 1996. [33] W. K. Qian and Z. D. Fang, “Design method for double inverted pendulum control system based on MATLAB,” Journal of University of Shanghai For Science and Technology, vol. 26, no. 6, pp. 590-592, Jun. 2004. [34] C. C. Tsai, “Application of Model Predictive Control to Parallel-Type Double Inverted Pendulum Driven by a Linear Motor,” M.S. Thesis, Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2006. [35] B. Karasozen, P. Pentrop, and Y. Wagner, “Inverted n-Bar Model in Descriptor and in State Space Form,” Mathematical Modelling of Systems, vol. 4, no. 4, pp. 272-285, 1995. [36] Y. X. Lin, “Double Link Inverted Pendulum System Swing Up and Balance Control,” M.S. Thesis, Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2003. [37] S. X. Lin, “Design and Implementation of Fuzzy Controller by Using DSP Chips and Its Application to Inverted Pendulum Car,” M.S. Thesis, Department of Mechanical Engineering, National Cheng Kung University, Tainan, Taiwan, 2002. [38] J. Xiao, “An Adaptive Fuzzy-neural Controller for Multivariable System,” in International Conference on Advanced Intelligent Mechatronics, California, U.S., Jul. 2005, pp. 24-28. [39] H. Unbehauen, “Discrete Computer Control Of A Triple-Inverted Pendulum,” Optimal control applications & methods, vol. 11, no. 2, pp. 157-171, Apr. 1990. [40] N. Kobori, K. Suzuki, P. Hartono, and S. Hashimoto, “Learning to control a joint driven double inverted pendulum using nested actor/critic algorithm,” in Proceedings of the 9th International Conference, Tokyo, Japan, Nov. 11 2002, pp. 2610-2614. [41] C. J. Ding, P. Duan, and M. L. Zhang, “Double Inverted Pendulum System Control Strategy Based on Fuzzy Genetic Algorithm,” in Proceedings of the IEEE International Conference on Automation and Logistics, Shenyang, China, Aug. 11, 2009, pp. 1318-1323. [42] Y. Zheng, S. Luo, and Z. Lv, “Control double inverted pendulum by reinforcement learning with double CMAC network,” in Proceedings of the 18th International Conference on Pattern Recognition, Beijing, China, Apr. 2006, pp. 2521-2525. [43] R. X. Wang, L. Sun, X. G. Ruan, and J. F. Qiao, “Control of inverted pendulum based on reinforcement learning and internally recurrent net,” in International Conference on Intelligent Computing, Hefei, China, Aug. 2005, pp. 2133-2142. [44] Z. Yu, S. Luo, Z. Lv, and L. Wu, “Control parallel double inverted pendulum by hierarchical reinforcement learning,” in Proceedings of the 7th International Conference on Signal Processing, Beijing, China, Aug. 2004, pp. 1614-1617.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/35240	-
dc.description.abstract	在這篇論文中主要提出了一種增強式學習（Reinforcement Learning, RL）的運動行為控制模型，該模型是由大腦皮質層的組織原則做為啟發，基於大腦皮質層上的感覺和運動區域功能來做模擬。自組織映射圖網路（Self-Organizing Maps, SOM）已經被證明在模擬腦皮質的拓撲功能上非常有效，利用這個特性做為外部環境的狀態對該模型激刺的一個感覺中介層，同樣的，也做為運動行為輸出的中介層，然後模型內使用一種具有相鄰函式（neighborhood function）的SARSA Q-learning演算法。由於有了SOM做為中介，原始的增強式學習在連續空間上所造成的查表過大問題得以解決，最後該模型能夠將連續空間上的狀態對映到連續的運動行為空間上。	zh_TW
dc.description.abstract	In this thesis, we propose a motor control model based on reinforcement learning (RL). The model is inspired by organizational principles of the cerebral cortex, specifically on cortical maps and functional hierarchy in sensory and motor areas of the brain. Self-Organizing Maps (SOM) have proven to be useful in modeling cortical topological maps. The SOM maps the input space in response to the real-valued state information, and a second SOM is used to represent the action space. We use a neighborhood update version of the SARSA Q-learning algorithm, and the SOM is a practical tool for Q-function to avoid representing in a large tabular form when the state or action space is continuous or very large. The final model can map a continuous input space to a continuous action space.	en
dc.description.provenance	Made available in DSpace on 2021-06-13T06:45:09Z (GMT). No. of bitstreams: 1 ntu-100-R98922066-1.pdf: 2561577 bytes, checksum: bf757066b856287440ff29a4c39de7d7 (MD5) Previous issue date: 2011	en
dc.description.tableofcontents	誌謝 i 摘要 ii Abstract iii 目錄 iv 圖目錄 vii 表目錄 x 第一章　緒論 1 1.1 背景 1 1.2 動機與目的 2 1.3 論文架構 2 第二章　腦模型 3 2.1 FARS模型 3 2.1.1 可操作特性（Affordances）與行為導向知覺 3 2.1.2 物體的表徵與獼猴腦皮質的反應 5 2.2 MOSAIC模型 6 2.2.1 前置選擇與回饋選擇 7 第三章　增強式學習 10 3.1 增強式學習模型 10 3.1.1 策略（Policy） 11 3.1.2 行為價值（Value） 12 3.2 Temporal Difference Learning (TD) 12 3.2.1 TD(0)方法 13 3.3 Q-Learning 13 3.3.1 One-step Q-learning 14 3.3.2 SARSA Q-learning 17 3.3 歸納Q-table的連續空間 17 第四章　自組織神經網路 19 4.1自組織神經網路（Self-Organizing Map, SOM） 19 4.1.1 Kohonen’s SOM背景 19 4.1.2 完成自組織的要點 20 4.2 基本的SOM演算法 23 4.2.1 SOM實例 26 4.3 SOM的實作方式 27 4.3.1 拓撲保存 29 4.4 SOM與大腦的關係 33 第五章　自組織增強式學習模型 34 5.1 結合SOM與Q-learning 34 5.1.1 建議動作與擾動動作 35 5.1.2 Neighborhood Q-learning 36 5.2 SRLM演算法 37 5.3 二維軌跡取物實驗 42 5.4 倒單擺系統實驗 49 5.4.1 倒單擺系統 49 5.4.2 倒單擺系統的動力學模型 51 5.4.3 單節倒單擺實驗 56 5.4.4 雙節倒單擺實驗 61 第六章　結論 66 6.1 討論 66 6.1.1 方法討論 66 6.1.2 延遲獎勵問題 69 6.2 未來工作 70 6.3 結論 71 參考文獻 72
dc.language.iso	zh-TW
dc.title	類神經網路自組織增強式學習模型	zh_TW
dc.title	Self-Organizing Reinforcement Learning Model	en
dc.type	Thesis
dc.date.schoolyear	99-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	吳善全(Shann-Chiuen Wu),蔡駿逸(Chun-Yi Tsai)
dc.subject.keyword	增強式學習,自組織映射圖網路,Q-learning,SARSA,Unsupervised learning,	zh_TW
dc.subject.keyword	Reinforcement learning,Self-Organizing Maps,Q-learning,SARSA,Unsupervised learning,	en
dc.relation.page	77
dc.rights.note	有償授權
dc.date.accepted	2011-07-25
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊工程學研究所	zh_TW
顯示於系所單位：	資訊工程學系

文件中的檔案：

檔案	大小	格式
ntu-100-1.pdf 目前未授權公開取用	2.5 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。