基於深層學習的生產系統動態控制

Fang-Yi Zhou; 周芳屹

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20746

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	吳政鴻(Cheng-Hung Wu)
dc.contributor.author	Fang-Yi Zhou	en
dc.contributor.author	周芳屹	zh_TW
dc.date.accessioned	2021-06-08T03:01:30Z	-
dc.date.copyright	2017-07-27
dc.date.issued	2017
dc.date.submitted	2017-07-21
dc.identifier.citation	REFERENCE [1] R. Ferrero, J. Rivera, and S. Shahidehpour, 'A dynamic programming two-stage algorithm for long-term hydrothermal scheduling of multireservoir systems,' IEEE Transactions on Power Systems, vol. 13, pp. 1534-1540, 1998. [2] T. Volling, D. E. Akyol, K. Wittek, and T. S. Spengler, 'A two-stage bid-price control for make-to-order revenue management,' Computers & Operations Research, vol. 39, pp. 1021-1032, 2012. [3] H.-S. Ahn, I. Duenyas, and R. Q. Zhang, 'Optimal stochastic scheduling of a two-stage tandem queue with parallel servers,' Advances in Applied Probability, vol. 31, pp. 1095-1117, 1999. [4] C.-H. Wu, M. E. Lewis, and M. Veatch, 'Dynamic allocation of reconfigurable resources ina two-stage Tandem queueing system with reliability considerations,' IEEE Transactions on Automatic Control, vol. 51, pp. 309-314, 2006. [5] H. E. Kirkizlar, 'Performance improvements through flexible workforce,' Georgia Institute of Technology, 2008. [6] S. P. Meyn, 'Workload models for stochastic networks: Value functions and performance evaluation,' IEEE Transactions on Automatic Control, vol. 50, pp. 1106-1122, 2005. [7] T. W. Archibald, 'Modelling replenishment and transshipment decisions in periodic review multilocation inventory systems,' Journal of the Operational Research Society, vol. 58, pp. 948-956, 2007. [8] W. Mouelhi-Chibani and H. Pierreval, 'Training a neural network to select dispatching rules in real time,' Computers & Industrial Engineering, vol. 58, pp. 249-256, 2010. [9] S. Olafsson and X. Li, 'Learning effective new single machine dispatching rules from optimal scheduling data,' International Journal of Production Economics, vol. 128, pp. 118-126, 2010. [10] Y.-R. Shiue, 'Data-mining-based dynamic dispatching rule selection mechanism for shop floor control systems using a support vector machine approach,' International Journal of Production Research, vol. 47, pp. 3669-3690, 2009. [11] M. E. Aydin and E. Öztemel, 'Dynamic job-shop scheduling using reinforcement learning agents,' Robotics and Autonomous Systems, vol. 33, pp. 169-178, 2000. [12] B. A. Peters and T. Yang, 'Integrated facility layout and material handling system design in semiconductor fabrication facilities,' IEEE Transactions on Semiconductor manufacturing, vol. 10, pp. 360-369, 1997. [13] P. Marbach and J. N. Tsitsiklis, 'Simulation-based optimization of Markov reward processes,' IEEE Transactions on Automatic Control, vol. 46, pp. 191-209, 2001. [14] V. R. Konda and J. N. Tsitsiklis, 'Actor-Critic Algorithms,' in NIPS, 1999, pp. 1008-1014. [15] J. N. Tsitsiklis and B. Van Roy, 'Optimal stopping of Markov processes: Hilbert space theory, approximation algorithms, and an application to pricing high-dimensional financial derivatives,' IEEE Transactions on Automatic Control, vol. 44, pp. 1840-1851, 1999. [16] P. Marbach, O. Mihatsch, and J. N. Tsitsiklis, 'Call admission control and routing in integrated services networks using neuro-dynamic programming,' IEEE Journal on selected areas in communications, vol. 18, pp. 197-208, 2000. [17] D. I. Simester, P. Sun, and J. N. Tsitsiklis, 'Dynamic catalog mailing policies,' Management Science, vol. 52, pp. 683-696, 2006. [18] K. Hornik, M. Stinchcombe, and H. White, 'Multilayer feedforward networks are universal approximators,' Neural networks, vol. 2, pp. 359-366, 1989. [19] S. Hochreiter, 'The vanishing gradient problem during learning recurrent neural nets and problem solutions,' International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, vol. 6, pp. 107-116, 1998. [20] N. Srivastava, G. E. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, 'Dropout: a simple way to prevent neural networks from overfitting,' Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014. [21] V. Nair and G. E. Hinton, 'Rectified linear units improve restricted boltzmann machines,' in Proceedings of the 27th international conference on machine learning (ICML-10), 2010, pp. 807-814. [22] A. Krizhevsky, I. Sutskever, and G. E. Hinton, 'Imagenet classification with deep convolutional neural networks,' in Advances in neural information processing systems, 2012, pp. 1097-1105. [23] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, et al., 'Going deeper with convolutions,' in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1-9. [24] J. Martens and I. Sutskever, 'Learning recurrent neural networks with hessian-free optimization,' in Proceedings of the 28th International Conference on Machine Learning (ICML-11), 2011, pp. 1033-1040. [25] M. L. Puterman, 'Markov Decision Processes. J,' ed: Wiley and Sons, 1994. [26] G. Taguchi, Introduction to quality engineering: designing quality into products and processes, 1986. [27] J. Heaton, Introduction to neural networks with Java: Heaton Research, Inc., 2008. [28] D. Kingma and J. Ba, 'Adam: A method for stochastic optimization,' arXiv preprint arXiv:1412.6980, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/20746	-
dc.description.abstract	本研究通過利用結合深層學習與動態規劃的方法，發展一套用以預測近似最佳生產系統控制策略的預測模型。動態規劃由於受制于維度詛咒，在求解較大規模系統的最佳控制策略時往往會花費很長時間。然而，動態最佳控制策略著和系統內特征存在著一定規律性。若有一種方法可以從小規模系統的最佳控制策略中提取有用的規律，並且用來預測大規模系統的最佳控制策略，將可以克服因為利用動態規劃求解最佳策略亦或是重新建模等所花費的時間成本在本研究中，我們考量一個考慮可靠度不確定性的三個工作站生產系統。目標是最小化所有等候線的等候成本。我們建構了由正交化的小規模系統的最佳控制策略集合與各工作站平行機台數的策略集合所組成的訓練樣本用來訓練深層神經網路。經由充分訓練過後的深層神經網路，可重複使用並且能高效的預測未來在系統參數，產能發生變化時的最佳動態控制策略。我們透過k-cv交叉驗證深層神經網路的學習效果，並且將其預測的近似最佳策略與動態規劃求解的最佳策略應用於離散事件模擬進行成本差異的驗證。結果表明，本研究所建構的動態策略預測模型可以針對新系統進行高準確率的預測且其控制策略所導致的與最佳控制策略的差異降低在一個極小的範圍內。	zh_TW
dc.description.abstract	This study presents a dynamic approach method for manufacturing systems by combing dynamic programming (DP) with deep learning. Due to the model complexity, dynamic programming cannot efficiently find optimal control policies for large systems. However, deep neuron network can now be used to predict control rules for a large scale of states. In this research, we consider a production system with reliability uncertainties and the objective is to minimize the average queue length. We construct an optimal policy space by combing an set of smaller scale systems. Then we apply the optimal policy space to train the deep neuron network as our policy predictor. The accuracy of DNN is validated by the k-fold cross-validation (k-cv) test in a wide variety of manufacturing systems. Then, discrete simulation is used to verify the cost different between near-optimal policies from deep learning and optimal policies from dynamic programming. Our result shows the near-optimal police output by deep neuron network high degree of accuracy as optimal dynamic police and the difference in simulation results is minimal.	en
dc.description.provenance	Made available in DSpace on 2021-06-08T03:01:30Z (GMT). No. of bitstreams: 1 ntu-106-R04546038-1.pdf: 2129569 bytes, checksum: 7f2f3e8560d5ebd1ab263260043a03f7 (MD5) Previous issue date: 2017	en
dc.description.tableofcontents	CONTENTS 誌謝 III 中文摘要 IV ABSTRACT V CONTENTS VI LIST OF FIGURES VIII LIST OF TABLES X Chapter 1 Introduction 1 1.1 Research Background 1 1.1.1 Curse of Dimensionality in Dynamic Programming 1 1.1.2 Modeling Problem in Manufacturing System 2 1.1.3 Barriers between dynamic programming models 4 1.1.4 Motivation and Summary 4 1.2 Research Objectives 6 1.3 Significance of the Thesis 7 1.4 Organization of the Thesis 7 Chapter 2 Literature Review 8 2.1 Dynamic Control of Queueing System 8 2.2 Machine Learning and Dynamic Control 9 2.3 Deep Learning and its Application 11 2.4 Summary 13 Chapter 3 Problem Formulation and Methodology 14 3.1 Dynamic Control Problem Description 14 3.2 Nomenclature and Assumptions 15 3.3 Modeling for Dynamic Programming 16 Chapter 4 Implementation of Deep Learning 25 4.1 Introduction to Optimal Policy Space 25 4.2 The Process of the Deep learning 27 Chapter 5 Result Analysis 33 5.1 Analysis of testing result 33 5.2 Implementation of discrete event simulator 40 Conclusion and Future Research 47 5.3 Future Research 47 REFERENCE 49
dc.language.iso	en
dc.subject	馬可夫決策過程	zh_TW
dc.subject	深層學習	zh_TW
dc.subject	動態規劃	zh_TW
dc.subject	Deep Learning	en
dc.subject	Markov decision process	en
dc.subject	Dynamic Programming	en
dc.title	基於深層學習的生產系統動態控制	zh_TW
dc.title	Dynamic Control of Manufacturing System – A Deep Learning Approach	en
dc.type	Thesis
dc.date.schoolyear	105-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	黃奎隆,藍俊宏,余承叡
dc.subject.keyword	深層學習,動態規劃,馬可夫決策過程,	zh_TW
dc.subject.keyword	Deep Learning,Dynamic Programming,Markov decision process,	en
dc.relation.page	51
dc.identifier.doi	10.6342/NTU201701817
dc.rights.note	未授權
dc.date.accepted	2017-07-24
dc.contributor.author-college	工學院	zh_TW
dc.contributor.author-dept	工業工程學研究所	zh_TW
顯示於系所單位：	工業工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-106-1.pdf 未授權公開取用	2.08 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。