建立使用非同步隨機梯度下降法的分散式訓練之多參數伺服器模型

Yu-Nuo Juan; 阮昱諾

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73765

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	周承復(Cheng-Fu Chou)
dc.contributor.author	Yu-Nuo Juan	en
dc.contributor.author	阮昱諾	zh_TW
dc.date.accessioned	2021-06-17T08:09:43Z	-
dc.date.available	2020-08-20
dc.date.copyright	2019-08-20
dc.date.issued	2019
dc.date.submitted	2019-08-16
dc.identifier.citation	[1] M. Abadi and al. Tensorflow: Largescale machine learning on heterogeneous systems. [2] C. M. Bishop. Pattern recognition and machine learning. springer, 2006. [3] L. Bottou. Largescale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010, pages 177–186, 2010. [4] L. Bottou. Stochastic gradient descent tricks. In Neural Networks: Tricks of the Trade Second Edition, pages 421–436, 2012. [5] C. Brezinski and M. R. Zaglia. Extrapolation methods: theory and practice, volume 2. Elsevier, 2013. [6] S. I. J. S. C. Szegedy, V. Vanhoucke and Z. Wojna. Rethinking the inception architecture for computer vision. In CVPR＇16, 2016. [7] Caffe2. Caffe2: A new lightweight, modular, and scalable deep learning framework. [8] J. Chen, R. Monga, S. Bengio, and R. Jozefowicz. “revisiting distributed synchronous sgd. In arXiv preprint arXiv:1604.00981, 2016. [9] T. Chen, M. Li, Y. Li, M. Lin, N. Wang, M. Wang, T. Xiao, B. Xu, C. Zhang, and Z. Zhang. “mxnet: A flexible and efficient machine learning library for heterogeneous distributed systems, 2015. [10] T. M. Chilimbi, Y. Suzue, J. Apacible, and K. Kalyanaraman. Project adam: Building an efficient and scalable deep learning training system. In OSDI, 2014. [11] P. J. Davis. Interpolation And Approximation. Courier Corporation, 1975. [12] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, and Q. V. L. et al. Large scale distributed deep networks. In Advances in neural information processing systems, 2012. [13] J. Deng, W. Dong, R. Socher, K. L. L.J. Li and, and L. FeiFei. Imagenet: A largescale hierarchical image database. In CVPR＇09,, 2009. [14] S. Hadjis, C. Zhang, D. I. I. Mitliagka and, and C. R´e. Omnivore: An optimizer for multidevice deep learning on cpus and gpus. In arXiv preprint arXiv:1606.04487, 2016. [15] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR＇16,, page 770–778, 2016. [16] G. Inc. tf cnn benchmarks: High performance benchmarks, 1975. [17] D. P. Kingma and J. L. Ba. Adam: A method for stochastic optimization. In ICLR, 2015. [18] L. Kleinrock. Queueing systems, 1975. [19] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature, 521(7553):436–444, 2015. [20] M. Li, D. G. Andersen, A. J. S. J. W. Park, V. J. A. Ahmed, J. Long, E. J. Shekita, and B.Y. Su. Scaling distributed machine learning with the parameter server. In OSDI, 2014. [21] S.H. Lin, M. Paolieri, C.F. Chou, and L. Golubchik. A modelbased approach to streamlining distributed training for asynchronous sgd. In 2018 IEEE 26th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS), 2018. [22] M. Reiser. A queueing network analysis of computer communication networks with window flow control. IEEE Transactions on Communications, 27(8):1199–1209, 1979. [23] M. Reiser and S. S. Lavenberg. Meanvalue analysis of closed multichain queuing networks, 1980. [24] G. A. Seber and A. J. Lee. Linear Regression Analysis, volume 329. John Wiley and Sons Inc., 2012. [25] K. Simonyan and A. Zisserman. Very deep convolutional networks for largescale image recognition. In arXiv preprint arXiv:1409.1556, 2014.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/73765	-
dc.description.abstract	深度神經網路最近在各領域獲得了巨大的成功，並吸引了更多世界各地學者的目光。大量的訓練工作考驗著軟硬體的發展。分散式學習是一種常見的加速方式。在這篇論文中我們會提出解決擴展學習環境的其中一個問題，也會解釋整個模型與背後使用的工具。	zh_TW
dc.description.abstract	Deep Neural Networks(DNNs) is very successful and has drawn more and more attentions from researchers all over the world. A huge demand of training jobs are challenging the development of both software tools and hardware systems. Distributed training is a common approach to speed up these jobs. In this paper, we propose a new method to address one of the problem in expanding the scale of your training environment, and we will also explain the model and tools behind.	en
dc.description.provenance	Made available in DSpace on 2021-06-17T08:09:43Z (GMT). No. of bitstreams: 1 ntu-108-R05944037-1.pdf: 2400923 bytes, checksum: 6e26ee35a35010911b6eb3a7736454ae (MD5) Previous issue date: 2019	en
dc.description.tableofcontents	Contents 口試委員會審定書iii 誌謝v Acknowledgements vii 摘要ix Abstract xi 1 Introduction 1 1.1 Parameter Server Architecture . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 Asynchronous SGD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Achievements and goals . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Related Works 7 3 Throughput Estimation Model 9 3.1 Queueing Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3.2 MeanValue Analysis(MVA) . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2.1 Known Parameters . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.2 Recursive Calculation . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.3 Multiple Parameter Servers Calculation . . . . . . . . . . . . . . 14 4 Packet Analyzing Tools 15 4.1 PacketCapture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 4.2 DumpProcessor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 5 Evaluation 21 5.1 Local PC Clusters(CPU Only) . . . . . . . . . . . . . . . . . . . . . . . 22 5.1.1 different models and batch size . . . . . . . . . . . . . . . . . . . 23 5.1.2 limited network bandwidth . . . . . . . . . . . . . . . . . . . . . 26 5.2 Google Cloud Case(CPU and GPU) . . . . . . . . . . . . . . . . . . . . 28 5.2.1 different model and batch size . . . . . . . . . . . . . . . . . . . 29 5.2.2 limited network bandwidth . . . . . . . . . . . . . . . . . . . . . 31 5.3 Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 6 Conclusion 37 Bibliography 39
dc.language.iso	en
dc.title	建立使用非同步隨機梯度下降法的分散式訓練之多參數伺服器模型	zh_TW
dc.title	Multi-parameter-server modeling for distributed asynchronous SGD	en
dc.type	Thesis
dc.date.schoolyear	107-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	梁嘉德,廖婉君,呂政修,吳曉光
dc.subject.keyword	深度學習,深度神經網路,分散式機器學習,排隊網路,Tensorflow,	zh_TW
dc.subject.keyword	Deep Learning,Deep Neural Networks,Distributed Machine Learning,Queueing Networks,TensorFlow,	en
dc.relation.page	41
dc.identifier.doi	10.6342/NTU201903787
dc.rights.note	有償授權
dc.date.accepted	2019-08-16
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	資訊網路與多媒體研究所	zh_TW
顯示於系所單位：	資訊網路與多媒體研究所

文件中的檔案：

檔案	大小	格式
ntu-108-1.pdf 目前未授權公開取用	2.34 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。