請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65151
完整後設資料紀錄
DC 欄位 | 值 | 語言 |
---|---|---|
dc.contributor.advisor | 郭斯彥(Sy-Yen Kuo) | |
dc.contributor.author | Yi-Hsiung Chen | en |
dc.contributor.author | 陳義雄 | zh_TW |
dc.date.accessioned | 2021-06-16T23:27:16Z | - |
dc.date.available | 2017-08-09 | |
dc.date.copyright | 2012-08-09 | |
dc.date.issued | 2012 | |
dc.date.submitted | 2012-07-31 | |
dc.identifier.citation | [1] D. Abadi. Data management in the cloud: Limitations and opportunities. IEEE Data Engineering Bulletin, 32(1):3–12, 2009.
[2] D. Agrawal, A. El Abbadi, S. Antony, and S. Das. Data management challenges in cloud computing infrastructures. Databases in Networked Information Systems, pages 1–10, 2010. [3] M. Armbrust, A. Fox, R. Griffith, A. Joseph, R. Katz, A. Konwinski, G. Lee,D. Patterson, A. Rabkin, I. Stoica, et al. A view of cloud computing. Communications of the ACM, 53(4):50–58, 2010. [4] D. Borthakur. The hadoop distributed file system: Architecture and design.Hadoop Project Website, 11:21, 2007. [5] E. Brewer. Towards robust distributed systems. In Proceedings of the Annual ACM Symposium on Principles of Distributed Computing, volume 19, pages 7–10, 2000. [6] E. Brewer. Cap twelve years later: How the” rules” have changed. Computer IEEE Computer Magazine, 45(2):23, 2012. [7] F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows,T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1–4:26, June 2008. [8] T. C. Chiueh. Introduction to itri cloud os. Availabile at http://www.rocusabc.org.tw/upload/ROCUSA/b0ae5a656bede944b998f318b6de8d8e.pdf, last accessed on July 2012. [9] W. C.-C. Chu, C.-W. Lu, J.-N. Chen, C.-H. Chang, C.-T. Yang, H.-M. Lee, and H.-M. Lee. Cloud computing in taiwan. Computer, 45(6):48 –56, june 2012. [10] E. F. Codd. A relational model of data for large shared data banks. Commun.ACM, 13(6):377–387, June 1970. [11] E. F. Codd. Further normalization of the data base relational model. Data Base Systems, pages 33–64, 1972. [12] B. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proceedings of the 1st ACM symposium on Cloud computing, pages 143–154. ACM, 2010. [13] D. Corporation. Big data management for the enterprise. DataStax Enterprise White Paper, March 2012. [14] A. Davies. High Availability MySQL Cookbook. Packt Pub., 2010. [15] J. Dean and S. Ghemawat. Mapreduce: a flexible data processing tool. Commun. ACM, 53(1):72–77, Jan. 2010. [16] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman,A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev., 41(6):205–220, Oct.2007. [17] D. Featherston. Cassandra: Principles and application. University of Illinois,2010. [18] A. S. Foundation. Sqoop user guide. http://sqoop.apache.org/docs/1.4.1-incubating/SqoopUserGuide.html, last accessed on July 2012. [19] A. Fox, R. Griffith, et al. Above the clouds: A berkeley view of cloud computing.Dept. Electrical Eng. and Comput. Sciences, University of California, Berkeley, Rep. UCB/EECS, 28, 2009. [20] S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of consistent,available, partition-tolerant web services. SIGACT News, 33(2):51–59, June 2002. [21] E. Hewitt. Cassandra: the definitive guide. O’Reilly Media, Inc., 2010. [22] ITRI. The world’s first ”all-in-one” cloud computing system. ITRI Today, 2011. [23] D. Karger, E. Lehman, T. Leighton, R. Panigrahy, M. Levine, and D. Lewin.Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In Proceedings of the twenty-ninth annual ACM symposium on Theory of computing, STOC ’97, pages 654–663, New York, NY, USA, 1997. ACM. [24] A. Lakshman and P. Malik. Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44(2):35–40, 2010. [25] N. Leavitt. Will nosql databases live up to their promise? Computer, 43(2):12–14, feb. 2010. [26] P. Mell and T. Grance. The nist definition of cloud computing. National Institute of Standards and Technology, 53(6):50, 2009. [27] J. Pereira and R. Oliveira. An object mapping for the cassandra distributed database. 2011. [28] M. Ronstr‥om, A. MySQL, and L. Thalmann. Mysql cluster architecture overview. 2004. [29] S. Sakr, A. Liu, D. Batista, and M. Alomari. A survey of large scale data management approaches in cloud environments. Communications Surveys Tutorials, IEEE, 13(3):311 –336, quarter 2011. [30] M. Slee, A. Agarwal, and M. Kwiatkowski. Thrift: Scalable cross-language services implementation. Facebook White Paper, 2007. [31] C. Strozzi. Nosql-a relational database management system. Web Site: http: // www. strozzi. it/ cgi-bin/ CSA/ tw7/ I/ en US/ nosql/ Home% 20Page ,Accessed, 2010. [32] R. Tavory. Hector: A high level java client for apache cassandra. http:// hector-client.github.com/hector/build/html/index.html, last accessed on July 2012. [33] A. C. Wiki. High level clients for cassandra. http://wiki.apache.org/ cassandra/ClientOptions, last accessed on July 2012. | |
dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/65151 | - |
dc.description.abstract | 隨著雲端運算的快速發展,以及社群網站(例如:Facebook、Twitter)的興盛,越來越多的資料儲存在「雲」上。傳統上對於資料儲存及管理的問題主要是透過關聯式資料庫(例如:MySQL)來解決,但是當伺服器的資源不足以應付過於龐大的資料時,我們就必須利用「垂直拓展」來克服,也就是升級伺服器的運算能力,或是加大硬碟儲存空間。垂直拓展的最大問題就是成本昂貴,在雲端運算的時代,資料增加的速度非常驚人,因此伺服器很可能沒過多久就必須再次升級。而「水平拓展」是比較好的方式,在運算叢集中增加伺服器數量,來取代單一機器的升級。可惜的是,傳統的關聯式資料庫由於資料模型的限制,對於水平拓展的支援能力並不好,因此「非關聯式」的資料庫應運而生。
非關聯式資料庫(例如:Cassandra)的特色是分散式以及資料模型的自由度,也因此通常都具備了高可得性、高延展性、高效能、以及不會發生單點故障的問題。有越來越多的企業考慮將傳統的資料庫轉換成非關聯式,但轉換的過程卻不是那麼的容易。第一個問題是資料模型的重建,在關聯式的模型設計時,往往是從資料的實體(entity)以及各個實體間的關聯(relation)著手,但在非關聯式的世界,我們卻應該先思考這個系統要提供哪些查詢功能(query),再進一步設計資料模型來最佳化查詢的速度。第二個問題是資料的轉移,企業在轉移之前,往往已經累積了數以萬計的資料,這些資料要以什麼樣的方式轉移到新的資料庫中,也是個相當值得研究的問題,但非關聯式資料庫的研究尚嫌不足,文獻資料非常缺乏,也提高了實作的難度。 本論文以一個業界的實際案例作為出發點,針對以上兩個問題提出詳細的探討,並對於如何將MySQL資料庫上的資料轉移到Cassandra資料庫,以實作配合效能評估來作為理論的佐證,希望能做為未來在非關聯式資料庫研究人員的參考。 | zh_TW |
dc.description.provenance | Made available in DSpace on 2021-06-16T23:27:16Z (GMT). No. of bitstreams: 1 ntu-101-R99921068-1.pdf: 3618394 bytes, checksum: c76e63d9beb6269e778e87a4789b5300 (MD5) Previous issue date: 2012 | en |
dc.description.tableofcontents | 致謝 ii
中文摘要 iii Abstract iv 1 Introduction 1 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.3 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2 Literature Review 5 2.1 Cloud Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1.1 The NIST definition . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.2 Data Management in Cloud . . . . . . . . . . . . . . . . . . . 8 2.2 The CAP Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.1 Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.2 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2.3 Partition Tolerance . . . . . . . . . . . . . . . . . . . . . . . . 11 2.3 NoSQL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.1 ACID vs. BASE . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3.2 CAP Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3 Technologies 14 3.1 ITRI Cloud OS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3.2 MySQL Cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 3.2.1 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 3.3 Apache Cassandra . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.3.1 Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3.2 Query Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 3.3.3 Distribution, Replication and Fault Tolerance . . . . . . . . . 21 4 Data Modeling 22 4.1 Design Differences Between RDBMS and Cassandra . . . . . . . . . . 22 4.1.1 Query Language . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.1.2 Referential Integrity . . . . . . . . . . . . . . . . . . . . . . . 23 4.1.3 Denormalization . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 4.3 Design Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.1 Materialized Views . . . . . . . . . . . . . . . . . . . . . . . . 26 4.3.2 Secondary Indexes . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.3 Valueless Column . . . . . . . . . . . . . . . . . . . . . . . . . 27 4.3.4 Aggregate Keys . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.3.5 Semantic Key . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 5 Case Study: ITRI Cloud OS 30 5.1 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.3 Implementation Details . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.1 Server Configuration . . . . . . . . . . . . . . . . . . . . . . . 35 5.3.2 Access Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 38 5.3.3 Data Migration . . . . . . . . . . . . . . . . . . . . . . . . . . 40 5.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.1 Benchmarking . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 5.4.2 Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6 Conclusions and Future Work 48 6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 A Sample Code 51 A.1 Hector API Usages . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.1.1 Retrieve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 A.1.2 Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.1.3 Delete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 A.2 MapReduce Code for Data Migration from HDFS to Cassandra . . . 53 Bibliography 54 | |
dc.language.iso | en | |
dc.title | 基於Cassandra資料庫之雲端資料建模:從SQL到NoSQL | zh_TW |
dc.title | Data Modeling in Cloud with Cassandra: From SQL to NoSQL | en |
dc.type | Thesis | |
dc.date.schoolyear | 100-2 | |
dc.description.degree | 碩士 | |
dc.contributor.oralexamcommittee | 雷欽隆(Chin-Lung Lei),顏嗣鈞(Hsu-Chun Yen),陳俊良(Chun-Liang Chen),陳英一(Ying-Yi Chen) | |
dc.subject.keyword | 非關聯式資料庫,分散式資料庫,雲端資料處理,資料建模, | zh_TW |
dc.subject.keyword | NoSQL,Cloud Data Management,Apache Cassandra,Non-relational Database,Distributed Database, | en |
dc.relation.page | 56 | |
dc.rights.note | 有償授權 | |
dc.date.accepted | 2012-07-31 | |
dc.contributor.author-college | 電機資訊學院 | zh_TW |
dc.contributor.author-dept | 電機工程學研究所 | zh_TW |
顯示於系所單位: | 電機工程學系 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-101-1.pdf 目前未授權公開取用 | 3.53 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。