以crowdsourcing方式建立SKYPE/SILK網路電話
使用者經驗模型: 數據的收集、過濾與分析

Ronald Kuo-Hua Ho; 何國華

請用此 Handle URI 來引用此文件： http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53044

完整後設資料紀錄

DC 欄位	值	語言
dc.contributor.advisor	黃寶儀(Polly Huang)
dc.contributor.author	Ronald Kuo-Hua Ho	en
dc.contributor.author	何國華	zh_TW
dc.date.accessioned	2021-06-15T16:40:59Z	-
dc.date.available	2017-08-16
dc.date.copyright	2015-08-16
dc.date.issued	2015
dc.date.submitted	2015-08-11
dc.identifier.citation	[1] Bergstra, Jan A., and C. A. Middelburg. 'Itu-t recommendation g. 107: The e-model, a computational model for use in transmission planning.' (2003). [2] ITU-T, Recommendation. 'P. 862.' Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs (2001). [3] Perlicki, Krzysztof. 'Simple analysis of the impact of packet loss and delay on voice transmission quality.' Journal of telecommunications and information technology (2002): 53-56. [4] ITU-T, P. 880: Continuous evaluation of time varying speech quality. 2004. [5] Beuran, Razvan, and Mihai Ivanovici. User-perceived quality assessment for VoIP applications. No. CERN-OPEN-2004-007. 2004. [6] Chen, S., Chu, C. Y., Yeh, S. L., Chu, H. H., & Huang, P. (2014). Modeling the qoe of rate changes in SKYPE/SILK VoIP calls. IEEE/ACM Transactions on Networking (TON), 22(6), 1781-1793. [7] McGraw, Kenneth O., Mark D. Tew, and John E. Williams. 'The integrity of Web-delivered experiments: Can you trust the data?.' Psychological Science 11.6 (2000): 502-506. [8] Surowiecki, James. The wisdom of crowds. Anchor, 2005. [9] Oinas-Kukkonen, Harri. 'Network analysis and crowds of people as sources of new organisational knowledge.' Knowledge Management: Theoretical Foundation (2008): 173-189. [10] Yen, Yu-Chuan, et al. 'Lab experiment vs. crowdsourcing: a comparative user study on Skype call quality.' Proceedings of the 9th Asian Internet Engineering Conference. ACM, 2013. [11] Mu, Mu, et al. 'Statistical analysis of ordinal user opinion scores.' Consumer Communications and Networking Conference (CCNC), 2012 IEEE. IEEE, 2012. [12] Clark, R. A., Podsiadlo, M., Fraser, M., Mayo, C., & King, S. (2007). Statistical analysis of the Blizzard Challenge 2007 listening test results. Proc. BLZ3-2007 (in Proc. SSW6). [13] Alonso, Omar, Daniel E. Rose, and Benjamin Stewart. 'Crowdsourcing for relevance evaluation.' ACM SigIR Forum. Vol. 42. No. 2. ACM, 2008. [14] Reichl, P., Egger, S., Schatz, R., & D'Alconzo, A. (2010, May). The logarithmic nature of QoE and the role of the Weber-Fechner law in QoE assessment. In Communications (ICC), 2010 IEEE International Conference on (pp. 1-5). IEEE. [15] Fiedler, M., Hossfeld, T., & Tran-Gia, P. (2010). A generic quantitative relationship between quality of experience and quality of service. Network, IEEE, 24(2), 36-41. [16] Hoßfeld, T., Hock, D., Tran-Gia, P., Tutschku, K., & Fiedler, M. (2008, May). Testing the IQX hypothesis for exponential interdependency between QoS and QoE of voice codecs iLBC and G. 711. In Proceedings of the 18th ITC Specialist Seminar on Quality of Experience (pp. 105-114). [17] ITU-T, E model, R Value Calculation, URL: http://www.itu.int/ITU-T/studygroups/com12/emodelv1/calcul.php [18] Downs, J. S., Holbrook, M. B., Sheng, S., & Cranor, L. F. (2010, April). Are your participants gaming the system?: screening mechanical turk workers. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 2399-2402). ACM. [19] Series, B. S. 'Methods for Assessor Screening.' (2014). [20] Kotrlik, J. W. K. J. W., & Higgins, C. C. H. C. C. (2001). Organizational research: Determining appropriate sample size in survey research appropriate sample size in survey research. Information technology, learning, and performance journal, 19(1), 43. [21] Willner, Ozzie. 'How to Choose the Proper Sample Size.' Technometrics 32.1 (1990): 94-95. [22] de Boor, Conte. Elementary numerical analysis. McGraw-Hill, 1972. [23] Lilliefors, H. W. (1967). On the Kolmogorov-Smirnov test for normality with mean and variance unknown. Journal of the American Statistical Association, 62(318), 399-402. [24] Eerola, T., Lensu, L., Kalviainen, H., Kamarainen, J. K., Leisti, T., Nyman, G., ... & Oittinen, P. (2010). Full reference printed image quality: Measurement framework and statistical evaluation. Journal of Imaging Science and Technology, 54(1), 10201-1. [25] Pitas, C. N., Moraitis, N., Panagopoulos, A. D., & Constantinou, P. (2010). Speech and video quality assessment of GSM and WCDMA rollout mobile radio access networks in a regulated and competitive market. In 9th international conference on measurement of speech, audio and video quality in networks (MESAQIN). [26] Ipeirotis, P. G. (2010). Analyzing the amazon mechanical turk marketplace. XRDS: Crossroads, The ACM Magazine for Students, 17(2), 16-21. [27] Raake, A. (2007). Speech quality of VoIP: assessment and prediction. John Wiley & Sons. [28] Walker, John Q. 'Assessing VoIP call quality using the E-model.' NetIQ Corporation (2001). [29] Paolacci, Gabriele, Jesse Chandler, and Panagiotis G. Ipeirotis. 'Running experiments on amazon mechanical turk.' Judgment and Decision making 5.5 (2010): 411-419. [30] ITU-T Recommendation P.913“Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality televisionin any environment,” 2014. [31] ITU-T Recommendation P.880“Continuous evaluation of time-varying speech quality,” 2014. [32] ITU-R Recommendation BT.500-13 “Methodology for the subjective assessment of the quality of television pictures,” 2003. [33] International Organization for Standardization. Sensory Analysis: General Guidance for the Selection, Training and Monitoring of Assessors. Selected Assessors. International Organization for Standardization, 1993. [34] Lorho, Gaëtan, Guillaume Le Ray, and Nick Zacharov. 'eGauge—a measure of assessor expertise in audio quality evaluations.' Audio Engineering Society Conference: 38th International Conference: Sound Quality Evaluation. Audio Engineering Society, 2010. [35] ITU-T Recommendation BS.1116“Methods for the subjective assessment of small impairments in audio systems,” 2014. [36] Massey Jr, Frank J. 'The Kolmogorov-Smirnov test for goodness of fit.' Journal of the American statistical Association 46.253 (1951): 68-78.
dc.identifier.uri	http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/53044	-
dc.description.abstract	對於VoIP服務，使用者體驗的建模與量測一直是個課題，有了Crowdsourcing平台的幫助，使用者體驗的研究者能夠執行大規模的使用者調查，從更大且更廣的人口分布中。大量的受測者資料在一定的時間中能夠被輕易地取得。然而，關於以Crowdsourcing的方式推導一個可靠的QoE模型的細節常常被忽略且值得被關注。目前的研究提供三個主要的貢獻，首先，三階段的使用者調查在Crowdsourcing平台上執行，參與者對數個帶有不同網路延遲的網路電話通話做滿意度評分，再利用得到的資料建立預測模型。結果中得到端對端延遲對使用者體驗的影響為指數型的衰退。其次，本論文也對於先前實驗室所有收集的使用者資料，做了一系列的分析並且提供檢測資料可靠度的方法。本文提出的三個主要的檢測方法，Cheat proof test、Normality test 和 Convergence test。其中，Cheat proof test能夠自動根據使用者的評分和資訊來判斷資料是否該被濾除、Normality test 則是來檢視使用者評分的分布是否符合Normal distribution、Convergence test利用數值分析的方式對使用者評分做收斂性的分析。第三點，本論文利用上述三種方法交叉比對不同的資料集(位元率、封包遺失率、網路延遲)的檢測結果，並且三種檢測方法的有效度及資料的特性夠被詳細地分析及討論。	zh_TW
dc.description.abstract	Modeling and measurement of user experience for Voice over IP (VoIP) service has long been a subject of study. With the help of crowdsourcing platform, researchers of studying user perception are allowed to perform user study from a large and diverse population. Moreover, an amount of subjects/user score data can be easily collected in a limited time. However, some details concerning the process of deriving a reliable QoE model with crowdsourcing was often being neglected but desperately needed to be addressed. Current study provides three main contributions. First, 60 participants are recruited to score emulated Skype calls with different levels of delay, and 44 users’ data are adopted to build a closed-form QoE model. Results show that the end-to-end delay has an impact on the user experience on an exponential scale. Second, taking all our previous user studies as an example, a set of analysis and quality control methodologies for user scores data are provided to increase the reliability of our study. Proposed methodologies involved in three kinds of test: cheat-proof test, normality test and convergence test. Proposed cheat-proof test investigates the details of how users’ data were screened based on their behaviors on rating scores. Normality test shows the scores in most of tracks are normally-distributed. Convergence test examines the scores did reach pre-defined convergence criterion in a numerical view. Third, by cross-comparing the results of three tests, the effectiveness and results of these tests were discussed and analyzed respectively among three data sets (bit-rate, loss rate and delay).	en
dc.description.provenance	Made available in DSpace on 2021-06-15T16:40:59Z (GMT). No. of bitstreams: 1 ntu-104-R02942102-1.pdf: 3450142 bytes, checksum: 22f40d49a188d9fbb3e02b96e75b4c74 (MD5) Previous issue date: 2015	en
dc.description.tableofcontents	Abstract v Contents vii Chapter 1 Introduction 1 1.1 Modelling 1 1.2 Data screening, collection and analysis 6 Chapter 2 Literature Reviews 12 2.1 Measuring QoE for delay 13 2.2 Issue in Crowdsourcing 16 2.3 Analysis of data 18 2.3 Pychology background 28 Chapter 3 Pilot Experiments 28 3.1 Evolution of Methodology 29 3.2 Experiment design-I 31 3.3 Prelimanary Results 33 3.4 Experiment design-II 36 3.5 Prelimanary Results 38 Chapter 4 Derived Model 39 4.1 Model form 39 Chapter 5 Full-sclae experiment 41 5.1 Experiment design 41 5.2 ANOVA Tests 43 5.2 Model specifics 43 Chapter 6 Data Screening 47 6.1 Cheat-proof test 48 6.2 Results of cheat-proof test 49 6.3 Analysis of outliers 51 6.4 Analysis of data 53 6.5 Results of cheat-proof test for all collected data 55 Chapter 7 Data Convergence 57 7.1 Convergence test 58 7.2 Factors of convergence 64 7.3 Quantifying convergence 67 7.4 Comparison between screened and noise data 70 Chapter 8 Normality od Data 72 8.1 Normality test 73 8.2 Graph approach 75 8.2.1 Frequency histogram with normal distribution overlay 75 8.2.2 CDF and normal 77 8.3 Hypothesis test 78 8.4 Why tests reject normality 81 8.5 Customized T-test 84 8.6 Discussion 86 Chapter 9 Data screening and analysis 87 9.1 Cheat-proof test among data sets 87 9.2 Convergence test among data sets 94 9.3 Normality test among data sets 102 Chapter10 Discussion 106 Chapter11 Conclusion 110 Reference 112
dc.language.iso	en
dc.subject	網路電話	zh_TW
dc.subject	心理物理學	zh_TW
dc.subject	Crowdsourcing	zh_TW
dc.subject	使用者感受	zh_TW
dc.subject	VoIP	en
dc.subject	Crowdsourcing	en
dc.subject	User Perception	en
dc.subject	QoE	en
dc.subject	Psychophysics	en
dc.title	以crowdsourcing方式建立SKYPE/SILK網路電話使用者經驗模型: 數據的收集、過濾與分析	zh_TW
dc.title	Crowdsourcing for QoE Models of SKYPE/SILK Calls: An Empirical Study on the Collection, Screening, and Analysis of Data	en
dc.type	Thesis
dc.date.schoolyear	103-2
dc.description.degree	碩士
dc.contributor.oralexamcommittee	葉素玲(Su-Ling Yeh),陳宏銘(Homer H. Chen)
dc.subject.keyword	網路電話,使用者感受,Crowdsourcing,心理物理學,	zh_TW
dc.subject.keyword	VoIP,Crowdsourcing,User Perception,QoE,Psychophysics,	en
dc.relation.page	117
dc.rights.note	有償授權
dc.date.accepted	2015-08-11
dc.contributor.author-college	電機資訊學院	zh_TW
dc.contributor.author-dept	電信工程學研究所	zh_TW
顯示於系所單位：	電信工程學研究所

文件中的檔案：

檔案	大小	格式
ntu-104-1.pdf 未授權公開取用	3.37 MB	Adobe PDF

顯示文件簡單紀錄

系統中的文件，除了特別指名其著作權條款之外，均受到著作權保護，並且保留所有的權利。