請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91391
標題: | 以異質化捉放法估算內容傳遞網路大小的資料分析 Data analysis for CDN server population estimation based on CMR model with heterogeneity |
作者: | 許誠 Cheng Hsu |
指導教授: | 黃寶儀 Polly Huang |
關鍵字: | Twitch,內容傳遞網路,捉放法,分群,Cormack-Jolly-Seber 模型, Twitch,Content Delivery Network,Capture-Recapture Models,Clustering,Cormack-Jolly-Seber Models, |
出版年 : | 2024 |
學位: | 碩士 |
摘要: | 近年來,串流媒體越來越熱門。其中,Twitch 主宰了遊戲直播的市場。在 2021 年,Twitch 擁有平均兩百七十多萬的線上即時觀眾與超過十萬名在線直播主。如同其他的串流媒體,Twitch 使用內容傳遞網路(Content Delivery Network)來提供服務給來自世界各地的廣大觀眾。內容傳遞網路可以降低內容傳播的延遲時間,是影響觀眾收看品質的關鍵之一。
在 2017 發表的一篇論文中,有團隊仔細分析過 Twitch 內容傳遞網路的架構。然而,這樣的實驗結果是一次性且高成本的。由於 Twitch 近年來的高速發展,加上缺乏長期監控的方法,大眾對於 Twitch 的內容傳遞網路的資訊所知相當有限。 在我們實驗室先前的成果中,我們成功使用捉放法中的 CJS 模型來預估Twitch 內容傳遞網路的伺服器數量。然而在這個模型當中,所有伺服器的存活率與被抓取率皆為相同—此假設明顯不符合實際情況。如果假設每個伺服器都有各自的存活率與被抓取率,那麼 CJS 模型將可能會花費許多時間來計算。此外,對於太小的分群,CJS 模型中的最大概似估計將可能有巨大誤差。 因此,在我的研究中,我總共對五個地區的資料進行分群。我試著以在不同時段的 transaction count 作為分群用的屬性,將擁有相似存活率與被抓取率的伺服器分在同一群,藉此實現異質化的 CJS 模型。一開始,我使用 S_Dbw 來分析分群的結果。然而,我發現 S_Dbw 無法幫助預測 CJS 模型預測錯誤率。因此,我提出了 Avg/Std 來分析 CJS 的結果,越大的 Avg/Std 傾向會有越大的錯誤率。 Streaming media become more and more popular and important in recent years. Among streaming platforms, Twitch is dominating the game streaming market. In 2021, Twitch had 2,778,000 average concurrent viewers and 105,000 average concurrent stream ers. Similar to other streaming media, Twitch uses Content Delivery Network to provide the service to massive viewers from all around the world. Content Delivery Network (CDN), which is the key part of the streaming system, is crucial for the quality of service. In the early work, a one-time experiment has been done to survey Twitch’s CDN. However, due to the rapid growth of Twitch and the high cost of a detailed scan on CDN, Twitch’s CDN remains largely unknown to the public. In our previous work, we used the CJS model, which assumes every individual shares the same time-dependent survival rate and capture probability, to estimate the CDN size. However, different servers may have different survival rates and capture probability. If we assume every server has its own survival rate and capture probability, the computation overhead of the CJS model may be too high since there are many parameters needed to estimate. Besides, maximum likelihood estimation would have a large bias if the sample size is too small [13]. In this research, I use the transaction count in hour periods to do clustering on the data from 5 countries and use the CMR model with heterogeneity with these clustering results. Next, I use S_Dbw score [7] to evaluate the clustering results. However, I find a better S_Dbw score does not lead to have a lower error rate in the MLE-CJS model. Instead, if Avg/Std in the number of sample servers larger than 0.3 of a cluster, it will tend to have a larger the estimation error rate. As a result, the clustering results with number of clusters less than 5 tend to have a lower estimation error rate since these clustering results contain less clusters with Avg/Std larger than 0.3. |
URI: | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/91391 |
DOI: | 10.6342/NTU202400040 |
全文授權: | 同意授權(全球公開) |
顯示於系所單位: | 電信工程學研究所 |
文件中的檔案:
檔案 | 大小 | 格式 | |
---|---|---|---|
ntu-112-1.pdf | 7.37 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。