請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55439完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 王奕翔(I-Hsiang Wang) | |
| dc.contributor.author | Hung-Wei Hsu | en |
| dc.contributor.author | 許泓崴 | zh_TW |
| dc.date.accessioned | 2021-06-16T04:02:32Z | - |
| dc.date.available | 2020-08-24 | |
| dc.date.copyright | 2020-08-24 | |
| dc.date.issued | 2020 | |
| dc.date.submitted | 2020-08-18 | |
| dc.identifier.citation | [1] E. L. Lehmann and J. P. Romano, Testing statistical hypotheses. Springer Science Business Media, 2006. [2] O. Zeitouni and M. Gutman, “On universal hypotheses testing via large deviations”, IEEE Transactions on Information Theory, vol. 37, no. 2, pp. 285–290, 1991. [3] P. J. Huber, “A robust version of the probability ratio test”, The Annals of Mathematical Statistics, pp. 1753–1758, 1965. [4] N. Merhav and C.-H. Lee, “A minimax classification approach with application to robust speech recognition”, IEEE transactions on speech and audio processing, vol. 1, no. 1, pp. 90–100, 1993. [5] L. Devroye, L. Gyorfi, and G. Lugosi, “A note on robust hypothesis testing”, IEEE Transactions on Information Theory, vol. 48, no. 7, pp. 2111–2114, 2002. [6] J. Ziv, “On classification with empirically observed statistics and universal data compression”, IEEE Transactions on Information Theory, vol. 34, no. 2, pp. 278–286, 1988. [7] M. Gutman, “Asymptotically optimal classification for multiple tests with empirically observed statistics”, IEEE Transactions on Information Theory, vol. 35, no. 2, pp. 401–408, 1989. [8] L. Zhou, V. Y. Tan, and M. Motani, “Second-order asymptotically optimal statistical classification”, Information and Inference: A Journal of the IMA. [9] N. Merhav and J. Ziv, “A bayesian approach for classification of markov sources”, IEEE transactions on information theory, vol. 37, no. 4, pp. 1067–1071, 1991. [10] W.-N. Chen and I.-H. Wang, “Anonymous heterogeneous distributed detection: Optimal decision rules, error exponents, and the price of anonymity”, IEEE Transactions on Information Theory, vol. 65, no. 11, pp. 7390–7406, 2019. [11] I. Csiszár, “The method of types [information theory]”, IEEE Transactions on Information Theory, vol. 44, no. 6, pp. 2505–2523, 1998. [12] C. Berge, Topological Spaces: including a treatment of multi-valued functions, vector spaces, and convexity. Courier Corporation, 1997. [13] W.-N. Chen, H.-C. Chen, and I.-H. Wang, “On the fundamental limits of heterogeneous distributed detection: Price of anonymity”, in 2018 IEEE International Symposium on Information Theory (ISIT), IEEE, 2018, pp. 1056–1060. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/55439 | - |
| dc.description.abstract | 在許多現實生活的應用如群眾外包以及機器學習領域中,訓練資料 以及測試資料未必會由同樣一組機率分布產生。在本論文中,我們為 此探討一個對於這種資料產生的機率分布差異下仍能達到穩定表現的 決策方法。我們用漸進分析的方法來探討這種架構下的理論極限。 有別於傳統的假說檢定之下我們已知所有資料產生的機率分布,我們考慮一個二元假說檢定的框架,從兩種可能假說的機率分布P0, P1之中獨立抽樣出兩個訓練資料序列,然後我們想要區分另一個測試資料序列是從P̃0或是P̃1之中抽樣來的。而這組產生測試資料序列的機 率分布我們假設和P0以及P1各自有一個可能的誤差。這樣的誤差描 述了資料產生的機率分布差異,並且我們用歐式空間中的模來量測此 誤差的多寡,在此誤差之下,我們推導出漸進最佳的決策方法並且分 析其最佳的錯誤率冪次,並且比較錯誤率的冪次並刻劃資料產生的機 率分布差異所造成的影響。最後我們擴展結過到多元假說檢定的架構 並且把我們的結果和異質性群眾外包的問題做連結。 | zh_TW |
| dc.description.abstract | In many real world applications such as crowdsourcing, machine learn- ing and distributed detection, the training and testing data might be generated from different distributions. In this thesis, we capture this property by consid- ering a robust version of statistical classification from empirically observed statistics with respect to this training-testing difference. We explore the fun- damental limit of this setting in the asymptotic regime where the number of samples goes to infinity and the ratio of training data and testing data is fixed. Unlike classical hypothesis testing where the underlying distributions are available, we first consider a binary setting where only i.i.d. sequences of observations are drawn from two candidate distributionsP0, P1. The goal is to classify another sequence which is known to be drawn i.i.d. from P̃0, P̃1 which are slightly deviated from P0, P1. The deviation might be considered as the mismatch between distributions in training and testing phases and the mismatch is measured by the norm of deviation in Euclidean space. We derive the asymptotically optimal test under its setting and its error exponents are compared with other regimes of statistical classification problems. We also extend the results to multiple hypothesis testing and relate to heterogeneous crowdsourcing applications. | en |
| dc.description.provenance | Made available in DSpace on 2021-06-16T04:02:32Z (GMT). No. of bitstreams: 1 U0001-2907202023112000.pdf: 1522513 bytes, checksum: 7a386a97fa9feba537b453f13eb8c599 (MD5) Previous issue date: 2020 | en |
| dc.description.tableofcontents | 誌謝 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii 摘要 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Related Works and Background . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.1 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2 Information Theoretic Basics . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1 Kullback-Leibler Divergence . . . . . . . . . . . . . . . . . . . . 8 2.2.2 Method of Types . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Hypothesis Testing Basics . . . . . . . . . . . . . . . . . . . . . . . . . 10 2.3.1 Classical Binary Hypothesis Testing . . . . . . . . . . . . . . . . 10 2.3.2 Optimal Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4 Robust Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.5 Statistical Classification from Empirically Observed Statistics . . . . . . 17 2.6 Multiple Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . 23 3 Statistical Classification from Mismatched Training Data . . . . . . . . . . . . 27 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 Binary Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.2.1 Asymptotically Optimal Test . . . . . . . . . . . . . . . . . . . . 29 3.2.2 The Price of Training-Testing Mismatch . . . . . . . . . . . . . . 33 3.2.3 The Price of Insufficient Training Data . . . . . . . . . . . . . . 37 3.3 Extension to Multiple Classes . . . . . . . . . . . . . . . . . . . . . . . 38 3.4 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.4.1 Proofs of theorems in Section 3.2.1 . . . . . . . . . . . . . . . . 40 3.4.2 Proofs of theorems in Section 3.2.2 and 3.2.3 . . . . . . . . . . . 50 3.4.3 Proofs of theorems in Section 3.3 . . . . . . . . . . . . . . . . . 61 4 Heterogeneous Crowdsourced Classification . . . . . . . . . . . . . . . . . . . 65 4.1 Mixture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 4.2 Anonymous Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A Proof of Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 B Proof of Type Approximations . . . . . . . . . . . . . . . . . . . . . . . . . . 84 C Proof of Continuities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 D Analysis of Minimizers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 | |
| dc.language.iso | en | |
| dc.subject | 假說檢定 | zh_TW |
| dc.subject | 匿名性 | zh_TW |
| dc.subject | 資料產生的機率分布差異 | zh_TW |
| dc.subject | 群眾外包 | zh_TW |
| dc.subject | Hypothesis Testing | en |
| dc.subject | Crowd Sourcing | en |
| dc.subject | Training-Testing Mismatch | en |
| dc.subject | Anonimity | en |
| dc.title | 基於偏差訓練資料的穩定分類 | zh_TW |
| dc.title | Robust Statistical Classification from Mismatched Training Data | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 108-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 陳柏寧(Po-Ning Chen),洪樂文(Yao-Win Peter Hong) | |
| dc.subject.keyword | 群眾外包,假說檢定,資料產生的機率分布差異,匿名性, | zh_TW |
| dc.subject.keyword | Crowd Sourcing,Hypothesis Testing,Training-Testing Mismatch,Anonimity, | en |
| dc.relation.page | 106 | |
| dc.identifier.doi | 10.6342/NTU202002069 | |
| dc.rights.note | 有償授權 | |
| dc.date.accepted | 2020-08-19 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 電信工程學研究所 | zh_TW |
| 顯示於系所單位: | 電信工程學研究所 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-2907202023112000.pdf 未授權公開取用 | 1.49 MB | Adobe PDF |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
