請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96123完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 林軒田 | zh_TW |
| dc.contributor.advisor | Hsuan-Tien Lin | en |
| dc.contributor.author | 陳璽安 | zh_TW |
| dc.contributor.author | Si-An Chen | en |
| dc.date.accessioned | 2024-11-14T16:07:36Z | - |
| dc.date.available | 2024-11-15 | - |
| dc.date.copyright | 2024-11-14 | - |
| dc.date.issued | 2024 | - |
| dc.date.submitted | 2024-11-04 | - |
| dc.identifier.citation | [1] H. Abdollahpouri, R. Burke, and B. Mobasher. Managing popularity bias in recommender systems with personalized re-ranking. arXiv preprint arXiv:1901.07555, 2019.
[2] H. Abdollahpouri, M. Mansoury, R. Burke, and B. Mobasher. The unfairness of popularity bias in recommendation. arXiv preprint arXiv:1907.13286, 2019. [3] N. A. Abdullah, R. A. Rasheed, M. H. N. M. Nasir, and M. M. Rahman. Eliciting auxiliary information for cold start user recommendation: A survey. Applied Sciences, 11(20):9608, 2021. [4] J. Achiam, S. Adler, S. Agarwal, L. Ahmad, I. Akkaya, F. L. Aleman, D. Almeida, J. Altenschmidt, S. Altman, S. Anadkat, et al. GPT-4 Technical Report. arXiv preprint arXiv:2303.08774, 2023. [5] P. Aggarwal, A. Deshpande, and K. R. Narasimhan. SemSup-XC: Semantic supervision for zero and few-shot extreme classifcation. In Proceedings of International Conference on Machine Learning, pages 228–247. PMLR, 2023. [6] A. M. Alaa and M. van der Schaar. Attentive state-space modeling of disease progression. Advances in Neural Information Processing Systems, 32, 2019. [7] A. Alexandrov, K. Benidis, M. Bohlke-Schneider, V. Flunkert, J. Gasthaus, T. Januschowski, D. C. Maddix, S. Rangapuram, D. Salinas, J. Schulz, L. Stella, A. C. Türkmen, and Y. Wang. GluonTS: Probabilistic and neu- ral time series modeling in Python. Journal of Machine Learning Research, 21 (116):, URL:1–6, 2020. [8] D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In International Conference on Learning Representations, 2015. [9] I. Beltagy, M. E. Peters, and A. Cohan. Longformer: The long-document Transformer. arXiv preprint arXiv:2004.05150, 2020. [10] J.-H. Böse, V. Flunkert, J. Gasthaus, T. Januschowski, D. Lange, D. Salinas, S. Schelter, M. Seeger, and Y. Wang. Probabilistic demand forecasting at scale. Proceedings of the VLDB Endowment, 10(12)::1694–1705, 2017. [11] G. E. P. Box, G. M. Jenkins, G. C. Reinsel, and G. M. Ljung. Time Series Analysis: Forecasting and Control. John Wiley & Sons, 1970. [12] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Nee- lakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in Neural Information Processing Systems, 33:1877–1901, 2020. [13] C. Capistrán, C. Constandse, and M. Ramos-Francia. Multi-horizon infation forecasts using disaggregated data. Economic Modelling, 27(3)::666–677, 2010. [14] I. Chalkidis, E. Fergadiotis, P. Malakasiotis, and I. Androutsopoulos. Large- scale multi-label text classifcation on EU legislation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 6314–6322, 2019. [15] I. Chalkidis, M. Fergadiotis, S. Kotitsas, P. Malakasiotis, N. Aletras, and I. An- droutsopoulos. An empirical study on large-scale multi-label text classifcation including few and zero-shot labels. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pages 7503–7515, 2020. [16] S.-A. Chen, C.-L. Li, S. O. Arik, N. C. Yoder, and T. Pfster. TSMixer: An all-MLP architecture for time series forecasting. Transactions on Machine Learning Research, 2023. [17] S.-A. Chen, L. Miculicich, J. M. Eisenschlos, Z. Wang, Z. Wang, Y. Chen, Y. Fujii, H.-T. Lin, C.-Y. Lee, and T. Pfster. TableRAG: Million-token table understanding with language models. arXiv preprint arXiv:2410.04739, 2024. [18] W. Chen, H. Wang, J. Chen, Y. Zhang, H. Wang, S. Li, X. Zhou, and W. Y. Wang. TabFact: A large-scale dataset for table-based fact verifcation. In International Conference on Learning Representations, 2020. [19] Z. Cheng, T. Xie, P. Shi, C. Li, R. Nadkarni, Y. Hu, C. Xiong, D. Radev, M. Ostendorf, L. Zettlemoyer, N. A. Smith, and T. Yu. Binding language models in symbolic languages. In International Conference on Learning Representations, 2023. [20] J. Choe, C. Park, F. Rameau, J. Park, and I. S. Kweon. PointMixer: MLP- Mixer for point cloud understanding. In Computer Vision–ECCV 2022, pages 620–640. Springer, 2022. [21] P. Courty and H. Li. Timing of seasonal sales. The Journal of Business, 72 (4)::545–572, 1999. [22] K. Dahiya, N. Gupta, D. Saini, A. Soni, Y. Wang, K. Dave, J. Jiao, G. K, P. Dey, A. Singh, et al. NGAME: Negative mining-aware mini-batching for extreme classifcation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, pages 258–266, 2023. [23] K. Dahiya, D. Saini, A. Mittal, A. Shaw, K. Dave, A. Soni, H. Jain, S. Agarwal, and M. Varma. DeepXML: A deep extreme multi-label learning framework ap- plied to short text documents. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 31–39, 2021. [24] K. Dahiya, S. Yadav, S. Sondhi, D. Saini, S. Mehta, J. Jiao, S. Agarwal, P. Kar, and M. Varma. Deep encoders with auxiliary parameters for extreme classif- cation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 358–367, 2023. [25] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional Transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. [26] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, 2019. [27] Q. Dong, L. Li, D. Dai, C. Zheng, Z. Wu, B. Chang, X. Sun, J. Xu, and Z. Sui. A survey on in-context learning. arXiv preprint arXiv:2301.00234, 2022. [28] A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Un- terthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020. [29] J. Eisenschlos, S. Krichene, and T. Müller. Understanding tables with in- termediate pre-training. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 281–296, Nov. 2020. [30] X. Fang, W. Xu, F. A. Tan, J. Zhang, Z. Hu, Y. Qi, S. Nickleach, D. Socolinsky, S. Sengamedu, and C. Faloutsos. Large language models on tabular data – A survey. arXiv preprint arXiv:2402.17944, 2024. [31] F. Fusco, D. Pascual, and P. Staar. pNLP-Mixer: An effcient all-MLP archi- tecture for language. arXiv preprint arXiv:2202.04350, 2022. [32] J. C. B. Gamboa. Deep learning for time-series analysis. arXiv preprint arXiv:1701.01887, 2017. [33] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023. [34] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT Press, 2016. [35] A. Gu, K. Goel, and C. Ré. Effciently modeling long sequences with structured state spaces. In International Conference on Learning Representations, 2022. [36] N. Gupta, S. Bohra, Y. Prabhu, S. Purohit, and M. Varma. Generalized zero- shot extreme multi-label learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 527–535, 2021. [37] N. Gupta, F. Devvrit, A. S. Rawat, S. Bhojanapalli, P. Jain, and I. S. Dhillon. Dual-encoders for extreme multi-label classifcation. In International Conference on Learning Representations, 2024. [38] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M. Chang. Retrieval augmented language model pre-training. In Proceedings of International Conference on Machine Learning, pages 3929–3938. PMLR, 2020. [39] H. He, H. Zhang, and D. Roth. Rethinking with retrieval: Faithful large language model inference. arXiv preprint arXiv:2301.00303, 2022. [40] D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Stein- hardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020. [41] J. Herzig, P. K. Nowak, T. Müller, F. Piccinno, and J. Eisenschlos. TaPas: Weakly supervised table parsing via pre-training. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4320– 4333, 2020. [42] C. C. Holt. Forecasting seasonals and trends by exponentially weighted moving averages. International Journal of Forecasting, 20(1)::5–10, 2004. [43] J. Howard and S. Ruder. Universal language model fne-tuning for text clas- sifcation. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, volume 1, pages 328–339, 2018. [44] G. B. Huang, M. Mattar, T. Berg, and E. Learned-Miller. Labeled Faces in the Wild: A database for studying face recognition in unconstrained environ- ments. In Workshop on Faces in ‘Real-Life’ Images: Detection, Alignment, and Recognition, 2008. [45] D. Hyun, C. Park, J. Cho, and H. Yu. Learning to utilize auxiliary reviews for recommendation. Information Sciences, 545:595–607, 2021. [46] H. Jain, Y. Prabhu, and M. Varma. Extreme multi-label loss functions for recommendation, tagging, ranking & other missing label applications. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 935–944, 2016. [47] V. Jain, J. Prakash, D. Saini, J. Jiao, R. Ramjee, and M. Varma. Renée: End-to-end training of extreme classifcation models. Proceedings of Machine Learning and Systems, 5, 2023. [48] A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, et al. Mistral 7B. arXiv preprint arXiv:2310.06825, 2023. [49] A. E. Johnson, T. J. Pollard, L. Shen, L.-w. H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and R. G. Mark. MIMIC-III, a freely accessible critical care database. Scientifc Data, 3(1):1–9, 2016. [50] V. Karpukhin, B. Oguz, S. Min, P. Lewis, L. Wu, S. Edunov, D. Chen, and W.-t. Yih. Dense passage retrieval for open-domain question answer- ing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, 2020. [51] T. Kim, J. Kim, Y. Tae, C. Park, J. Choi, and J. Choo. Reversible instance normalization for accurate time-series forecasting against distribution shift. In International Conference on Learning Representations, 2022. [52] Y. Kim, C. Denton, L. Hoang, and A. M. Rush. Structured attention networks. In International Conference on Learning Representations, 2017. [53] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015. [54] I. Kononenko. Machine learning for medical diagnosis: History, state of the art and perspective. Artifcial Intelligence in Medicine, 23(1):89–109, 2001. [55] N. Kourentzes. Intermittent demand forecasts with neural networks. International Journal of Production Economics, 143(1)::198–206, 2013. [56] A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classifcation with deep convolutional neural networks. Advances in Neural Information Processing Systems, 25, 2012. [57] C.-W. Lee, W. Fang, C.-K. Yeh, and Y.-C. F. Wang. Multi-label zero-shot learning with structured knowledge graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1576–1585, 2018. [58] D. D. Lewis, Y. Yang, T. Russell-Rose, and F. Li. RCV1: A new bench- mark collection for text categorization research. Journal of Machine Learning Research, 5(Apr):361–397, 2004. [59] J. Li, B. Hui, G. Qu, J. Yang, B. Li, B. Li, B. Wang, B. Qin, R. Geng, N. Huo, et al. Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs. Advances in Neural Information Processing Systems, 36, 2024. [60] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y. Wang, and X. Yan. Enhancing the locality and breaking the memory bottleneck of Transformer on time series forecasting. In Advances in Neural Information Processing Systems, pages 5244–5254. 2019, 2019. [61] B. Lika, K. Kolomvatsos, and S. Hadjiefthymiades. Facing the cold start prob- lem in recommender systems. Expert Systems with Applications, 41(4):2065– 2073, 2014. [62] B. Lim, S. Ö. Arık, N. Loef, and T. Pfster. Temporal fusion Transformers for interpretable multi-horizon time series forecasting. International Journal of Forecasting, 37(4):1748–1764, 2021. [63] B. Lim and S. Zohren. Time-series forecasting with deep learning: A survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021. [64] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick. Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014, pages 740–755. Springer, 2014. [65] W. Lin, R. Blloshmi, B. Byrne, A. de Gispert, and G. Iglesias. An inner table retriever for robust table question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 9909–9926, 2023. [66] N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, and P. Liang. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12:157–173, 2024. [67] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dustdar. Pyraformer: Low-complexity pyramidal attention for long-range time series modeling and forecasting. In International Conference on Learning Representations, April 2022. [68] T. Liu, F. Wang, and M. Chen. Rethinking tabular data understanding with large language models. arXiv preprint arXiv:2312.16702, 2023. [69] Y. Liu, H. Wu, J. Wang, and M. Long. Non-stationary Transformers: Ex- ploring the stationarity in time series forecasting. In Advances in Neural Information Processing Systems, 2022. [70] I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2019. [71] E. Loza Mencía and J. Fürnkranz. Effcient pairwise multilabel classifcation for large-scale problems in the legal domain. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 50–65. Springer, 2008. [72] S. Makridakis, E. Spiliotis, and V. Assimakopoulos. M5 accuracy competition: Results, fndings, and conclusions. International Journal of Forecasting, pages 1346–1364, 2022. ISSN 0169-2070. [73] J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, pages 165–172, 2013. [74] R. Meng, , Y. Liu, S. R. Joty, C. Xiong, Y. Zhou, and S. Yavuz. SFR- Embedding-2: Advanced text embedding with multi-stage training, 2024. [75] T. M. Mitchell and T. M. Mitchell. Machine Learning, volume 1. McGraw-Hill New York, 1997. [76] A. Mittal, K. Dahiya, S. Agrawal, D. Saini, S. Agarwal, P. Kar, and M. Varma. DECAF: Deep extreme classifcation with label features. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pages 49–57, 2021. [77] A. Mittal, N. Sachdeva, S. Agrawal, S. Agarwal, P. Kar, and M. Varma. ECLARE: Extreme classifcation with label graph correlations. In Proceedings of the Web Conference, pages 3721–3732, 2021. [78] M. Mosbach, M. Andriushchenko, and D. Klakow. On the stability of fne-tuning BERT: Misconceptions, explanations, and strong baselines. In International Conference on Learning Representations, 2021. [79] N. Muennighof, N. Tazi, L. Magne, and N. Reimers. MTEB: Massive text embedding benchmark. arXiv preprint arXiv:2210.07316, 2022. [80] J. Mullenbach, S. Wiegrefe, J. Duke, J. Sun, and J. Eisenstein. Explain- able prediction of medical codes from clinical text. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1101–1111, 2018. [81] M. Muthivhi, T. van Zyl, and H. Wang. Multi-modal recommendation sys- tem with auxiliary information. In Southern African Conference for Artifcial Intelligence Research, pages 108–122. Springer, 2022. [82] Y. Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam. A time series is worth 64 words: Long-term forecasting with Transformers. In International Conference on Learning Representations, 2023. [83] B. N. Oreshkin, D. Carpov, N. Chapados, and Y. Bengio. N-BEATS: Neu- ral basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, April 2020. [84] V. Pal, A. Yates, E. Kanoulas, and M. de Rijke. MultiTabQA: Generating tabular answers for multi-table question answering. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6322–6334, 2023. [85] P. Pasupat and P. Liang. Compositional semantic parsing on semi-structured tables. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1470–1480, 2015. [86] M. Pourreza and D. Rafei. DIN-SQL: Decomposed in-context learning of text- to-SQL with self-correction. In Advances in Neural Information Processing Systems, 2023. [87] M. Pourreza and D. Rafei. DTS-SQL: Decomposed text-to-SQL with small large language models. arXiv preprint arXiv:2402.01117, 2024. [88] A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In Proceedings of International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. [89] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unifed text-to-text transformer. Journal of Machine Learning Research, 21(140):1–67, 2020. [90] N. Rajkumar, R. Li, and D. Bahdanau. Evaluating the text-to-SQL capabili- ties of large language models. arXiv preprint arXiv:2204.00498, 2022. [91] S. S. Rangapuram, M. W. Seeger, J. Gasthaus, L. Stella, Y. Wang, and T. Januschowski. Deep state space models for time series forecasting. Advances in Neural Information Processing Systems, 31, 2018. [92] N. Reimers and I. Gurevych. Sentence-BERT: Sentence embeddings using Siamese BERT-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Lin- guistics, 11 2019. [93] A. Rios and R. Kavuluru. Few-shot and zero-shot multi-label learning for struc- tured label spaces. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, volume 2018, page 3132. NIH Public Access, 2018. [94] S. Robertson, H. Zaragoza, et al. The probabilistic relevance framework: BM25 and beyond. Foundations and Trends in Information Retrieval, 3(4):333–389, 2009. [95] D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski. DeepAR: Prob- abilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3):1181–1191, 2020. [96] D. Salinas, V. Flunkert, J. Gasthaus, and T. Januschowski. DeepAR: Prob- abilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting, 36(3)::1181–1191, 2020. [97] G. Salton and C. Buckley. Term-weighting approaches in automatic text re- trieval. Information Processing & Management, 24(5):513–523, 1988. [98] V. Sanh. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019. [99] K. Song, X. Tan, T. Qin, J. Lu, and T.-Y. Liu. MPNet: Masked and permuted pre-training for language understanding. Advances in Neural Information Processing Systems, 33:16857–16867, 2020. [100] Y. Sui, J. Zou, M. Zhou, X. He, L. Du, S. Han, and D. Zhang. TAP4LLM: Table provider on sampling, augmenting, and packing semi-structured data for large language model reasoning. arXiv preprint arXiv:2312.09039, 2023. [101] R. Sun, S. O. Arik, H. Nakhost, H. Dai, R. Sinha, P. Yin, and T. Pfster. SQL-PaLM: Improved large language modeladaptation for text-to-SQL. arXiv preprint arXiv:2306.00739, 2023. [102] O. Tatanov, S. Beliaev, and B. Ginsburg. Mixer-TTS: Non-autoregressive, fast and compact text-to-speech model conditioned on language model embeddings. In IEEE International Conference on Acoustics, Speech and Signal Processing, pages 7482–7486, 2022. [103] G. Team, R. Anil, S. Borgeaud, Y. Wu, J.-B. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, et al. Gemini: A family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023. [104] I. O. Tolstikhin, N. Houlsby, A. Kolesnikov, L. Beyer, X. Zhai, T. Unterthiner, J. Yung, A. Steiner, D. Keysers, J. Uszkoreit, M. Lucic, and A. Dosovit- skiy. MLP-Mixer: An all-MLP architecture for vision. In Advances in Neural Information Processing Systems, pages 24261–24272. 2021, 2021. [105] H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. LLaMA: Open and effcient foundation language models. arXiv preprint arXiv:2302.13971, 2023. [106] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 30, 2017. [107] A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understand- ing. arXiv preprint arXiv:1804.07461, 2018. [108] B. Wang, C. Ren, J. Yang, X. Liang, J. Bai, Q.-W. Zhang, Z. Yan, and Z. Li. MAC-SQL: Multi-agent collaboration for text-to-SQL. arXiv preprint arXiv:2312.11242, 2023. [109] L. Wang, N. Yang, and F. Wei. Query2doc: Query expansion with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023. [110] Q. Wang, B. Li, T. Xiao, J. Zhu, C. Li, D. F. Wong, and L. S. Chao. Learning deep Transformer models for machine translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 1810– 1822, 2019. [111] S. Wang, X. Liu, J. Zhao, Y. Liu, S. Liu, Y. Liu, and J. Zhao. Computer auxiliary diagnosis technique of detecting cholangiocarcinoma based on med- ical imaging: A review. Computer Methods and Programs in Biomedicine, 208:106265, 2021. [112] X. Wang, R. E. Mercer, and F. Rudzicz. Auxiliary knowledge-induced learn- ing for automatic multi-label medical document classifcation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, pages 2006–2016, Torino, Italia, May 2024. ELRA and ICCL. [113] Z. Wang, H. Zhang, C.-L. Li, J. M. Eisenschlos, V. Perot, Z. Wang, L. Mi- culicich, Y. Fujii, J. Shang, C.-Y. Lee, and T. Pfster. Chain-of-Table: Evolv- ing tables in the reasoning chain for table understanding. In International Conference on Learning Representations, 2024. [114] Q. Wen, T. Zhou, C. Zhang, W. Chen, Z. Ma, J. Yan, and L. Sun. Transformers in time series: A survey. In Proceedings of the Thirty-Second International Joint Conference on Artifcial Intelligence, pages 6778–6786, 8 2023. Survey Track. [115] R. Wen, K. Torkkola, B. Narayanaswamy, and D. Madeka. A multi-horizon quantile recurrent forecaster. arXiv preprint arXiv:1711.11053, 2017. [116] W. H. O. WHO. International Statistical Classifcation of Diseases and Related Health Problems: Alphabetical Index, volume 3. World Health Organization, 2004. [117] H. Wu, J. Xu, J. Wang, and M. Long. Autoformer: Decomposition Transformers with auto-correlation for long-term series forecasting. In Advances in Neural Information Processing Systems, pages 22419–22430. 2021, 2021. [118] S. Xiao, Z. Liu, P. Zhang, N. Muennighof, D. Lian, and J.-Y. Nie. C- Pack: Packed resources for general Chinese embeddings. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, page 641–649, 2024. [119] Y. Xiong, W. Chang, C. Hsieh, H. Yu, and I. S. Dhillon. Extreme zero-shot learning for extreme text classifcation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5455–5468, 2022. [120] P. Xu, X. Zhu, and D. A. Clifton. Multimodal learning with Transformers: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(10):12113–12132, 2023. [121] S. Yadav, D. Saini, A. Buvanesh, B. Paliwal, K. Dahiya, S. Asokan, Y. Prabhu, J. Jiao, and M. Varma. Extreme meta-classifcation for large-scale zero-shot retrieval. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pages 3657–3666, 2024. [122] Z. Yang, Z. Dai, Y. Yang, J. Carbonell, R. R. Salakhutdinov, and Q. V. Le. XLNet: Generalized autoregressive pretraining for language understanding. In Advances in Neural Information Processing Systems (NeurIPS), volume 32, 2019. [123] S. Yao, J. Zhao, D. Yu, N. Du, I. Shafran, K. R. Narasimhan, and Y. Cao. ReAct: Synergizing reasoning and acting in language models. In International Conference on Learning Representations, 2023. [124] T. Yao, X. Yi, D. Z. Cheng, F. Yu, T. Chen, A. Menon, L. Hong, E. H. Chi, S. Tjoa, J. Kang, et al. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 4321–4330, 2021. [125] Y. Ye, B. Hui, M. Yang, B. Li, F. Huang, and Y. Li. Large language models are versatile decomposers: Decomposing evidence and questions for table-based reasoning. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 174–184, 2023. [126] P. Yin, W.-D. Li, K. Xiao, A. Rao, Y. Wen, K. Shi, J. Howland, P. Bailey, M. Catasta, H. Michalewski, O. Polozov, and C. Sutton. Natural language to code generation in interactive data science notebooks. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 126–173, 2023. [127] P. Yin, G. Neubig, W.-t. Yih, and S. Riedel. TaBERT: Pretraining for joint understanding of textual and tabular data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 8413–8426, 2020. [128] R. You, Z. Zhang, Z. Wang, S. Dai, H. Mamitsuka, and S. Zhu. AttentionXML: Label tree-based attention-aware deep model for high-performance extreme multi-label text classifcation. Advances in Neural Information Processing Systems, 32, 2019. [129] H.-F. Yu, H.-Y. Huang, I. Dhillon, and C.-J. Lin. A unifed algorithm for one- class structured matrix factorization with side information. In Proceedings of the AAAI Conference on Artifcial Intelligence, volume 31, 2017. [130] T. Yu, R. Zhang, K. Yang, M. Yasunaga, D. Wang, Z. Li, J. Ma, I. Li, Q. Yao, S. Roman, Z. Zhang, and D. Radev. Spider: A large-scale human- labeled dataset for complex and cross-domain semantic parsing and text-to- SQL task. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3911–3921, 2018. [131] A. Zeng, M. Chen, L. Zhang, and Q. Xu. Are Transformers efective for time series forecasting? In Proceedings of the AAAI Conference on Artifcial Intelligence, 2023. [132] G. Zhang, B. E. Patuwo, and M. Y. Hu. Forecasting with artifcial neural networks: The state of the art. International Journal of Forecasting, 14(1): :35–62, 1998. [133] G. P. Zhang and M. Qi. Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, 160(2)::501–514, 2005. [134] J. Zhang and K. Nawata. Multi-step prediction for infuenza outbreak by an adjusted long short-term memory. Epidemiology & Infection, 146(7)::809–816, 2018. [135] T. Zhang. Mathematical Analysis of Machine Learning Algorithms. Cam- bridge University Press, 2023. [136] T. Zhang, Z. Xu, T. Medini, and A. Shrivastava. Structural contrastive rep- resentation learning for zero-shot multi-label text classifcation. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 4937– 4947, 2022. [137] T. Zhang, X. Yue, Y. Li, and H. Sun. TableLlama: Towards open large generalist models for tables. arXiv preprint arXiv:2311.09206, 2023. [138] X. Zhang, Y. Zhang, D. Long, W. Xie, Z. Dai, J. Tang, H. Lin, B. Yang, P. Xie, F. Huang, et al. mGTE: Generalized long-context text represen- tation and reranking models for multilingual text retrieval. arXiv preprint arXiv:2407.19669, 2024. [139] Y. Zhang, J. Henkel, A. Floratou, J. Cahoon, S. Deep, and J. M. Patel. ReAcTable: Enhancing ReAct for table question answering. arXiv preprint arXiv:2310.00815, 2023. [140] Y. Zheng, Z. Dang, C. Peng, C. Yang, and X. Gao. Multi-view multi-label anomaly network traffc classifcation based on MLP-Mixer neural network. arXiv preprint arXiv:2210.16719, 2022. [141] V. Zhong, C. Xiong, and R. Socher. Seq2SQL: Generating structured queries from natural language using reinforcement learning. arXiv preprint arXiv:1709.00103, 2017. [142] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang. Informer: Beyond effcient Transformer for long sequence time-series forecasting. In Proceedings of the AAAI Conference on Artifcial Intelligence, pages 11106– 11115, 2021. [143] T. Zhou, Z. Ma, X. Wang, Q. Wen, L. Sun, T. Yao, W. Yin, and R. Jin. FiLM: Frequency improved Legendre memory model for long-term time series forecasting. In Advances in Neural Information Processing Systems, 2022. [144] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin. FEDformer: Fre- quency enhanced decomposed Transformer for long-term series forecasting. In Proceedings of International Conference on Machine Learning, pages 27268– 27286, 2022. [145] Y. Zhu and H. Zamani. ICXML: An in-context learning framework for zero- shot extreme multi-label classifcation. In Findings of the Association for Computational Linguistics: NAACL 2024, pages 2086–2098, 2024. [146] L. Zhuang, L. Wayne, S. Ya, and Z. Jun. A robustly optimized BERT pre- training approach with post-training. In Proceedings of the 20th Chinese National Conference on Computational Linguistics, pages 1218–1227, Aug. 2021. | - |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/96123 | - |
| dc.description.abstract | 本論文主要探討如何讓深度學習模型更有效地利用「輔助資訊」,並解決三個關鍵問題。輔助資訊是機器學習可以利用的額外數據,例如天氣、地理位置、表格資料或文字描述,這些資訊可以幫助模型做出更精確及可解釋的預測。然而,使用注意力機制處理這些輔助資訊時,模型會面臨諸多挑戰。
注意力機制是一種常見的深度學習技術,讓模型能夠專注於輸入數據中最相關的部分,並廣泛應用於自然語言處理和其他任務中。儘管注意力模型在處理文字數據上表現優異,當面對不同形式的數據或大量輔助資訊時,還是會遇到困難。本論文針對以下三個挑戰提出了解決方案: 1. 數據模態不兼容:在像時間序列預測這樣的任務中,模型需要處理時間及表格數據,而注意力機制對於這種數據的效果不如處理文字數據那麼好。我們提出了TSMixer架構,通過多層感知機(MLP)來更有效地捕捉時間上的變化,並結合輔助資訊來提高預測精度。 2. 輔助資訊過長:在處理像表格問答這類任務時,表格數據可能非常龐大,遠超出模型能處理的上下文範圍。我們開發了 TableRAG 檢索系統,以「格」為單位在語言模型處理前先篩選出最相關的內容,幫助模型在面對大量數據時更高效地回答正確答案。 3. 泛化能力下降:當模型針對特定任務進行微調時,往往會過度專注於訓練資料,而失去對未知資料的預測能力,這在零樣本多標籤文本分類中尤其明顯。我們提出了「單向微調」框架,透過凍結標籤編碼器,保留標籤語義的豐富性,同時只微調文件編碼器,從而保持模型的零樣本預測能力。 本論文透過這些方法,展示了如何解決這些挑戰,提升注意力機制在整合輔助資訊時的效能,幫助模型在多種任務中更準確地運作。 | zh_TW |
| dc.description.abstract | This dissertation explores how to make deep learning models more effectively utilize auxiliary information and addresses three key challenges. Auxiliary information refers to additional data that machine learning models can leverage, such as weather, geographical location, tabular data, or text descriptions, which help improve prediction accuracy and interoperability. However, there are several challenges when using attention mechanisms to process auxiliary information.
The attention mechanism is a common deep learning technique that enables models to focus on the most relevant parts of the input data; it is widely used in natural language processing and other tasks. Although attention models perform exceptionally well with textual data, they encounter difficulties when dealing with different data types or large amounts of auxiliary information. This dissertation proposes solutions for the following three challenges: 1. Modality incompatibility: In tasks like time series forecasting, models need to process time and tabular data, but attention mechanisms are not as effective in handling these types of data as they are with text. We propose the TSMixer architecture, which uses multi-layer perceptrons (MLPs) to better capture temporal patterns and integrate auxiliary information to improve prediction accuracy. 2. Excessive length of auxiliary information: In tasks like table-based question answering, the amount of tabular data can be enormous, exceeding the model's context window. We develop the TableRAG retrieval system, which filters the most relevant information at the cell level before it is processed by the language model, allowing the model to answer questions more efficiently when dealing with large datasets. 3. Loss of generalizability: When models are fine-tuned for specific tasks, they often become too focused on the training data, losing their ability to predict unseen data, particularly in zero-shot multi-label text classification. We propose a one-sided fine-tuning framework, which freezes the label encoder to retain semantic richness while only fine-tuning the document encoder, preserving the model's zero-shot prediction capabilities. This dissertation demonstrates how these solutions address the challenges, enhancing the performance of attention mechanisms in effectively integrating auxiliary information, and enabling models to perform more accurately across various tasks. | en |
| dc.description.provenance | Submitted by admin ntu (admin@lib.ntu.edu.tw) on 2024-11-14T16:07:36Z No. of bitstreams: 0 | en |
| dc.description.provenance | Made available in DSpace on 2024-11-14T16:07:36Z (GMT). No. of bitstreams: 0 | en |
| dc.description.tableofcontents | Acknowledgements iii
摘要 v Abstract vii Contents xi List of Figures xv List of Tables xxi Chapter 1 Introduction 1 1.1 Learning with Auxiliary Information ................ 1 1.2 Attention-based Models with Auxiliary Information ........ 3 1.3 Challenges in Utilizing Auxiliary Information by Attention-based Models 5 1.4 Dissertation Scope and Statement ................. 7 1.5 Dissertation Organization ...................... 8 Chapter 2 Background 11 2.1 Auxiliary Information in Machine Learning . . . . . . . . . . . . 11 2.1.1 Types of Auxiliary Information . . . . . . . . . . . . . . . . . . 12 2.2 Attention Mechanisms . . . . . . . . . . . . . . . . . . . . . . . . 12 2.3 Pre-trained Language Models . . . . . . . . . . . . . . . . . . . . 13 2.3.1 Transfer Learning with Pre-trained Models . . . . . . . . . . . . 14 2.3.2 Emerging Capabilities of Large Language Models . . . . . . . . 14 2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 3 Addressing Modality Incompatibility in Time Series Forecasting 17 3.1 Challenges of Transformer-based Models in Multivariate Time Series Forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.3 Proposed Method: TSMixer—An All-MLP Architecture for Time Series Forecasting 25 3.3.1 Motivation: Linear Models for Time Series Forecasting . . . . . 25 3.3.2 TSMixer Architecture . . . . . . . . . . . . . . . . . . . . . . . . 30 3.3.3 Extended TSMixer for Auxiliary Information . . . . . . . . . . . 33 3.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 37 3.4.2 Multivariate Long-term Forecasting . . . . . . . . . . . . . . . . 39 3.4.3 Large-scale Forecasting with Auxiliary Information . . . . . . . 42 3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Chapter 4 Handling Length Constraints in Large-scale Table Question Answering 49 4.1 Scaling Language Models to Large Tables . . . . . . . . . . . . . 50 4.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.3 Proposed Method: TableRAG—A Retrieval-First Table Understanding Framework 54 4.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.3.2 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3.3 Core Components of TableRAG . . . . . . . . . . . . . . . . . . 55 4.3.4 Token Complexity Analysis . . . . . . . . . . . . . . . . . . . . . 59 4.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 62 4.4.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.4.3 Retrieval Performance Analysis . . . . . . . . . . . . . . . . . . 66 4.4.4 Scalability Test on TabFact . . . . . . . . . . . . . . . . . . . . 67 4.4.5 Comparison with State-of-the-Art on WikiTableQA . . . . . . . 68 4.4.6 Ablation Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Chapter 5 Preserving Zero-shot Capability in Supervised Fine-tuning for Multi-label Text Classification 75 5.1 Performance Trade-of in Multi-label Text Classifcation . . . . . 76 5.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 5.2.1 Unsupervised Approaches . . . . . . . . . . . . . . . . . . . . . 80 5.2.2 Supervised Approaches . . . . . . . . . . . . . . . . . . . . . . . 80 5.3 Proposed Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 5.3.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 5.3.2 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . 83 5.3.3 One-sided Fine-tuned Dual Encoder (OF-DE) . . . . . . . . . . 83 5.3.4 One-sided Fine-tuned Label-wise Attention Network (OF-LAN) 86 5.3.5 Auxiliary Self-supervised Training on Label Descriptions . . . . 88 5.4 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.4.2 Main Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.3 Zero-shot Performance Analysis . . . . . . . . . . . . . . . . . . 94 5.4.4 Ablation Study . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 5.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Chapter 6 Conclusion 99 References 101 Appendix A — Appendix of Chapter 3 123 A.1 Hyperparameters of TSMixer . . . . . . . . . . . . . . . . . . . . 123 Appendix B — Appendix of Chapter 4 127 B.1 Query Expansion Prompt for Schema Retrieval . . . . . . . . . . 127 B.2 Query Expansion Prompt for Cell Retrieval . . . . . . . . . . . . 127 B.3 TableRAG Solver Prompt . . . . . . . . . . . . . . . . . . . . . . 128 Appendix C — Appendix of Chapter 5 129 C.1 Results with Additional Metrics . . . . . . . . . . . . . . . . . . . 129 | - |
| dc.language.iso | en | - |
| dc.title | 在注意力模型之上有效利用輔助資訊 | zh_TW |
| dc.title | Effective Utilization of Auxiliary Information Over Attention-Based Models | en |
| dc.type | Thesis | - |
| dc.date.schoolyear | 113-1 | - |
| dc.description.degree | 博士 | - |
| dc.contributor.oralexamcommittee | 林智仁;林守德;陳縕儂;李宏毅;王鈺強 | zh_TW |
| dc.contributor.oralexamcommittee | Chih-Jen Lin;Shou-De Lin;Yun-Nung Chen;Hung-Yi Lee;Yu-Chiang Wang | en |
| dc.subject.keyword | 機器學習,深度學習,輔助資訊,大型語言模型,注意力機制,時間序列預測,表格問答,多標籤文本分類, | zh_TW |
| dc.subject.keyword | Machine Learning,Deep Learning,Auxiliary Information,Large Language Model,Attention Mechanism,Time Series Forecasting,Table Question Answering,Multi-label Text Classification, | en |
| dc.relation.page | 129 | - |
| dc.identifier.doi | 10.6342/NTU202404537 | - |
| dc.rights.note | 同意授權(全球公開) | - |
| dc.date.accepted | 2024-11-04 | - |
| dc.contributor.author-college | 電機資訊學院 | - |
| dc.contributor.author-dept | 資訊工程學系 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| ntu-113-1.pdf | 3.51 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
