請用此 Handle URI 來引用此文件:
http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85479完整後設資料紀錄
| DC 欄位 | 值 | 語言 |
|---|---|---|
| dc.contributor.advisor | 陳炳宇(Bing-Yu Chen) | |
| dc.contributor.author | Chao-Hsien Ting | en |
| dc.contributor.author | 丁肇賢 | zh_TW |
| dc.date.accessioned | 2023-03-19T23:17:12Z | - |
| dc.date.copyright | 2022-07-22 | |
| dc.date.issued | 2021 | |
| dc.date.submitted | 2022-07-13 | |
| dc.identifier.citation | [1] Audio description or media alternative. https://www.w3.org/TR/2008/REC-WCAG20-20081211/#media-equiv-audio-desc, 2022. [2] The audio description project. https://adp.acb.org/guidelines.html, 2022. [3] Audio descriptions for 360 degree video: Best practices. https://www.youtube.com/watch?v=jOX6gxUZq8w, 2022. [4] Blindsquare. https://www.blindsquare.com/, 2022. [5] Microsoft soundscape. https://www.microsoft.com/en-us/research/product/soundscape/, 2022. [6] Providing a movie with extended audio descriptions. https://www.w3.org/TR/WCAG20-TECHS/G8.html, 2022. [7] W3c image concepts. https://www.w3.org/WAI/tutorials/images/, 2022. [8] Xr accessibility user requirements. https://www.w3.org/TR/xaur/, 2022. [9] Youdescribe. https://youdescribe.org/, 2022. [10] W. Bares, V. Gandhi, Q. Galvane, and R. Ronfard. Pano2vid: Automatic cinematography for watching 360◦ videos. In Proc. Eurograph. Workshop Intell. Cinematogr. Editing, page 1, 2017. [11] C. J. Branje and D. I. Fels. Livedescribe: can amateur describers create high-quality audio description? Journal of Visual Impairment & Blindness, 106(3):154–165, 2012. [12] E. D’Atri, C. M. Medaglia, A. Serbanati, U. B. Ceipidor, E. Panizzi, and A. D’Atri. A system to aid blind people in the mobility: A usability test and its results. In Second International Conference on Systems (ICONS’07), pages 35–35. IEEE, 2007. [13] D. Doukhan, J. Carrive, F. Vallet, A. Larcher, and S. Meignier. An open-source speaker gender detection framework for monitoring gender equality. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 5214–5218. IEEE, 2018. [14] A. Fidyka and A. Matamala. Audio description in 360º videos: Results from focus groups in barcelona and krak´ow. Translation Spaces, 7(2):285–303, 2018. [15] A. Fidyka and A. Matamala. Retelling narrative in 360° videos: Implications for audio description. Translation Studies, pages 1–15, 2021. [16] L. Gagnon, C. Chapdelaine, D. Byrns, S. Foucher, M. Heritier, and V. Gupta. A computervision-assisted system for videodescription scripting. In 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition-Workshops, pages 41–48. IEEE, 2010. [17] Z. Ge, S. Liu, F. Wang, Z. Li, and J. Sun. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. [18] J. L. Gonz´alez-Mora, A. Rodriguez-Hernandez, E. Burunat, F. Martin, and M. A. Castellano. Seeing the world by hearing: Virtual acoustic space (vas) a new space perception system for blind people. In 2006 2nd International Conference on Information & Communication Technologies, volume 1, pages 837–842. IEEE, 2006. [19] A. Guo, S. McVea, X. Wang, P. Clary, K. Goldman, Y. Li, Y. Zhong, and J. P. Bigham. Investigating cursor-based interactions to support non-visual exploration in the real world. In Proceedings of the 20th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’18, page 3–14, New York, NY, USA, 2018. Association for Computing Machinery. [20] J. Herskovitz, J. Wu, S. White, A. Pavel, G. Reyes, A. Guo, and J. P. Bigham. Making mobile augmented reality applications accessible. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’20, New York, NY, USA, 2020. Association for Computing Machinery. [21] H.-N. Hu, Y.-C. Lin, M.-Y. Liu, H.-T. Cheng, Y.-J. Chang, and M. Sun. Deep 360 pilot: Learning a deep agent for piloting through 360 sports videos. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1396–1405. IEEE, 2017. [22] M. Kobayashi, T. O’Connell, B. Gould, H. Takagi, and C. Asakawa. Are synthesized video descriptions acceptable? In Proceedings of the 12th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’10, page 163–170, New York, NY, USA, 2010. Association for Computing Machinery. [23] A. L´ecuyer, P. Mobuchon, C. M´egard, J. Perret, C. Andriot, and J.-P. Colinot. Homere: a multimodal system for visually impaired people to explore virtual environments. In IEEE Virtual Reality, 2003. Proceedings., pages 251–258. IEEE, 2003. [24] M. S. Lee, W. Shin, and S. W. Han. Tracer: Extreme attention guided salient object tracing network. arXiv preprint arXiv:2112.07380, 2021. [25] J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte. Swinir: Image restoration using swin transformer. In IEEE International Conference on Computer Vision Workshops, 2021. [26] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll´ar, and C. L. Zitnick. Microsoft coco: Common objects in context. In D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, editors, Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing. [27] Y.-C. Lin, Y.-J. Chang, H.-N. Hu, H.-T. Cheng, C.-W. Huang, and M. Sun. Tell me where to look: Investigating ways for assisting focus in 360 video. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pages 2535–2545, 2017. [28] Y.-T. Lin, Y.-C. Liao, S.-Y. Teng, Y.-J. Chung, L. Chan, and B.-Y. Chen. Outside-in: Visualizing out-of-sight regions-of-interest in a 360 video using spatial picture-in-picture previews. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, pages 255–265, 2017. [29] X. Liu, P. Carrington, X. A. Chen, and A. Pavel. What makes videos accessible to blind and visually impaired people? In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems, CHI ’21, New York, NY, USA, 2021. Association for Computing Machinery. [30] Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo. Swin transformer: Hierarchical vision transformer using shifted windows. arXiv preprint arXiv:2103.14030, 2021. [31] S. Maidenbaum, S. Levy-Tzedek, D.-R. Chebat, and A. Amedi. Increasing accessibility to the blind of virtual environments, using a virtual mobility aid based on the” eyecane”: Feasibility study. PloS one, 8(8):e72555, 2013. [32] M. Montagud, I. Fraile, J. A. Nu˜nez, and S. Fern´andez. Imac: enabling immersive, accessible and personalized media experiences. In Proceedings of the 2018 ACM International Conference on Interactive Experiences for TV and Online Video, pages 245–250, 2018. [33] M. Montagud, P. Orero, and S. Fern´andez. Immersive media and accessibility: hand in hand to the future. 2020. [34] M. Montagud, P. Orero, and A. Matamala. Culture 4 all: accessibility-enabled cultural experiences through immersive vr360 content. Personal and Ubiquitous Computing, 24(6):887–905, 2020. [35] V. Nair, J. L. Karp, S. Silverman, M. Kalra, H. Lehv, F. Jamil, and B. A. Smith. Nav-Stick: Making Video Games Blind-Accessible via the Ability to Look Around, page 538–551. Association for Computing Machinery, New York, NY, USA, 2021. [36] R. Natalie, E. Jarjue, H. Kacorri, and K. Hara. Viscene: A collaborative authoring tool for scene descriptions in videos. In The 22nd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’20, New York, NY, USA, 2020. Association for Computing Machinery. [37] R. Natalie, J. Loh, H. S. Tan, J. Tseng, I. L. Y.-R. Chan, E. H. Jarjue, H. Kacorri, and K. Hara. The efficacy of collaborative authoring of video scene descriptions. In The 23rd International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’21, New York, NY, USA, 2021. Association for Computing Machinery. [38] A. Pavel, B. Hartmann, and M. Agrawala. Shot orientation controls for interactive cinematography with 360 video. In Proceedings of the 30th Annual ACM Symposium on User Interface Software and Technology, pages 289–297, 2017. [39] A. Pavel, G. Reyes, and J. P. Bigham. Rescribe: Authoring and automatically editing audio descriptions. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology, pages 747–759, 2020. [40] D. Sato, U. Oh, K. Naito, H. Takagi, K. Kitani, and C. Asakawa. Navcog3: An evaluation of a smartphone-based blind indoor navigation assistant with semantic features in a largescale environment. In Proceedings of the 19th International ACM SIGACCESS Conference on Computers and Accessibility, ASSETS ’17, page 270–279, New York, NY, USA, 2017. Association for Computing Machinery. [41] A. F. Siu, M. Sinclair, R. Kovacs, E. Ofek, C. Holz, and E. Cutrell. Virtual Reality Without Vision: A Haptic and Auditory White Cane to Navigate Complex Virtual Worlds, page 1–13. Association for Computing Machinery, New York, NY, USA, 2020. [42] T. Souˇcek and J. Lokoˇc. Transnet v2: An effective deep network architecture for fast shot transition detection. arXiv preprint arXiv:2008.04838, 2020. [43] V. Tiponut, Z. Haraszy, D. Ianchis, and I. Lie. Acoustic virtual reality performing manmachine interfacing of the blind. In WSEAS International Conference. Proceedings. Mathematics and Computers in Science and Engineering, number 12. Citeseer, 2008. [44] M. Torres-Gil, O. Casanova-Gonzalez, and J. L. Gonz´alez-Mora. Applications of virtual reality for visually impaired people. WSEAS transactions on computers, 9(2):184–193, 2010. [45] M. Wang, Y.-J. Li, W.-X. Zhang, C. Richardt, and S.-M. Hu. Transitioning360: Contentaware nfov virtual camera paths for 360° video playback. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 185–194. IEEE, 2020. [46] M. Wang, Y.-J. Li, W.-X. Zhang, C. Richardt, and S.-M. Hu. Transitioning360: Contentaware nfov virtual camera paths for 360° video playback. In 2020 IEEE International Symposium on Mixed and Augmented Reality (ISMAR), pages 185–194. IEEE, 2020. [47] B. F. Yuksel, S. J. Kim, S. J. Jin, J. J. Lee, P. Fazli, U. Mathur, V. Bisht, I. Yoon, Y. T. Siu, and J. A. Miele. Increasing video accessibility for visually impaired users with human-in-the-loop machine learning. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, CHI EA ’20, page 1–9, New York, NY, USA, 2020. Association for Computing Machinery. [48] Y. Zhang, P. Sun, Y. Jiang, D. Yu, Z. Yuan, P. Luo, W. Liu, and X. Wang. Bytetrack: Multiobject tracking by associating every detection box. arXiv preprint arXiv:2110.06864, 2021. [49] Y. Zhao, C. L. Bennett, H. Benko, E. Cutrell, C. Holz, M. R. Morris, and M. Sinclair. Enabling People with Visual Impairments to Navigate Virtual Reality with a Haptic and Auditory Cane Simulation, page 1–14. Association for Computing Machinery, New York, NY, USA, 2018. [50] Y. Zhao, E. Cutrell, C. Holz, M. R. Morris, E. Ofek, and A. D. Wilson. SeeingVR: A Set of Tools to Make Virtual Reality More Accessible to People with Low Vision, page 1–14. Association for Computing Machinery, New York, NY, USA, 2019. | |
| dc.identifier.uri | http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/85479 | - |
| dc.description.abstract | 視障者通常會藉由,具正常視力的口述者經過認知、選擇、進而描述影片中的關鍵視覺要素後,所製作成的口述影像,利用聆聽的方式去理解原始影片的內容。 360° 影片是一種新興的影像傳播形式,觀影者能透過環繞的畫面獲得身歷其境的體驗。然而,360° 影片全方位的特性使得口述者難以獲悉整體的視覺內容和剖析空間細節,而這些資訊往往是視障者建立沉浸感的重要因素。 透過與專業口述師的討論與發想,我們確立了數個描述 360° 影片的關鍵挑戰,並藉此一步步設計出了OmniScribe──一套致力於輔助口述者,為 360° 影片創作沉浸式口述影像的專業系統。 OmniScribe使用AI生成的影像內容感知疊層,幫助口述師更精確地掌握 360° 影片的內容以及細節;此外,OmniScribe使口述師能夠為視障者製作「空間化口述」和標示「沉浸式標籤」,進而讓閱聽者通過我們開發的手機app,享受沉浸式的口述影像。 在一項共計 11 位新手和專業口述師參與的實驗中,我們展示了OmniScribe在優化創作口述影像工作流程的價值;此外,我們透過一項共計 8 位盲人參與的實驗,初步證實了利用OmniScribe製作的口述影像相較於標準口述,更能使視障者在 360° 影片中獲得沉浸感。最後,我們在文末討論了促使 360° 影片更加泛用的設計方向。 | zh_TW |
| dc.description.abstract | Blind or visually impaired (BVI) people typically access videos via audio descriptions (AD) crafted by sighted describers who comprehend, select, and describe crucial visual content in the videos. 360° video is an emerging storytelling medium that enables immersive experiences that people may not possibly reach in everyday life. However, the omnidirectional nature of 360° videos makes it challenging for describers to perceive the holistic visual content and interpret spatial information that is essential to create immersive ADs for blind people. Through a formative study with a professional describer, we identified key challenges in describing 360° videos and iteratively designed OmniScribe, a system that supports the authoring of immersive ADs for 360° videos. OmniScribe uses AI-generated content-awareness overlays for describers to better grasp 360° video content. Furthermore, OmniScribe enables describers to author spatial AD and immersive labels for blind users to consume the videos immersively with our mobile prototype. In a study with 11 professional and novice describers, we demonstrate the value of OmniScribe in the authoring workflow; and a study with 8 blind participants reveals the promise of immersive AD over standard AD for 360° videos. Finally, we discuss the implications of promoting 360° video accessibility. | en |
| dc.description.provenance | Made available in DSpace on 2023-03-19T23:17:12Z (GMT). No. of bitstreams: 1 U0001-1207202220260100.pdf: 39813501 bytes, checksum: 094a75bfb6398d0f3e444341890acfb9 (MD5) Previous issue date: 2021 | en |
| dc.description.tableofcontents | 致謝i 中文摘要ii Abstract iii List of Figures viii List of Tables xii Chapter 1 Introduction 1 Chapter 2 Related Work 5 2.1 Video Accessibility and Audio Descriptions . . . . . . . . . . . . . . . . . 5 2.2 Tools to Support Audio Description Authoring . . . . . . . . . . . . . . . . 7 2.3 Accessibility of Mixed-Reality Content . . . . . . . . . . . . . . . . . . . . 7 2.4 Accessibility of 360° Videos . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Chapter 3 Formative Study 11 3.1 Interview Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 3.2 Limitation of Recruitment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.3 Challenges to make 360° videos accessible . . . . . . . . . . . . . . . . . . 12 3.3.1 Hard to perceive holistic 360° content for describers . . . . . . . . 13 3.3.2 Unclear section division conceals the direction information . . . . 13 3.3.3 Constructing mental map with conceivable information . . . . . . 14 3.3.4 Interactivity and agency . . . . . . . . . . . . . . . . . . . . . . . . 14 3.4 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Chapter 4 System 17 4.1 Timeline Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 4.2 Description Panel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 4.3 Components for Enhancing Content-Awareness . . . . . . . . . . . . . . . 20 4.3.1 View control widgets . . . . . . . . . . . . . . . . . . . . . . . . . . 21 4.3.2 Section division overlay. . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3.3 Saliency overlay. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3.4 Object tracking overlay. . . . . . . . . . . . . . . . . . . . . . . . . 22 4.3.5 Content map of dynamic objects . . . . . . . . . . . . . . . . . . . 23 4.4 Components for Creating Immersive Labels . . . . . . . . . . . . . . . . . 23 4.4.1 Authoring spatial AD . . . . . . . . . . . . . . . . . . . . . . . . . . 23 4.4.2 Authoring scene descriptions . . . . . . . . . . . . . . . . . . . . . 24 4.4.3 Authoring object descriptions . . . . . . . . . . . . . . . . . . . . . 24 4.5 Mobile Prototype for Rendering OmniScribe-Generated Descriptions . . . 26 4.6 360° Video Preprocessing Pipeline . . . . . . . . . . . . . . . . . . . . . . . 27 4.6.1 Shot boundary and audio segmentation . . . . . . . . . . . . . . . . 27 4.6.2 Saliency detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.6.3 Object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4.6.4 Super resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 Chapter 5 Evaluation: Using OmniScribe to Create Descriptions 30 5.1 Video Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 5.2 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.3 Tasks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 5.4 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 5.5 Result: How People Used OmniScribe? . . . . . . . . . . . . . . . . . . . . 33 5.5.1 Usability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5.2 Section division overlay . . . . . . . . . . . . . . . . . . . . . . . . 34 5.5.3 Object tracking overlay . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.5.4 Saliency overlay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 5.5.5 Content map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 5.5.6 View control widgets . . . . . . . . . . . . . . . . . . . . . . . . . . 37 5.6 Quality of Authored Audio Descriptions . . . . . . . . . . . . . . . . . . . 37 Chapter 6 Evaluation: Immersive Labels for Consuming 360° Videos 41 6.1 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 6.2 Materials and Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.3 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 6.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.4.1 Spatial ADs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 6.4.2 Scene description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.4.3 Object description . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 6.5 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Chapter 7 Discussion and Future Work 47 7.1 Co-designing Immersive Experiences with BVI People . . . . . . . . . . . 47 7.2 Immersion and Cognitive Load . . . . . . . . . . . . . . . . . . . . . . . . . 48 7.3 Interactive Mobile System for 360° Videos . . . . . . . . . . . . . . . . . . 48 7.4 OmniScribe Generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . 49 Chapter 8 Conclusion 50 Bibliography 51 | |
| dc.language.iso | en | |
| dc.subject | 電腦視覺 | zh_TW |
| dc.subject | 360°影片 | zh_TW |
| dc.subject | 口述影像 | zh_TW |
| dc.subject | 多媒體 | zh_TW |
| dc.subject | 盲人 | zh_TW |
| dc.subject | 視力障礙 | zh_TW |
| dc.subject | 資訊聲音化 | zh_TW |
| dc.subject | audio description | en |
| dc.subject | computer vision | en |
| dc.subject | sonification | en |
| dc.subject | visual impairment | en |
| dc.subject | blind | en |
| dc.subject | 360° video | en |
| dc.subject | multimedia | en |
| dc.title | OmniScribe: 360°影片沉浸式口述影像創作系統 | zh_TW |
| dc.title | OmniScribe: Authoring Immersive Audio Descriptions for 360° Videos | en |
| dc.type | Thesis | |
| dc.date.schoolyear | 110-2 | |
| dc.description.degree | 碩士 | |
| dc.contributor.oralexamcommittee | 林文杰(Wen-Chieh Lin),詹力韋(Liwei Chan) | |
| dc.subject.keyword | 360°影片,口述影像,多媒體,盲人,視力障礙,資訊聲音化,電腦視覺, | zh_TW |
| dc.subject.keyword | 360° video,audio description,multimedia,blind,visual impairment,sonification,computer vision, | en |
| dc.relation.page | 56 | |
| dc.identifier.doi | 10.6342/NTU202201432 | |
| dc.rights.note | 同意授權(全球公開) | |
| dc.date.accepted | 2022-07-13 | |
| dc.contributor.author-college | 電機資訊學院 | zh_TW |
| dc.contributor.author-dept | 資訊工程學研究所 | zh_TW |
| dc.date.embargo-lift | 2022-07-22 | - |
| 顯示於系所單位: | 資訊工程學系 | |
文件中的檔案:
| 檔案 | 大小 | 格式 | |
|---|---|---|---|
| U0001-1207202220260100.pdf | 38.88 MB | Adobe PDF | 檢視/開啟 |
系統中的文件,除了特別指名其著作權條款之外,均受到著作權保護,並且保留所有的權利。
