Skip navigation

DSpace JSPUI

DSpace preserves and enables easy and open access to all types of digital content including text, images, moving images, mpegs and data sets

Learn More
DSpace logo
English
中文
  • Browse
    • Communities
      & Collections
    • Publication Year
    • Author
    • Title
    • Subject
    • Advisor
  • Search TDR
  • Rights Q&A
    • My Page
    • Receive email
      updates
    • Edit Profile
  1. NTU Theses and Dissertations Repository
  2. 電機資訊學院
  3. 資訊工程學系
Please use this identifier to cite or link to this item: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97067
Title: 在低光條件下機器感知
Machine Perception under Low Light Conditions
Authors: 莫易喆
Igor Morawski
Advisor: 徐宏民
Winston H. Hsu
Keyword: 低光機器認知,低光圖像處理,RAW圖像處理,語義引導,
low-light machine cognition,low-light image processing,raw image processing,semantic guidance,
Publication Year : 2025
Degree: 博士
Abstract: 深度學習雖然有效地提升了機器感知的穩健性,但是在低光源或是視覺環境不佳的情況下,目前的方法仍然具有挑戰性。本論文重點在探討低光源條件下,使用傳統 RGB相機的機器感知。 基於RGB相機系統對許多現實應用來說很重要,例如自動駕駛和混合實境。

在低光源的環境下,相機捕捉到的光子數量少,導致成像與感知的困難與低訊噪比。常見增加光子數量的策略,例如增加曝光時間、增加光圈大小或使用閃光燈,會導致偽影與成像成像品質下降。雖然在大量低光源圖像上進行訓練是提升性能的直接方法,但是這種方法不總是可行的,特別是在有許多下游任務的模型需要單獨訓練的情況下。此外,低光源資料的收集和標註通常很費工且昂貴。在本論文中,我們提出了解決這些問題的策略與方法。

首先,我們提出增強圖像品質的框架,並展示此圖像增強模型可以在物件偵測模型的監督下進行訓練。與需要成對的圖像數據集的方法相比,我們提出的方法使用物件偵測的標註,以及不需要在限制場景多樣性或是在特定靜態或受控環境下產生嚴格對齊的配對圖像。

其次,我們提出了利用RAW圖像作為比傳統sRGB圖像更穩健的模態。由於傳統ISP在極端低光源條件下容易出現錯誤,我們提出了一個在物件偵測模型的監督下進行訓練的類神經ISP。此外,我們提出了一個高計算效率的模型架構,引入傳統ISP的專家知識,可以改善對其他相機傳感器的泛化能力。

第三,我們提出了一個利用預訓練的視覺語言對比模型的訓練方法,在不需要任何配對或未配對的低光源與正常光源圖像的情況下,從語義上引導圖像增強模型,提升低光源條件下的圖像增強表現。我們提出的方法利用預訓練的視覺語言模型的零樣本和開放詞彙能力,能夠很有效地擴展到不同的數據集,且不受物件類別的限制。

我們透過廣泛的研究來驗證每個提出的模組的有效性。此外,我們提供了一個大型且高品質的RAW和sRGB的配對圖像數據集,並針對低光源物件偵測做標註。此開源數據集包含在非受控環境中捕捉的戶外場景,適合用來評估未來低光源影像增強與物件偵測方法的性能。
Deep learning greatly advanced the robustness of machine cognition. Still, adverse visual conditions, such as low light and various atmospheric conditions, remain challenging for existing methods. Our work focuses on machine cognition under low-light conditions using ubiquitous traditional RGB cameras, crucial in many real-life applications, such as autonomous driving or mixed reality.

Fundamentally, low-light imaging and perception difficulties are caused by a low photon count sensed by the camera, resulting in low SNR. Common strategies to increase the photon count — increasing exposure time, aperture size or using flash — lead to artifacts and further performance degradation. While directly training on large amounts of low-light data is a straightforward way to improve performance, it is not always feasible, especially if there are many downstream-task models that require separate training. Moreover, low-light dataset collection and annotation are often prohibitively laborious and expensive. In this thesis, we propose several strategies to address these issues.

First, we present an image-with-enhancement framework and demonstrate that the enhancement model can be optimized under the guidance of an object detector. In contrast with methods using paired image datasets, our proposed method relies on object detection annotation and thus does not require strictly aligned data that would limit the scene diversity to controlled or static environments.

Second, we propose to leverage raw sensor data as a more robust modality than traditional sRGB data. As traditional ISPs often break down under extreme low-light conditions, we propose a dedicated neural ISP that is optimized under the guidance of a downstream object detector. Furthermore, we propose a computationally efficient architecture integrating expert knowledge about traditional ISP to improve generalization to unseen sensors.

Third, we propose a training strategy leveraging a pre-trained contrastive vision and language model to semantically guide the enhancement model in a way that improves low-light performance without any need for paired or unpaired normal-light images. The proposed method is effective and scales well to include many datasets without constraining the object category set by leveraging zero-shot open-vocabulary capabilities of pre-trained visual-linguistic models.

We present extensive studies to validate the effectiveness of each of the proposed components and provide a large, high-quality dataset of processed and raw images annotated for low-light object detection, consisting of outdoor scenes captured in an uncontrolled environment, made publicly available for task-based comparison and benchmarking of future low-light enhancement and detection methods.
URI: http://tdr.lib.ntu.edu.tw/jspui/handle/123456789/97067
DOI: 10.6342/NTU202401143
Fulltext Rights: 同意授權(限校園內公開)
metadata.dc.date.embargo-lift: 2025-02-27
Appears in Collections:資訊工程學系

Files in This Item:
File SizeFormat 
ntu-113-1.pdf
Access limited in NTU ip range
67.53 MBAdobe PDF
Show full item record


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

社群連結
聯絡資訊
10617臺北市大安區羅斯福路四段1號
No.1 Sec.4, Roosevelt Rd., Taipei, Taiwan, R.O.C. 106
Tel: (02)33662353
Email: ntuetds@ntu.edu.tw
意見箱
相關連結
館藏目錄
國內圖書館整合查詢 MetaCat
臺大學術典藏 NTU Scholars
臺大圖書館數位典藏館
本站聲明
© NTU Library All Rights Reserved