[paper review] SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection

Notice

Recent Posts

Recent Comments

Link

github

« 2026/02 »
일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28

Tags more

Archives

Today

Total

관리 메뉴

Stand on the shoulders of giants

[paper review] SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection 본문

Paper reviews

[paper review] SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection

finallyupper 2024. 7. 18. 22:50

Introduction

해당 논문에서는 one-stage detector인 YOLOv5와 domain adaptiation을 결합해 cross-domain detection 성능을 개선할 수 있는 semi-supervised domain adaptive YOLO를 제안했다.

Contributions

DAOD 문제를 해결할 수 있는 novel한 SSDA-YOLO를 제안함
Penalty function인 distillation loss, consistency loss를 새롭게 제안해 성능을 개선시킴.
저명한domain transfer 실험들에서 성능 개선
Outdated detector인 Faster R-CNN과 비교했을때 발전된 Detector가 DAOD 필드에 필요함을 보임.

Related Works

1) Object Detection

Two-stage architectures = ROI들을 추출하고 bbox classification, regresssion 수행
One-stage detectors = 사전에 정의된 achors 기반으로 예측한 feature maps에서 바로 output bbox와 class들을 추출

Two-stage 방식은 느리지만 정확하고, One-stage방식은 빠르지만 정확도가 떨어진다는 장단점을 가졌다.

해당 논문에서는 CSPNet, FPN, Focal Loss를 결합한 YOLOv5를 사용해 단순하고 효율적인 접근 방식을 취하고자 했다.

2) Cross-Domain Object Detection

기존 Cross-domain object detection 방법론들은 two-stage detector Faster R-CNN 기반이었다.
- DA-Faster = Gradient Reversal layer를 제안
- SWDA = instance-level, image-level alignments를 디자인해 new domain에서 성능을 개선
- SCL = gradient detach 기반 stacked complementary losses 방법론
- NLDA = target domain에서 noisy한 라벨들로 학습할 수 있는 robust learning을 formulate
- MAF(Multi-adversarial Faster R-CNN), ATF(Asymmetric triway Faster R-CNN), PA-ATF
- MEAA = LUAA와 MUCA로 구성된 두 tailor-made 모듈들로 이루어진 Faster R-CNN기반 방법론
- UMT = domain bias를 완화하기 위해 CycleGAN을 이용해 가짜 학습 이미지를 생성함.
- MSDA = 여러 source domain들에서 얻은 labeled data에 집중해 DMSN 제안 (Domain invariance 개선, discriminative power를 보존)
- US-DAF = 학습하는 동안 부정적인 transfer를 줄여주는 multi-label learning과 함께 DA Faster RCNN 사용
- SIGMA = Source와 target data를 그래프로 나타내고 adaptation 문제를 graph matching 문제로 reformulate함.
One-stage detectors로 DAOD를 해결하려는 시도들도 있었다.
- EPMDA = objectness map들을 정확히 추출하기 위해 FCOS를 adapt
- I3Net = SSD 구조를 위해 특별히 디자인된 complementary modules

3) Semi-supervised Domain Adaptation

UDA (Unsupervised Domain Adaptation)
- labeled된 source domain으로부터 unlabeled target domain으로 adapt하는 모델을 정의함.
- 원래는 이미지 분류에 많이 썼었음.
- target domain
  - label들은 training할때 못보고, 이미지들은 사용됨.
- 일반적으로 이전 DAOD 방법론들은 source와 target domain의 이미지들을 따로 다뤘음.
- 근데 또 이때 target과 관련한 일부의 labeled image들을 구할 수 있으면
- ⇒ semi-supervised learning(few-shot learning)을 적용해서 이득볼수 있다.
예시
- DTPL = target domain 이미지들엥 image-level annotation들을 제공하는 weakly supervised progressive domain adaptation framework을 제안함
- MTOR = 기존 semi-supervied learning 테스크를 위해 디자인되었던 Mean Teacher (MT)에서 Oject relation을 연구했다.
- UMT = 여기서 unbiased Mean Teacher를 사용함으로써 Faster R-CNN adatation을 개선했다.
- TPKP, TDD, PT = cross-domain 불일치를 해결하고 target과 관련한 feature들을 찾기 위해 MT 모델을 활용한 knowledge distillation framework을 적용했다.
- DAFormer = self-training pipeline에 MT model을 사용하여 cross-domain semantic segmentation task를 해결하고자했다.
이런 principle들에서 영감을 받아 해당 논문에서는 prevalent MT model을 사용함. Combination
- Supervised learning in source dataset
- Unsupervised learning in target dataset
  - unlabeled target training image들은 teacher model에 넣기 전에 source와 비슷하게 생긴 global scence과 같은 걸로 style-translate를 하게됨 ❕

Preliminaries and Motivations

그동안 DAOD 방법론들의 SOTA가 two-stage detectors 기반이었던 이유는 아래 두 branches를 제공하기 때문이다.

Classification
Localization

ex. DA-Faster : instance-level과 image-level에서의 (총 두개의) represenntations를 제안함.

최근 one-stage detectors 기반도 이 두 levels에서 어떻게 cross-domain feature들을 추출할지에 연구했음. 대부분은 하나의 detection model에 source와 target domain 모두를 학습에 쓰다보니 adversarial way라서 문제적이었다.

Two principal challenges

1) Knowledge Distillation Structure

위에서도 언급했듯 기존에는 adversarial process였다. 구체적으로 GRL(Gradient Reversal Layer)라고해서 두개의 conflict하는 optimzation objectives를 realize하는데에 사용되는 bidirectional operator이다.

Forward training할대는 classification error를 minimize
backpropagation할때는 binary-classifcation error를 maximize하고 domain-invariant features학습

최근에는 더 robust한 teacher-sutdent framework이 DAOD에서 사용되기 시작했다. 해당 distillation 구조는 source detector가 target image의 object들을 잘 잡도록 해준다.

그치만 위 방법론들은 다 Faster R-CNN 기반이고 intermediate features 기반의 knowledge를 distill함.

본 논문에서는 teacher-student framework를 사용하나, superior one-stage detector YOLOv5의 final response를 기반으로 knowledge를 distill한다. (본 연구의 경우 student, teacher의 아키텍처를 동일하게 가져감)

2) Cross-domain Features Extraction

One-stage detection framework에서는 classification과 localization을 통합해서 처리한다.

EPMDA
- FCOS 기반
- image-lavel과 instance-level의 feature extraction을 모방해 1)global 2)center-aware discriminators를 제안함.
I3Net
- SSD 기반
- Image-level과 pixel-level feature들을 compensate하기 위해 하나의 multi-label classifier와 2개의 domain discriminator들을 디자인했다.
그 외 DA-YOLO
- YOLOv3 기반
- 두개의 adaptive 모듈들 사용 = RIA, MSIA(image와 instance level adapation을 각각 수행하기 위해) (3개의 domain clsssifiers와 함께)
MS-DAYOLO
- YOLOv4 기반
- domain adaptation network(DAN)을 Backbone으로 넣고 직접적으로 domain-invariant featrue들을 학습시킴.

본 논문에서는 pseudo cross-generated image들로 image-level의 shift를 해결하고자 했고 Student model학습을 guide하기 위해 instance-level에서의 target domain feature들을 얻고자 Mean Teacher model을 사용했다.

Proposed method

4 Main Components

Mean Teacher model with knowledge distillation framework for guiding robust student nw updating
pseudo cross-generated training images
updated distillation loss
novel consistency loss

용어 정리

I^s : soure image들의 집합 (N개의 object bounding boxes를 가짐)

총 c개의 object class들을 가짐.

I^t : unlabeled target image들의 집합

I^s 에 N_s개의 source image들이 있을때 bbox coord 집합 B^s와 class labels C^s 존재

📌 Goal : Dataset Ds, Dt가 주어졌을때 target domain에서 좋은 성능을 갖는 모델을 학습해 DAOD 문제를 해결하자.

Mean Teacher Model

Mean Teacher model은 이미지 분류 task에서 semi-supervised learning을 위해 처음 제안되었다. 해당 모델은 student, teacher이라는 두개의 동일한 모델 아키텍처로 구성된 knowledge distillation 구조를 갖고 있다.

Domain Adaptation task를 위해서 student model은 gradient descent optimizer를 사용해서 source domain의 labeled data를 학습하고 teacher model은 studet 모델로부터 EMA weights를 받아서 업데이트 된다

Ps, Pt 는 각각 student와 teacher의 weight parameters

본 연구에서는 teacher model의 input으로 unlabeled target domain samples D_t만을 사용하고 Student model에서는 여기서 부분적으로 unlabled samples I^t를 사용해 학습한다. Distillation하는 동안 teacher model prediction들로부터 높은 probabilities를 갖는 bbox들을 선택해서 pseudo labels로 사용하고, stduent model은 target domain과의 variance를 줄여 robustness를 얻고자 한다.

Teacher model에서 NMS를 사용해 threshold 기준으로 쳐내고 sorting해서 구성한 pseudo label들을 student model에 전달해서 student model에게 target domain에 대한 instance-level의 feature들을 제공한다.

Pseudo Training Images Generation

Distillation network을 잘 설계하더라도 student model은 source domain에 있는 images I^s가 지배적이고 teacher model에서는 target domain feature들에의해 guide되기 때문에 image-level domain differences가 존재한다.

이러한 문제를 해결하기위해 본 논문에서는 unpaired image translator CUT을 사용했다.

Remedying Cross-Domain Discrepancy

Consistency Loss Function

Student model에 souce, target-like paired images $(I^t, I^t_f)$ 삽입 ⇒ scence-level에서의 data분포는 다를지라도 같은 label space에 존재함.

가설 - 양쪽 domain 이미지들이 Student model에 주어졌을때 output이 consistent 해야만 한다.

⇒ 따라서 두 output이 최대한 비슷하려면 constraint 필요

가능한 선택지들

두 feature maps 사이에 intermediate supervision 적용
final prediction들 사이에 error constraint
1,2를 통합

1번인 intermediate supervision strategy는 convolutional pose machines (CPM)에서 먼저 연구되었고 기울기 소실 문제 해결(Supervised)에 도움이 되긴 했지만 해당 연구에서는 Unsupervised DA penalization이기 때문에두 input (I^t, I^t_f)을 넣은 이후에 intermediate features를 사용하지 않고 consistent한 predictions를 갖기를 원한다.

⇒ 따라서 두번째 constraint를 선택 (두개의 final outputs 사이의 L2 distance 계산)

⇒ “consistency loss”

Overall Optimization

Inference할때는 학습된 student model만 사용하고 target image를 input으로 사용하게된다. 해당 모델은 모든 loss들을 jointly optimize해서 end-to-end로 학습된다.

Experiment

detector = YOLOb5-L (Large parameters)

Training

$(I^s, I^s_f)$ with labels
$(I^t, I^t_f)$ without labels

Transfer Experiment Design

*Gain = mAP 증가(Gain)

*Rel. = Oraclel mAP에 대한 상대적 UDA 향상

The Source Only indicates training with labeled source images and directly testing on the target data without domain adaptation.
The Oracle indicates training and testing with labeled target images
Base = 기본 SSDA-YOLO framework

1) Real → Virtual adaptation : PascalVOC → Clipart1k

Dataset
- PascalVOC 2007, 2012 datasets (source domain)
- Clipart1k dataset (target domain)
Results
- Faster R-CNN 기반의 방법은 YOLOv5 기반 방법에 비해 덜 우수한 성능을 보였. 이는 더 강력하고 발전된 기본 탐지기를 사용하는 것이 DAOD 작업에 중요함을 시사할 수 있다.
*Base D=Distillation, C: Consistency , DC:Both

2) Normal → Adverse weather adaptation : Cityscapes → Foggy Cityscapes

Dataset
- Cityscapes (source domain)
- Foggy Cityscapes (target domain)
Results
- 논문의 Base가 Source only보다 훨씬 좋은 성능을 보였다.
- Base에 distillation loss와 consistency loss를 추가했을때 mAP가 55.9로 TDD보다 훨씬 좋은 성능을 보였다.
- mAP이외에 Gain, Rel.을 측정했을때 Ours과 크게 차이나지 않는 결과를 보였다.

3) Self-made yawning datasets using various K-12 course videos : Source Classroom → Target Classroom

Dataset
- 동일 city에서 real course 비디오들을 수집하고, 다른 도시에서 30 course videos (다른 교실)를 수집함
- 각각을 soucre, target school images로 할당 (3초마다 frames 생성)
Results
- Base_DC와 Oracle 사이에 여전히 gap이 존재하기는 하지만 실제 교실에서 cross-domain behavior detection에서 accuracy degradation을 완화할 수 있었다.

Conclusion

Knowledge Distillation Framework 제안
Global domain difference를 줄이기 위해 style-transfer를 해서 pseudo-training images를 cross-generate했다.
Consistency loss function을 구상해서 prediction shifts를 줄이고자 했다.

'Paper reviews' 카테고리의 다른 글

[paper review] GDP: Generative Diffusion Prior for Unified Image Restoration and Enhancement (1)	2024.08.25
[paper review] Denoising Diffusion Models for Plug-and-Play Image Restoration (0)	2024.08.03
[paper review] DPS (Diffusion Posterior Sampling For General Noisty Inverse Problems) (1)	2024.07.14
[paper review] Come-Closer-Diffuse-Faster: Accelerating Conditional Diffusion Models forInverse Problems through Stochastic Contraction (0)	2024.07.10
[paper review] DDRM (Denoising Diffusion Restoration Models) (0)	2024.07.06

'Paper reviews' Related Articles

Stand on the shoulders of giants

[paper review] SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection 본문

[paper review] SSDA-YOLO: Semi-supervised domain adaptive YOLO for cross-domain object detection

Introduction

Contributions

Related Works

1) Object Detection

2) Cross-Domain Object Detection

3) Semi-supervised Domain Adaptation

Preliminaries and Motivations

Two principal challenges

1) Knowledge Distillation Structure

2) Cross-domain Features Extraction

Proposed method

용어 정리

Mean Teacher Model

Pseudo Training Images Generation

Remedying Cross-Domain Discrepancy

Consistency Loss Function

Overall Optimization

Experiment

Training

Transfer Experiment Design

1) Real → Virtual adaptation : PascalVOC → Clipart1k

2) Normal → Adverse weather adaptation : Cityscapes → Foggy Cityscapes

3) Self-made yawning datasets using various K-12 course videos : Source Classroom → Target Classroom

Conclusion

'Paper reviews' 카테고리의 다른 글

티스토리툴바