[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection

Notice

Recent Posts

Recent Comments

Link

github

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

Stand on the shoulders of giants

[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection 본문

Paper reviews

[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection

finallyupper 2024. 5. 14. 14:25

Main idea

Focal detection network + fast/light-weight suppression head(based on CSP)
- Focal detection network ⇒ rpn w/ per pixel center & scale regression
- light suppression head ⇒ detached settings ∴ light
- False Positive를 줄인다. (by fast suppression)
*기존 : RPN + Detection head(bbox heads)
Anchor free method
*기존: single/multi-stage detectors는 rely on anchors
loss MR-2 in havy occlusion settings

Background

Pedesetrian detection
- key = model이 light & efficient해야함. (autonomous vehicle의 computation power문제로인해)
RPN
- 기존에 느린 selective search-based region proposal generation을 CNN기반 network로 변경
- detectionhead와 end-to-end로 학습 가능.
Two-stage detectors
- (-): RPN자체는 그냥 후보 region 제안용도로 가볍게 사용되었었음
- → 이후 Detection head에서 refine해야함
One-stage detectors
- RPN X
- patch별 detection
- (-) class imbalance

⇒ 둘다 anchor에 의존적

Anchor-free approaches (one-stage detector)

one-stage detector에서 patch별하는거 대신 pixel별로 class 예측 (→이때 downscaled된 feature map)
center and scale-based approaches
1. object의 center pixel여부를 classify
2. 해당 object의 scale을 regress(예측)
(-) more false positives .. (penalty reduced focal loss)

Architecture

focal detection network는 rpn보다 strong detection candidates를 만듦.

1) Feature Extraction (HRNet as backbone)

high-resolution feature들을 추출함.
bbox를 제대로 그리기 위해서 high-resolution features가 요구됨.
HOW?
- backbone의 stage들에서 feature map들을 가져옴
- → bilinear interpolation과 conv operations를 통해 해당 backbone의 여러 stage들로부터 얻은 feature map들을 모 (h/4, w/4) 로 upscaling
- interpolation은 메모리 비용 x

2) Focal Detection Network

center and scale-based approaches
→ center와 scale map 예측
CSP와 비슷 (different loss settings)

Loss

(1) Center loss

cross entropy loss w/ focus weight (prediction confidence기반)
- 쉬운 샘플들에대한 contribution은 줄이고 optimizer가 hard samples에 focus하게함.
false positive가 true center와 가까우면 M값이 커져서 (1-M)^베타 term은 작아짐.
→ 생각해보면 그렇다면 center가 가까운 경우에 대해서 false positives가 충분히 punished되지 못한다.(Fast Supression Head 추가 배경)
(2) Focal Detection head의 Loss

Fast R-CNN에서의 Smooth L1 loss 대신 Vanilla L1 loss를 regression loss로 사용.
- 해당 연구에서는 height의 log값을 사용하고 있기때문에 smooth l1 loss를 사용하게되면 penalty가 더 줄어들어서 detection하기 더 안좋아짐 (불충분한 iou로 인한 fp발생)
FDN loss = Regression + classification + offset loss

3) Fast Suppression Head

center가 가까운 경우에 대해서 false positives가 충분히 punished되지 못한다.
- NMS는 겹치는 bbox들에 대해 쳐내는거라서 iou가 0.5보다 작게되면 NMS로 제거가 안될 수 있으니까.
→ NMS로 suppressed되기는 하지만 추가적인 suppression step도 여전히 필요(추가 suppress)

Detached settings → gradient flow가 detection head로 가지x
Loss = Binary cross entropy

4) Pedestrian detection (Detection model)

score = S(Focal Detection network )+ S(Fast Suppression Head)
*thresholding hyperparam 제거
Goal : pedestrain이 not suppressed로 detected되도록 하자.
Detection model
- joint prob. dist. of P(s, d, c, h) 사용 $P(\neg s, d|c, h)$
  - d : detected
  - c : center of pedestrian
  - h : height of pedestrian
  - s : suppressed

Experiment

Benchmark dataset
- City persons, Euro city persons(val sets), caltech pedestrian dataset(test set)
Evaluation Criteria
- $MR^{-2}$ Log-average miss rate (lower is better)
  - miss rate (MR)을 FPPI rates 동일 spaced log-space in range 10^-2 to 10^0로 averaging한거
  - FPPI = FP / #tested images
  - MR = FN / #GT boxes
  - → detection bbox와 gt bbox기반 iou 계산 → tp(matched) / fp(dismatched)
- Inference time https://events.afcea.org/FedID22/Custom/Handout/Speaker110137_Session9656_1.pdf
- Weight averaging

Results

Progressive fine tuning
- dataset a로 train + dataset b로 fine0tuning ⇒ all times low MR-2 in heavy occlusion settings

References

CSP → head 따옴. (RPN보다 효율적이고 강력함) https://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_High-Level_Semantic_Feature_Detection_A_New_Perspective_for_Pedestrian_Detection_CVPR_2019_paper.pdf

HRNet = backbone model로 사용 https://arxiv.org/pdf/1908.07919v2
- Extracts high-resolution features from images
- process전반에 high resolution representations를 유지할 수 있도록해주는 cnn
Evaluation metrics (FPPI, LAMR, MR-2)

occlusion

https://www.baeldung.com/cs/image-processing-occlusions#:~:text=Put simply%2C occlusion in an,the building (background surface)%3A
하나의 object가 다른 object의 일부를 가리는 것.
(-) Reduce the available visual information
ex. autonomous driving

color 기준 tracker 구성시 파란색 물체를 두개의 object로 인식하는 문제 발생.

⇒ should be “robust”

PPT

240511_paperreview_F2DNet (2).pdf

2.29MB

'Paper reviews' 카테고리의 다른 글

[paper review] ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models (1)	2024.07.03
[paper review] Diffusion Models for Image Restoration and Enhancement – A Comprehensive Survey (0)	2024.07.01
[paper review] Fast R-CNN (ICCV 2015) (0)	2024.05.06
[paper review] Fourmer: An Efficient Global Modeling Paradigm for Image Restoration (1)	2024.03.08
[paper review] FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model (0)	2024.03.08

'Paper reviews' Related Articles

Stand on the shoulders of giants

[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection 본문

[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection

Main idea

Background

Architecture

1) Feature Extraction (HRNet as backbone)

2) Focal Detection Network

Loss

3) Fast Suppression Head

4) Pedestrian detection (Detection model)

Experiment

Results

References

occlusion

'Paper reviews' 카테고리의 다른 글

티스토리툴바