[paper review] Fast R-CNN (ICCV 2015)

Notice

Recent Posts

Recent Comments

Link

github

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

Stand on the shoulders of giants

[paper review] Fast R-CNN (ICCV 2015) 본문

Paper reviews

[paper review] Fast R-CNN (ICCV 2015)

finallyupper 2024. 5. 6. 09:28

Main Ideas

Fast R-CNN (Fast Region-based Convolutional Network method for object detection)
mAP SOTA 달성 (R-CNN, SPPnet보다 우수한 성능)
truncated svd를 사용하여 학습 시간은 줄이면서(Speed up 하면서) mAP를 거의 유지했다.

RoI (max) pooling layer

hxw ROI window를H x W grid가 되도록 h/H x w/W 크기의 sub windows로 나눈다.
pooling을 각 feature map별로 진행
spatial pytamid pooling layer의 원리(SPPnet)를 따르고 있다.

Initializing from pre-trained networks

Fast R-CNN을 pretrained된 network로 초기화함. (ImageNet networks)
이를 기반으로 세개의 transformations를 적용함.
1. 마지막 max pooling layer → RoI pooling layer
  - HxW의 고정된 크기로 만듦
2. 마지막 FC-layer를 두 sibling layers로 대체2)category-specific bounding-box regressor
3. 1)FC-layer+softmax
4. input를 두개로 변경2)list of RoIs
5. 1)list of images

Finetuning for detection

Streamlined training process

RoI별로 K+1개 카테고리로 구성된 확률분포 p = (p0, p1, .., pK)
K object classes에 맞게 $t^k=(t_x^k, t_y^k,t_w^k, t_h^k)$ 를 가짐.
- t^k specifies a scale-invariant translation and log-space height/width shift relative to an object proposal.

1) Multi-task loss

L1 loss를 사용하고 있어 sensitivity가 낮음.

2) Mini-batch sampling

SGD mini-batch를 활용함.
- 이때 해당 mini-batch는 N=2(two sampled images)를 사용함(uniformly at random)
- mini batch size = R = 128
- ⇒ 즉 Sampling 64 RoIs from each image
object proposals에서
- gt와의 IoU값이 0.5 이상인 경우에 속하는 총 25%의 RoI를 활용함.
- 남은 RoIs들은 ioU의 최댓값이 [0.1, 0.5)에 해서 최종적으로 u=0이됨.
training할때 data augmentation은 0.5의 확률로 flip만 적용함.

3) Back-propagation through RoI pooling layers

4) SGD hyper-parameters

softmax classification : zero-mdean Gaussian dist, std=0.01
bbox regression : zero-mean Gaussian dist, std = 0.001
biases = 0
lr : w=1, bias=2, global lr=0.001
SGD
- 30k mini-batch iterations (global lr=0.001)
- lr을 0.0001로 낮추고10k iterations 다시 학습
- 만일 데이터셋이 더 커지면 iterations 늘임.
- momentum = 0.9, param decay = 0.0005

5) Scale Invariance

brute-force approach
- pre-defined pixel size (train, test)
multi-scale approach
- image pyramid를 사용하여 네트워크에 대략적인 scale-invariance를 부여한다

6) Fast R-CNN detection

input
- a list of images
- a list of R object proposals
test time에서 보통은 R=2000이기는 한데 45k 정도로 큰 값을 본 논문의 경우는 사용하였음.
image pyramid의 개념을 활용해서
→ 각 RoI 스케일링해서 scaled RoI는 대략 224^2 pixels
각 object class k마다 RoI r에 detection confidence를 부여함.
(RoI r에 대해 class가 k일 확률 pk)
이후 non-maximum suppression을 클래스마다 독립적으로 적용함.

Truncated SVD for faster detection

= simple compression method

Main results

pre-trained ImageNet models
- CaffeNet (== AlexNet) from R-CNN (S)
- VGG_CNN_M_1024 : S와 depth는 같지만 이보다 wide함 (M)
- very deep VGG16 (L)
⇒ single-scale training and testing
VOC12로 최고 result인 mAP of 65.7% 달성

VOC10에서는 SegDeepM이 더 높은 mAP를 가졌음
- SegDeepM은 voc12 train-val과 segmentation annots로 학습된 모델인데 markow random fild를 써서 r-cnn을 부스팅했다고한다.
- ⇒ r-cnn말고 fast r-cnn과 결합한다면 더 좋은 결과를 얻을 것.
VOC07

이 경우 SPPnet, R-CNN을 FRCN과 비교하기위함이고

동일한 pre-trained VGG16 network와 bounding-box regression을 사용함.

sppnet이 63.1인데 conv layers를 finetuning하는것 만으로도(frcn) 66.9까지 올라감.

diff image들을 제거하면 더 올라갔음.

train,test time (VOC07)
- FRCN이 Truncated svd가 없을때 R-CNN보다 146배 빨랐고 있을때는 213배 빨랐음.
  - 이때 Training시간이 약 9배(=8.8배) 향상되었음(⇒84시간→9.5시간)
- SPPnet과 비교했을때는 학습때 2.7배빨랐고
  - 테스트때 truncated svd없으면 7배, 있으면 10배 빨랐다.

Truncated SVD를 사용하면, detection time을 줄이면서 mAP감소 미미했다.
Importance of fine-tuning the conv layers
- 아래 layer 전을 다 freeze하고 실험했을때 RoI pooling layer도함께 fine-tuning할경우에 성능이 더 좋았다.

실제로 classification loss만 사용했을 때, multi-task training + bounding box regressor을 학습하지 않았을 때, classification loss + 다른 parameter을 frozen 상태로 고정했을 때 보다 multi-task training을 하면서 bounding box regressor을 사용했을 때 mAP가 증가
scale invariance
- 위에서 언급하였듯이 Brute-Force Approch 및 Multi-Scale Approach 중 어느 것이 더 좋은지 학습해보았다.
- Brute-Force Approach에서는 1개의 scale을 사용하여 픽셀 사이즈를 600으로 고정하였고, Multi-Scale Approach에서는 5개의 scale {480, 576, 688, 865, 1200}을 사용하였다. 결과 또한 앞서 얘기했던 것과 같이 S,M에서는 Multi-Scale Approch가 좋은 mAP를 보였고, L에서는 Brute-Force Approach가 높은 mAP를 보였다.

5.3은 training data가 많으면 mAP가 올라간다고 설명한다.

5.4는 SVM보다 Softmax가 L에서 더 좋은 성능을 낸다고 한다.

5.5는 object proposal이 많다고 무조건 mAP가 올라가는 것은 아니라고한다.

References

https://noru-jumping-in-the-mountains.tistory.com/14 results쪽

https://github.com/zjZSTU/Fast-R-CNN/tree/master?tab=readme-ov-file#文档浏览

https://github.com/rbgirshick/fast-rcnn?tab=readme-ov-file

GitHub - rbgirshick/fast-rcnn: Fast R-CNN

Fast R-CNN. Contribute to rbgirshick/fast-rcnn development by creating an account on GitHub.

github.com

240317_paperreview_fast_r-cnn (1).pdf

2.05MB

발표자료 첨부합니다.

'Paper reviews' 카테고리의 다른 글

[paper review] Diffusion Models for Image Restoration and Enhancement – A Comprehensive Survey (0)	2024.07.01
[paper review] F2DNet: Fast Focal Detection Network for Pedestrian Detection (1)	2024.05.14
[paper review] Fourmer: An Efficient Global Modeling Paradigm for Image Restoration (1)	2024.03.08
[paper review] FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model (0)	2024.03.08
[paper review] A Critical Evaluation of Website Fingerprinting Attacks (0)	2024.02.19

'Paper reviews' Related Articles

Stand on the shoulders of giants

[paper review] Fast R-CNN (ICCV 2015) 본문

[paper review] Fast R-CNN (ICCV 2015)

Main Ideas

RoI (max) pooling layer

Initializing from pre-trained networks

Finetuning for detection

1) Multi-task loss

2) Mini-batch sampling

3) Back-propagation through RoI pooling layers

4) SGD hyper-parameters

5) Scale Invariance

6) Fast R-CNN detection

Truncated SVD for faster detection

= simple compression method

Main results

References

'Paper reviews' 카테고리의 다른 글

티스토리툴바