๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

Fast R-CNN

by ์ œ๋ฃฝ 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•

 

 

0. Fast R-CNN
  • ๊ทธ๋ž˜์„œ ๋‚˜์˜จ ์นœ๊ตฌ๊ฐ€ fast R-CNN

  1. Selective Search
    1. input image๋ฅผ ๊ฐ€์ง€๊ณ  selective search ์ง„ํ–‰
    1. image ์•ˆ์— ๊ฐ์ฒด๊ฐ€ ์žˆ์„๋ฒ•ํ•œ ํ›„๋ณด๊ตฐ๋“ค์„ ์ตœ๋Œ€ ex) 2000๊ฐœ ์„ ์ •ํ•จ
    1. ROI ์˜์—ญ ์ถ”์ถœ⇒ ์ด ๋•Œ 2000๊ฐœ์˜ ์˜์—ญ์„ ๋‹ค ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ(Hierarohical sampling)์ด๋ผ๊ณ  ํ•จex) input image๊ฐ€ 2๊ฐœ๊ณ , region์ด 128๋กœ ์žก์•˜๋‹ค๋ฉด 64๊ฐœ์˜ ์˜์—ญ๋งŒ ํ›„๋ณด ์˜์—ญ์œผ๋กœ ๊ฐ€์ ธ๊ฐ
    2. ⇒ ํ•œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๋‹น์˜ ์ด๋ฏธ์ง€๋งŒํผ ๋‚˜๋ˆ ์ค€ ์• ๋“ค๋งŒ ์‚ฌ์šฉํ•œ๋‹ค
  1. CNN
    1. input image ํ•œ ์žฅ์„ ๊ทธ๋ƒฅ CNN ๊ตฌ์กฐ์— ๋„ฃ์–ด๋ฒ„๋ฆผ (conv+pooling์˜ ๋ฐ˜๋ณต ๊ตฌ๊ฐ„)
    1. CNN ๊ณ„์ธต ๋ฐ˜๋ณตํ•˜๋‹ค๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์—์„œ์˜ ํ’€๋ง์„ ROI pooling์œผ๋กœ ์ง„ํ–‰ํ•จ
  1. ROI pooling⇒ SPP pooling์„ ๋‹จ์ˆœํ™” ์‹œํ‚จ ๋ฐฉ๋ฒ•⇒ max pooling์„ ์‚ฌ์šฉํ•ด 7x7 feature map ์ถ”์ถœํ•ด์„œ ๊ณ ์ • ํฌ๊ธฐ ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ฌ
    ๊ทธ๊ฑธ ๊ฐ์•ˆํ•ด์„œ๋ผ๋„ ์ง„ํ–‰ (์‹œ๊ฐ„ ๋‹จ์ถ•)
  2. ** add) ROI์˜ ํ›„๋ณด ์˜์—ญ ํฌ๊ธฐ๋“ค์ด ๋‹ค ๋‹ค์–‘ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์˜ˆ๋ฅผ ๋“ค์–ด nxm ์™€ ๊ฐ™์ด ์ •์‚ฌ๊ฐํ˜•์ด ์•„๋‹Œ์• ๋“ค์˜ ๊ฒฝ์šฐ 7x7์˜ ์˜์—ญ์œผ๋กœ ์ชผ๊ฐœ๋Š” ๊ณผ์ •์—์„œ ํฌ๊ธฐ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜๋„ ์žˆ์Œ
  3. ⇒ SPP์˜ ๊ฒฝ์šฐ, 4x4 2x2 1x1๊ณผ ๊ฐ™์ด 3๊ฐœ์˜ pooling ๋ฐฉ๋ฒ•์„ ์‚ฌ์šฉํ–ˆ๋‹ค๋ฉด, ์ด ์นœ๊ตฌ๋Š” ํ•œ ๋ฒˆ๋งŒ ์ง„ํ–‰ํ–ˆ์Œ
  1. FC Layers
    1. ์ดํ›„, FC layer์„ ํ•œ ๋ฒˆ ๊ฑฐ์น˜๊ณ , ๋‘ ๊ฐœ์˜ ๊ฐˆ๋ž˜๋กœ ๋‚˜๋ˆ”
    1. ๊ฐ๊ฐ FC layer์„ ๋‘ ๋ฒˆ ๋” ๊ฑฐ์น˜๊ณ  1) classification 2) Boundary boxes regression ์ง„ํ–‰
  1. Loss Function
    1. ์ด ์นœ๊ตฌ์˜ ๊ฒฝ์šฐ, classification loss์™€ bbox regressor loss๋ฅผ ์„ž์–ด์„œ ์ข…ํ•ฉ์ ์ธ loss๋ฅผ ๊ตฌํ•จ
    1. ์ด loss๊ฐ’์„ ์ด์šฉํ•ด ์—ญ์ „ํŒŒ๋ฅผ ์ง„ํ–‰ํ•จ** classification: softmax๋กœ ์–ป์–ด๋‚ธ ํ™•๋ฅ ๊ฐ’๊ณผ ์ •๋‹ต๊ฐ’์— ๋Œ€ํ•œ loss

      ⇒ smooth L1์€ ๋น ๋ฅธ ์†๋„๋กœ loss๋ฅผ 0์œผ๋กœ ์ˆ˜๋ ดํ•œ๋‹ค๋Š” ํŠน์ง•์„ ์ง€๋‹˜ (์†๋„ ๋น ๋ฅด๊ฒŒ)

    2. ** localization loss๋Š” x,y,w,h์— ๋Œ€ํ•œ ์˜ˆ์ธก๊ฐ’๊ณผ groundtruth(์‹ค์ œ ์ •๋‹ต๊ฐ’)์˜ ์กฐ์ ˆ์„ ํ†ตํ•ด ๊ณ„์‚ฐํ•ด์„œ smooth L1 ํ•จ์ˆ˜๋ฅผ ํ†ต๊ณผํ•œ ๊ฐ’์ด๋ผ๊ณ .

→ ์ด ์นœ๊ตฌ์˜ ๋ฌธ์ œ์ :

R-CNN๊ณผ SPPNet๋ณด๋‹ค๋Š” ์„ฑ๋Šฅ์ด ์ข‹์œผ๋‚˜, test ๊ฒฐ๊ณผ๋ฅผ ๋ดค์„ ๋•Œ, region proposal ๊ณผ selective search๋ฅผ ์ง„ํ–‰ํ•  ๋•Œ ์‹œ๊ฐ„ ๋น„์ค‘์ด ํผ (์‹œ๊ฐ„ ์˜ค๋ž˜ ๊ฑธ๋ฆผ)

 

728x90
๋ฐ˜์‘ํ˜•

'Deep Learning > [๋…ผ๋ฌธ] Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Faster R-CNN  (0) 2023.07.06
YOLO: You Only Look Once: Unified, Real-Time Object Detection  (1) 2023.07.06
Transformer  (0) 2023.07.06
Inception V2/3  (0) 2023.07.06
ELMO  (0) 2023.07.06