๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

RetinaNet

by ์ œ๋ฃฝ 2023. 7. 5.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

1. Intro
  • Class Imbalance๋ž€→ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ๊ฐ ํด๋ž˜์Šค์˜ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋ถˆ๊ท ํ˜•ํ•œ ์ƒํ™ฉ์„ ์˜๋ฏธํ•จ.→ ์ด์ง„ ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ ํ•œ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ํด๋ž˜์Šค์— ๋น„ํ•ด ๋งค์šฐ ์ ์€ ๊ฒฝ์šฐ(class imbalance), ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒ.
  • → ์˜ˆ๋ฅผ ๋“ค์–ด, ์งˆ๋ณ‘ ์œ ๋ฌด๋ฅผ ํŒ๋‹จํ•˜๋Š” ๋ถ„๋ฅ˜ ๋ฌธ์ œ์—์„œ, ๊ฑด๊ฐ•ํ•œ ์‚ฌ๋žŒ์ด ๋Œ€๋ถ€๋ถ„์ด๊ณ , ์งˆ๋ณ‘์„ ๊ฐ€์ง„ ์‚ฌ๋žŒ์˜ ์ˆ˜๊ฐ€ ๋งค์šฐ ์ ๋‹ค๋ฉด(class imbalance), ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ.
  • → ์ฆ‰, ํ•œ ํด๋ž˜์Šค์— ์†ํ•œ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ ๋‹ค๋ฅธ ํด๋ž˜์Šค์— ์†ํ•œ ์ƒ˜ํ”Œ ์ˆ˜๋ณด๋‹ค ์›”๋“ฑํžˆ ๋งŽ๊ฑฐ๋‚˜ ์ ์€ ์ƒํ™ฉ์„ ๋งํ•จ.
  • ๋…ผ๋ฌธ์—์„œ๋Š” ๋ฐฐ๊ฒฝ์˜์—ญ(easy negative)์ด ๋Œ€๋ถ€๋ถ„์ด๋ผ ํ•™์Šต์— ๋ผ์น˜๋Š” ์˜ํ–ฅ๋ ฅ์ด ์ปค์ ธ์„œ ๋ชจ๋ธ ์„ฑ๋Šฅ์ด ํ•˜๋ฝํ•œ๋‹ค๊ณ  ๋งํ•จ

โ€ป ์ถ”๊ฐ€ ๋‚ด์šฉ

: ๊ฐ์ฒด ๊ฒ€์ถœ ๋ชจ๋ธ์—์„œ๋Š” ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š” ๋ถ€๋ถ„(=foreground)๊ณผ ๊ฐ์ฒด๊ฐ€ ์—†๋Š” ๋ถ€๋ถ„(=background)์„ ๋ถ„๋ฅ˜ํ•ด์•ผ ํ•จ.

์ด๋•Œ, foreground์™€ background์˜ ๋น„์œจ์ด ๋งค์šฐ ๋ถˆ๊ท ํ˜•ํ•˜๋‹ค๋ฉด(class imbalance), ๋ชจ๋ธ์ด background๋กœ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ฒฝํ–ฅ์ด ๊ฐ•ํ•ด์ง€๋ฉฐ, foreground๋ฅผ ์ •ํ™•ํžˆ ๊ฒ€์ถœํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Œ.


  1. Two-statge-Detector ๋‘ ๊ฐ€์ง€ ์ธก๋ฉด์—์„œ ํ•ด๊ฒฐ์ฑ… ์‚ฌ์šฉ.์ฒซ ๋ฒˆ์งธ, region proposals์„ ํ†ตํ•ด background sample์„ ๊ฑธ๋Ÿฌ์คŒ.ex) positive/negative sample์˜ ์ˆ˜๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ์œ ์ง€ํ•˜๋Š” sampling heuristic ๋ฐฉ๋ฒ• ์ ์šฉ- hard negative mining, OHEM ๋“ฑ
  2. ex) selective search, edgeboxes, deepmask, RPN ๋“ฑ
  1. One-statge-Detector region proposal ๊ณผ์ •์„ ์—†์• ๊ณ  ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ sampling ํ•˜๋Š” deance sampling ์‚ฌ์šฉ
  2. ๋”ฐ๋ผ์„œ ๋” ๋งŽ์€ ํ›„๋ณด ์˜์—ญ์ด ๋ฐœ์ƒํ•˜๊ณ , class imbalance ๋ฌธ์ œ๊ฐ€ ๋” ์‹ฌ๊ฐํ•จ
  • ๋”ฐ๋ผ์„œ One-stage-Detector์—์„œ๋„ class imbalance ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์ž! ํ•ด์„œ ๋‚˜์˜จ ๋ฐฉ๋ฒ•์ด Focal loss. ์ด๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐ (RetinaNet)
2. Idea
  • class imbalance๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ธฐ์กด์˜ cross entropy loss๋ฅผ reshapingํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆ.
  • ์ฆ‰, ์ž˜ ๋ถ„๋ฅ˜๋˜๋Š” ์• ๋“ค(well-classified examples, easy sample)์—๊ฒŒ ๋” ์ž‘์€ ๊ฐ€์ค‘์น˜(dwon-weights)๋ฅผ ์ฃผ๋Š” ๋ฐฉ๋ฒ•์ด ์žˆ์Œ.→ ์˜ˆ๋ฅผ ๋“ค๋ฉด ๋ฐฐ๊ฒฝ์ด ๋” ์ž˜ ๋ถ„๋ฅ˜๋˜๊ธฐ์— ๋” ์ž‘์€ ๊ฐ€์ค‘์น˜๋ฅผ ๋ถ€์—ฌ.

⇒ ์ด๊ฒŒ ๋ฐ”๋กœ Focal Loss

  • ๋งŽ์€ easy negative์— ์˜ํ–ฅ ๊ฐ€๋Š” ๊ฒƒ์„ ๋ง‰์Œ ( ๋ฐฐ๊ฒฝ๋“ค์—๊ฒŒ๋Š” ๊ฐ€์ค‘์น˜ ๋ถ€์—ฌ ์ ๊ฒŒ )
  • ์ž‘๊ฒŒ ๋ถ„ํฌ๋˜์–ด ์žˆ๋Š” hard example์— ์ง‘์ค‘ ( ์‹ค์ œ ๊ฐ์ฒด ์žˆ๋Š” ์นœ๊ตฌ๋“ค์—๊ฒŒ ๋” )
  • ์ตœ๊ณ ๋‹น!
3. Focal Loss
  • cross entropy (CE)๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์‹œ์ž‘ํ•จ
  • ground truth class์— ๋Œ€ํ•ด 0,1๋กœ ๋ถ„๋ฅ˜ํ•จ, label๊ณผ ๋™์ผํ•œ ๊ฒฝ์šฐ 1
  • y=1์ผ ๋•Œ, p์ด๊ณ , ๊ทธ ์™ธ์˜ ๊ฒฝ์šฐ์—” 1−p.
  • ๋”ฐ๋ผ์„œ CE(p,y)์˜ ์‹์„ ๋‹ค์‹œ ์จ์„œ, CE(pt) = −log(pt)๋ผ๊ณ  ํ‘œ๊ธฐํ•จ.
3.1 Balanced Cross Entropy
  • ์—ฌ๊ธฐ์„œ y์— ์ƒ๊ด€์—†์ด pt > 0.5 ์ด๋ฉด Confidence๊ฐ€ ๋†’์œผ๋ฏ€๋กœ Loss๊ฐ€ ํฌ๊ฒŒ ์ค„์–ด๋“œ๋Š”๋ฐ, ๋ฌธ์ œ๋Š” ์‰ฝ๊ฒŒ ๋ถ„๋ฅ˜๊ฐ€ ์ž˜ ๋ผ์„œ 0.5๋ฅผ ๋„˜๊ธฐ๊ธฐ ์‰ฌ์šด Background๋‚˜ class๋“ค์ด ๋„ˆ๋ฌด ๋งŽ์ด Loss๋ฅผ ์ค„์—ฌ๋ฒ„๋ฆฌ๊ฒŒ ๋จ.

⇒ ์ด๋ ‡๊ฒŒ ๋˜๋ฉด ์ ๊ฒŒ ์žˆ๋Š” ํด๋ž˜์Šค๊ฐ€ Loss์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์••๋„ํ•˜๊ฒŒ ํ•จ

  • ์—ฌ๊ธฐ์„œ α๋Š” ๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ ˆํ•˜๋Š” ์—ญํ• ์„ ํ•จ.
  • ์ด๋•Œ ์†์‹ค ํ•จ์ˆ˜์—์„œ ๋ชจ๋“  ํด๋ž˜์Šค๋ฅผ ๋™๋“ฑํ•˜๊ฒŒ ์ทจ๊ธ‰ํ•˜๋ฉด ๋ชจ๋ธ์ด ํ•™์Šต์„ ์ œ๋Œ€๋กœ ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•จ.
  • ๋”ฐ๋ผ์„œ, Focal Loss์—์„œ๋Š” ์ด๋Ÿฌํ•œ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์†์‹ค ํ•จ์ˆ˜์˜ ๊ฐ€์ค‘์น˜๋ฅผ ์กฐ์ ˆ.
  • α๋Š” ๊ฐ ํด๋ž˜์Šค๋ณ„๋กœ ๊ฐ€์ค‘์น˜๋ฅผ ์ง€์ •ํ•˜๋Š”๋ฐ, ์†์‹ค ํ•จ์ˆ˜์—์„œ ์ ๊ฒŒ ๋“ฑ์žฅํ•œ ํด๋ž˜์Šค(์ •๋‹ต ํด๋ž˜์Šค - positive) ์— ๋Œ€ํ•ด์„œ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋†’๊ฒŒ ์ฃผ์–ด ๋ชจ๋ธ์ด ์ด๋ฅผ ๋”์šฑ ์ž˜ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ.
  • ์ฆ‰, negative(๊ฐ์ฒด ๋ฌด) class ์˜ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ 1000๊ฐœ, positive(๊ฐ์ฒด ์œ ) class์˜ ์ƒ˜ํ”Œ ์ˆ˜๊ฐ€ 100๊ฐœ๋ผ๋ฉด ์ด๋Š” ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•์ž„.
  • ๊ทธ๋ ‡๊ธฐ์— 100๊ฐœ์— ๋Œ€ํ•ด์„œ ๊ฐ€์ค‘์น˜๋ฅผ ๋” ์ค˜์„œ ๋ถˆ๊ท ํ˜•์„ ํ•ด๊ฒฐํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ์ •์˜.

⇒ ์ฆ‰, loss์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ ์ •๋„๋งŒ ์กฐ์ •ํ•  ๋ฟ, easy/hard sample ๋“ค์— ๋Œ€ํ•œ loss ๋ฐ˜์˜์€ ํ•˜์ง€ ๋ชปํ–ˆ์Œ

⇒ Scaling Factor ๋“ฑ์žฅ

3.2 Focal Loss Definition
  • ์‰ฝ๊ฒŒ ๋ถ„๋ฅ˜๋˜๋Š” neagative(๋ฐฐ๊ฒฝ)์€ loss์˜ ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€. gradient๋ฅผ ์ง€๋ฐฐํ•จ
  • α balance๋Š” easy example๊ณผ hard example ์‚ฌ์ด๋ฅผ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์—†์Œ.
  • ์ด๋•Œ easy example๊ณผ hard example์€ positive class๋ผ๊ณ  ๋งํ•  ์ˆ˜ ์žˆ์Œ
  • ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š” class ์ค‘์—์„œ ์‰ฝ๊ฒŒ ๊ฒ€์ถœํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜ ์•„๋‹˜ ๊ฒ€์ถœํ•˜๊ธฐ๊ฐ€ ์–ด๋ ค์šด ์˜ˆ์‹œ๋ฅผ ๋งํ•จ.
  • ์‰ฌ์šด ์˜ˆ์ œ์™€ ์–ด๋ ค์šด ์˜ˆ์ œ ์‚ฌ์ด์—๋Š” ์†์‹ค ๊ฐ’์˜ ์ฐจ์ด๊ฐ€ ํฌ๊ฒŒ ๋‚˜์ง€ ์•Š์•„์„œ, Focal Loss์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๊ฐ€์ค‘์น˜ ์กฐ์ ˆ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ํ•ด๊ฒฐํ•  ์ˆ˜ ์—†์Œ⇒ ์ฆ‰, easy sample ์— ๋Œ€ํ•ด ๊ฐ€์ค‘์น˜๋ฅผ ๋‚ฎ์ถฐ์„œ hard example์— ๋” ์ง‘์ค‘ํ•˜๋Š” ์†์‹คํ•จ์ˆ˜๋ฅผ ์žฌ์„ค์ •ํ•˜๋Š” ๊ฒƒ์„ ์ œ์•ˆํ•จ
  • CE์— (1−pt)^γ๋ฅผ ๊ณฑํ•ด์คŒ

(1) pt ์™€ modulating factor์™€์˜ ๊ด€๊ณ„

  1. pt(๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ํ™•๋ฅ ๊ฐ’) ์ด ์ž‘์„ ๊ฒฝ์šฐ, FL์€ 1์— ๊ฐ€๊นŒ์›Œ์ง€๋ฉฐ, loss์— ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š์Œ
  1. pt(๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ํ™•๋ฅ )์ด 1์— ๊ฐ€๊นŒ์›Œ์งˆ ๊ฒฝ์šฐ, FL์€ 0์œผ๋กœ ์ˆ˜๋ ดํ•จ ( ์ด๋ฏธ ์–˜๋Š” ์ข‹์€ ๋ชจ๋ธ )⇒ Focal Loss์—์„œ ๊ฐ’์ด 0์œผ๋กœ ์ˆ˜๋ ดํ• ์ˆ˜๋ก easy example์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋Š” ๋‚ฎ๊ฒŒ, hard example์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋Š” ๋†’๊ฒŒ ์„ค์ •๋จ⇒ ์ด์— ๋ฐ˜ํ•ด Focal Loss๋Š” ๊ฐ’์ด 0์œผ๋กœ ์ˆ˜๋ ดํ• ์ˆ˜๋ก easy example์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๊ฐ์†Œ์‹œ์ผœ์„œ ๋ชจ๋ธ์ด hard example์— ๋”์šฑ ์ง‘์ค‘์ ์œผ๋กœ ํ•™์Šตํ•˜๋„๋ก ์œ ๋„ํ•จ.
  2. ⇒ Focal Loss๋Š” ๊ธฐ์กด์˜ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค(Cross Entropy Loss)์— ๋Œ€ํ•œ ๋ณด์™„์ ์ธ ์—ญํ• ์„ ํ•˜๋Š”๋ฐ, ๊ธฐ์กด์˜ ์†์‹ค ํ•จ์ˆ˜๋Š” easy example์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜์™€ hard example์— ๋Œ€ํ•œ ๊ฐ€์ค‘์น˜๋ฅผ ๋ชจ๋‘ ๋™์ผํ•˜๊ฒŒ ์ ์šฉ.

(2) focusing parameter γ์˜ ์—ญํ• 

  • ํŒŒ๋ผ๋ฏธํ„ฐ γ์€ easy example์˜ ๊ฐ€์ค‘์น˜๊ฐ€ ์ž‘์•„์ง€๋Š” ๋น„์œจ์„ ๋” ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ์กฐ์ •ํ•ด์คŒ.
  • γ๊ฐ€ 0์ผ ๋•Œ, FL์€ CE์™€ ๋™์ผํ•˜๋ฉด์„œ γ์ด ์ฆ๊ฐ€ํ•จ์— ๋”ฐ๋ผ Scaling factor์˜ ์˜ํ–ฅ์ด ์ปค์ง.
  • FL๋Š” easy example์˜ ๊ฐ€์ค‘์น˜(๊ธฐ์—ฌ๋„)๋ฅผ ์ค„์ด๊ณ , example์ด ์ž‘์€ loss๋ฅผ ๋ฐ›๋Š” ๋ฒ”์œ„๋ฅผ ํ™•์žฅ์‹œํ‚ค๋Š” ๊ธฐ๋Šฅ์„ ํ•จ. ( ์ฆ‰, pt๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๋” ์ž‘์•„์ง€๋Š” loss๋ฅผ ๊ฐ–๋Š”๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ ex) 100๋ฐฐ ⇒ 1000๋ฐฐ ์™€ ๊ฐ™์ด loss๋Š” ์ž‘์•„์ง€๋˜ ๋ฒ”์œ„๋Š” ํ™•์žฅ ์‹œํ‚ด์˜ ์˜๋ฏธ๋กœ ํŒŒ์•…)
  • ์˜ˆ๋ฅผ ๋“ค์–ด γ=2, pt=0.9์ผ ๋•Œ, CE์— ๋น„ํ•ด 100๋ฐฐ ์ ์€ loss๋ฅผ ๊ฐ€์ง€๋ฉฐ pt=0.968์ผ ๋•Œ๋Š” 1000๋ฐฐ ์ ์€ loss๋ฅผ ๊ฐ€์ง
  • ์ด๋Š” ์ž˜๋ชป ๋ถ„๋ฅ˜๋œ example์„ ์ˆ˜์ •ํ•˜๋Š” ์ž‘์—…์˜ ์ค‘์š”๋„๋ฅผ ์ƒ์Šน์‹œํ‚ด์„ ์˜๋ฏธ. (hard example์— ๊ฐ€์ค‘์น˜๋ฅผ ๋” ์ฃผ๊ฒ ๋‹ค)
  • γ=2์ผ ๋•Œ, ๊ฐ€์žฅ ํšจ๊ณผ์ .
  • loss layer์˜ ๊ตฌํ˜„์€ p ๋ฅผ ๊ณ„์‚ฐํ•˜๊ธฐ ์œ„ํ•œ sigmoid ์—ฐ์‚ฐ๊ณผ loss ๊ณ„์‚ฐ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ ธ์˜จ๋‹ค๋Š” ์ ์— ์ฃผ๋ชฉ
3.3. Class Imbalance and Model Initialization
  • ๊ธฐ์กด binary classification model์€ label์ด 0 ๋˜๋Š” 1์ผ ํ™•๋ฅ ์ด ๊ฐ™๋„๋ก ์ดˆ๊ธฐํ™”๋จ
  • ์ฆ‰, ์ด๋ ‡๊ฒŒ ์ดˆ๊ธฐํ™”๋ฅผ ํ•˜๊ณ , ๋ถˆ๊ท ํ˜• ํ˜„์ƒ์ด ๋‚˜ํƒ€๋‚˜๋ฉด ๋” ๋งŽ์€ class๊ฐ€ total loss์—์„œ ๋” ๋งŽ์€ ๋น„์ค‘์„ ์ฐจ์ง€ํ•˜๊ฒŒ ๋˜๋ฏ€๋กœ ์ดˆ๊ธฐ๋ถ€ํ„ฐ ํ•™์Šต์ด ๋ถˆ์•ˆ์ •ํ•ด์ง (์• ์ดˆ์— ๋ฐฐ๊ฒฝ๊ณผ ๊ด€๋ จ๋œ box๊ฐ€ ๋” ๋งŽ์ด ์ƒ์„ฑ๋˜์–ด์žˆ์œผ๋ฏ€๋กœ ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜•์ž„)
  • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด prior ์ด๋ผ๋Š” ๊ฐœ๋…์˜ pํ•ญ์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋จ
  • ์ฆ‰, FPN์˜ ๋ ˆ์ด์–ด๋Š” ์ด 5๊ฐœ์ด๊ณ , Prior ํ•ญ์€ ์ด๋Ÿฌํ•œ ๊ฐ ๋ ˆ์ด์–ด์—์„œ ์‚ฌ์šฉ๋˜์–ด, ์ด์ „ ๋ ˆ์ด์–ด์—์„œ ๊ณ„์‚ฐ๋œ ์ถœ๋ ฅ๊ฐ’๋“ค์„ ์ด์šฉํ•˜์—ฌ ํ˜„์žฌ ๋ ˆ์ด์–ด์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์„ ์ดˆ๊ธฐํ™”ํ•จ.

  • pํ•ญ์ด๋ผ๋Š” ๊ฒƒ์€ penalty์˜ ๊ฐœ๋…๊ณผ ๊ฐ™์€ ๊ฒƒ์„ ๋งํ•จ ๋‹ค์‹œ ๋งํ•ด, ๋ฌผ์ฒด๊ฐ€ ์—†๋Š” ๋ฐฐ๊ฒฝ anchor box์— ๋” ๋งŽ์€ ํŒจ๋„ํ‹ฐ๋ฅผ ๋ถ€๊ณผํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ. → ๋งค์นญ๋˜์ง€ ์•Š๋Š” anchor box์˜ ๋น„์œจ์„ ๋‚ฎ์ถค
  • pํ•ญ์„ ๋‚ฎ๊ฒŒ ์„ค์ •๋ ์ˆ˜๋ก ๊ฐ์ฒด์™€ ๋งค์นญ๋˜์ง€ ์•Š๋Š” anchor box๋“ค์˜ ๋น„์œจ์„ ๋‚ฎ์ถ”๋Š” ์—ญํ• ์„ ํ•จ.
  • ์ด๋ฅผ ํ†ตํ•ด, ๋ชจ๋ธ์ด ๋ฐฐ๊ฒฝ๊ณผ ๊ฐ™์€ ํด๋ž˜์Šค์— ๋Œ€ํ•ด์„œ๋Š” ๋œ ๋ฏผ๊ฐํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๊ณ , ๊ฐ์ฒด๊ฐ€ ์žˆ๋Š” ํด๋ž˜์Šค์— ๋Œ€ํ•ด์„œ๋Š” ๋” ๋ฏผ๊ฐํ•˜๊ฒŒ ์˜ˆ์ธกํ•˜๋„๋ก ์œ ๋„
3.4. Class Imbalance and Two-stage Detectors
  • Two-stage Detectors ๋Š” ๋ณดํ†ต cross entropy loss ๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์•ŒํŒŒ-๋ฐธ๋Ÿฐ์‹ฑ์€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ.
  • ๋Œ€์‹  ํด๋ž˜์Šค ๋ถˆ๊ท ํ˜• ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด 2๊ฐ€์ง€ ๋ฉ”์นด๋‹ˆ์ฆ˜์„ ์‚ฌ์šฉํ•จ.

(1) two-stage cascade

1๋ฒˆ์งธ cascade stage ๋Š” object proposal mechanism ์œผ๋กœ ๊ฑฐ์˜ ๋ฌดํ•œ๋Œ€์˜ ๊ฐ€๋Šฅํ•œ ์˜ค๋ธŒ์ ํŠธ์˜ ์œ„์น˜๋ฅผ 1~2์ฒœ๊ฐœ ์ •๋„๋กœ ์ค„์ž„

์„ ํƒํ•  proposals ์€ ๋ฌผ๋ก  ๋žœ๋คํ•œ ๊ฒƒ์ด ์•„๋‹ˆ๋ผ true object location ์— ๊ฐ€๊นŒ์šด ๊ฒƒ

๊ทธ๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ ๋Œ€๋ถ€๋ถ„์„ ์ฐจ์ง€ํ•˜๋Š” easy negatives ๋ฅผ ์ œ์™ธํ•  ์ˆ˜ ์žˆ์Œ.

(2) biased minibatch sampling

2 ๋‹จ๊ณ„ ํ•™์Šต์—์„œ๋Š”, biased sampling ์œผ๋กœ minibatch ๋ฅผ ๊ตฌ์„ฑ.

positive/negative ๋น„์œจ์„ ์•ŒํŒŒ-๋ฐธ๋Ÿฐ์‹ฑ factor ์™€ ๋น„์Šทํ•˜๊ฒŒ ์ƒ˜ํ”Œ๋ง.

ex)

ํด๋ž˜์Šค A์™€ ํด๋ž˜์Šค B๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹์ด ์žˆ์„ ๋•Œ, ํด๋ž˜์Šค A์— ๋Œ€ํ•œ ์ƒ˜ํ”Œ๋ง ๋น„์œจ์„ 0.5, ํด๋ž˜์Šค B์— ๋Œ€ํ•œ ์ƒ˜ํ”Œ๋ง ๋น„์œจ์„ 0.1๋กœ ์„ค์ •ํ•œ๋‹ค๋ฉด, ํด๋ž˜์Šค A์˜ ์ƒ˜ํ”Œ์€ ํด๋ž˜์Šค B์˜ ์ƒ˜ํ”Œ๋ณด๋‹ค ๋” ๋งŽ์ด ์„ ํƒ๋  ๊ฒƒ์ž„.

์ด๋Ÿฌํ•œ ๋ฐฉ์‹์œผ๋กœ ๊ตฌ์„ฑ๋œ minibatch๋กœ ํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๋ฉด, ๋ถˆ๊ท ํ˜•ํ•œ ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ๋ชจ๋ธ์ด ์ ์€ ํด๋ž˜์Šค๋ฅผ ์ž˜ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Œ.

4. RetinaNet Detector
  • ํ•˜๋‚˜์˜ backbone network + ๋‘๊ฐ€์ง€(class,box)subnetwork ์กด์žฌโ€ป ์—ฌ๊ธฐ์„œ backbone์ด๋ž€?
  • → ์ „์ฒด input image์— ๋Œ€ํ•œ convolutional feature map์˜ ์—ฐ์‚ฐ์„ ํ•˜๋Š” ๊ธฐ๋Šฅ์„ ํ•จ. ๊ทธ๋ƒฅ convolutional network๋ฅผ ๋งํ•˜๋Š”๊ฑฐ๋กœ ๋ณด๋ฉด๋จ.
  • first subnet์€ backbone์˜ output(anchor box)์— ๋Œ€ํ•œ object๋ฅผ classificationํ•˜๋Š” ์—ญํ• 
  • second subnet์€ anchor box์™€ GT(Ground Truth) Box๋ฅผ ๋น„๊ตํ•˜๋Š” regression์˜ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰
4.1 FPN(Feature Pyramid Network Backbone):
  • FPN์— ๋Œ€ํ•œ ์„ค๋ช… ์ฐธ์กฐ

  • ResNet๊ตฌ์กฐ์— FPN์„ backbone์œผ๋กœ ์‚ฌ์šฉ
  • FPN์€ top-down pathway์™€ lateral connection์„ ์‚ฌ์šฉํ•˜์—ฌ multi-scale feature pyramid๋ฅผ ์ƒ์„ฑํ•จ
  • pyramid์˜ channels์˜ ์ˆ˜๋Š” 256์œผ๋กœ ์„ค์ •
4.2 Anchors:
  • three aspect ratios 1:2;1:1,2:1
  • IoU threshold of 0.5
    • [0, 0.4)์˜ IoU๋Š” background๋ผ๊ณ  ํŒ๋‹จ
    • [0.4, 0.5]์˜ IoU๋ฅผ ๊ฐ€์ง€๋Š” Anchors Box๋Š” ํ•™์Šต ๋„์ค‘์— ๋ฌด์‹œ๋จ
4.3 Classification Subnet:
  • Anchor box ๋‚ด์— object๊ฐ€ ์กด์žฌํ•  ํ™•๋ฅ ์„ predict
  • subnet์€ FPN level ์˜†์— ๋ถ™์–ด์žˆ๋Š” ์ž‘์€ FCN(Fully Convolution Network).
  • subnet์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” pyramid level์—์„œ ๊ณต์œ ๋จ
  • 3x3 ์ปจ๋ณผ๋ฃจ์…˜์„ 256 ๊ฐœ์˜ ํ•„ํ„ฐ๋กœ ์ปจ๋ณผ๋ฃจ์…˜ํ•˜๊ณ  ReLU ํ™œ์„ฑํ•จ์ˆ˜๋ฅผ ์ ์šฉ. ์ด๊ฒƒ์„ 4ํšŒ๋ฐ˜๋ณต (x 4)
  • ๋งˆ์ง€๋ง‰์œผ๋กœ KA(K*A) ๊ฐœ์˜ ํ•„ํ„ฐ๋กœ conv ์‹คํ–‰
  • ์ด์ง„ ๋ถ„๋ฅ˜์ด๋ฏ€๋กœ ๋งˆ์ง€๋ง‰์—๋Š” sigmoid activations ์ ์šฉ๋จ
  • C=256, A=9๋ฅผ ์‚ฌ์šฉํ•จ
  • box regression subnet ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ณต์œ  ์•ˆํ•จ
4.4 Box Regression Subnet:
  • Classification Subnet๊ณผ ๊ฐ™์ด ๊ฐ FPN level์— ์ž‘์€ FCN์„ ๋ถ™์ž„
  • Anchor box ์˜ offset 4๊ฐœ (center x, center y, width, height)๋ฅผ GT๋ฐ•์Šค์™€ ์œ ์‚ฌํ•˜๊ฒŒ regression ์ง„ํ–‰
  • class-agnostic bounding box regressor ์‚ฌ์šฉ
    • ์ฆ‰, ์ž…๋ ฅ ์ด๋ฏธ์ง€์—์„œ ๋ฐœ๊ฒฌ๋œ ๋ชจ๋“  ๊ฐ์ฒด์— ๋Œ€ํ•ด ๋™์ผํ•œ ๊ฒฝ๊ณ„ ์ƒ์ž๋กœ ์˜ˆ์ธกํ•จ
    • ์—ฌ๋Ÿฌ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ, ํ•˜๋‚˜์˜ ์ƒ์ž๋กœ ๋ฌถ์–ด์„œ ์˜ˆ์ธกํ•จ
    • ex) ์ž๋™์ฐจ์™€ ์‚ฌ๋žŒ์ด๋ผ๋Š” ๋‘ ๊ฐœ์˜ ๊ฐ์ฒด๊ฐ€ ์žˆ์„ ๊ฒฝ์šฐ, ์ผ๋ฐ˜์ ์œผ๋กœ๋Š” ์ž๋™์ฐจ์™€ ์‚ฌ๋žŒ์— ๋Œ€ํ•œ ๊ฒฝ๊ณ„ ์ƒ์ž๋ฅผ ์˜ˆ์ธกํ•œ๋‹ค๋ฉด, ์ด ์นœ๊ตฌ๋Š” ๋‘ ๊ฐœ์— ๋Œ€ํ•ด์„œ ํ•œ๊บผ๋ฒˆ์— ๊ฒฝ๊ณ„ ์ƒ์ž 1๊ฐœ๋งŒ์„ ๋งŒ๋“ฌ⇒ ์ด๋ ‡๊ฒŒ ๋˜๋ฉด ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์ ์–ด์ง€๋Š”๋ฐ, ์ ์–ด์ง์—๋„ ๊ธฐ์กด์— ์“ฐ๋˜ regressor๊ณผ ๋™์ผํ•œ ํšจ๊ณผ๋ฅผ ๋ƒ„!
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ class subnet ํŒŒ๋ผ๋ฏธํ„ฐ์™€ ๋งค๊ฐœ๋ณ€์ˆ˜ ๊ณต์œ  ์•ˆํ•จ
5. Inference and Training

• FPN level์—์„œ box prediction ์ ์ˆ˜๊ฐ€ ๋†’์€ 1000๊ฐœ์˜ box๋งŒ result์— ์‚ฌ์šฉํ•˜๊ณ  ์ตœ์ข… detection์— NMS๋ฅผ ์ ์šฉํ•ด ์†๋„ ํ–ฅ์ƒ์„ ์‹œํ‚ด

5.1 Focal Loss:
  • class sub์˜ output์œผ๋กœ Focal loss๋ฅผ ์‚ฌ์šฉ
  • gamma = 2, alpha = 0.25์ผ ๋•Œ ๊ฐ€์žฅ ์ข‹์€ ๊ฒฐ๊ณผ๊ฐ€ ๋‚˜์˜ด
5.2 Initialization:
  • FPN์˜ initialization์€ FPN ๋…ผ๋ฌธ๊ณผ ๊ฐ™์€ ๊ฐ’์œผ๋กœ ์ง„ํ–‰
  • RetinaNet subnet์—์„œ ๋งˆ์ง€๋ง‰ layer๋ฅผ ์ œ์™ธํ•œ ๋ชจ๋“  conv layer๋Š” bias=0, gaussian weight fill = 0.01๋กœ ์ดˆ๊ธฐํ™”
  • classification subnet์˜ ๋งˆ์ง€๋ง‰ conv layer๋Š” bias=-log((1-pi)/pi)๋กœ ์ดˆ๊ธฐํ™”
5.3 Optimization:
  • SGD ์‚ฌ์šฉ
  • LR = 0.01, 90000๋ฒˆ ํ•™์Šต์„ ์ง„ํ–‰.
  • 60000๋ฒˆ ์ผ๋•Œ LR 10 ๋‚˜๋ˆ„๊ณ , 80000๋ฒˆ ์ผ๋•Œ LR 10 ๋‚˜
  • weight decay = 0.0001, momentum = 0.9
  • class predict์€ focal loss, box regression์€ smooth L1 loss ์‚ฌ์šฉ
6. Experiments
7. Conclusion
  • one-stage์˜ ๊ฐ€์žฅ ์ฃผ์š”ํ•œ obstacle์ด์—ˆ๋˜ class imbalance๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด Focal Loss๋ฅผ ์ œ์•ˆํ•˜์˜€๊ณ , ์„ฑ๋Šฅ์„ ์ž…์ฆ.
  • SOTA๋ฅผ ๋‹ฌ์„ฑ
8. Reference

 

728x90
๋ฐ˜์‘ํ˜•

'Deep Learning > [๋…ผ๋ฌธ] Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

U-Net  (0) 2023.07.05
Bert  (0) 2023.07.05
VIT [AN IMAGE IS WORTH 16X16 WORDS: TRANSFORMERS FOR IMAGE RECOGNITION AT SCALE]  (0) 2023.07.05
GPT-1  (0) 2023.07.05
DeepLab V2: Semantic Image Segmentation with Convolutional Nets, Atrous Convolution and Fully Connected CRFs  (0) 2023.07.05