๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

Inception V2/3

by ์ œ๋ฃฝ 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•

 

 

 

1. Intro
  • CNN์ด ๋ฐœ์ „ํ•˜๋ฉด์„œ ๋ชจ๋ธ ํฌ๊ธฐ๋‚˜ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ๋•Œ๋ฌธ์— ํ•œ๊ณ„ ๋ฐœ์ƒ.
  • ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ์‹์„ ์†Œ๊ฐœํ•จ
  • ์•„๋ž˜ ๋ฐฉ๋ฒ•์œผ๋กœ ILSVRC 2012 daset์œผ๋กœ top1 error๊ฐ€ 17.2%, top5 error๊ฐ€ 3.58%๋ฅผ ๋‹ฌ์„ฑ

 

  • VGGNet์€ ์„ฑ๋Šฅ์€ ์ข‹์ง€๋งŒ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜๊ฐ€ ๋งŽ์•„์„œ ๋น„์šฉ ๋งŽ์ด ๋ฐœ์ƒ
  • Inception์€ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐœ์ˆ˜ ์ค„์ด๊ณ  ์„ฑ๋Šฅ ์ข‹๋‹ค๋Š” ๊ฒฐ๋ก ์ด ๋‚˜์˜ด
  • inception์ด ๊ตฌ์กฐ๊ฐ€ ๋ณต์žกํ•ด์„œ ์˜คํžˆ๋ ค ์ตœ์ ํ™” ๋ฐฉํ•ด๊ฐ€ ๋œ๋‹ค๋Š” ๊ฒฐ๊ณผ ๋ฐœ์ƒ. ์˜คํžˆ๋ ค ํšจ์œจ์„ฑ์ด ๋–จ์–ด์ง⇒ ์ด ์นœ๊ตฌ๋Š” ๊ตฌ์กฐ๊ฐ€ ๋ณต์žกํ•ด์„œ ์ˆ˜์ •ํ•˜๊ธฐ ์–ด๋ ต + ๋‹จ์ˆœ ํ™•์žฅ์˜ ๊ฒฝ์šฐ ์˜คํžˆ๋ ค ๊ณ„์‚ฐ ์˜ค๋ž˜ ๊ฑธ๋ฆฌ๊ฒŒ ๋จ
  • ⇒ ์–ด๋–ค ์ด์œ ๋กœ ์ธํ•ด ํšจ์œจ์„ฑ์ด ์ข‹์€์ง€ ์ •ํ™•ํ•˜๊ฒŒ ์•Œ ์ˆ˜ ์—†์–ด์„œ ์ƒˆ๋กœ์šด ๊ณณ์— ์ ์šฉํ•˜๊ธฐ ์–ด๋ ต
  • ํ•ด์„œ ๋‹จ์ ๋“ค์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ ๊ฒƒ์ด ๋…ผ๋ฌธ์ž„.
2. Characters
2-1) Factorization into smaller convolutions (๋” ์ž‘์€ ํ•ฉ์„ฑ๊ณฑ์œผ๋กœ ๋ถ„ํ•ด)
  • 5x5 , 7x7 conv ⇒ 3x3 conv๋กœ ๋ถ„ํ•ดํ•˜๋ฉด ์—ฐ์‚ฐ๋Ÿ‰ ๋ฐ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ์†Œ
  • 5x5 conv์˜ ๊ฒฝ์šฐ, 25๋ฒˆ์˜ ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰.
  • ๋ฐ˜๋ฉด 3x3 conv 2๋ฒˆ์˜ ๊ฒฝ์šฐ, ์ด 18๋ฒˆ ์—ฐ์‚ฐ ์ˆ˜ํ–‰
  • ๋”ฐ๋ผ์„œ ์—ฐ์‚ฐ๋Ÿ‰์ด ํ™• ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œโ€ป ์ธ์ ‘ํ•œ unit ๊ฐ„์˜ ๊ฐ€์ค‘์น˜ ๊ณต์œ ํ•ด์„œ ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜๋ฅผ ์ค„์—ฌ์คŒ

⇒ VGG์—์„œ ์†Œ๊ฐœ๋œ ๋ฐฉ๋ฒ•

2-2) Asymmetric Convolutions (๋น„๋Œ€์นญ ํ•ฉ์„ฑ๊ณฑ ๋ถ„ํ•ด)
  • 3x3 conv๋ฅผ ๋” ๋ถ„ํ•ด ํ•  ์ˆ˜ ์žˆ์„๊นŒ? ์—์„œ ๋‚˜์˜จ ๋‹ต.
  • 2x2 conv๋กœ ๋ถ„ํ•ดํ•˜๊ธฐ
  • ํ•˜์ง€๋งŒ ์‹คํ—˜ํ•ด๋ณธ ๊ฒฐ๊ณผ, 2x2 conv ๋ณด๋‹ค nx1 ๋น„๋Œ€์นญ conv๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๊ฒƒ์ด ๋” ํšจ๊ณผ์ 
  • 3x3 conv → 1x3 conv, 3x1 conv๋กœ ๋ถ„ํ•ด
  • 7x7์„ 1x7๊ณผ 7x1๋กœ ๋ถ„ํ•ดํ•œ inception module
  • Inception v2์—์„œ๋Š” ์œ„์™€ ๊ฐ™์€ inception module ์‚ฌ์šฉ
2-3) Utility of Auxiliary Classifiers (๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ์˜ ํ™œ์šฉ)
  • ๊ธฐ์กด googlenet์—์„œ ๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ ํ™œ์šฉํ•˜๋ฉด ์‹ ๊ฒฝ๋ง ์ˆ˜๋ ด์— ๋” ํšจ๊ณผ์ ์ด๋ผ๊ณ  ์ฃผ์žฅ(๋ชจ๋ธ์ด ๊ธธ์–ด์„œ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฐœ์ƒ). but ํšจ๊ณผ ์—†๋‹ค๊ณ ..
  • ์ด ๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ์— drop out์ด๋‚˜ batch norm์ด ์žˆ์—ˆ์„ ๋•Œ, ์„ฑ๋Šฅ ํ–ฅ์ƒ ๊ฒฐ๊ณผ๊ฐ€ ๋” ๋‚˜์•˜์Œ ⇒ ๊ทธ๋ž˜์„œ ์ •๊ทœํ™” ํšจ๊ณผ์— ๋” ๊ฐ€๊นŒ์›Œ์š”! ๋ผ๊ณ  ์ฃผ์žฅํ•˜๊ฒŒ ๋จ
2-4) Efficient Grid Size Reduction (ํšจ์œจ์ ์ธ ๊ทธ๋ฆฌ๋“œ ํฌ๊ธฐ ์ถ•์†Œ)
  • pooling์˜ ๋ฌธ์ œ์ 
    • ์ผ๋ฐ˜ CNN ์‹ ๊ฒฝ๋ง์˜ ๊ฒฝ์šฐ feature map ์‚ฌ์ด์ฆˆ ๊ฐ์†Œ๋ฅผ ์œ„ํ•ด pooling ์—ฐ์‚ฐ ์ง„ํ–‰
    • representational bottlenet(pooling ํ•˜๊ฒŒ๋˜๋ฉด size ์ค„๋ฉด์„œ ์ •๋ณด๋Ÿ‰ ์ค„์–ด๋“œ๋Š” ๊ฒƒ์„ ๋งํ•จ) ํ”ผํ•˜๊ธฐ ์œ„ํ•ด ํ•„ํ„ฐ ์ˆ˜ ์ฆ๊ฐ€์‹œํ‚ด
    • pooling์„ ํ†ตํ•ด ์—ฐ์‚ฐ๋Ÿ‰์€ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ง€๋งŒ, ์ •๋ณด ์†์‹ค๋„ ๋ฐœ์ƒ
    • ์™ผ์ชฝ ์‚ฌ์ง„์˜ ๊ฒฝ์šฐ, pooling์„ ๊ฑฐ์นœ ์—ฐ์‚ฐ ( ์ ˆ๋ฐ˜์œผ๋กœ ๋จ)
    • ์˜ค๋ฅธ์ชฝ ์‚ฌ์ง„์˜ ๊ฒฝ์šฐ pooling ์•ˆ๊ฑฐ์นœ ์—ฐ์‚ฐ ⇒ ๋” ๋งŽ์€ ์—ฐ์‚ฐ๋Ÿ‰ ํ•„์š”
  • stride=2, conv layer์˜ ๋ณ‘๋ ฌ ์‚ฌ์šฉ
  • ํ‘œํ˜„๋ ฅ ๊ฐ์†Œx, ์—ฐ์‚ฐ๋Ÿ‰ ๊ฐ์†Œo
  • ๋ฌธ์ œ ํ•ด๊ฒฐ!
  • ์™ผ์ชฝ๋ณด๋ฉด stride=2
  • ์˜ค๋ฅธ์ชฝ์€ ๋ณ‘๋ ฌ ์˜ˆ์‹œ
2-5 ) Model Regularization via Label Smoothing
  • hard label์„ soft label๋กœ ์Šค๋ฌด๋”ฉ ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ
    • hard label: one-hot-encoded vector(์ •๋‹ต์€1, ๋‚˜๋จธ์ง€๋Š” 0์œผ๋กœ ๋œ ๋ฒกํ„ฐ)
    • soft label: label์„ 0~1 ์‚ฌ์ด ๊ฐ’์œผ๋กœ ๊ตฌ์„ฑ
  • ์™œ ํ•จ?: ๋ ˆ์ด๋ธ”์„ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ํ•ด์„œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ์„ ๋†’์ด๊ณ ์ž ์‚ฌ์šฉ(=์ •๋‹ต์— ๋Œ€ํ•œ ํ™•์‹ ์„ ๊ฐ์†Œ์‹œ์ผœ์„œ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ๋‚ด๊ณ ์ž)

ex) ๊ธฐ์กด label : [0,1,0,0] ⇒ [0.025, 0.925, 0.025, 0.025]๋กœ ๋ณ€๊ฒฝ

3. Inception -v2/v3
0. Inception v1 - googlenet
1. Inception v2
  • 42์ธต์˜ ์‹ ๊ฒฝ๋ง
  • ์—ฐ์‚ฐ๋Ÿ‰์€ Googlenet๋ณด๋‹ค 2.5๋ฐฐ ๋งŽ์ง€๋งŒ VGG๋ณด๋‹ค ํšจ์œจ์ 
  • ๊ฐ inception module์—์„œ conv ์—ฐ์‚ฐ์˜ ๊ฒฝ์šฐ, zero-padding ์ ์šฉ.
  • ์ด์™ธ์—๋Š” zero-padding ์ ์šฉx
  • 3์ข…๋ฅ˜์˜ inception module ์‚ฌ์šฉํ•จ
    1. 3x3 Conv ํ•„ํ„ฐ ์‚ฌ์šฉ => ์—ฐ์‚ฐ๋Ÿ‰ ์ ˆ๊ฐ
    1. n x n ์˜ ํ˜•ํƒœ์˜ Conv๋ฅผ 1xn๊ณผ nx1 Convolution์œผ๋กœ ๋ถ„ํ•ด => ์—ฐ์‚ฐ๋Ÿ‰ ์ ˆ๊ฐ
    1. inception v1์˜ ์ดˆ๋ฐ˜๋ถ€์— ์žˆ๋˜ ๋ณด์กฐ ๋ถ„๋ฅ˜๊ธฐ ์‚ญ์ œ
    1. stride 2๋ฅผ ์‚ฌ์šฉ => ์ด๋ฏธ์ง€ ์‚ฌ์ด์ฆˆ ์ถ•์†Œ, ๊ณ„์‚ฐ๋Ÿ‰ ์ ˆ๊ฐ
2. Inception v3
  • Inception-v2์—์„œ BN-auxiliary + RMSProp + Label Smoothing + Factorized 7x7 ์„ ๋‹ค ์ ์šฉํ•œ ๋ชจ๋ธ
4. Outro
  • ๋‹น์‹œ ์ตœ๊ณ ์˜ ์—๋Ÿฌ์œจ์„ ์ƒ๋‹นํžˆ ๊ฐœ์„ ์‹œ์ผฐ์œผ๋ฉฐ, 2014 ILSVRC GoogLeNet ensemble error๋Œ€๋น„ ๊ฑฐ์˜ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ž„.
  • ๋˜ํ•œ, 79*79 size ๊ฐ™์ด lower resolution์—์„œ๋„ ๋†’์€ ์„ฑ๋Šฅ์„ ์–ป์Œ.
  • ๋†’์€ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ๋‚ฎ์€ ๋น„์šฉ์„ ๋“ค์—ฌ ๋„คํŠธ์›Œํฌ๋ฅผ ์„ค๊ณ„

 

 

728x90
๋ฐ˜์‘ํ˜•

'Deep Learning > [๋…ผ๋ฌธ] Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Fast R-CNN  (0) 2023.07.06
Transformer  (0) 2023.07.06
ELMO  (0) 2023.07.06
SegNet  (0) 2023.07.06
CycleGAN  (0) 2023.07.05