๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

EfficientNet

by ์ œ๋ฃฝ 2023. 7. 7.
728x90
๋ฐ˜์‘ํ˜•

 

 

1. Intro
  • ์ด์ „๊นŒ์ง€๋Š” depth, width, size ์ค‘ ํ•˜๋‚˜๋งŒ scale ํ•˜๋Š” ๊ฒƒ์„ ์ฃผ๋กœ ๋‹ค๋ฃธ⇒ ๋” ๋‚˜์€ ์ •ํ™•๋„ ํ˜น์€ ํšจ์œจ์„ฑ์œผ๋กœ convnet์„ scale up ํ•˜๋Š” ๋ฐฉ๋ฒ•์ด ์—†์„๊นŒ?์—์„œ ๋‚˜์˜จ ์นœ๊ตฌ
  • depth, width, size ์„ธ ๊ฐ€์ง€ ๊ท ํ˜•์„ ์ž˜ ๋งž์ถ”๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ
  • ์ƒ์ˆ˜ ๋น„์œจ๋กœ ์„ธ ๊ฐ€์ง€๋ฅผ ๊ฐ๊ฐ scalingํ•˜๋ฉด ๋œ๋‹ค๋Š” ๊ฒƒ์„ ์•Œ๊ฒŒ ๋จ
2. Model Scaling
  • Convnet์˜ ์ •ํ™•๋„๋ฅผ ๋†’์ผ ๋•Œ ์ž˜ ์งœ์—ฌ์ง„ ๋ชจ๋ธ์„ ์ฐพ๋Š” ๋ฐฉ๋ฒ•๋„ ์žˆ์ง€๋งŒ, ๊ธฐ์กด ๋ชจ๋ธ์„ ๋ฐ”ํƒ•์œผ๋กœ ๋ณต์žก๋„๋ฅผ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•๋„ ๋งŽ์ด ์‚ฌ์šฉ
  • depth scaling: layer์˜ ๊ฐœ์ˆ˜๋ฅผ ๋†’์—ฌ์คŒ ex) ResNet
  • width scaling: channel(ํ•„ํ„ฐ) ๊ฐœ์ˆ˜๋ฅผ ๋†’์—ฌ์คŒ ex) MobileNet, ShuffleNet
  • resolution scaling: input image์˜ ํ•ด์ƒ๋„๋ฅผ ๋†’์—ฌ์คŒ
  • 3๊ฐ€์ง€ scaling ๊ธฐ๋ฒ•์— ๋Œ€ํ•ด ๊ฐ scaling ๊ธฐ๋ฒ•๋งˆ๋‹ค ๋‚˜๋จธ์ง€๋Š” ๊ณ ์ •ํ•˜๊ณ , 1๊ฐœ์˜ scaling๋งŒ ํ‚ค์›Œ๊ฐ€๋ฉด์„œ ์ •ํ™•๋„ ๋ณ€ํ™” ์ธก์ •⇒ width์™€ depth๋Š” ๋น„๊ต์  ์ด๋ฅธ ์‹œ์ ์— ์ •ํ™•๋„๊ฐ€ ์™„๋งŒํ•ด์ง? (ํฌํ™”์˜ ์˜๋ฏธ๋ฅผ ์ž˜ ๋ชจ๋ฅด๊ฒ ์Œ)
  • ⇒ resolution์€ ํ‚ค์šธ์ˆ˜๋ก ์ •ํ™•๋„๊ฐ€ ์ž˜ ์˜ค๋ฆ„
  • ๊ฐ™์€ FLOPS์ธ๋ฐ๋„ ๋ถˆ๊ตฌํ•˜๊ณ  1.5%๊นŒ์ง€ ์ •ํ™•๋„ ์ฐจ์ด๊ฐ€ ๋‚จ
  • ์ดˆ๋ก์ƒ‰๊ณผ ๋…ธ๋ž€์ƒ‰ ๋น„๊ตํ•˜๋ฉด, depth๋ฅผ ํ‚ค์šฐ๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” resolution์„ ํ‚ค์šฐ๋Š”๊ฒŒ ๋” ์ข‹์•˜๊ณ , ๋นจ๊ฐ„์ƒ‰ ์„ ์„ ๋ณด๋ฉด 3๊ฐ€์ง€ scaling์„ ๋™์‹œ์— ํ‚ค์šฐ๋Š” ๊ฒƒ์ด ๋” ์ข‹์•˜์Œ
3. Compound Scaling
  • 3๊ฐ€์ง€์˜ scaling์„ ์กฐ์ ˆํ•˜๋Š” ๊ฒƒ๋„ ์ค‘์š”ํ•˜์ง€๋งŒ ๋ชจ๋ธ ์ž์ฒด๋„ ์ข‹์•„์•ผ ํ•จ
  • ์ด ๋…ผ๋ฌธ์—์„œ๋Š” MnasNet๊ณผ ๊ฑฐ์˜ ๋™์ผํ•œ AutoML์„ ํ†ตํ•ด ๋ชจ๋ธ์„ ํƒ์ƒ‰ ⇒ ์—ฌ๊ธฐ์„œ ์ฐพ์€ ๋ชจ๋ธ์ด EfficientNet-B0
  • compund scaling ์ ์šฉ(3๊ฐ€์ง€ scaling)
  • depth, width, resolution ์€ ๊ฐ๊ฐ ์•ŒํŒŒ, ๋ฒ ํƒ€, ๊ฐ๋งˆ๋กœ ๋‚˜ํƒ€๋‚ด๋ฉฐ ๊ฐ๊ฐ์˜ ๋น„์œจ์€ ๋…ธ๋ž€์ƒ‰์œผ๋กœ ๊ฐ•์กฐํ•œ ์กฐ๊ฑด์„ ๋งŒ์กฑ์‹œ์ผœ์•ผ ํ•จ
  • ์ด๋•Œ ์ œ๊ณฑ์ด ๋“ค์–ด๊ฐ„ ์ด์œ :⇒ depth๋Š” 2๋ฐฐ ํ‚ค์›Œ์ฃผ๋ฉด FLOPS๋„ ๋น„๋ก€ํ•ด์„œ 2๋ฐฐ ์ฆ๊ฐ€ํ•˜์ง€๋งŒ, width์™€ resolution์€ ๊ฐ€๋กœ์™€ ์„ธ๋กœ๊ฐ€ ๊ฐ๊ฐ ๊ณฑํ•ด์ง€๊ธฐ ๋•Œ๋ฌธ์— ์ œ๊ณฑ ๋ฐฐ ์ฆ๊ฐ€ํ•จ
  • alpha x beta^2 x gamma^2 =2⇒ 2์™€ ์œ ์‚ฌํ•œ ๊ฐ’์ด์–ด์•ผ
  • EfficientNet์˜ ์•ŒํŒŒ, ๋ฒ ํƒ€, ๊ฐ๋งˆ ๊ฐ’์€ ๊ฐ„๋‹จํ•œ grid search๋ฅผ ํ†ตํ•ด ๊ตฌํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์•ˆํ•˜๊ณ  ์žˆ๊ณ , ์ฒ˜์Œ ๋‹จ๊ณ„์—์„œ๋Š” ํŒŒ์ด๋ฅผ 1๋กœ ๊ณ ์ •ํ•œ ๋’ค, ํƒ€๊ฒŸ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์ด๋Š” ์•ŒํŒŒ, ๋ฒ ํƒ€, ๊ฐ๋งˆ ๊ฐ’์„ ์ฐพ์•„๋ƒ„
  • ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์•ŒํŒŒ ๊ฐ’์€ 1.2, ๋ฒ ํƒ€ ๊ฐ’์€ 1.1, ๊ฐ๋งˆ ๊ฐ’์€ 1.15๋ฅผ ์‚ฌ์šฉํ–ˆ, ๋ฐฉ๊ธˆ ๊ตฌํ•œ 3๊ฐœ์˜ scaling factor๋Š” ๊ณ ์ •ํ•œ ๋’ค ํŒŒ์ด๋ฅผ ํ‚ค์›Œ์ฃผ๋ฉฐ ๋ชจ๋ธ์˜ ์‚ฌ์ด์ฆˆ๋ฅผ ํ‚ค์›Œ์ฃผ๊ณ  ์žˆ์Œ
4. Experiments
  • single dimension์— ๋Œ€ํ•œ scaling method๋ณด๋‹ค compound scaling method๊ฐ€ ์„ฑ๋Šฅ์ด ๋” ์ข‹์•˜์Œ
  • ๋” ์ ์€ ํŒŒ๋ผ๋ฏธํ„ฐ์™€ FLOP์œผ๋กœ๋„ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์Œ
5. Outro
  1. Depth(d)
  • ConvNet์€ ๊นŠ์ด๊ฐ€ ๊นŠ์„์ˆ˜๋ก, ๋” ํ’๋ถ€ํ•˜๊ณ  ๋ณต์žกํ•œ feature๋ฅผ ํ•™์Šตํ•˜๊ณ , ๋‹ค๋ฅธ ๋ฌธ์ œ์—๋„ ์ผ๋ฐ˜ํ™”๊ฐ€ ์ž˜ ๋จ
  • ํ•˜์ง€๋งŒ vanishing gradient ๋ฌธ์ œ๋กœ ํ•™์Šต์‹œํ‚ค๊ธฐ๊ฐ€ ์–ด๋ ค์›€
  • skip connection, batch normalization ๋“ฑ์˜ ํ•ด๊ฒฐ ๋ฐฉ๋ฒ•์ด ์žˆ์ง€๋งŒ, ์•„์ฃผ ๊นŠ์€ ๊ฒฝ์šฐ์— ํšจ๊ณผ๊ฐ€ ์—†์Œ
  • ex ) ResNet-1000, ResNet-101์˜ ๊ฒฝ์šฐ ๋น„์Šทํ•œ accuracy๋ฅผ ๋ณด์ž„
  1. Width(w)
  • ์ž‘์€ size์˜ ๋ชจ๋ธ์—์„œ ์ฃผ๋กœ ์“ฐ์ž„
  • wider network๋Š” fine-grained feature๋ฅผ ๋ฝ‘์•„๋‚ผ ์ˆ˜ ์žˆ๊ณ  ํ•™์Šต์‹œํ‚ค๊ธฐ ๋” ์‰ฌ์›€
  • ๊ทธ๋Ÿฌ๋‚˜ ์•„์ฃผ ๋„“๊ณ  ์–•์€ network์˜ ๊ฒฝ์šฐ higher level feature๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ๊ฐ€ ํž˜๋“ฆ
  1. Resolution(r)
  • ๋†’์€ resolution์˜ ์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ fine-grained feature๋ฅผ ๋ฝ‘์•„๋‚ด๊ธฐ ์‰ฌ์›€
  • ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๋„ˆ๋ฌด ๋†’์€ resolution ์˜ ๊ฒฝ์šฐ accuracy gain์ด ๋”๋ŽŒ์ง€๋Š” ๊ฑธ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

 

โ€ป ๊ฒฐ๋ก 

  • width, depth, resolution ๊ฐ„ ๊ท ํ˜•์ด model์„ scale up ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์š”์†Œ์˜€์Œ์„ ๋ณด์ž„
  • accuracy๊ฐ€ ๋” ๋†’๊ณ  ํšจ์œจ์ ์ด๋ฉด์„œ, ์‰ฝ๊ฒŒ ๋ชจ๋ธ์„ scale up ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•จ
  • ์ „์ด ํ•™์Šต์—์„œ๋„ ๋” ์ ์€ parameter์™€ FLOP์œผ๋กœ ์ž˜ ๋™์ž‘ํ•จ์„ ๋ณด์ž„
728x90
๋ฐ˜์‘ํ˜•

'Deep Learning > [๋…ผ๋ฌธ] Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

XLM: Cross-lingual Language Model Pretraining  (0) 2023.07.09
YOLOv4: Optimal Speed and Accuracy of Object Detection  (0) 2023.07.09
cGAN/Pix2Pix  (0) 2023.07.07
R-CNN  (0) 2023.07.06
GAN: Generative Adversarial Nets  (0) 2023.07.06