๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

Noisy Student: Self-training with Noisy Student improves ImageNet classification(2019)

by ์ œ๋ฃฝ 2023. 7. 14.
728x90
๋ฐ˜์‘ํ˜•

 

๋ฆฌ๋ทฐ๋Š” ์•„๋ž˜์ชฝ์— ์žˆ์Šต๋‹ˆ๋‹น ! !

๋ฒˆ์—ญ ver
0. Abstract

์šฐ๋ฆฌ๋Š” Noisy Student Training์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ๋ ˆ์ด๋ธ”์ด ํ’๋ถ€ํ•œ ๊ฒฝ์šฐ์—๋„ ์ž˜ ์ž‘๋™ํ•˜๋Š” ์ค€์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. Noisy Student Training์€ ImageNet์—์„œ 88.4%์˜ top-1 ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” 35์–ต ๊ฐœ์˜ ์•ฝํ•œ ๋ ˆ์ด๋ธ”์ด ๋ถ€์ฐฉ๋œ Instagram ์ด๋ฏธ์ง€๊ฐ€ ํ•„์š”ํ•œ ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ๋ณด๋‹ค 2.0% ๋” ๋†’์€ ์„ฑ๋Šฅ์ž…๋‹ˆ๋‹ค. ๊ฐ•๊ฑด์„ฑ ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ๋Š” ImageNet-A์˜ top-1 ์ •ํ™•๋„๋ฅผ 61.0%์—์„œ 83.7%๋กœ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ImageNet-C์˜ ํ‰๊ท  ์†์ƒ ์˜ค์ฐจ๋ฅผ 45.7์—์„œ 28.3์œผ๋กœ ์ค„์ด๊ณ , ImageNet-P์˜ ํ‰๊ท  ๋’ค์ง‘๊ธฐ ๋น„์œจ์„ 27.8์—์„œ 12.2๋กœ ์ค„์ž…๋‹ˆ๋‹ค. Noisy Student Training์€ self-training๊ณผ distillation์˜ ๊ฐœ๋…์„ ํ™•์žฅํ•˜์—ฌ ํ•™์Šต ์ค‘์— ๋™์ผํ•œ ํฌ๊ธฐ ์ด์ƒ์˜ ํ•™์ƒ ๋ชจ๋ธ๊ณผ ํ•™์Šต ์ค‘์— ํ•™์ƒ์—๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ImageNet์—์„œ๋Š” ๋จผ์ € ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋กœ EfficientNet ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๊ณ , ์ด๋ฅผ ์„ ์ƒ๋‹˜์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ 3์–ต ๊ฐœ์˜ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๊ฐ€์งœ ๋ ˆ์ด๋ธ” ์ด๋ฏธ์ง€์˜ ์กฐํ•ฉ์œผ๋กœ ๋” ํฐ EfficientNet์„ ํ•™์ƒ ๋ชจ๋ธ๋กœ ํ•™์Šต์‹œํ‚ต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ณผ์ •์„ ํ•™์ƒ์„ ์„ ์ƒ๋‹˜์œผ๋กœ ๋˜๋Œ๋ ค ๋†“๋Š” ๊ฒƒ์œผ๋กœ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค. ํ•™์ƒ์˜ ํ•™์Šต ์ค‘์—๋Š” dropout, stochastic depth, RandAugment๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•๊ณผ ๊ฐ™์€ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์ž…ํ•˜์—ฌ ํ•™์ƒ์ด ์„ ์ƒ๋‹˜๋ณด๋‹ค ๋” ์ผ๋ฐ˜ํ™”๋  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.

โ€ป ์ค€์ง€๋„ ํ•™์Šต: ์ •๋‹ต ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ž‘์€ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ 1์ฐจ (์ง€๋„)ํ•™์Šต์„ ํ•˜๊ณ , ์ •๋‹ต ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ํฐ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ 2์ฐจ ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ์„ ๋งํ•จ

โ€ป Knowledge distillation: ์ž‘์€ ๋„คํŠธ์›Œํฌ๋„ ํฐ ๋„คํŠธ์›Œํฌ์™€ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋„๋ก, ํ•™์Šต๊ณผ์ •์—์„œ ํฐ ๋„คํŠธ์›Œํฌ์˜ ์ง€์‹์„ ์ž‘์€ ๋„คํŠธ์›Œํฌ์—๊ฒŒ ์ „๋‹ฌํ•˜์—ฌ ์ž‘์€ ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ฒ ๋‹ค๋Š” ๋ชฉ์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Œ

1. Introduction

์ตœ๊ทผ ๋ช‡ ๋…„ ๋™์•ˆ ๋”ฅ๋Ÿฌ๋‹์€ ์ด๋ฏธ์ง€ ์ธ์‹ ๋ถ„์•ผ์—์„œ ๋†€๋ผ์šด ์„ฑ๊ณผ๋ฅผ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค [45, 80, 75, 30, 83]. ๊ทธ๋Ÿฌ๋‚˜ ์ตœ์ฒจ๋‹จ ๋น„์ „ ๋ชจ๋ธ๋“ค์€ ์—ฌ์ „ํžˆ ๋Œ€๋ถ€๋ถ„ ์ง€๋„ ํ•™์Šต์œผ๋กœ ํ›ˆ๋ จ๋˜๋ฉฐ, ์ด๋Š” ์ž˜ ์ž‘๋™ํ•˜๊ธฐ ์œ„ํ•ด ๋Œ€๋Ÿ‰์˜ ๋ ˆ์ด๋ธ”์ด ๋ถ€์ฐฉ๋œ ์ด๋ฏธ์ง€ ์ง‘ํ•ฉ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ๋ถ€์ฐฉ๋œ ์ด๋ฏธ์ง€๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋ฉด, ์šฐ๋ฆฌ๋Š” ๋Œ€๋Ÿ‰์˜ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ™œ์šฉํ•˜์—ฌ ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ์˜ ์ •ํ™•๋„์™€ ๊ฐ•๊ฑด์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œํ•œํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

์—ฌ๊ธฐ์—์„œ๋Š” ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ฒจ๋‹จ ImageNet ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , ์ด ์ •ํ™•๋„ ํ–ฅ์ƒ์ด ๊ฐ•๊ฑด์„ฑ(๋ถ„ํฌ ๋ฐ–์˜ ์ผ๋ฐ˜ํ™”)์— ํฐ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ํ›จ์”ฌ ๋” ํฐ ๋ฐ์ดํ„ฐ์…‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ, ์ด ๋ฐ์ดํ„ฐ์…‹์˜ ์ƒ๋‹น ๋ถ€๋ถ„์€ ImageNet ํ›ˆ๋ จ ์ง‘ํ•ฉ ๋ถ„ํฌ์— ์†ํ•˜์ง€ ์•Š๋Š” ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค (์ฆ‰, ImageNet์˜ ์–ด๋–ค ์นดํ…Œ๊ณ ๋ฆฌ์—๋„ ์†ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค). Noisy Student Training์ด๋ผ๋Š” ์ค€์ง€๋„ ํ•™์Šต ์ ‘๊ทผ ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐฉ์‹์€ ์„ธ ๊ฐ€์ง€ ์ฃผ์š” ๋‹จ๊ณ„๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Šต๋‹ˆ๋‹ค:

 

(1) ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋กœ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค

(2) ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์— ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค

(3) ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๊ฐ€์งœ ๋ ˆ์ด๋ธ” ์ด๋ฏธ์ง€์˜ ์กฐํ•ฉ์œผ๋กœ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค.

์ด๋Ÿฌํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ํ•™์ƒ์„ ์„ ์ƒ๋‹˜์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ์žฌ๋ผ๋ฒจ๋งํ•˜๊ณ  ์ƒˆ๋กœ์šด ํ•™์ƒ์„ ํ›ˆ๋ จ์‹œํ‚ค๋ฉด์„œ ๋ช‡ ๋ฒˆ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค.

Noisy Student Training์€ self-training๊ณผ distillation์„ ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์œผ๋กœ ๊ฐœ์„ ํ•ฉ๋‹ˆ๋‹ค.

  • ์ฒซ์งธ๋กœ, ํ•™์ƒ์„ ์„ ์ƒ๋‹˜๋ณด๋‹ค ํฌ๊ฑฐ๋‚˜ ์ ์–ด๋„ ๋™์ผํ•œ ํฌ๊ธฐ๋กœ ๋งŒ๋“ค์–ด ํ•™์ƒ์ด ๋” ํฐ ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋” ์ž˜ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.
  • ๋‘˜์งธ๋กœ, ํ•™์ƒ์—๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ํ•™์ƒ์ด ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๋กœ๋ถ€ํ„ฐ ๋” ์–ด๋ ต๊ฒŒ ํ•™์Šตํ•˜๋„๋ก ํ•ฉ๋‹ˆ๋‹ค. ํ•™์ƒ์—๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ๊ธฐ ์œ„ํ•ด RandAugment ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• [18], dropout [76], stochastic depth [37]์™€ ๊ฐ™์€ ์ž…๋ ฅ ๋…ธ์ด์ฆˆ ๋ฐ ๋ชจ๋ธ ๋…ธ์ด์ฆˆ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์„ ํ†ตํ•ด ํ•™์ƒ์€ ๋” ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์—์„œ ๋” ๊ฐ•๋ ฅํ•˜๊ฒŒ ํ•™์Šตํ•˜๋ฉฐ, ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๋กœ๋ถ€ํ„ฐ ๋” ๊ฐ•๋ ฅํ•œ ํŠน์ง•์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค.

Noisy Student Training๊ณผ 3์–ต ๊ฐœ์˜ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ EfficientNet์˜ ImageNet top-1 ์ •ํ™•๋„๋ฅผ 88.4%๋กœ ํ–ฅ์ƒ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค [83]. ์ด ์ •ํ™•๋„๋Š” ์ด์ „์— ์•ฝ 35์–ต ๊ฐœ์˜ ์•ฝํ•œ ๋ ˆ์ด๋ธ”์ด ๋ถ€์ฐฉ๋œ Instagram ์ด๋ฏธ์ง€๊ฐ€ ํ•„์š”ํ•œ ์ตœ์ฒจ๋‹จ ๊ฒฐ๊ณผ๋ณด๋‹ค 2.0% ๋” ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ํ‘œ์ค€ ImageNet ์ •ํ™•๋„๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ํ›จ์”ฌ ์–ด๋ ค์šด ํ…Œ์ŠคํŠธ ์„ธํŠธ์—์„œ์˜ ๋ถ„๋ฅ˜ ๊ฐ•๊ฑด์„ฑ๋„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ImageNet-A์˜ top-1 ์ •ํ™•๋„๋Š” 61.0%์—์„œ 83.7%๋กœ ํ–ฅ์ƒ๋˜์—ˆ์œผ๋ฉฐ, ImageNet-C์˜ ํ‰๊ท  ์†์ƒ ์˜ค์ฐจ๋Š” 45.7์—์„œ 28.3์œผ๋กœ ์ค„์—ˆ์œผ๋ฉฐ, ImageNet-P์˜ ํ‰๊ท  ๋’ค์ง‘๊ธฐ ๋น„์œจ์€ 27.8์—์„œ 12.2๋กœ ์ค„์—ˆ์Šต๋‹ˆ๋‹ค. ์ฃผ์š” ๊ฒฐ๊ณผ๋Š” ํ‘œ 1์— ๋‚˜์™€ ์žˆ์Šต๋‹ˆ๋‹ค.

 

2. Noisy Student Training

์•Œ๊ณ ๋ฆฌ์ฆ˜ 1์€ Noisy Student Training์˜ ๊ฐœ์š”๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์ž…๋ ฅ์€ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๋ฐ ํ‘œ์ค€ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์€ ์†Œํ”„ํŠธ ํ˜•์‹ (์—ฐ์†์ ์ธ ๋ถ„ํฌ) ๋˜๋Š” ํ•˜๋“œ ํ˜•์‹ (์›-ํ•ซ ๋ถ„ํฌ)์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ๊ฒฐํ•ฉ๋œ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์„ ์ตœ์†Œํ™”ํ•˜๋Š” ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ํ•™์ƒ์„ ์„ ์ƒ๋‹˜์œผ๋กœ ๋˜๋Œ๋ ค ๋†“์•„ ์ƒˆ๋กœ์šด ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๊ณ  ์ƒˆ๋กœ์šด ํ•™์ƒ์„ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค. ์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ Figure 1์—์„œ๋„ ์‹œ๊ฐ์ ์œผ๋กœ ์„ค๋ช…๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ์•Œ๊ณ ๋ฆฌ์ฆ˜์€ ์ค€์ง€๋„ ํ•™์Šต์—์„œ์˜ self-training([71, 96])๊ณผ distillation[33]์˜ ๊ฐœ์„ ๋œ ๋ฒ„์ „์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์ด ์ด์ „ ์—ฐ๊ตฌ์™€ ์–ด๋–ป๊ฒŒ ๊ด€๋ จ๋˜์–ด ์žˆ๋Š”์ง€์— ๋Œ€ํ•œ ๋” ๋งŽ์€ ๋…ผ์˜๋Š” ์„น์…˜ 5์—์„œ ๋‹ค๋ฃจ๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ €ํฌ์˜ ์ฃผ์š” ๊ฐœ์„ ์ ์€ ํ•™์ƒ์—๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๊ณ , ์„ ์ƒ๋‹˜๋ณด๋‹ค ์ž‘์ง€ ์•Š์€ ํ•™์ƒ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์— ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ์ €ํฌ์˜ ๋ฐฉ๋ฒ•์ด ์ง€์‹ ์ฆ๋ฅ˜(Knowledge Distillation)[33]์™€ ๋‹ค๋ฅธ ์ ์ž…๋‹ˆ๋‹ค. ์ง€์‹ ์ฆ๋ฅ˜์—์„œ๋Š”

1) ๋…ธ์ด์ฆˆ๊ฐ€ ์ž์ฃผ ์‚ฌ์šฉ๋˜์ง€ ์•Š์œผ๋ฉฐ,

2) ์„ ์ƒ๋‹˜๋ณด๋‹ค ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์œ„ํ•ด ์ž‘์€ ํ•™์ƒ ๋ชจ๋ธ์ด ์ž์ฃผ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์ €ํฌ ๋ฐฉ๋ฒ•์€ ์ง€์‹ ํ™•์žฅ์œผ๋กœ ์ƒ๊ฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์ƒ ๋ชจ๋ธ์ด ์ถฉ๋ถ„ํ•œ ์šฉ๋Ÿ‰๊ณผ ๋…ธ์ด์ฆˆ์™€ ๊ฐ™์€ ์–ด๋ ค์šด ํ™˜๊ฒฝ์„ ํ†ตํ•ด ์„ ์ƒ๋‹˜๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

 

Noising Student

ํ•™์ƒ์„ ๊ณ ์˜๋กœ ๋…ธ์ด์ฆˆ ์ฒ˜๋ฆฌํ•  ๋•Œ๋Š” ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋•Œ ๋…ธ์ด์ฆˆ๊ฐ€ ์—†๋Š” ์„ ์ƒ๋‹˜๊ณผ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜๋„๋ก ํ•™์Šต๋ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ๋Š” ๋‘ ๊ฐ€์ง€ ์œ ํ˜•์˜ ๋…ธ์ด์ฆˆ๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค:

์ž…๋ ฅ ๋…ธ์ด์ฆˆ์™€ ๋ชจ๋ธ ๋…ธ์ด์ฆˆ์ž…๋‹ˆ๋‹ค. ์ž…๋ ฅ ๋…ธ์ด์ฆˆ๋กœ๋Š” RandAugment [18]์™€ ๊ฐ™์€ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ๋…ธ์ด์ฆˆ๋กœ๋Š” dropout [76]๊ณผ stochastic depth [37]๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ ์šฉํ•  ๋•Œ, ๋…ธ์ด์ฆˆ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์˜ ๊ฒฐ์ • ํ•จ์ˆ˜์—์„œ ๋ถˆ๋ณ€์„ฑ์„ ๊ฐ•์ œํ•˜๋Š” ์ค‘์š”ํ•œ ์ด์ ์„ ๊ฐ€์ง€๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

์ฒซ์งธ, ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์€ Noisy Student Training์—์„œ ์ค‘์š”ํ•œ ๋…ธ์ด์ง• ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์€ ํ•™์ƒ์ด ์ด๋ฏธ์ง€์˜ ์ฆ๊ฐ•๋œ ๋ฒ„์ „ ๊ฐ„์— ์˜ˆ์ธก ์ผ๊ด€์„ฑ์„ ๋ณด์žฅํ•˜๋„๋ก ๊ฐ•์ œํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค (UDA [91]์™€ ์œ ์‚ฌ). ํŠนํžˆ, ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์—์„œ ์„ ์ƒ๋‹˜์€ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ฝ์–ด๋“ค์—ฌ ๊ณ ํ’ˆ์งˆ์˜ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, ํ•™์ƒ์€ ์ฆ๊ฐ•๋œ ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋Ÿฌํ•œ ๋ ˆ์ด๋ธ”์„ ์žฌํ˜„ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ํ•™์ƒ์€ ์ด๋ฏธ์ง€์˜ ๋ฒˆ์—ญ๋œ ๋ฒ„์ „์ด ์›๋ณธ ์ด๋ฏธ์ง€์™€ ๋™์ผํ•œ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ ๊ฐ€์ ธ์•ผ ํ•จ์„ ๋ณด์žฅํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

๋‘˜์งธ, dropout๊ณผ stochastic depth ํ•จ์ˆ˜๋ฅผ ๋…ธ์ด์ฆˆ๋กœ ์‚ฌ์šฉํ•  ๋•Œ, ์„ ์ƒ๋‹˜์€ ์ถ”๋ก  ์‹œ (๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋•Œ) ์•™์ƒ๋ธ”์ฒ˜๋Ÿผ ๋™์ž‘ํ•˜๊ณ , ํ•™์ƒ์€ ๋‹จ์ผ ๋ชจ๋ธ์ฒ˜๋Ÿผ ๋™์ž‘ํ•ฉ๋‹ˆ๋‹ค. ์ฆ‰, ํ•™์ƒ์€ ๋” ๊ฐ•๋ ฅํ•œ ์•™์ƒ๋ธ” ๋ชจ๋ธ์„ ๋ชจ๋ฐฉํ•˜๋„๋ก ๊ฐ•์ œ๋ฉ๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ์˜ ํšจ๊ณผ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋Š” 4.1์ ˆ์—์„œ ์ œ์‹œ๋ฉ๋‹ˆ๋‹ค.

 

Other Techniques

Noisy Student Training์€ ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง๊ณผ ๊ท ํ˜• ๋งž์ถค๊ณผ ๊ฐ™์€ ์ถ”๊ฐ€์ ์ธ ๊ธฐ๊ต๋กœ ๋”์šฑ ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” [91, 93]์™€ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์ด ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ์€ ์ด๋ฏธ์ง€๋ฅผ ๊ฑธ๋Ÿฌ๋‚ด๊ธฐ ๋•Œ๋ฌธ์— ๋ณดํ†ต ๋„๋ฉ”์ธ ๋ฐ–์˜ ์ด๋ฏธ์ง€๋“ค์ž…๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ๋ถ„ํฌ๊ฐ€ ํ›ˆ๋ จ ์„ธํŠธ์™€ ์ผ์น˜ํ•˜๋„๋ก ๋ณด์žฅํ•˜๊ธฐ ์œ„ํ•ด, ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ์ˆ˜๋ฅผ ๊ท ํ˜•์žˆ๊ฒŒ ๋งž์ถ”์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ImageNet์˜ ๋ชจ๋“  ํด๋ž˜์Šค๋Š” ์œ ์‚ฌํ•œ ์ˆ˜์˜ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฏ€๋กœ, ์ด๋ฏธ์ง€๊ฐ€ ๋ถ€์กฑํ•œ ํด๋ž˜์Šค์—์„œ๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฏธ์ง€๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์€ ํด๋ž˜์Šค์˜ ๊ฒฝ์šฐ ๊ฐ€์žฅ ๋†’์€ ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ง„ ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์‹คํ—˜์—์„œ ๋ชจ๋‘ ์ž˜ ๋™์ž‘ํ•˜๋Š” ์†Œํ”„ํŠธ ๋˜๋Š” ํ•˜๋“œ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๊ณผ ํ•จ๊ป˜ ์‚ฌ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ•˜๊ณ ์ž ํ•ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, ์†Œํ”„ํŠธ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์€ ์™ธ๋ถ€ ๋„๋ฉ”์ธ์˜ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ์•ฝ๊ฐ„ ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์•„๋ž˜์—์„œ๋Š” ์ผ๊ด€์„ฑ์„ ์œ„ํ•ด, ๋ช…์‹œ๋˜์ง€ ์•Š๋Š” ํ•œ ์†Œํ”„ํŠธ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๊ณผ ํ•จ๊ป˜ ๊ฒฐ๊ณผ๋ฅผ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค.

 

Comparisons with Existing SSL Methods

self-training ์™ธ์—๋„ ์ค€์ง€๋„ ํ•™์Šต์˜ ๋˜ ๋‹ค๋ฅธ ์ค‘์š”ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ๋Š” ์ผ๊ด€์„ฑ ํ›ˆ๋ จ [5, 64, 47, 84, 56, 91, 8] ๋ฐ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๋ง [48, 39, 73, 1]์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ์œ ๋งํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ•ด๋ƒˆ์œผ๋‚˜, ์šฐ๋ฆฌ์˜ ์ดˆ๊ธฐ ์‹คํ—˜์—์„œ๋Š” ์ผ๊ด€์„ฑ ์ •๊ทœํ™” ๋ฐ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”๋ง์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐฉ๋ฒ•๋“ค์ด ImageNet์—์„œ๋Š” ๋œ ์ž˜ ์ž‘๋™ํ•˜๋Š” ๊ฒƒ์œผ๋กœ ๋‚˜ํƒ€๋‚ฌ์Šต๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋กœ ํ›ˆ๋ จ๋œ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๋Œ€์‹ , ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ณ„๋„์˜ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ ์—†์ด ํ›ˆ๋ จ ์ค‘์ธ ๋ชจ๋ธ ์ž์ฒด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ๋Š” ํ›ˆ๋ จ ์ค‘์ธ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๊ฐ€ ๋‚ฎ๊ณ  ์—”ํŠธ๋กœํ”ผ๊ฐ€ ๋†’๊ธฐ ๋•Œ๋ฌธ์—, ์ผ๊ด€์„ฑ ํ›ˆ๋ จ์€ ๋ชจ๋ธ์„ ๊ณ ์—”ํŠธ๋กœํ”ผ ์˜ˆ์ธก์œผ๋กœ ์ •๊ทœํ™”ํ•˜๊ณ  ์ข‹์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์ง€ ๋ชปํ•˜๋„๋ก ๋ฐฉ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ฐ˜์ ์ธ ํ•ด๊ฒฐ์ฑ…์œผ๋กœ๋Š” ์—”ํŠธ๋กœํ”ผ ์ตœ์†Œํ™”๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‚ฎ์€ ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ง„ ์˜ˆ์ œ๋ฅผ ๊ฑธ๋Ÿฌ๋‚ด๊ฑฐ๋‚˜ ์ผ๊ด€์„ฑ ์†์‹ค์„ ์ ์ง„์ ์œผ๋กœ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๊ฒƒ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ ์ง„์ ์ธ ์Šค์ผ€์ค„๋ง, ์‹ ๋ขฐ๋„ ๊ธฐ๋ฐ˜ ํ•„ํ„ฐ๋ง ๋ฐ ์—”ํŠธ๋กœํ”ผ ์ตœ์†Œํ™”์— ๋”ฐ๋ฅธ ์ถ”๊ฐ€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋Œ€๊ทœ๋ชจ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ์ด์— ๋น„ํ•ด self-training/์„ ์ƒ๋‹˜-ํ•™์ƒ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋กœ ImageNet์— ๋Œ€ํ•œ ์ข‹์€ ์„ ์ƒ๋‹˜์„ ํ›ˆ๋ จ์‹œํ‚ฌ ์ˆ˜ ์žˆ์–ด ImageNet์— ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

3. Experiments

์ด ์„น์…˜์—์„œ๋Š” ๋จผ์ € ์‹คํ—˜ ์„ธ๋ถ€ ์‚ฌํ•ญ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ๊ณผ ๋น„๊ตํ•˜์—ฌ ImageNet ๊ฒฐ๊ณผ๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ, ImageNet-A, C, P์™€ ๊ฐ™์€ ๊ฐ•๊ฑด์„ฑ ๋ฐ์ดํ„ฐ์…‹ ๋ฐ ์ ๋Œ€์  ๊ณต๊ฒฉ ์ƒํ™ฉ์—์„œ ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์˜ ๋†€๋ผ์šด ๊ฐœ์„  ์‚ฌํ•ญ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

3.1. Experiment Details

Labeled dataset. ์šฐ๋ฆฌ๋Š” ImageNet 2012 ILSVRC ์ฑŒ๋ฆฐ์ง€ ์˜ˆ์ธก ๊ณผ์ œ์—์„œ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋ฐ์ดํ„ฐ์…‹์€ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ๊ฐ€์žฅ ๋งŽ์ด ๋ฒค์น˜๋งˆํฌ๋œ ๋ฐ์ดํ„ฐ์…‹ ์ค‘ ํ•˜๋‚˜๋กœ ๊ฐ„์ฃผ๋˜๋ฉฐ, ImageNet์—์„œ์˜ ๊ฐœ์„  ์‚ฌํ•ญ์ด ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ „์ด๋œ๋‹ค๊ณ  ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค [44, 66].

Unlabeled dataset. ์šฐ๋ฆฌ๋Š” ์•ฝ 3์–ต ๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•˜๋Š” JFT(Jigsaw For Transformation) ๋ฐ์ดํ„ฐ์…‹ [33, 15]์—์„œ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ์–ป์Šต๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐ์…‹์˜ ์ด๋ฏธ์ง€์—๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ์ง€๋งŒ, ์šฐ๋ฆฌ๋Š” ๋ ˆ์ด๋ธ”์„ ๋ฌด์‹œํ•˜๊ณ  ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ ์ทจ๊ธ‰ํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฐ์ดํ„ฐ์…‹์—์„œ ImageNet ๊ฒ€์ฆ ์„ธํŠธ ์ด๋ฏธ์ง€๋ฅผ ํ•„ํ„ฐ๋งํ•ฉ๋‹ˆ๋‹ค ([58] ์ฐธ์กฐ).

์šฐ๋ฆฌ๋Š” ์ด ๋ฐ์ดํ„ฐ ์…‹์— ๋Œ€ํ•ด ๋ฐ์ดํ„ฐ ํ•„ํ„ฐ๋ง๊ณผ ๊ท ํ˜• ์กฐ์ •์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

  1. ๋จผ์ €, ImageNet [83]์—์„œ ํ›ˆ๋ จ๋œ EfficientNet-B0๋ฅผ JFT ๋ฐ์ดํ„ฐ ์…‹ [33, 15]์— ์ ์šฉํ•˜์—ฌ ๊ฐ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค.
  1. ๊ทธ๋Ÿฐ ๋‹ค์Œ ๋ ˆ์ด๋ธ”์˜ ์‹ ๋ขฐ๋„๊ฐ€ 0.3๋ณด๋‹ค ๋†’์€ ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค. ๊ฐ ํด๋ž˜์Šค๋งˆ๋‹ค ์ตœ๋Œ€ 13๋งŒ๊ฐœ์˜ ๊ฐ€์žฅ ๋†’์€ ์‹ ๋ขฐ๋„๋ฅผ ๊ฐ€์ง„ ์ด๋ฏธ์ง€๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค.
  1. ๋งˆ์ง€๋ง‰์œผ๋กœ, 13๋งŒ๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ๊ฐ–๋„๋ก ๊ฐ ํด๋ž˜์Šค์—์„œ ์ด๋ฏธ์ง€๋ฅผ ์ž„์˜๋กœ ๋ณต์ œํ•ฉ๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ์ด ์ด๋ฏธ์ง€ ์ˆ˜๋Š” 130๋งŒ์žฅ์ด๋ฉฐ (์ผ๋ถ€ ์ด๋ฏธ์ง€๊ฐ€ ๋ณต์ œ๋จ), ์ด ์ค‘์—์„œ๋Š” 81๋งŒ์žฅ์˜ ๊ณ ์œ ํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์ด๋Ÿฌํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•ด์„œ๋Š” ๋†’์€ ๊ฐ•๊ฑด์„ฑ์„ ๊ฐ€์ง€๊ณ  ์žˆ์œผ๋ฏ€๋กœ ์ด๋“ค์„ ๊ด‘๋ฒ”์œ„ํ•˜๊ฒŒ ์กฐ์ •ํ•˜์ง€๋Š” ์•Š์Šต๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๊ฒฐ๊ณผ์™€์˜ ๊ณต์ •ํ•œ ๋น„๊ต๋ฅผ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์ธ YFCC100M [85]์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ถ€๋ก A.4์—์„œ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

Architecture. ์ €ํฌ๋Š” ๋” ๋งŽ์€ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋” ๋‚˜์€ ์šฉ๋Ÿ‰์„ ์ œ๊ณตํ•˜๋Š” EfficientNets [83]์„ ๊ธฐ์ค€ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ๋Š” EfficientNet-B7๋ฅผ ๋”์šฑ ํ™•์žฅํ•˜์—ฌ EfficientNet-L2๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. EfficientNet-L2๋Š” EfficientNet-B7๋ณด๋‹ค ๋„“๊ณ  ๊นŠ์œผ๋ฉฐ ํ•ด์ƒ๋„๊ฐ€ ๋‚ฎ์ง€๋งŒ, ์ด๋Š” ๋” ๋งŽ์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ๋งž์ถœ ๋งค๊ฐœ๋ณ€์ˆ˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋ธ ํฌ๊ธฐ๊ฐ€ ํฌ๊ธฐ ๋•Œ๋ฌธ์—, EfficientNet-L2์˜ ํ›ˆ๋ จ ์‹œ๊ฐ„์€ EfficientNet-B7์˜ ํ›ˆ๋ จ ์‹œ๊ฐ„์˜ ์•ฝ 5๋ฐฐ์ž…๋‹ˆ๋‹ค. EfficientNet-L2์— ๋Œ€ํ•œ ์ž์„ธํ•œ ์ •๋ณด๋Š” ๋ถ€๋ก A.1์˜ ํ…Œ์ด๋ธ” 8์„ ์ฐธ์กฐํ•ด์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

Training details. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ๋Š” ๊ธฐ๋ณธ์ ์œผ๋กœ ๋ฐฐ์น˜ ํฌ๊ธฐ 2048์„ ์‚ฌ์šฉํ•˜๊ณ , ๋ชจ๋ธ์„ ๋ฉ”๋ชจ๋ฆฌ์— ๋งž์ถ”์ง€ ๋ชปํ•  ๊ฒฝ์šฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์ค„์ž…๋‹ˆ๋‹ค. ๋ฐฐ์น˜ ํฌ๊ธฐ 512, 1024, 2048์„ ์‚ฌ์šฉํ•ด๋„ ๋™์ผํ•œ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ์— ๋”ฐ๋ผ ํ›ˆ๋ จ ๋‹จ๊ณ„ ์ˆ˜์™€ ํ•™์Šต๋ฅ  ์Šค์ผ€์ค„์„ ๊ฒฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ตฌ์ฒด์ ์œผ๋กœ, EfficientNet-B4๋ณด๋‹ค ํฐ ๋ชจ๋ธ์ธ EfficientNet-L2๋ฅผ ํฌํ•จํ•œ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋Š” 350 epoch ๋™์•ˆ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•˜๊ณ , ๋” ์ž‘์€ ํ•™์ƒ ๋ชจ๋ธ์— ๋Œ€ํ•ด์„œ๋Š” 700 epoch ๋™์•ˆ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค. ํ•™์Šต๋ฅ ์€ ๋ ˆ์ด๋ธ” ๋ฐฐ์น˜ ํฌ๊ธฐ 2048์— ๋Œ€ํ•ด 0.128๋กœ ์‹œ์ž‘ํ•˜๊ณ , 350 epoch ํ›ˆ๋ จ ์‹œ 2.4 epoch๋งˆ๋‹ค 0.97๋กœ ๊ฐ์†Œํ•˜๊ฑฐ๋‚˜ 700 epoch ํ›ˆ๋ จ ์‹œ 4.8 epoch๋งˆ๋‹ค ๊ฐ์†Œํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ๋Š” ํŠนํžˆ ํฐ ๋ชจ๋ธ์˜ ๊ฒฝ์šฐ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํฐ ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ๋Œ€๋Ÿ‰์˜ ์–‘์„ ์ตœ๋Œ€ํ•œ ํ™œ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์™€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋ฅผ ์—ฐ๊ฒฐํ•˜์—ฌ ํ‰๊ท  ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์„ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค. EfficientNet-L2์— ๋Œ€ํ•ด์„œ๋Š” ์ตœ๊ทผ ์ œ์•ˆ๋œ train-test ํ•ด์ƒ๋„ ๋ถˆ์ผ์น˜๋ฅผ ์ˆ˜์ •ํ•˜๋Š” ๊ธฐ์ˆ ์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค [86]. ์šฐ๋ฆฌ๋Š” ๋จผ์ € ์ž‘์€ ํ•ด์ƒ๋„๋กœ 350 epoch ๋™์•ˆ ์ผ๋ฐ˜์ ์ธ ํ›ˆ๋ จ์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ํฐ ํ•ด์ƒ๋„๋กœ unaugmented ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด 1.5 epoch ๋™์•ˆ ๋ชจ๋ธ์„ ์„ธ๋ฐ€ํ•˜๊ฒŒ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. [86]๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ, ์„ธ๋ฐ€ ์กฐ์ • ์ค‘์—๋Š” ์–•์€ ๋ ˆ์ด์–ด๋ฅผ ๊ณ ์ •ํ•ฉ๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ๊ฐ€์žฅ ํฐ ๋ชจ๋ธ์ธ EfficientNet-L2๋Š” ๋ ˆ์ด๋ธ” ๋ฐฐ์น˜ ํฌ๊ธฐ์˜ 14๋ฐฐ์ธ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐฐ์น˜ ํฌ๊ธฐ๋กœ ํ›ˆ๋ จํ•  ๊ฒฝ์šฐ, 2048 ๊ฐœ์˜ ์ฝ”์–ด๋ฅผ ๊ฐ–๋Š” Cloud TPU v3 Pod์—์„œ 6์ผ ๋™์•ˆ ํ›ˆ๋ จํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

Noise. ์šฐ๋ฆฌ๋Š” ํ•™์ƒ ๋ชจ๋ธ์— ๋Œ€ํ•ด stochastic depth [37], dropout [76] ๋ฐ RandAugment [18]๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋…ธ์ด์ฆˆ ํ•จ์ˆ˜์— ๋Œ€ํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋Š” EfficientNet-B7๊ณผ L2์— ๋Œ€ํ•ด ๋™์ผํ•˜๊ฒŒ ์„ค์ •๋ฉ๋‹ˆ๋‹ค. ํŠนํžˆ, stochastic depth์—์„œ๋Š” ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์˜ ์ƒ์กด ํ™•๋ฅ ์„ 0.8๋กœ ์„ค์ •ํ•˜๊ณ , ๋‹ค๋ฅธ ๋ ˆ์ด์–ด์— ๋Œ€ํ•ด์„œ๋Š” ์„ ํ˜• ๊ฐ์†Œ ๊ทœ์น™์„ ๋”ฐ๋ฆ…๋‹ˆ๋‹ค. dropout์€ dropout ๋น„์œจ์ด 0.5์ธ ๋งˆ์ง€๋ง‰ ๋ ˆ์ด์–ด์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค. RandAugment์—์„œ๋Š” ๋žœ๋ค ์ž‘์—…์„ ๋‘ ๊ฐ€์ง€ ์ ์šฉํ•˜๋ฉฐ, ํฌ๊ธฐ๋Š” 27๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

Interative training. ์šฐ๋ฆฌ ์‹คํ—˜์—์„œ ์ตœ๊ณ ์˜ ๋ชจ๋ธ์€ ํ•™์ƒ์„ ์ƒˆ๋กœ์šด ์„ ์ƒ๋‹˜์œผ๋กœ ๋˜๋Œ๋ ค ๋†“๋Š” ์„ธ ๋ฒˆ์˜ ๋ฐ˜๋ณต์„ ํ†ตํ•ด ์–ป์€ ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋จผ์ € ImageNet์—์„œ EfficientNet-B7๋ฅผ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ๋กœ ํ›ˆ๋ จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ B7 ๋ชจ๋ธ์„ ์„ ์ƒ๋‹˜์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ๋ ˆ์ด๋ธ” ๋ฐฐ์น˜ ํฌ๊ธฐ์˜ 14๋ฐฐ๋กœ ์„ค์ •ํ•˜์—ฌ EfficientNet-L2 ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๊ทธ ๋‹ค์Œ EfficientNet-L2 ๋ชจ๋ธ์„ ์„ ์ƒ๋‹˜์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ƒˆ๋กœ์šด EfficientNet-L2 ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œ์ผฐ์Šต๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰์œผ๋กœ ๋‹ค์‹œ ๋ฐ˜๋ณตํ•˜๊ณ  ๋ ˆ์ด๋ธ” ๋ฐฐ์น˜ ํฌ๊ธฐ์˜ 28๋ฐฐ๋กœ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐฐ์น˜ ํฌ๊ธฐ๋ฅผ ์„ค์ •ํ•˜์—ฌ ํ›ˆ๋ จํ–ˆ์Šต๋‹ˆ๋‹ค. ์„ธ ๋ฒˆ์˜ ๋ฐ˜๋ณต์˜ ์ƒ์„ธํ•œ ๊ฒฐ๊ณผ๋Š” 4.2์ ˆ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3.2. ImagetNet Results

์šฐ๋ฆฌ๋Š” ๋จผ์ € ๋ฌธํ—Œ์—์„œ ๋ณดํ†ต ์ˆ˜ํ–‰๋˜๋Š” ๋Œ€๋กœ ImageNet 2012 ILSVRC ์ฑŒ๋ฆฐ์ง€ ์˜ˆ์ธก ๊ณผ์ œ์˜ ๊ฒ€์ฆ ์„ธํŠธ ์ •ํ™•๋„๋ฅผ ๋ณด๊ณ ํ•ฉ๋‹ˆ๋‹ค [45, 80, 30, 83] (๋˜ํ•œ [66]๋„ ์ฐธ์กฐ). ํ‘œ 2์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์„ ์ ์šฉํ•œ EfficientNet-L2๋Š” 88.4%์˜ top-1 ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ EfficientNet์˜ ์ตœ๊ณ  ๋ณด๊ณ  ์ •ํ™•๋„์ธ 85.0%๋ณด๋‹ค ํฌ๊ฒŒ ๊ฐœ์„ ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด 3.4%์˜ ์ด ๊ฐœ์„ ์€ ๋‘ ๊ฐ€์ง€ ์š”์†Œ์—์„œ ์˜จ ๊ฒƒ์ž…๋‹ˆ๋‹ค: ๋ชจ๋ธ์„ ๋” ํฌ๊ฒŒ ๋งŒ๋“ค์–ด์„œ(+0.5%)์™€ Noisy Student Training์„ ์ ์šฉํ•˜์—ฌ(+2.9%). ์ฆ‰, Noisy Student Training์€ ์•„ํ‚คํ…์ฒ˜๋ฅผ ๋ณ€๊ฒฝํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ์ •ํ™•๋„์— ํ›จ์”ฌ ํฐ ์˜ํ–ฅ์„ ๋ฏธ์นฉ๋‹ˆ๋‹ค.

๋˜ํ•œ, Noisy Student Training์€ FixRes ResNeXt-101 WSL [55, 86]์˜ ์ตœ๊ณ  ์ •ํ™•๋„์ธ 86.4%๋ฅผ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. FixRes ResNeXt-101 WSL์€ ํƒœ๊ทธ๋กœ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ 35์–ต ๊ฐœ์˜ Instagram ์ด๋ฏธ์ง€๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ๋น„๊ต์  ์ˆ˜์ง‘ํ•˜๊ธฐ ์‰ฌ์šด 3์–ต ๊ฐœ์˜ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋งŒ ํ•„์š”ํ•œ ๋ฐ˜๋ฉด, ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ์ด๋ฅผ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ, ์šฐ๋ฆฌ์˜ ๋ชจ๋ธ์€ FixRes ResNeXt-101 WSL๊ณผ ๋น„๊ตํ•˜์—ฌ ๋งค๊ฐœ๋ณ€์ˆ˜์˜ ์ˆ˜์—์„œ ์•ฝ ๋‘ ๋ฐฐ ์ •๋„ ์ž‘์Šต๋‹ˆ๋‹ค.

Model size study: Noisy Student Training for EfficientNet B0-B7 without Interative Training. Noisy Student Training๊ฐ€ ๋‹ค๋ฅธ EfficientNet ๋ชจ๋ธ์—๋„ ์ด์ ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋Š”์ง€ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์ถ”๊ฐ€ ์‹คํ—˜์„ ์ง„ํ–‰ํ•ฉ๋‹ˆ๋‹ค. ์ด์ „ ์‹คํ—˜์—์„œ๋Š” EfficientNet-L2์˜ ์ •ํ™•๋„๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ˜๋ณต์ ์ธ ํ›ˆ๋ จ์ด ์‚ฌ์šฉ๋˜์—ˆ์ง€๋งŒ, ๋งŽ์€ ์‹คํ—˜์— ๋Œ€ํ•ด์„œ๋Š” ๋ฐ˜๋ณต์ ์ธ ํ›ˆ๋ จ์„ ์‚ฌ์šฉํ•˜๊ธฐ ์–ด๋ ค์šฐ๋ฏ€๋กœ ์ด๋ฅผ ๊ฑด๋„ˆ๋œ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” EfficientNet-B0๋ถ€ํ„ฐ EfficientNet-B7 [83]๊นŒ์ง€ ๋ชจ๋ธ ํฌ๊ธฐ๋ฅผ ๋ณ€๊ฒฝํ•˜๊ณ , ์„ ์ƒ๋‹˜๊ณผ ํ•™์ƒ์œผ๋กœ ๋™์ผํ•œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ชจ๋“  EfficientNet ๊ธฐ์ค€ ๋ชจ๋ธ์— RandAugment๋ฅผ ์ ์šฉํ•˜์—ฌ ๋” ๊ฒฝ์Ÿ๋ ฅ ์žˆ๋Š” ๊ธฐ์ค€ ๋ชจ๋ธ์„ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ์˜ 3๋ฐฐ๋ฅผ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ๋กœ ์„ค์ •ํ•˜๋ฉฐ, EfficientNet-B0์˜ ๊ฒฝ์šฐ ๋ ˆ์ด๋ธ”์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์˜ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋™์ผํ•˜๊ฒŒ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆผ 2์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์€ ๋ชจ๋“  ๋ชจ๋ธ ํฌ๊ธฐ์— ๋Œ€ํ•ด ์ผ๊ด€๋œ ์•ฝ 0.8%์˜ ๊ฐœ์„ ์„ ์ด๋Œ์–ด๋ƒ…๋‹ˆ๋‹ค. ์ „๋ฐ˜์ ์œผ๋กœ, Noisy Student Training์ด ์ ์šฉ๋œ EfficientNet์€ ๋ชจ๋ธ ํฌ๊ธฐ์™€ ์ •ํ™•๋„ ์‚ฌ์ด์—์„œ ์ด์ „ ์ž‘์—…์— ๋น„ํ•ด ํ›จ์”ฌ ๋” ์ข‹์€ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ๋˜ํ•œ ๋ฐ˜๋ณต์ ์ธ ํ›ˆ๋ จ ์—†์ด๋„ ๋น„์ „ ๋ชจ๋ธ์ด Noisy Student Training์—์„œ ์ด์ ์„ ์–ป์„ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.

3.3. Robustness Results on ImageNet-A, ImageNet-C and ImageNet-P

์šฐ๋ฆฌ๋Š” ์ตœ๊ณ ์˜ ๋ชจ๋ธ์ธ top1 ์ •ํ™•๋„๊ฐ€ 88.4%๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ชจ๋ธ์„ ์„ธ ๊ฐ€์ง€ ๊ฐ•์ธ์„ฑ ํ…Œ์ŠคํŠธ ์„ธํŠธ์ธ ImageNet-A, ImageNet-C ๋ฐ ImageNet-P์—์„œ ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ImageNet-C์™€ P ํ…Œ์ŠคํŠธ ์„ธํŠธ [31]๋Š” ํ๋ฆผ, ์•ˆ๊ฐœ, ํšŒ์ „ ๋ฐ ํฌ๊ธฐ ์กฐ์ •๊ณผ ๊ฐ™์€ ์ผ๋ฐ˜์ ์ธ ์†์ƒ๊ณผ ๋ณ€ํ˜•์ด ์žˆ๋Š” ์ด๋ฏธ์ง€๋ฅผ ํฌํ•จํ•ฉ๋‹ˆ๋‹ค. ImageNet-A ํ…Œ์ŠคํŠธ ์„ธํŠธ [32]๋Š” ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ์˜ ์ •ํ™•๋„์— ํฐ ํ•˜๋ฝ์„ ์•ผ๊ธฐํ•˜๋Š” ์–ด๋ ค์šด ์ด๋ฏธ์ง€๋กœ ๊ตฌ์„ฑ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ…Œ์ŠคํŠธ ์„ธํŠธ๋Š” "๊ฐ•์ธ์„ฑ" ๋ฒค์น˜๋งˆํฌ๋กœ ๊ฐ„์ฃผ๋ฉ๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ImageNet-A์˜ ๊ฒฝ์šฐ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€๊ฐ€ ํ›จ์”ฌ ๋” ์–ด๋ ต๊ฑฐ๋‚˜, ImageNet-C์™€ P์˜ ๊ฒฝ์šฐ ํ…Œ์ŠคํŠธ ์ด๋ฏธ์ง€๊ฐ€ ํ›ˆ๋ จ ์ด๋ฏธ์ง€์™€ ๋‹ค๋ฅด๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

ImageNet-C์™€ ImageNet-P์˜ ๊ฒฝ์šฐ, ์šฐ๋ฆฌ๋Š” ํ•ด์ƒ๋„ 224x224์™€ 299x299๋กœ ๋‘ ๊ฐ€์ง€ ๋ฒ„์ „์˜ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๊ณ , EfficientNet์ด ํ›ˆ๋ จ๋œ ํ•ด์ƒ๋„๋กœ ์ด๋ฏธ์ง€๋ฅผ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค. ํ‘œ 3, 4 ๋ฐ 5์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์€ ์ด์ „ ์ตœ์ฒจ๋‹จ ๋ชจ๋ธ์ธ 35์–ต ๊ฐœ์˜ ์•ฝํ•œ ๋ ˆ์ด๋ธ” ์ด๋ฏธ์ง€๋กœ ํ›ˆ๋ จ๋œ ResNeXt-101 WSL [55, 59]๊ณผ ๋น„๊ตํ•˜์—ฌ ๊ฐ•์ธ์„ฑ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์ƒ๋‹นํ•œ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ImageNet-A์˜ ๊ฒฝ์šฐ, top-1 ์ •ํ™•๋„๋ฅผ 61.0%์—์„œ 83.7%๋กœ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค. ImageNet-C์˜ ๊ฒฝ์šฐ, ํ‰๊ท  ์†์ƒ ์˜ค๋ฅ˜ (mCE)๋ฅผ 45.7์—์„œ 28.3์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค. ImageNet-P์˜ ๊ฒฝ์šฐ, ํ•ด์ƒ๋„ 224x224 (์ง์ ‘ ๋น„๊ต)๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ํ‰๊ท  flip rate (mFR)๋Š” 14.2์ด๊ณ , ํ•ด์ƒ๋„ 299x299๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ 12.2์ž…๋‹ˆ๋‹ค. ImageNet-C์™€ ImageNet-P์—์„œ์˜ ์ด๋Ÿฌํ•œ ๊ฐ•์ธ์„ฑ์˜ ํฐ ํ–ฅ์ƒ์€ ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์ด ๊ฐ•์ธ์„ฑ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜์ง€ ์•Š์•˜์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ๋†€๋ผ์šด ๊ฒฐ๊ณผ์ž…๋‹ˆ๋‹ค.

Qualitative Analysis. ๊ฐ•์ธ์„ฑ ๋ฒค์น˜๋งˆํฌ์˜ ์ƒ๋‹นํ•œ ํ–ฅ์ƒ์„ ์ง๊ด€์ ์œผ๋กœ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด, ์šฐ๋ฆฌ๋Š” Figure 3์—์„œ ๋ช‡ ๊ฐ€์ง€ ์ด๋ฏธ์ง€๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด ์ด๋ฏธ์ง€๋“ค์€ ํ‘œ์ค€ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์ด ์ž˜๋ชป๋˜์—ˆ์ง€๋งŒ Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์€ ์˜ฌ๋ฐ”๋ฅธ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

Figure 3a์—์„œ๋Š” ImageNet-A์˜ ์˜ˆ์‹œ ์ด๋ฏธ์ง€์™€ ์šฐ๋ฆฌ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ ๋งค์šฐ ์–ด๋ ค์šด ์ด๋ฏธ์ง€์˜ ์˜ฌ๋ฐ”๋ฅธ ๋ ˆ์ด๋ธ”์„ ์„ฑ๊ณต์ ์œผ๋กœ ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, Noisy Student Training์ด ์—†์œผ๋ฉด, ๋‘ ๋ฒˆ์งธ ํ–‰ ์™ผ์ชฝ์— ํ‘œ์‹œ๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๋ชจ๋ธ์€ ๋ฌผ ์œ„์˜ ๊ฒ€์€ ์—ฐ๊ฝƒ ์žŽ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€๋ฅผ ์†Œ๊ธˆ์Ÿ์ด๋กœ ์ž˜๋ชป ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ์˜ฌ๋ฐ”๋ฅด๊ฒŒ ์ž ์ž๋ฆฌ๋ฅผ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. ๋งจ ์œ„ ์™ผ์ชฝ ์ด๋ฏธ์ง€์—์„œ, Noisy Student Training์ด ์—†๋Š” ๋ชจ๋ธ์€ ๋ฐ”๋‹ค ์‚ฌ์ž๋ฅผ ๋ฌด์‹œํ•˜๊ณ  ๋ถ€ํ‘œ๋ฅผ ๋“ฑ๋Œ€๋กœ ์ž˜๋ชป ์ธ์‹ํ•˜์ง€๋งŒ, Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ๋ฐ”๋‹ค ์‚ฌ์ž๋ฅผ ์ธ์‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Figure 3b๋Š” ImageNet-C์˜ ์ด๋ฏธ์ง€์™€ ํ•ด๋‹น ์˜ˆ์ธก์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๊ทธ๋ฆผ์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ๋ˆˆ, ๋ชจ์…˜ ๋ธ”๋Ÿฌ, ์•ˆ๊ฐœ์™€ ๊ฐ™์€ ์‹ฌ๊ฐํ•œ ์†์ƒ๊ณผ ๋ณ€ํ˜•์ด ์žˆ๋Š” ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ˜๋ฉด, Noisy Student Training์ด ์—†๋Š” ๋ชจ๋ธ์€ ์ด๋Ÿฌํ•œ ์กฐ๊ฑด์—์„œ ํฌ๊ฒŒ ํ”ผํ•ด๋ฅผ ์ž…์Šต๋‹ˆ๋‹ค. ๊ฐ€์žฅ ํฅ๋ฏธ๋กœ์šด ์ด๋ฏธ์ง€๋Š” ์ฒซ ๋ฒˆ์งธ ํ–‰์˜ ์˜ค๋ฅธ์ชฝ์— ํ‘œ์‹œ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋ฆผ์˜ ๊ทธ๋„ค๋Š” ๊ฑฐ์˜ ์ธ์‹ํ•  ์ˆ˜ ์—†๋Š”๋ฐ๋„ Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ์—ฌ์ „ํžˆ ์˜ฌ๋ฐ”๋ฅธ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.

Figure 3c๋Š” ImageNet-P์˜ ์ด๋ฏธ์ง€์™€ ํ•ด๋‹น ์˜ˆ์ธก์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์ด ์ ์šฉ๋œ ๋ชจ๋ธ์€ ์ด๋ฏธ์ง€๊ฐ€ ๋‹ค๋ฅธ ๋ณ€ํ˜•์„ ๊ฒช์„ ๋•Œ๋„ ์˜ฌ๋ฐ”๋ฅธ ์ผ๊ด€๋œ ์˜ˆ์ธก์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๋ฐ˜๋ฉด, Noisy Student Training์ด ์—†๋Š” ๋ชจ๋ธ์€ ์˜ˆ์ธก์„ ์ž์ฃผ ๋’ค์ง‘์Šต๋‹ˆ๋‹ค.

3.4 Adversarial Robustness Results

์ผ๋ฐ˜์ ์ธ ์†์ƒ๊ณผ ๋ณ€ํ˜•์— ๋Œ€ํ•œ ๋ชจ๋ธ์˜ ๊ฐ•์ธ์„ฑ์„ ํ…Œ์ŠคํŠธํ•œ ํ›„์—๋Š” ์ ๋Œ€์ ์ธ ๋ณ€ํ˜•์— ๋Œ€ํ•œ ์„ฑ๋Šฅ๋„ ์—ฐ๊ตฌํ•ฉ๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” Noisy Student Training์ด ์ ์šฉ๋œ EfficientNet-L2 ๋ชจ๋ธ๊ณผ ๊ทธ๋ ‡์ง€ ์•Š์€ ๋ชจ๋ธ์„ FGSM ๊ณต๊ฒฉ์— ๋Œ€ํ•ด ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ด ๊ณต๊ฒฉ์€ ์ž…๋ ฅ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ํ•œ ๋ฒˆ์˜ ๊ทธ๋ž˜๋””์–ธํŠธ ํ•˜๊ฐ• ๋‹จ๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ๊ฐ ํ”ฝ์…€์— ๋Œ€ํ•œ ์—…๋ฐ์ดํŠธ๋Š” ์ž…์‹ค๋ก ์œผ๋กœ ์„ค์ •๋ฉ๋‹ˆ๋‹ค. Figure 4์—์„œ ๋ณผ ์ˆ˜ ์žˆ๋“ฏ์ด, Noisy Student Training์€ ์ ๋Œ€์ ์ธ ๊ฐ•์ธ์„ฑ์„ ๋ช…์‹œ์ ์œผ๋กœ ์ตœ์ ํ™”ํ•˜์ง€ ์•Š์•˜์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ  ์ •ํ™•๋„์—์„œ ๋งค์šฐ ํฐ ํ–ฅ์ƒ์„ ์ด๋•๋‹ˆ๋‹ค. ๋” ๊ฐ•๋ ฅํ•œ PGD ๊ณต๊ฒฉ์ธ 10๋ฒˆ์˜ ๋ฐ˜๋ณต์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ (= 16), Noisy Student Training์€ EfficientNet-L2์˜ ์ •ํ™•๋„๋ฅผ 1.1%์—์„œ 4.4%๋กœ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

์•ž์„œ ์–ธ๊ธ‰ํ•œ ์ ๋Œ€์ ์ธ ๊ฐ•์ธ์„ฑ ๊ฒฐ๊ณผ๋Š” ์ง์ ‘์ ์œผ๋กœ ์ด์ „ ์—ฐ๊ตฌ์™€ ๋น„๊ตํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์šฐ๋ฆฌ๋Š” 800x800 ํฌ๊ธฐ์˜ ํฐ ์ž…๋ ฅ ํ•ด์ƒ๋„๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์ ๋Œ€์ ์ธ ์ทจ์•ฝ์ ์€ ์ž…๋ ฅ ์ฐจ์›๊ณผ ํ•จ๊ป˜ ๋ณ€๋™ํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค [22, 25, 24, 74].

4. Ablation Study

์ด ์„น์…˜์—์„œ๋Š” ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์˜ ์ค‘์š”์„ฑ์„ ์ดํ•ดํ•˜๊ธฐ ์œ„ํ•ด ๋…ธ์ด์ฆˆ์™€ ๋ฐ˜๋ณต์ ์ธ ํ›ˆ๋ จ์˜ ์ค‘์š”์„ฑ์„ ์—ฐ๊ตฌํ•˜๊ณ , ๋‹ค๋ฅธ ๊ตฌ์„ฑ ์š”์†Œ์— ๋Œ€ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์š”์•ฝํ•ฉ๋‹ˆ๋‹ค. ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ๋ถ„์„ํ•จ์œผ๋กœ์จ, ์ด ๋ฐฉ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ๊ฒฐ์ •ํ•˜๋Š” ์ฃผ์š” ์š”์†Œ์— ๋Œ€ํ•œ ํ†ต์ฐฐ๋ ฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

4.1. The Importance of Noise in Self-training

์šฐ๋ฆฌ๋Š” ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์—์„œ ์ƒ์„ฑ๋œ ์†Œํ”„ํŠธ ์˜์‚ฌ ๋ ˆ์ด๋ธ”์„ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•™์ƒ์ด ์„ ์ƒ๋‹˜ ๋ชจ๋ธ๊ณผ ์™„์ „ํžˆ ๋™์ผํ•˜๊ฒŒ ํ›ˆ๋ จ๋˜๋Š” ๊ฒฝ์šฐ ๋ฏธ๋ถ„ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์€ 0์ด ๋˜๊ณ  ํ›ˆ๋ จ ์‹ ํ˜ธ๋Š” ์‚ฌ๋ผ์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ, ํ•™์ƒ์ด ์†Œํ”„ํŠธ ์˜์‚ฌ ๋ ˆ์ด๋ธ”๋กœ ์„ ์ƒ๋‹˜์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ด์œ ์— ๋Œ€ํ•œ ์งˆ๋ฌธ์ด ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค. ์•ž์„œ ์–ธ๊ธ‰ํ–ˆ๋“ฏ์ด, ํ•™์ƒ์ด ์„ ์ƒ๋‹˜์˜ ์ง€์‹๋งŒ ๋ฐฐ์šฐ์ง€ ์•Š๋„๋ก ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๊ฐ€์„ค์„ ์„ธ์›๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋‹ค๋ฅธ ์–‘์˜ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์™€ ๋‹ค๋ฅธ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์˜ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง„ ๋‘ ๊ฐ€์ง€ ์‹œ๋‚˜๋ฆฌ์˜ค์—์„œ ๋…ธ์ด์ฆˆ์˜ ์ค‘์š”์„ฑ์„ ์กฐ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ์ด ๋‘ ๊ฒฝ์šฐ ๋ชจ๋‘, ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จํ•  ๋•Œ ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ์ ์  ์ฆ๊ฐ•, ์Šคํ† ์บ์Šคํ‹ฑ ๋Ž์Šค ๋ฐ ๋“œ๋กญ์•„์›ƒ์„ ์ œ๊ฑฐํ•˜๋ฉด์„œ ๋ ˆ์ด๋ธ”๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด์„œ๋Š” ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ ‡๊ฒŒ ํ•จ์œผ๋กœ์จ, ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ๋…ธ์ด์ฆˆ์˜ ์˜ํ–ฅ์„ ๋ ˆ์ด๋ธ”๋œ ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์˜ค๋ฒ„ํ”ผํŒ… ๋ฐฉ์ง€์˜ ์˜ํ–ฅ๊ณผ ๋ถ„๋ฆฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๋…ธ์ด์ฆˆ๊ฐ€ ์ƒ์„ฑ๋œ ์˜์‚ฌ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•  ๋•Œ ๋…ธ์ด์ฆˆ๋ฅผ ๋น„ํ™œ์„ฑํ™”ํ•ด์•ผ ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ์—ฐ๊ตฌํ•˜๊ธฐ ์œ„ํ•ด ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ์„ ์ƒ๋‹˜๊ณผ ๋…ธ์ด์ฆˆ๊ฐ€ ์—†๋Š” ์„ ์ƒ๋‹˜์„ ๋น„๊ตํ•ฉ๋‹ˆ๋‹ค.

ํ‘œ 6์—์„œ ๋ณด์—ฌ์ฃผ๋Š” ๋ฐ”์™€ ๊ฐ™์ด, stochastic depth, dropout, ๊ทธ๋ฆฌ๊ณ  data augmentation๊ณผ ๊ฐ™์€ ๋…ธ์ด์ฆˆ ๊ธฐ๋Šฅ์€ ํ•™์ƒ ๋ชจ๋ธ์ด ์„ ์ƒ๋‹˜๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ ๊ธฐ๋Šฅ์„ ์ œ๊ฑฐํ•˜๋ฉด ์„ฑ๋Šฅ์ด ์ผ๊ด€๋˜๊ฒŒ ํ•˜๋ฝํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ 130M์˜ ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ, ๋…ธ์ด์ฆˆ ๊ธฐ๋Šฅ์„ ์ œ๊ฑฐํ•œ ๊ฒฝ์šฐ์—๋„ ์„ฑ๋Šฅ์€ 84.0%์—์„œ 84.3%๋กœ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ํ–ฅ์ƒ์€ SGD (Stochastic Gradient Descent)๋กœ ์„ค๋ช…๋  ์ˆ˜ ์žˆ๋Š”๋ฐ, ์ด๋Š” ํ›ˆ๋ จ ๊ณผ์ •์— ํ™•๋ฅ ์ ์ธ ์š”์†Œ๋ฅผ ๋„์ž…ํ•ฉ๋‹ˆ๋‹ค.

์–ด๋–ค ์‚ฌ๋žŒ๋“ค์€ ๋…ธ์ด์ฆˆ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐœ์„ ๋œ ๊ฒฐ๊ณผ๋Š” ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€์˜ ์˜์‚ฌ ๋ ˆ์ด๋ธ”์— ๋Œ€ํ•œ ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•œ ๊ฒฐ๊ณผ๋ผ๊ณ  ์ฃผ์žฅํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” 130M์˜ ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ์—๋Š” ๋ชจ๋ธ์ด ํ›ˆ๋ จ ์†์‹ค์„ ํ†ตํ•ด ๋ฏธ๋ถ„๋ฅ˜๋œ ์ง‘ํ•ฉ์— ๋Œ€ํ•ด ๊ณผ์ ํ•ฉ๋˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๊ฒƒ์„ ํ™•์ธํ–ˆ์Šต๋‹ˆ๋‹ค. ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•˜๋ฉด ๋ ˆ์ด๋ธ”๋œ ์ด๋ฏธ์ง€์˜ ํ›ˆ๋ จ ์†์‹ค์ด ํ›จ์”ฌ ๋‚ฎ์•„์ง€์ง€๋งŒ, ๋ฏธ๋ถ„๋ฅ˜๋œ ์ด๋ฏธ์ง€์˜ ๊ฒฝ์šฐ์—๋Š” ๋…ธ์ด์ฆˆ๋ฅผ ์ œ๊ฑฐํ•ด๋„ ํ›ˆ๋ จ ์†์‹ค์˜ ๊ฐ์†Œ๊ฐ€ ๋” ์ž‘๊ฒŒ ๋‚˜ํƒ€๋‚ฉ๋‹ˆ๋‹ค. ์ด๋Š” ํฐ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•ด ๊ณผ์ ํ•ฉํ•˜๊ธฐ๊ฐ€ ๋” ์–ด๋ ต๊ธฐ ๋•Œ๋ฌธ์ผ ๊ฒƒ์ž…๋‹ˆ๋‹ค.

๋งˆ์ง€๋ง‰์œผ๋กœ, ์˜์‚ฌ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜๋ฉด ์ •ํ™•๋„๊ฐ€ ๋‚ฎ์•„์ง€๋Š” ๊ฒƒ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ฐ•๋ ฅํ•œ ๋…ธ์ด์ฆˆ๊ฐ€ ์—†๋Š” ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ๊ฐ€์ง€๋Š” ๊ฒƒ์˜ ์ค‘์š”์„ฑ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.

4.2. A Study of Interative Training

์ด๋ฒˆ ์„น์…˜์—์„œ๋Š” ๋ฐ˜๋ณต ํ›ˆ๋ จ์˜ ์„ธ๋ถ€์ ์ธ ํšจ๊ณผ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค. 3.1์ ˆ์—์„œ ์–ธ๊ธ‰ํ•œ๋Œ€๋กœ, ์šฐ๋ฆฌ๋Š” ๋จผ์ € ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ๋กœ EfficientNet-B7 ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ค๊ณ , ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ด๋ฅผ ์„ ์ƒ๋‹˜์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ EfficientNet-L2 ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฐ ๋‹ค์Œ ์ด ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•˜์—ฌ ์ƒˆ๋กœ์šด ํ•™์ƒ ๋ชจ๋ธ์„ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ‘œ 7์—์„œ ๋ณด์—ฌ์ง€๋“ฏ์ด, ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ์ฒซ ๋ฒˆ์งธ ๋ฐ˜๋ณต์—์„œ 87.6%๋กœ ๊ฐœ์„ ๋˜๊ณ , ๋‘ ๋ฒˆ์งธ ๋ฐ˜๋ณต์—์„œ๋Š” 88.1%๋กœ ๊ฐœ์„ ๋ฉ๋‹ˆ๋‹ค (๋™์ผํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋˜ ์„ฑ๋Šฅ์ด ๋” ์ข‹์€ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•œ ๊ฒฝ์šฐ). ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ๋ฐ˜๋ณต ํ›ˆ๋ จ์ด ์ ์  ๋” ๋‚˜์€ ๋ชจ๋ธ์„ ๋งŒ๋“œ๋Š” ๋ฐ ํšจ๊ณผ์ ์ž„์„ ๋‚˜ํƒ€๋ƒ…๋‹ˆ๋‹ค. ๋งˆ์ง€๋ง‰ ๋ฐ˜๋ณต์—์„œ๋Š” ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋ ˆ์ด๋ธ”๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ ์‚ฌ์ด์˜ ๋น„์œจ์„ ๋” ํฌ๊ฒŒ ์„ค์ •ํ•˜์—ฌ ์ตœ์ข… ์„ฑ๋Šฅ์„ 88.4%๋กœ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

4.3 Additional Ablation Study Summarization

์ €ํฌ๋Š” Noisy Student Training์˜ ๋‹ค์–‘ํ•œ ์„ค๊ณ„ ์„ ํƒ์˜ ์ค‘์š”์„ฑ๋„ ์—ฐ๊ตฌํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋…์ž๋“ค์—๊ฒŒ ์‹ค์šฉ์ ์ธ ๊ฐ€์ด๋“œ๋ฅผ ์ œ๊ณตํ•˜๊ธฐ๋ฅผ ๋ฐ”๋ž๋‹ˆ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด Appendix A.2์—์„œ 8๊ฐ€์ง€ ๊ธฐ๋Šฅ ์ œ๊ฑฐ ์—ฐ๊ตฌ๋ฅผ ์ˆ˜ํ–‰ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์š”์•ฝ๋œ ๊ฒฐ๊ณผ๋ฅผ ๋„์ถœํ–ˆ์Šต๋‹ˆ๋‹ค:

  • ๊ฒฐ๊ณผ #1: ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ ํฐ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ ์‚ฌ์šฉ์€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋ƒ…๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #2: ๋Œ€๋Ÿ‰์˜ ๋ฏธ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ๋Š” ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #3: ์ผ๋ถ€ ๊ฒฝ์šฐ์—๋Š” ์†Œํ”„ํŠธ ๋ฏธ๋ถ„๋ฅ˜๊ฐ€ ํ•˜๋“œ ๋ฏธ๋ถ„๋ฅ˜๋ณด๋‹ค ์™ธ๋ถ€ ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #4: ํฐ ํ•™์ƒ ๋ชจ๋ธ์€ ํ•™์ƒ์ด ๋” ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #5: ๋ฐ์ดํ„ฐ ๊ท ํ˜•์€ ์ž‘์€ ๋ชจ๋ธ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #6: ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ์™€ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋™ ํ›ˆ๋ จ์€ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จํ•œ ๋‹ค์Œ ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ๋กœ ์„ธ๋ถ€ ์กฐ์ •ํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #7: ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋ ˆ์ด๋ธ”๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ ์‚ฌ์ด์˜ ํฐ ๋น„์œจ ์‚ฌ์šฉ์€ ๋ชจ๋ธ์ด ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์—์„œ ๋” ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ ํ›ˆ๋ จํ•˜์—ฌ ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #8: ํ•™์ƒ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ด ๋•Œ๋กœ๋Š” ์„ ์ƒ๋‹˜์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ํ•™์ƒ๋ณด๋‹ค ๋” ๋‚˜์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์„ ์ƒ๋‹˜์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ํ•™์ƒ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งŽ์€ ์ˆ˜์˜ ํ›ˆ๋ จ epoch๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
5. Related Works

Self-training

์ €ํฌ ์ž‘์—…์€ self-training์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, [71, 96, 68, 67]์™€ ๊ฐ™์€ ์ด์ „์˜ ์ž‘์—…๋“ค์—์„œ ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. Self-training์€ ๋จผ์ € ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ๋กœ ์ข‹์€ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚จ ๋‹ค์Œ, ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋ ˆ์ด๋ธ”์„ ์ง€์ •ํ•˜๊ณ , ๋งˆ์ง€๋ง‰์œผ๋กœ ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ์™€ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•จ๊ป˜ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์ƒ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค. ํ•˜์ง€๋งŒ ๊ธฐ์กด์˜ self-training ๋ฐฉ์‹์—์„œ๋Š” ์ผ๋ฐ˜์ ์œผ๋กœ ํ•™์ƒ ๋ชจ๋ธ์— ๋Œ€ํ•œ ๋…ธ์ด์ฆˆ ์ฃผ์ž…์€ ๊ธฐ๋ณธ์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์ง€ ์•Š๊ฑฐ๋‚˜, ๋…ธ์ด์ฆˆ์˜ ์—ญํ• ์ด ์™„์ „ํžˆ ์ดํ•ด๋˜๊ฑฐ๋‚˜ ์ •๋‹นํ™”๋˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์ž‘์—…๊ณผ ๊ธฐ์กด ์ž‘์—… ๊ฐ„์˜ ์ฃผ์š”ํ•œ ์ฐจ์ด์ ์€ ๋…ธ์ด์ฆˆ์˜ ์ค‘์š”์„ฑ์„ ์ธ์‹ํ•˜๊ณ , ํ•™์ƒ ๋ชจ๋ธ์„ ๋”์šฑ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์ ๊ทน์ ์œผ๋กœ ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์ž…ํ•œ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

์ด์ „์— self-training์€ ResNet-50์˜ top-1 ์ •ํ™•๋„๋ฅผ 76.4%์—์„œ 81.2%๋กœ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค [93]. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋Š” ์—ฌ์ „ํžˆ ์ตœ์‹  ๊ธฐ์ˆ ์˜ ์ •ํ™•๋„์™€๋Š” ๊ฑฐ๋ฆฌ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค. Yalniz ๋“ฑ [93]์€ ๋˜ํ•œ ์šฐ๋ฆฌ์™€ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ImageNet-A, C ๋ฐ P์˜ ๊ฐ•๊ฑด์„ฑ ๋ฉด์—์„œ ํฐ ๊ฐœ์„ ์„ ๋ณด์—ฌ์ฃผ์ง€ ์•Š์•˜์Šต๋‹ˆ๋‹ค. ๋ฐฉ๋ฒ•๋ก ์ ์œผ๋กœ๋Š”, ๊ทธ๋“ค์€ ๋จผ์ € ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ์ด๋ฏธ์ง€๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ›ˆ๋ จ์‹œํ‚จ ๋‹ค์Œ, ์ตœ์ข… ๋‹จ๊ณ„๋กœ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ์ด๋ฏธ์ง€์—์„œ ๋ชจ๋ธ์„ ์„ธ๋ฐ€ ์กฐ์ •ํ•˜๋„๋ก ์ œ์•ˆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ Noisy Student Training์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋‘ ๋‹จ๊ณ„๋ฅผ ํ•˜๋‚˜๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ๋‹จ์ˆœํ™”ํ•˜๊ณ  ์‹คํ—˜์—์„œ ๋” ์ข‹์€ ์„ฑ๋Šฅ์„ ์–ป์„ ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

Data Distillation [63]๊ณผ ๊ฐ™์€ ๊ธฐ์ˆ ์€ ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๋‹ค์–‘ํ•œ ๋ณ€ํ™˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ•๋ ฅํ•œ ์„ ์ƒ๋‹˜์„ ๋งŒ๋“ค๊ธฐ ์œ„ํ•ด ์˜ˆ์ธก์„ ์•™์ƒ๋ธ”ํ•˜๋Š” ๋ฐ˜๋ฉด, ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์€ ํ•™์ƒ์„ ์•ฝํ™”์‹œํ‚ค๋Š” ๊ฒƒ๊ณผ ๋ฐ˜๋Œ€์ž…๋‹ˆ๋‹ค. Parthasarathi et al. [61]์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ง€์‹ ์ฆ๋ฅ˜๋ฅผ ํ†ตํ•ด ๋ฐฐํฌ์šฉ์œผ๋กœ ์ž‘๊ณ  ๋น ๋ฅธ ์Œ์„ฑ ์ธ์‹ ๋ชจ๋ธ์„ ์ฐพ์•˜์Šต๋‹ˆ๋‹ค. ํ•™์ƒ์—๊ฒŒ ๋…ธ์ด์ฆˆ๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ํ•™์ƒ๋„ ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ์„ ์ƒ๋‹˜๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ๋Š” ์–ด๋ ต์Šต๋‹ˆ๋‹ค. [69]์˜ ๋„๋ฉ”์ธ ์ ์‘ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ๊ด€๋ จ์ด ์žˆ์ง€๋งŒ ๋น„๋””์˜ค์— ๋Œ€ํ•ด ๋งค์šฐ ์ตœ์ ํ™”๋˜์–ด ์žˆ์œผ๋ฉฐ, ์˜ˆ๋ฅผ ๋“ค์–ด ๋น„๋””์˜ค์—์„œ ์‚ฌ์šฉํ•  ํ”„๋ ˆ์ž„์„ ์˜ˆ์ธกํ•ฉ๋‹ˆ๋‹ค. [101]์˜ ๋ฐฉ๋ฒ•์€ ์—ฌ๋Ÿฌ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์„ ์•™์ƒ๋ธ”ํ•˜๋Š”๋ฐ, ์ด๋Š” ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค ๋น„์šฉ์ด ๋” ๋งŽ์ด ๋“ญ๋‹ˆ๋‹ค.

Co-training [9]์€ ํ”ผ์ณ๋ฅผ ๋‘ ๊ฐœ์˜ ์ƒํ˜ธ ๋ฐฐํƒ€์ ์ธ ํŒŒํ‹ฐ์…˜์œผ๋กœ ๋ถ„ํ• ํ•˜๊ณ  ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋‘ ๊ฐœ์˜ ๋ชจ๋ธ์„ ํ•ด๋‹น ํ”ผ์ณ ์ง‘ํ•ฉ๊ณผ ํ•จ๊ป˜ ํ›ˆ๋ จ์‹œํ‚ต๋‹ˆ๋‹ค. ๊ทธ๋“ค์˜ "๋…ธ์ด์ฆˆ"์˜ ์†Œ์Šค๋Š” ๋ ˆ์ด๋ธ”๋˜์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ์—์„œ ํ•ญ์ƒ ๋™์˜ํ•˜์ง€ ์•Š๋„๋ก ํ”ผ์ณ๋ฅผ ๋ถ„ํ• ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•์€ ํ•™์ƒ ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ฃผ์–ด ์„ ์ƒ๋‹˜๊ณผ ํ•™์ƒ์ด ์„œ๋กœ ๋‹ค๋ฅธ ์˜ˆ์ธก์„ ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋ฉฐ, ImageNet์—๋Š” ํ”ผ์ณ๋ฅผ ๋ถ„ํ• ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค ๋” ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

์…€ํ”„ ํŠธ๋ ˆ์ด๋‹ ๋ฐ ๊ณต๋™ ํŠธ๋ ˆ์ด๋‹์€ ๋…ธ์ด์ฆˆ๊ฐ€ ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ํšจ๊ณผ์ ์œผ๋กœ ์ž‘๋™ํ•จ์ด ์ž…์ฆ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํฐ ๋„์›€์ด ๋˜์—ˆ์œผ๋ฉฐ, ์‹œ๋งจํ‹ฑ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜, ํ…์ŠคํŠธ ๋ถ„๋ฅ˜ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์—์„œ ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ๊ธฐ๊ณ„ ๋ฒˆ์—ญ์—์„œ๋Š” ์—ญ ๋ฒˆ์—ญ๊ณผ ์…€ํ”„ ํŠธ๋ ˆ์ด๋‹์ด ์˜๋ฏธ ์žˆ๋Š” ๊ฐœ์„ ์„ ์ด๋Œ์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ ํฐ ์—ญํ• ์„ ํ•˜๊ณ  ์žˆ์œผ๋ฉฐ, ๋‹ค์–‘ํ•œ ๋ถ„์•ผ์—์„œ ๋„๋ฆฌ ์ฑ„ํƒ๋˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.

Semi-supervised Learning

์ž๊ธฐ ํ•™์Šต(self-training) ์™ธ์—๋„ ์ค€์ง€๋„ํ•™์Šต(semi-supervised learning)์˜ ๋˜ ๋‹ค๋ฅธ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์€ ์ผ๊ด€์„ฑ ํ›ˆ๋ จ(consistency training)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ฉ๋‹ˆ๋‹ค [12, 103]. ์ด ๋ฐฉ๋ฒ•๋“ค์€ ๋ชจ๋ธ์˜ ์˜ˆ์ธก์ด ์ž…๋ ฅ, ์ˆจ๊ฒจ์ง„ ์ƒํƒœ ๋˜๋Š” ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์— ์ฃผ์ž…๋œ ๋…ธ์ด์ฆˆ์— ๋ถˆ๋ณ€ํ•˜๋„๋ก ์ œ์•ฝ์„ ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค. ์ผ๊ด€์„ฑ ์ •๊ทœํ™”๋Š” ํ›ˆ๋ จ ์ค‘์ธ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์˜์‚ฌ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š”๋ฐ, ์ด๋กœ ์ธํ•ด ImageNet์—์„œ๋Š” ์ผ๊ด€์„ฑ ์ •๊ทœํ™”๊ฐ€ ์ž˜ ์ž‘๋™ํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์œผ๋กœ ์•Œ๋ ค์ ธ ์žˆ์Šต๋‹ˆ๋‹ค. ํ›ˆ๋ จ ์ดˆ๊ธฐ ๋‹จ๊ณ„์—์„œ ๋ชจ๋ธ์„ ๊ณ ์—”ํŠธ๋กœํ”ผ ์˜ˆ์ธก์œผ๋กœ ์ •๊ทœํ™”ํ•˜๋ฉด์„œ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ์–ต์ œํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

์˜์‚ฌ ๋ ˆ์ด๋ธ”(pseudo label)์— ๊ธฐ๋ฐ˜ํ•œ ์—ฐ๊ตฌ๋“ค [48, 39, 73, 1]์€ ์ž๊ธฐ ํ•™์Šต๊ณผ ์œ ์‚ฌํ•˜์ง€๋งŒ, ์ผ๊ด€์„ฑ ํ›ˆ๋ จ๊ณผ ๊ฐ™์€ ๋ฌธ์ œ๋ฅผ ๊ฒช์Šต๋‹ˆ๋‹ค. ์™œ๋ƒํ•˜๋ฉด ์ด๋“ค์€ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ–๋Š” ์ˆ˜๋ ด๋œ ๋ชจ๋ธ ๋Œ€์‹  ํ›ˆ๋ จ ์ค‘์ธ ๋ชจ๋ธ์— ์˜์‚ฌ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ ์˜์กดํ•˜๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค. ์ค€์ง€๋„ํ•™์Šต์˜ ๋‹ค๋ฅธ ํ”„๋ ˆ์ž„์›Œํฌ์—๋Š” ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ• [102, 89, 94, 42], ์ž ์žฌ ๋ณ€์ˆ˜๋ฅผ ๋Œ€์ƒ ๋ณ€์ˆ˜๋กœ ํ™œ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ• [41, 53, 95] ๋ฐ ์ €๋ฐ€๋„ ๋ถ„๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๋ฐฉ๋ฒ• [26, 70, 19] ๋“ฑ์ด ํฌํ•จ๋ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•๋“ค์€ ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•๊ณผ ๋ณด์™„์ ์ธ ์ด์ ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Knowledge Distillation

์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ๋Š” ์ง€์‹ ์••์ถ•์„ ์œ„ํ•œ ์ง€์‹ ์ฆ๋ฅ˜(Knowledge Distillation) ๋ฐฉ๋ฒ• [10, 3, 33, 21, 6]๊ณผ๋„ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ์†Œํ”„ํŠธ ํƒ€๊นƒ์„ ์‚ฌ์šฉํ•จ์œผ๋กœ์จ ์ง€์‹ ์ฆ๋ฅ˜ ๋ฐฉ๋ฒ•๊ณผ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค. ์ง€์‹ ์ฆ๋ฅ˜์˜ ์ฃผ์š” ๋ชฉ์ ์€ ํ•™์ƒ ๋ชจ๋ธ์„ ์ž‘๊ฒŒ ๋งŒ๋“ค์–ด ๋ชจ๋ธ ์••์ถ•์„ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ๋ฐฉ๋ฒ•๊ณผ ์ง€์‹ ์ฆ๋ฅ˜์˜ ์ฃผ์š” ์ฐจ์ด์ ์€ ์ง€์‹ ์ฆ๋ฅ˜๊ฐ€ ๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ํ•™์ƒ ๋ชจ๋ธ์„ ๊ฐœ์„ ํ•˜๋Š” ๊ฒƒ์„ ๋ชฉํ‘œ๋กœ ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

Robustness

๊ธฐ์กด ์—ฐ๊ตฌ๋“ค (์˜ˆ: [82, 31, 66, 27])์€ ์‹œ๊ฐ ๋ชจ๋ธ์˜ ๋ถ€์กฑํ•œ ๊ฒฌ๊ณ ์„ฑ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฒฌ๊ณ ์„ฑ์˜ ๋ถ€์กฑ์€ ์ตœ๊ทผ ๋จธ์‹ ๋Ÿฌ๋‹๊ณผ ์ปดํ“จํ„ฐ ๋น„์ „ ๋ถ„์•ผ์—์„œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์ด ๋˜์–ด์™”์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ๋Š” ๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ •ํ™•์„ฑ๊ณผ ์ผ๋ฐ˜์ ์ธ ๊ฒฌ๊ณ ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ฒฐ๊ณผ๋Š” ๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ ๋Œ€์  ๊ฒฌ๊ณ ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ์ฃผ์žฅ๊ณผ ์ผ์น˜ํ•ฉ๋‹ˆ๋‹ค [11, 77, 57, 97]. ์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ์™€ ์ด๋Ÿฌํ•œ ์—ฐ๊ตฌ๋“ค์˜ ์ฃผ์š” ์ฐจ์ด์ ์€ ๊ทธ๋“ค์ด ๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ์—์„œ ์ง์ ‘ ์ ๋Œ€์  ๊ฒฌ๊ณ ์„ฑ์„ ์ตœ์ ํ™”ํ•˜๋Š” ๋ฐ˜๋ฉด, ์šฐ๋ฆฌ๋Š” Noisy Student Training์ด ์ง์ ‘์ ์œผ๋กœ ๊ฒฌ๊ณ ์„ฑ์„ ์ตœ์ ํ™”ํ•˜์ง€ ์•Š์•„๋„ ๊ฒฌ๊ณ ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ์ ์ž…๋‹ˆ๋‹ค.

 

6. Conclusion

์•ฝํ•œ ์ง€๋„ ํ•™์Šต์— ๋Œ€ํ•œ ์ด์ „ ์—ฐ๊ตฌ๋“ค์€ ์ตœ์ฒจ๋‹จ ImageNet ๋ชจ๋ธ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ์ˆ˜์‹ญ์–ต ๊ฐœ์˜ ์•ฝํ•˜๊ฒŒ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ๋ฅผ ํ•„์š”๋กœ ํ–ˆ์Šต๋‹ˆ๋‹ค. ์ด ์—ฐ๊ตฌ์—์„œ๋Š” ๋ฏธ์ง€์˜ ์ด๋ฏธ์ง€๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ตœ์ฒจ๋‹จ ImageNet ๋ชจ๋ธ์˜ ์ •ํ™•์„ฑ๊ณผ ๊ฒฌ๊ณ ์„ฑ์„ ํ˜„์ €ํžˆ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ฃผ์—ˆ์Šต๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๋ฏธ์ง€์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋Œ€๊ทœ๋ชจ๋กœ ํ™œ์šฉํ•˜๊ธฐ ์œ„ํ•ด ์ž๊ธฐ ํ•™์Šต(self-training)์ด๋ผ๋Š” ๊ฐ„๋‹จํ•˜๊ณ  ํšจ๊ณผ์ ์ธ ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ํ•™์ƒ ๋ชจ๋ธ์— ๋…ธ์ด์ฆˆ๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ํ•™์ƒ์ด ์„ ์ƒ๋‹˜์˜ ์ง€์‹์„ ๋„˜์–ด์„œ๋„๋ก ๊ฐœ์„ ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋”ฐ๋ผ์„œ ์ด๋ฅผ Noisy Student Training์ด๋ผ๊ณ  ๋ถ€๋ฆ…๋‹ˆ๋‹ค.

์šฐ๋ฆฌ์˜ ์‹คํ—˜ ๊ฒฐ๊ณผ์—์„œ๋Š” Noisy Student Training๊ณผ EfficientNet์„ ์‚ฌ์šฉํ•˜์—ฌ 88.4%์˜ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” Noisy Student Training์„ ์‚ฌ์šฉํ•˜์ง€ ์•Š์„ ๋•Œ๋ณด๋‹ค 2.9% ๋” ๋†’์€ ์ˆ˜์น˜์ž…๋‹ˆ๋‹ค. ์ด ๊ฒฐ๊ณผ๋Š” ๋˜ํ•œ ์ƒˆ๋กœ์šด ์ตœ๊ณ  ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ์ด์ „์— ์•ฝ 10๋ฐฐ ๋” ๋งŽ์€ ์•ฝํ•˜๊ฒŒ ๋ ˆ์ด๋ธ”์ด ์ง€์ •๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์‚ฌ์šฉํ•œ ์ตœ์ ์˜ ๋ฐฉ๋ฒ•๋ณด๋‹ค 2.0% ๋” ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค. [55, 86]

์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ์—์„œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋Š” Noisy Student Training์ด ์ปดํ“จํ„ฐ ๋น„์ „ ๋ชจ๋ธ์˜ ๊ฒฌ๊ณ ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€ ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ์˜ ์‹คํ—˜์—์„œ๋Š” ์šฐ๋ฆฌ ๋ชจ๋ธ์ด ImageNet-A, C ๋ฐ P์—์„œ ์„ฑ๋Šฅ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ด์„ ๋ณด์˜€์Šต๋‹ˆ๋‹ค

 

๐Ÿ’ก
<๋ฆฌ๋ทฐ>

1. Introduction

  • ๋”ฅ๋Ÿฌ๋‹์€ Image classification ๋ถ„์•ผ์—์„œ ์—„์ฒญ๋‚œ ์ข‹์€ ์„ฑ๊ณผ๋“ค์„ ๋ณด์—ฌ์คฌ์Œ
  • ํ•˜์ง€๋งŒ, ์ด๋Ÿฌํ•œ ๋ชจ๋ธ๋“ค์€ ์—ฌ์ „ํžˆ ๋Œ€๋ถ€๋ถ„ ์ง€๋„ํ•™์Šต์œผ๋กœ ํ›ˆ๋ จ๋˜๊ณ , labeld image๋ฅผ ํ•„์š”๋กœ ํ–ˆ์Œ
  • ๊ทธ๋ ‡๊ฒŒ ๋˜๋ฉด, unlabeled data๋ฅผ ํ™œ์šฉํ•ด์„œ ์•„์ด๋””์–ด๋ฅผ ๋‚ผ ์ˆ˜ ์žˆ๋Š” ๊ธฐํšŒ๋ฅผ ์ œํ•œ์‹œํ‚ด
  • โžก๏ธ ๊ฒฐ๋ก : ์ด ๋…ผ๋ฌธ์€ Unlabeled data๋ฅผ ํ™œ์šฉํ•ด์„œ ImageNet ์ •ํ™•๋„ ๋ฐ ์ผ๋ฐ˜ํ™” ํ–ฅ์ƒ์„ ํ•˜๊ณ ์ž ํ–ˆ์Œ

 

  • ๊ธฐ์กด Self-training framework๋ฅผ ์‚ฌ์šฉํ–ˆ๊ณ , ์ด ๋…ผ๋ฌธ์—์„œ๋Š” Noisy Student Training ๊ธฐ๋ฒ•์ด๋ผ๊ณ  ์นญํ•จ
  • โ€ป Self-training framework (์ค€์ง€๋„ ํ•™์Šต)
    • ์ง€๋„ ํ•™์Šต๊ณผ ๋น„์ง€๋„ ํ•™์Šต์˜ ์‚ฌ์ด์— ์žˆ๋Š” ํ•™์Šต ๋ฐฉ์‹
    • ๋ ˆ์ด๋ธ”์ด ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ์™€ ์กด์žฌํ•˜์ง€ ์•Š๋Š” ๋ฐ์ดํ„ฐ ๋ชจ๋‘๋ฅผ ํ›ˆ๋ จ์— ์‚ฌ์šฉํ•จ
    • ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋‹ค๋Ÿ‰์˜ ๋ฐ์ดํ„ฐ์— ์ ์€ ์–‘์˜ ๋ ˆ์ด๋ธ”์ด ์กด์žฌํ•˜๋Š” ๋ฐ์ดํ„ฐ๋ฅผ ํฌํ•จ์‹œํ‚ค๋Š” ๊ฒฝ์šฐ, ์ •ํ™•๋„ ๊ฐœ์„ ์˜ ๊ฒฝํ–ฅ์„ฑ์„ ๋ณด์ธ๋‹ค๊ณ  ํ•จ

    ์ฐธ๊ณ : https://gooopy.tistory.com/122

  • labeled images๋กœ teacher model ํ•™์Šต
  • teacher model์„ ์ด์šฉํ•˜์—ฌ unlabeled images์— ๋Œ€ํ•œ pseudo labels ์ƒ์„ฑ
  • labeled images์™€ pseudo labeled images๋ฅผ ์ด์šฉํ•˜์—ฌ student model ํ•™์Šต
๋…ผ๋ฌธ ์š”์•ฝ: ๋ณธ์ธ๋ณด๋‹ค ๋˜‘๋˜‘ํ•œ(equal-or-larger student model) ํ•™์ƒ๋“ค(with noise)์—๊ฒŒ ๋ณธ์ธ๋„ ํ™•์‹คํ•˜์ง€ ์•Š์€ ์–ด๋ ค์šด ๊ณต๋ถ€(pseudo labeled images)์™€ ํ™•์‹คํ•œ ๊ณต๋ถ€(labeled images)๋ฅผ ์‹œํ‚จ๋‹ค. ํ•™์ƒ๋“ค์€ ๋จธ๋ฆฌ๋ฅผ ๋งž๋Œ€๋ฉฐ(ensemble ๊ฐ™์€ ํšจ๊ณผ) ์ฒญ์ถœ์–ด๋žŒ ํ•˜๊ณ , ์„ ์ƒ๋‹˜์ด ๋˜์–ด ์ƒˆ๋กœ์šด ํ•™์ƒ์„ ๊ฐ™์€ ๋ฐฉ์‹์œผ๋กœ ๊ฐ€๋ฅด์นœ๋‹ค.

2. Noisy Student Training

2-1) Method

  1. Labeled image์™€ unlabeled image ํ•„์š”
  1. labeled image๋กœ teacher model์„ ํ•™์Šต์‹œํ‚ด.
  • ์ด๋•Œ ์†์‹ค ํ•จ์ˆ˜๋Š” labeled image์˜ cross entropy loss๋ฅผ ์ตœ์†Œํ™”ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ์ง„ํ–‰.
  • ๋…ผ๋ฌธ์—์„œ๋Š” EfficientNet์„ ์‚ฌ์šฉ
  1. teacher model์„ ์‚ฌ์šฉํ•˜์—ฌ unlabeled image์— ๋Œ€ํ•œ pseudo label์„ ์ƒ์„ฑํ•จ. (noise ์ถ”๊ฐ€ x)
  • pseudo labels์€ softํ•˜๊ฑฐ๋‚˜ hardํ•จ.
  • softํ•œ label:: continuous distribution ํ•œ label.: “์ด ์‚ฌ์ง„์€ ์‚ฌ์ž์ผ ํ™•๋ฅ ์ด ๊ฐ€์žฅ ๋†’์€๋ฐ ๊ณ ์–‘์ด๋ž‘๋„ ๋‹ฎ์•˜๋„ค”์™€ ๊ฐ™์€ knowledge๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅ.
  • : softmax๋ฅผ ๊ฑฐ์ณ ๋‚˜์˜จ output
  • hardํ•œ label:: ex) one-hot vector
  1. pseudo labeled image์™€ labeled image๋กœ ๋…ธ์ด์ฆˆ๊ฐ€ ์ถ”๊ฐ€๋œ student model์„ ํ•™์Šต.
  • ์ด๋•Œ, student model์€ teacher model๋ณด๋‹ค ๊ฐ™๊ฑฐ๋‚˜ ํฐ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•จ.
  • โ€ป ์ฃผ์š” ๊ฐœ์„ ์  , knowledge Distillation๊ณผ ๋‹ค๋ฅธ ์ 
    • ๋…ธ์ด์ฆˆ๊ฐ€ ์ž์ฃผ ์‚ฌ์šฉ๋˜์ง€ ์•Š์•˜์Œ
    • teacher model๋ณด๋‹ค ๋” ๋น ๋ฅธ ์†๋„๋ฅผ ์œ„ํ•ด ์ž‘์€ student model์ด ์ž์ฃผ ์‚ฌ์šฉ๋์Œ

    โžก๏ธ๊ฒฐ๋ก : Knowledge Expansion์œผ๋กœ student์— ๋” ํฐ ์šฉ๋Ÿ‰์„ ์ œ๊ณตํ•˜๊ณ  ์–ด๋ ค์šด ํ™˜๊ฒฝ(noise) ์†์—์„œ ํ•™์Šต์„ ์‹œ์ผœ, ํ•™์ƒ์ด ์„ ์ƒ๋‹˜๋ณด๋‹ค ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋ฐœํœ˜ํ•  ์ˆ˜ ์žˆ๋Š” ๋ชจ๋ธ์ด๋ผ๊ณ  ์ด ๋…ผ๋ฌธ์—์„œ ๋งํ•จ

     


 

2-2) Noising Student

 
Noise์˜ ๋ฐฉ๋ฒ•: ์ด 3๊ฐ€์ง€ - input noise: RandAugment - model noise: dropout, stochastic depth
  • input noise
    • RandAugment (data augmentation): ์ž๋™ data augmentation
    • ์„ ์ƒ๋‹˜์€ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ฝ์–ด ๋“ค์—ฌ ๊ณ ํ’ˆ์งˆ์˜ ๊ฐ€์งœ ๋ ˆ์ด๋ธ”์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐ˜๋ฉด, ํ•™์ƒ์€ ์ฆ๊ฐ•๋œ ์ด๋ฏธ์ง€(noise๊ฐ€ ์žˆ๋Š”) ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋Ÿฌํ•œ ๋ ˆ์ด๋ธ”์„ ์žฌํ˜„ํ•ด์•ผ ๋” ํ•™์Šต์ด ์ž˜ ๋˜๊ธฐ ๋•Œ๋ฌธ์—, ํ•ด์ฃผ์—ˆ๋‹ค๊ณ  ํ•จ (pseudo labeled data๋ฅผ ์ข€ ๋” ํž˜๋“ค๊ฒŒ ํ•™์Šตํ•˜๋„๋ก ๊ฐ•์ œํ•˜๊ธฐ ์œ„ํ•จ)
    • RandAugment์„ ํ†ตํ•ด ๋ฐ”๋€ ์ด๋ฏธ์ง€๊ฐ€ ๊ธฐ์กด ์ด๋ฏธ์ง€์™€ ๊ฐ™์€ label์ธ ์‚ฌ์‹ค์„ student๊ฐ€ ์•Œ๊ฒŒ ๋จ.
    • ์ด๋ฅผ ํ†ตํ•ด ๋” ์–ด๋ ค์šด ์ด๋ฏธ์ง€๋„ ์˜ˆ์ธก์„ ์ž˜ ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ.
    • ์ฐธ๊ณ ) RandAugment
      • ํ™•๋ฅ ๊ณผ ๋ฌด์ž‘์œ„์„ฑ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ• ๋ฐฉ๋ฒ•

 

  • model noise
    • dropout: ํ™•๋ฅ ์ ์œผ๋กœ ํŠน์ • ๋‰ด๋Ÿฐ์„ ํ•™์Šต์— ์ฐธ์—ฌ x
    • stochastic depth: ํ•™์Šต์‹œ ๋ฌด์ž‘์œ„๋กœ layer ์ƒ๋žต (skip connection ์‚ฌ์šฉ) -> ์งง์€ network๋กœ ํ•™์Šต
    โžก๏ธ ๊ฐ•๋ ฅํ•œ ensemble ํšจ๊ณผ๋ฅผ ๋‚ธ๋‹ค๊ณ  ํ•จ

 

2-3) Other Techniques

์ถ”๊ฐ€ ์ ์šฉ ๊ธฐ๋ฒ•: 2.5๊ฐ€์ง€? - data filtering - balancing - pseudo labels
  • Data filtering
    • teacher model์ด ๋‚ฎ์€ confidence๋กœ ์˜ˆ์ธกํ•˜๋Š” ์ด๋ฏธ์ง€๋Š” ๋Œ€๋ถ€๋ถ„์ด out-of-domain images ์ด๊ธฐ ๋•Œ๋ฌธ์— ํ•„ํ„ฐ๋ง ์ง„ํ–‰
  • Balancing
    • ImageNet ๋ฐ์ดํ„ฐ๋ฅผ ๋ณด๋ฉด ๊ฐ class ๋งˆ๋‹ค ๋น„์Šทํ•œ ๊ฐœ์ˆ˜์˜ labeled images๊ฐ€ ์žˆ์ง€๋งŒ, unlabeled images์— ๋Œ€ํ•ด์„œ๋„ ๊ฐ class ๋งˆ๋‹ค ์ด๋ฏธ์ง€ ๊ฐœ์ˆ˜์˜ balance๋ฅผ ๋งž์ถฐ์•ผ ํ•จ.
    • ์ ์€ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜์˜ class ์ด๋ฏธ์ง€๋ฅผ duplicateํ•˜์—ฌ ๋Š˜๋ ธ๊ณ , ๋งค์šฐ ๋งŽ์€ ๋ฐ์ดํ„ฐ ๊ฐœ์ˆ˜์˜ class ์ด๋ฏธ์ง€ ์ค‘ ๋†’์€ confidence๋ฅผ ๋ณด์ด๋Š” ์ด๋ฏธ์ง€๋งŒ ์‚ฌ์šฉํ•จ.
  • pseudo labels
    • soft/hard labels ์ค‘ soft pseudo labels๋ฅผ ์‚ฌ์šฉํ–ˆ์„ ๋•Œ out of domain unlabeled data์— ๋Œ€ํ•ด ์„ฑ๋Šฅ์ด ๋” ์ข‹๋‹ค๊ณ  ํ•จ. ๋”ฐ๋ผ์„œ soft pseudo labels๋ฅผ ์‚ฌ์šฉํ•ด์„œ ์‹คํ—˜ ์ง„ํ–‰

 

3. Experiments

3.1. Experiment Details

 

โ€ป Unlabeled data: ๋ฐ์ดํ„ฐ์…‹์˜ ์ด๋ฏธ์ง€์—๋Š” ๋ ˆ์ด๋ธ”์ด ์žˆ์ง€๋งŒ, ๋ ˆ์ด๋ธ”์„ ๋ฌด์‹œํ•˜๊ณ  ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋กœ ์ทจ๊ธ‰

  • dataset
    • labeled dataset: ImageNet 2012 ILSVRC challenge
    • unlabeled dataset: JFT 300M (data filtering๊ณผ balancing ์ ์šฉ) * ImageNet์œผ๋กœ ํ•™์Šต๋œ EfficientNet-B0๋ฅผ ์ด์šฉ
      • confidence score > 0.3
      • class ๋งˆ๋‹ค 130K images filter
      • 130K images ๋ณด๋‹ค ์ ์„ ์‹œ duplicate randomly
  • Architecture
    • EfficientNets * EfficientNet-B7
      • EfficientNet-L2
  • Training
    • ํฐ batch size ์‚ฌ์šฉ
    • fixing the train-test resolution discrepancy * 350 epoch ๋™์•ˆ ์ž‘์€ resolution ์‚ฌ์šฉ
      • ๊ทธ ํ›„ 1.5 epoch ๋™์•ˆ unaugmented labeled images์— ๋Œ€ํ•ด ํฐ resolution์œผ๋กœ fine-tuning
      • fine-tuning์‹œ์— shallow layer freeze
  • Noise
    • stochastic depth: final layer์— 0.8์„ ๋‘๊ณ  ๋‹ค๋ฅธ layer์€ linear decay rule์„ ๋”ฐ๋ฅด๋„๋ก ํ•œ๋‹ค.
    • dropout: final classification layer์— 0.5
    • RandAugment: magnitude=27
  • Iterative training
    • 3 iterations
    • 1st teacher model: EfficientNet-B7
    • 1st student model: EfficientNet-L2
    • 2nd student model: EfficientNet-L2
    • 3rd student model: EfficientNet-L2
    • batch size์˜ ratio (unlabeled batch size : labeled batch size)๋ฅผ ํฌ๊ฒŒ ๋‘์—ˆ๋‹ค. * 1st student model -> 14:1
      • 2nd student model -> 14:1
      • 3rd student model -> 28:1

 

3.2. ImagetNet Results

  • Noisy Student(EfficientNet-L2)๊ฐ€ 88.4% top-1 accuracy๋กœ SOTA๋ฅผ ๊ฐฑ์‹ .
  • ๊ธฐ์กด EfficientNet-B7์— ๋น„ํ•ด 3.4% ์„ฑ๋Šฅ ๊ฐœ์„ ์ด ์ด๋ฃจ์–ด์ง
  • EfficientNet-L2์—์„œ Noisy Student ๋ฐฉ๋ฒ•์˜ ์ถ”๊ฐ€๋กœ 2.9% ์„ฑ๋Šฅ ๊ฐœ์„ 
  • Noisy Student๊ฐ€ ๊ธฐ์กด EfficientNet์— ๋น„ํ•ด ํšจ๊ณผ๊ฐ€ ์žˆ๋Š”์ง€๋ฅผ ์•Œ์•„๋ณด๊ธฐ ์œ„ํ•ด์„œ, iterative training์„ ์ง„ํ–‰ํ•˜์ง€ ์•Š๊ณ  ๋”ฑ ํ•œ ๋ฒˆ์˜ student ํ•™์Šต์„ ์ง„ํ–‰ํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ธ ๊ทธ๋ž˜ํ”„
  • teacher์™€ student๋Š” ๊ฐ™์€ ๋ชจ๋ธ๋กœ ํ•™์Šต์‹œํ‚ด
  • EfficientNet-B0 ๋ถ€ํ„ฐ EfficientNet-B7 ๊นŒ์ง€ ๋ชจ๋ธ์˜ ํฌ๊ธฐ๋ฅผ ๋‹ค์–‘ํ•˜๊ฒŒ ๋ณ€ํ™” ์‹œํ‚ด
  • ๊ณต์ •์„ฑ์„ ์œ„ํ•ด ๊ธฐ์กด baseline์˜ EfficientNet์— RandAugment๋ฅผ ์ ์šฉ ์‹œํ‚ด
  • ๋ชจ๋“  ๋ชจ๋ธ ์‚ฌ์ด์ฆˆ์— ์žˆ์–ด ์•ฝ 0.8%์˜ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ
  • iterative training ์—†์ด๋„ ์–ด๋Š์ •๋„ ์„ฑ๊ณผ๊ฐ€ ์žˆ๋Š” ๊ฒƒ์œผ๋กœ ํ™•์ธ๋จ

 

3.3. Robustness Results on ImageNet-A, ImageNet-C and ImageNet-P

โ€ป Robustnessํ•˜๋‹ค์˜ ์˜๋ฏธ

: ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์ด ์ž‘์€ ๋ณ€ํ™”์—๋„ ๋ฏผ๊ฐํ•˜์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ

 

  • Robustness ์ธก์ •์„ ์œ„ํ•ด ImageNet-A, ImageNet-C, ImageNet-P์„ ์ด์šฉ
  • ImageNet-A ๋ฐ์ดํ„ฐ์…‹์€ SOTA model๋“ค์ด ๊ณตํ†ต์ ์œผ๋กœ ์–ด๋ ค์›Œํ•˜๋Š” ์ด๋ฏธ์ง€๋“ค์„ ๋ชจ์€ ๋ฐ์ดํ„ฐ์…‹
  • ImageNet-C์™€ ImageNet-P ๋ฐ์ดํ„ฐ์…‹์€ blurring, fogging, rotation, ๊ทธ๋ฆฌ๊ณ  scaling ๋“ฑ๊ณผ ๊ฐ™์€ ์ด๋ฏธ์ง€์— ํ”ํžˆ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๋Š” Corruption๊ณผ perturbation์ด ์ ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹

โžก๏ธ ์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€๋“ค์€ ์–ด๋ ค์šด task(ImageNet-A)์ด๋ฉฐ, ํŠธ๋ ˆ์ด๋‹ ๋ฐ์ดํ„ฐ์™€ ๋‹ค๋ฅด๊ธฐ(ImageNet-C, ImageNet-P) ๋•Œ๋ฌธ์— robustness๋ฅผ ์ธก์ •ํ•˜๋Š”๋ฐ์— ์‚ฌ์šฉ๋๋‹ค๊ณ  ํ•จ

  • ImageNet-P์˜ ๊ฒฝ์šฐ mFR(mean flip rate)๋ฅผ resolution์— ๋”ฐ๋ผ 14.2์™€ 12.2๊นŒ์ง€ ๋‚ฎ์ถค. ์ €์ž๋Š” ์ด ๋…ผ๋ฌธ์ด robustness ํ–ฅ์ƒ์„ ์˜๋„ํ•œ๊ฒŒ ์•„๋‹ˆ์—ˆ์–ด์„œ ๊ฒฐ๊ณผ๊ฐ€ ๋†€๋ž๋‹ค๊ณ  ํ–ˆ์Œ

 


 

3.4. Adversarial Robustness Results

โ€ป Adversarial Robustness๋ž€ : ์‹ ๊ฒฝ๋ง ๋ชจ๋ธ์„ ๊ณต๊ฒฉํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ๋ฒ•์œผ๋กœ, ๋ชจ๋ธ์˜ ์†์‹ค ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์— ๋ฏธ์„ธํ•œ ์กฐ์ •์„ ๊ฐ€ํ•ด ๋ชจ๋ธ์„ ์˜ค์ž‘๋™์‹œํ‚ค๋Š” ๊ฒƒ

  • Adversarial attack์˜ ์ผ์ข…์ธ FGSM(Fast Gradient Sign Method)์„ ์ ์šฉ.
  • ์ด ์‹คํ—˜์—์„œ๋„ ์ €์ž๋Š” ์ด ๋ชจ๋ธ์ด Adversarial Robustness๋ฅผ ๊ณ ๋ คํ•˜์ง€ ์•Š๊ณ  ๋งŒ๋“ค์—ˆ์ง€๋งŒ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๋Š” ๊ฒƒ์— ๋†€๋ผ์›Œํ•จ.
  • ๊ทธ๋ž˜ํ”„๋ฅผ ๋ณด๋ฉด epsilon์ด ์ฆ๊ฐ€ํ•  ์ˆ˜๋ก ๋”์šฑ ํฐ ์ฐจ์ด์˜ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ํ•˜๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์Œ.

 

4. Ablation Study

 

4.1. The Importance of Noise in Self-training

  • Unlabeled Data๊ฐ€ ํด์ˆ˜๋ก ์ข‹๋‹ค
  • Noise๊ฐ€ ์žˆ์„์ˆ˜๋ก ์ข‹๋‹ค
  • teacher Model์ด Unlabeled Data๋ฅผ ์ถ”๋ก ํ•  ๋•Œ๋Š” Noise๋ฅผ ์ถ”๊ฐ€ํ•˜์ง€ ์•Š๋Š”๊ฒŒ ๋” ์ข‹๋‹ค

 

4.2. A Study of Interative Training

โ€ป 14:1์€ ๋ฐฐ์น˜ ํฌ๊ธฐ๊ฐ€ ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํฌ๊ธฐ์˜ 14%๋ผ๋Š” ๊ฒƒ์„ ์˜๋ฏธ.  ex) ๋ฐ์ดํ„ฐ ์„ธํŠธ์˜ ํฌ๊ธฐ๊ฐ€ 10000๊ฐœ๋ผ๋ฉด ๋ฐฐ์น˜ ํฌ๊ธฐ๋Š” 140๊ฐœ๋ฅผ ์˜๋ฏธํ•จ

  • ๋ฐ˜๋ณตํ• ์ˆ˜๋ก ์„ฑ๋Šฅ์ด ๋” ์ข‹๋‹ค

 

4.3. Additional Ablation Study Summarization

  • ๊ฒฐ๊ณผ #1: ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง„ ํฐ ์„ ์ƒ๋‹˜ ๋ชจ๋ธ ์‚ฌ์šฉ์€ ๋” ์ข‹์€ ๊ฒฐ๊ณผ๋ฅผ ์ด๋Œ์–ด๋ƒ…๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #2: ๋Œ€๋Ÿ‰์˜ ๋ฏธ๋ถ„๋ฅ˜ ๋ฐ์ดํ„ฐ๋Š” ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์œ„ํ•ด ํ•„์ˆ˜์ ์ž…๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #3: ์ผ๋ถ€ ๊ฒฝ์šฐ์—๋Š” ์†Œํ”„ํŠธ ๋ฏธ๋ถ„๋ฅ˜๊ฐ€ ํ•˜๋“œ ๋ฏธ๋ถ„๋ฅ˜๋ณด๋‹ค ์™ธ๋ถ€ ๋„๋ฉ”์ธ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•ด ๋” ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #4: ํฐ ํ•™์ƒ ๋ชจ๋ธ์€ ํ•™์ƒ์ด ๋” ๊ฐ•๋ ฅํ•œ ๋ชจ๋ธ์„ ๋ฐฐ์šธ ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ฐ ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #5: ๋ฐ์ดํ„ฐ ๊ท ํ˜•์€ ์ž‘์€ ๋ชจ๋ธ์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #6: ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ์™€ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์˜ ๊ณต๋™ ํ›ˆ๋ จ์€ ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ ํ›ˆ๋ จํ•œ ๋‹ค์Œ ๋ ˆ์ด๋ธ”๋œ ๋ฐ์ดํ„ฐ๋กœ ์„ธ๋ถ€ ์กฐ์ •ํ•˜๋Š” ํŒŒ์ดํ”„๋ผ์ธ๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #7: ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ์™€ ๋ ˆ์ด๋ธ”๋œ ๋ฐฐ์น˜ ํฌ๊ธฐ ์‚ฌ์ด์˜ ํฐ ๋น„์œจ ์‚ฌ์šฉ์€ ๋ชจ๋ธ์ด ๋ฏธ๋ถ„๋ฅ˜๋œ ๋ฐ์ดํ„ฐ์—์„œ ๋” ๊ธด ์‹œ๊ฐ„ ๋™์•ˆ ํ›ˆ๋ จํ•˜์—ฌ ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
  • ๊ฒฐ๊ณผ #8: ํ•™์ƒ์„ ์ฒ˜์Œ๋ถ€ํ„ฐ ํ›ˆ๋ จ์‹œํ‚ค๋Š” ๊ฒƒ์ด ๋•Œ๋กœ๋Š” ์„ ์ƒ๋‹˜์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ํ•™์ƒ๋ณด๋‹ค ๋” ๋‚˜์„ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์„ ์ƒ๋‹˜์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ ํ•™์ƒ๋„ ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚ด๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋งŽ์€ ์ˆ˜์˜ ํ›ˆ๋ จ epoch๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

5. Related Works

โ€ป Self-training๋ž€ (์ค€์ง€๋„ํ•™์Šต ๋ฐฉ๋ฒ• ์ค‘ ํ•˜๋‚˜)

: labeled data๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šตํ•œ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๊ฒฐ๊ณผ(confidentํ•œ ๊ฒฐ๊ณผ)๋ฅผ unlabeled data์˜ Pseudo-Label๋กœ ๊ฐ€์ •ํ•ด์„œ unlabeled data๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•

๐Ÿ€
๊ทธ๋ ‡๋‹ค๋ฉด self training๊ณผ Semi-supervised Learning์˜ ์ฐจ์ด๋Š” ๋ญ˜๊นŒ
  • Self-training: ๋ ˆ์ด๋ธ”๋ง ๋˜์–ด์žˆ์ง€ ์•Š์€ ๋ฐ์ดํ„ฐ๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ ์Šค์Šค๋กœ ๋ ˆ์ด๋ธ”๋ง ๋˜๊ฒŒ ํ•˜๋Š” ๊ฒƒ
  • Semi-supervised Learning(์ค€์ง€๋„ํ•™์Šต): ๋ ˆ์ด๋ธ”๋ง ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์„ ๋•Œ, ๋ ˆ์ด๋ธ”์ด ์—†๋Š” ๋ฐ์ดํ„ฐ๋„ ํ•จ๊ป˜ ์ด์šฉํ•˜์—ฌ ๋ชจ๋ธ์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•

โžก๏ธ ๊ฒฐ๋ก ์ ์œผ๋กœ๋Š” ๋‘˜ ๋‹ค labeling ๋œ ๋ฐ์ดํ„ฐ๊ฐ€ ๋ถ€์กฑํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๊ณ ์•ˆ๋œ ๋ฐฉ๋ฒ•

โ€ป Knowledge Distillation๋ž€(์ง€์‹ ์ฆ๋ฅ˜)

: ํฐ ๋„คํŠธ์›Œํฌ(Teacher network) ์˜ ์ง€์‹์„ ์‹ค์ œ๋กœ ์‚ฌ์šฉํ•˜๊ณ ์ž ํ•˜๋Š” ์ž‘์€ ๋„คํŠธ์›Œํฌ(Student network) ์—๊ฒŒ ์ „๋‹ฌํ•˜๋Š” ๊ฒƒ

: ์ž‘์€ ๋„คํŠธ์›Œํฌ๋„ ํฐ ๋„คํŠธ์›Œํฌ์™€ ๋น„์Šทํ•œ ์„ฑ๋Šฅ์„ ๋‚ผ ์ˆ˜ ์žˆ๋„๋ก, ํ•™์Šต ๊ณผ์ •์—์„œ ํฐ ๋„คํŠธ์›Œํฌ์˜ ์ง€์‹์„ ์ž‘์€ ๋„คํŠธ์›Œํฌ์—๊ฒŒ ์ „๋‹ฌํ•˜์—ฌ ์ž‘์€ ๋„คํŠธ์›Œํฌ์˜ ์„ฑ๋Šฅ์„ ๋†’์ด๊ฒ ๋‹ค๋Š” ๋ชฉ์ ์„ ์ง€๋‹˜

6. Conclusion

  • ์šฐ๋ฆฌ์˜ ์—ฐ๊ตฌ์—์„œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋Š” Noisy Student Training์ด ์ปดํ“จํ„ฐ ๋น„์ „ ๋ชจ๋ธ์˜ ๊ฒฌ๊ณ ์„ฑ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค๋Š” ๊ฒƒ์„ ๋ณด์—ฌ์ค€ ๊ฒƒ์ด๋ผ๊ณ …
  • ๋!

 

 

 

7. Reference

https://light-tree.tistory.com/196

https://byeongjokim.github.io/posts/Self-training-with-Noisy-Student-improves-ImageNet-classification/

https://www.youtube.com/watch?v=l0jdNn5AGmo&t=900s

https://www.youtube.com/watch?v=q7PjrmGNx5A&t=593s

https://velog.io/@dust_potato/SSL-Paper-ReviewSelf-training-1.-Semi-supervised-Learningby-Entropy-Minimization-NIPS-2004

 

728x90
๋ฐ˜์‘ํ˜•