๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๋Œ€์™ธํ™œ๋™/2023 LG Aimers 3๊ธฐ

Module 7. ๋”ฅ๋Ÿฌ๋‹ (Deep Learning) (KAIST ์ฃผ์žฌ๊ฑธ ๊ต์ˆ˜)

by ์ œ๋ฃฝ 2023. 7. 15.
728x90
๋ฐ˜์‘ํ˜•

๋‚ ์งœ: 2023๋…„ 7์›” 15์ผ

Part 1. Introduction to Deep Neural Networks

1. Deep Learning

: ์‹ ๊ฒฝ์„ธํฌ๋“ค์ด ๋ง์„ ์ด๋ฃจ์–ด์„œ ์ •๋ณด๋ฅผ ๊ตํ™˜ํ•˜๊ณ  ์ฒ˜๋ฆฌํ•˜๋Š” ๊ณผ์ •์„ ๋ณธ๋”ฐ์„œ ๋งŒ๋“  ๋ฐฉ์‹์„ ์˜๋ฏธํ•จ

2. ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์˜ ๊ธฐ๋ณธ ๋™์ž‘ ๊ณผ์ •

- Big Data์˜ ํ•„์š”

- GPU Acceleration

- Algorithm Improvements

 

3. Perceptron

- ํผ์…‰ํŠธ๋ก ์€ ์ƒ๋ฌผํ•™์ ์ธ ์‹ ๊ฒฝ๊ณ„(Neual Network)์˜ ๊ธฐ๋ณธ ๋‹จ์œ„์ธ ์‹ ๊ฒฝ์„ธํฌ(=๋‰ด๋Ÿฐ)์˜ ๋™์ž‘ ๊ณผ์ •์„ ํ†ต๊ณ„ํ•™์ ์œผ๋กœ ๋ชจ๋ธ๋งํ•œ ์•Œ๊ณ ๋ฆฌ์ฆ˜

4. Forward Propagation

- ํ–‰๋ ฌ ๊ณฑ์„ ํ†ตํ•ด sigmoid function๊ณผ ๊ฐ™์€ actiavtion function์„ ์ง€๋‚˜๋ฉด ๊ฒฐ๊ณผ ๊ฐ’์ด ๋‚˜์˜ด

 

 

5. MSE

- ์—๋Ÿฌ ์ œ๊ณฑ์˜ ํ‰๊ท 

- ์ด์ƒ์น˜์— ๋ฏผ๊ฐ

 

6. Softmax ํ•จ์ˆ˜

 

Part 2. Training Neural Networks

1. Gradient descent

- Neural network์—์„œ ์ €ํฌ๊ฐ€ ์ตœ์ ํ™” ํ•˜๊ณ ์ž ํ•˜๋Š” parameter์ด ์กด์žฌ

- parameter๊ณผ ํ•™์Šต data๋ฅผ ํ•ด๋‹น parameter๋“ค๋กœ ์ด๋ฃจ์–ด์ง„ neural network์— ์ž…๋ ฅ์œผ๋กœ ๋„ฃ์–ด ground truth ๊ฐ’๊ณผ ๋น„๊ตํ•จ์œผ๋กœ์จ ์ฐจ์ด๋ฅผ ์ตœ์†Œํ™”ํ•˜๋„๋ก ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์ฐพ๋Š” ๊ฒƒ

- loss function์— ๋Œ€ํ•ด ๊ฐ๊ฐ ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์— ๋Œ€ํ•œ ๋ฏธ๋ถ„๊ฐ’์„ ๊ตฌํ•จ 

- ๋ฏธ๋ถ„ ๊ฐ’์„ ์‚ฌ์šฉํ•ด์„œ ํ˜„์žฌ ์ฃผ์–ด์ง„ parameter ๊ฐ’์„ ๊ฐ€์ง€๊ณ  ๋ฏธ๋ถ„ ๋ฐฉํ–ฅ์˜ ๋งˆ์ด๋„ˆ์Šค ๋ฐฉํ–ฅ์œผ๋กœ ํŠน์ • step size ํ˜น์€ learning rate๋ฅผ ๊ณฑํ•ด์„œ ํ•ด๋‹น ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์„ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ์‹์œผ๋กœ ํ•™์Šต์„ ์ง„ํ–‰

- w๋ผ๋Š” ์ตœ์ ํ™”ํ•˜๊ณ ์ž ํ•˜๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๊ฐ€ ํ•จ์ˆ˜์— ์ž…๋ ฅ๋˜๊ณ , loss๊ฐ€ ๊ฒฐ๊ตญ ์ž‘์•„์ง€๋Š” ์ชฝ์œผ๋กœ ์ด๋™ํ•˜๊ฒŒ ๋จ

- 1์ฐจ์›์˜ ํ•จ์ˆ˜๋ฅผ ์ €๋ ‡๊ฒŒ ๋“ฑ๊ณ ์„ ์œผ๋กœ ํ‘œ์‹œํ•˜๋ฉด ์ €๋ ‡๊ฒŒ ํ‘œํ˜„๋จ

- ์ง€๊ทธ์žฌ๊ทธ ํ˜„์ƒ์€ ์ผ๋ถ€ ์†์‹ค ํ•จ์ˆ˜์—์„œ ๊ฒฝ์‚ฌํ•˜๊ฐ•๋ฒ•์ด ์ˆ˜๋ ดํ•˜์ง€ ์•Š๊ณ , ํŒŒ๋ผ๋ฏธํ„ฐ ๊ฐ’์ด ๋งค๋ฒˆ ํŠ•๊ฒจ ๋‚˜๊ฐ€๋Š” ํ˜„์ƒ์„ ๋งํ•œ๋‹ค๊ณ .

- ์„ธํƒ€1๊ณผ, ์„ธํƒ€2๋Š” neural network์˜ ๊ฐ€์ค‘์น˜ ๊ฐ’๋“ค์ž„ 

- ์„ธํƒ€ 1์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ๊ทธ๋ ธ์„ ๋•Œ์—๋Š”, ์™„๋งŒํ•˜๊ฒŒ ์ค„์–ด๋“ค์ง€๋งŒ ์„ธํƒ€2์— ๋Œ€ํ•œ ํ•จ์ˆ˜๋ฅผ ๊ทธ๋ฆฌ๊ฒŒ ๋˜๋ฉด ๊ธ‰๊ฒฉํ•˜๊ฒŒ ํ•จ์ˆ˜๊ฐ€ ์ค„์–ด๋“ค๊ฒŒ๋จ 

- ๊ทธ๋ž˜์„œ ์ง€๊ทธ์žฌ๊ทธ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์ž„

- ๋งŒ์•ฝ ์ง€๊ทธ์žฌ๊ทธ ํ˜„์ƒ์ด ์ ๋‹ค๋ฉด, ๋‘ ๊ฐœ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์— ๋Œ€ํ•œ ํ•จ์ˆ˜์˜ ๊ธฐ์šธ๊ธฐ ๋ณ€๋™์ด ์ƒ๋Œ€์ ์œผ๋กœ ์ผ์ •ํ•˜๊ฑฐ๋‚˜ ๋ถ€๋“œ๋Ÿฝ๊ฒŒ ๋ณ€ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ. ์ด ๊ฒฝ์šฐ, ๋‘ ํ•จ์ˆ˜๋Š” ๋น„์Šทํ•œ ํ˜•ํƒœ๋ฅผ ๊ฐ€์ง€๋ฉฐ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅธ ์†์‹ค ํ•จ์ˆ˜ ๊ฐ’์˜ ๋ณ€ํ™”๋„ ๋น„์Šทํ•˜๊ฒŒ ๋  ์ˆ˜ ์žˆ์Œ

 

 

2. ์—ญ์ „ํŒŒ

- ์ƒ๋žต

 

3. Sigmoid Activation

- sigmoid activation

: ์•ž์ชฝ์— ์žˆ๋Š” ํŒŒ๋ผ๋ฏธํ„ฐ๋“ค์˜ ๊ธฐ์šธ๊ธฐ ๊ฐ’์ด ์ž‘์Œ์œผ๋กœ ์ธํ•ด ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์—…๋ฐ์ดํŠธ๊ฐ€ ๊ฑฐ์˜ ์ผ์–ด๋‚˜์ง€ ์•Š๊ฒŒ ๋จ -> ํ•™์Šต์ด ๋Š๋ ค์ง

 

- Tanh activation

: -1 ~ 1์‚ฌ์ด

: gradient๊ฐ’์ด 0์—์„œ 1/2๊นŒ์ง€ ๋ฐ–์— ๋˜์ง€ ์•Š์Œ -> ๊ฒฐ๊ตญ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ๋ฐœ์ƒ

 

- ReLU activation

: max(0,x)

: 0๋ณด๋‹ค ์ž‘์€ ๊ฒฝ์šฐ 0์œผ๋กœ, 0๋ณด๋‹ค ํฐ ๊ฒฝ์šฐ ๊ฐ’ ๊ทธ๋Œ€๋กœ

 

4. Batch Normalization

batch normalization์„ ํ•˜๋Š” ์ด์œ 

 

Batch Normalization์€ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๊ธฐ๋ฒ• ์ค‘ ํ•˜๋‚˜๋กœ, ๊ฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”(normalize)ํ•˜์—ฌ ํ•™์Šต์„ ์•ˆ์ •ํ™”์‹œํ‚ค๋Š” ์—ญํ• ์„ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์Šต ๊ณผ์ •์„ ๋” ๋น ๋ฅด๊ณ  ์•ˆ์ •์ ์œผ๋กœ ๋งŒ๋“ค๊ณ , ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
Batch Normalization์„ ์‚ฌ์šฉํ•˜๋Š” ์ฃผ์š”ํ•œ ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์Šต๋‹ˆ๋‹ค:

1. ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค ๋˜๋Š” ํญ์ฃผ ๋ฌธ์ œ ์™„ํ™”: ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ์—์„œ๋Š” ์—ฌ๋Ÿฌ ์ธต์„ ๊ฑฐ์น˜๋ฉด์„œ ๊ทธ๋ž˜๋””์–ธํŠธ ๊ฐ’์ด ์ง€๋‚˜์น˜๊ฒŒ ์ž‘์•„์ง€๊ฑฐ๋‚˜ ํฌ๊ฒŒ ์ฆ๊ฐ€ํ•˜๋Š” ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Batch Normalization์€ ๊ฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๋ฐ์ดํ„ฐ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ์„ ์ •๊ทœํ™”ํ•˜์—ฌ ๊ทธ๋ž˜๋””์–ธํŠธ์˜ ํฌ๊ธฐ๋ฅผ ์กฐ์ ˆํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ๊ทธ๋ž˜๋””์–ธํŠธ ์†Œ์‹ค๊ณผ ํญ์ฃผ ๋ฌธ์ œ๋ฅผ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

2. ํ•™์Šต ์†๋„ ํ–ฅ์ƒ: Batch Normalization์€ ๊ฐ ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ํ•™์Šต ๊ณผ์ •์—์„œ ๋” ์•ˆ์ •์ ์ธ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ์ด๋กœ ์ธํ•ด ํ•™์Šต ์†๋„๊ฐ€ ํ–ฅ์ƒ๋˜๊ณ , ์ˆ˜๋ ดํ•˜๋Š”๋ฐ ๊ฑธ๋ฆฌ๋Š” ์‹œ๊ฐ„์ด ์ค„์–ด๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

3. ์ดˆ๊ธฐํ™”์— ๋œ ๋ฏผ๊ฐ: Batch Normalization์€ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ดˆ๊ธฐ ๊ฐ€์ค‘์น˜ ์„ค์ •์— ๋Œ€ํ•œ ์˜ํ–ฅ์„ ์ค„์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๋ชจ๋ธ์˜ ํ•™์Šต์„ ์•ˆ์ •ํ™”์‹œํ‚ค๊ณ  ์ดˆ๊ธฐํ™”์— ๋Œ€ํ•œ ์˜์กด์„ฑ์„ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค.

4. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ํ–ฅ์ƒ: Batch Normalization์€ ๊ฐ ์ธต๋งˆ๋‹ค ์ž…๋ ฅ ๋ฐ์ดํ„ฐ์˜ ๋ถ„ํฌ๋ฅผ ์ •๊ทœํ™”ํ•˜๋ฏ€๋กœ, ๋ชจ๋ธ์ด ๋” ์ผ๋ฐ˜ํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด๋Š” ๊ณผ์ ํ•ฉ์„ ๋ฐฉ์ง€ํ•˜๊ณ , ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Batch Normalization์€ ์ผ๋ฐ˜์ ์œผ๋กœ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, ํ•™์Šต ๊ณผ์ •์„ ์•ˆ์ •ํ™”์‹œํ‚ค๊ณ  ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐ์— ํฐ ๋„์›€์„ ์ค๋‹ˆ๋‹ค.

 

 

- mini batch๊ฐ€ 10์ผ ๊ฒฝ์šฐ, tanh ๋…ธ๋“œ๋ฅผ ํ†ต๊ณผํ•˜๊ธฐ ์ง์ „์— ์ž…๋ ฅ์œผ๋กœ ๋ฐœ์ƒ๋˜๋Š” ๊ฐ’์ด 10๊ฐœ์˜ mini batch ๋‚ด์— data item๋งˆ๋‹ค ํ•˜๋‚˜์”ฉ ๋‹ค ์กด์žฌ

- ๋ฐœ์ƒ๋˜๋Š” 10๊ฐœ์˜ tanh ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ๊ฐ’๋“ค์„ ๋ชจ์•„์„œ ํ‰๊ท  ๋ถ„์‚ฐ์„ ๊ตฌํ•  ๊ฒƒ์ž„

-> ํ‰๊ท 0, ๋ถ„์‚ฐ1์ด ๋˜๋„๋ก ํ•˜๋Š” ์ •๊ทœํ™” ๊ณผ์ •์„ ์ˆ˜ํ–‰ํ•˜๊ฒŒ ๋˜๋ฉด tanh ์ž…๋ ฅ์œผ๋กœ ์ฃผ์–ด์ง€๋Š” ๋Œ€๋žต์ ์ธ ๋ฒ”์œ„๋ฅผ 0์„ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋Š” ๋ถ„ํฌ๋ฅผ ๋งŒ๋“ค ์ˆ˜ ์žˆ๊ฒŒ ๋จ 

- y=ax+b => ํ•ด๋‹น data๊ฐ€ ๊ฐ€์ง€๋Š” ๊ณ ์œ ์˜ ํ‰๊ท ๊ณผ ๋ถ„์‚ฐ ๋˜ํ•œ ๋ณต์›ํ•ด๋‚ผ ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์ด ๋ถ€์—ฌ๋จ 

: ์—ด๊ฐœ ๊ฐ’์˜ ํ‰๊ท ๊ฐ’์„ ์—ฌ๊ธฐ์„œ ๊ฒฐ์ •๋œ b๋ผ๋Š” ๊ฐ’์œผ๋กœ ํ‰๊ท ๊ฐ’์ด ๋ฐ”๋€Œ๊ฒŒ ๋จ 

: ๋ถ„์‚ฐ์˜ ๊ฒฝ์šฐ, a์ œ๊ณฑ์— ํ•ด๋‹นํ•˜๋Š” ๋ถ„์‚ฐ๊ฐ’์„ ๊ฐ€์ง€๊ฒŒ ๋จ

- ์ดํ›„, ์‹ ํ•˜๋‚˜๋ฅผ ๋” ์ƒ์„ฑํ•จ์œผ๋กœ์จ, ์ตœ์ ์˜ ํ‰๊ท  ๋ถ„์‚ฐ ๊ฐ’์„ ์Šค์Šค๋กœ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•จ

Part 3. Convolutional Neural Networksand Image Classification

- ConvNet์ด ๋‚˜์˜จ ์ดํ›„๋กœ ๋ถ€ํ„ฐ ๋งŽ์ด ๋ฐœ์ „ํ•˜๊ฒŒ ๋จ

 

- ํŠน์ • class์— ์กด์žฌํ•  ์ˆ˜ ์žˆ๋Š” ์ž‘์€ ํŠน์ • ํŒจํ„ด๋“ค์„ ์ •์˜ํ•˜๊ณ , ํŒจํ„ด๋“ค์ด ์ฃผ์–ด์ง„ ์ด๋ฏธ์ง€ ์ƒ์— ์žˆ๋Š”์ง€๋ฅผ ํŒ๋‹จ

 

 - ๊ฐ layer์—์„œ ์ž…๋ ฅ ๋…ธ๋“œ๊ฐ€ ์ถœ๋ ฅ ๋…ธ๋“œ ๋ชจ๋‘์™€ ํŠน์ •ํ•œ ๊ฐ€์ค‘์น˜์˜ ํ˜•ํƒœ๋กœ ์—ฐ๊ฒฐ์ด ๋œ network๋ฅผ fully-connected layer ํ˜น์€ fully connected neural network๋ผ๊ณ  ๋ถ€๋ฆ„

 

- CNN: Computer vision์—์„œ ์‚ฌ์šฉ (์ด๋ฏธ์ง€ ๋“ฑ)

 

- RNN: ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ์— ์ ํ•ฉํ•œ ๋ชจ๋ธ

1. Basic Idea of ConvNets

- ๊ณ ์–‘์ด๊ฐ€ ์ด๋Ÿฐ ์ž์„ธ๋ฅผ ์ทจํ•˜๊ณ  ์ด๋Ÿฐ ์ƒ‰๊น”์„ ๊ฐ€์ง€๊ณ , ์ด๋Ÿฐ ๋ฐฐ๊ฒฝ์ด ์žˆ์—ˆ์„ ๋•Œ์— ๋ชจ๋“  ์š”์†Œ๋“ค์„ ์ผ์ผ์ด ๊ทœ์ •ํ•ด์„œ ๊ณ ์–‘์ด๋ผ๋Š” ํŠน์„ฑ์„ ์ •์˜ํ•˜๋ ค๊ณ  ํ•˜๊ธฐ ๋ณด๋‹ค๋Š” ๊ณ ์–‘์ด์—์„œ ์ผ๊ด€๋˜๊ฒŒ ๋‚˜ํƒ€๋‚˜๋Š” ๋ถ€๋ถ„, ๋ถ€๋ถ„ ๋ณ„๋กœ์˜ ํŠน์ง•๋“ค์„ bottom up ๋ฐฉ์‹์œผ๋กœ ์ž˜ ๊ฒ€์ถœํ•ด๋‚ด๊ณ , ์ •๋ณด๋“ค์„ ์กฐํ•ฉํ•ด์„œ ์ตœ์ข…์ ์œผ๋กœ ๊ณ ์–‘์ด๋‹ค ๋ผ๊ณ  ์ผ๊ด€๋˜๊ฒŒ ์ธ์‹ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ Convnet์ด ๋™์ž‘ํ•จ

 

2. Conv layer

- ๋‹ค์Œ layer๋กœ ์ ์šฉ๋˜๋Š” conv filter๋Š” ์—ฌ๊ธฐ์— ์ฃผ์–ด์ง„ ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ ์ฑ„๋„ ์ˆ˜์™€ ๊ฐ™์€ ์ฑ„๋„ ์ˆ˜๋ฅผ ๊ฐ€์ ธ์•ผ ํ•จ

- ์ตœ์ข…์ ์œผ๋กœ ํ•œ ์žฅ์˜ output activation map์„ ์–ป๊ฒŒ ๋จ 

 

3. Pooling

- ์ •ํ•ด์ง„ ํฌ๊ธฐ ์•ˆ์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’๋งŒ ๋ฝ‘์•„๋ƒ„

4. ReLU

- ์„ ํ˜•๊ฒฐํ•ฉ ํ•œ ํ›„, activation function์„ ์ ์šฉํ•ด์คŒ 

-  ์ด์ „ conv layer์—์„œ ๋‚˜์˜จ ๊ฐ’์— ์ ์šฉํ•˜๋ฉด ์Œ์ˆ˜๋Š” 0, ์–‘์ˆ˜๋Š” ์–‘์ˆ˜ ๊ทธ๋ž˜๋„ ์ ์šฉํ•จ

5. Various CNN Architectures

- AlexNet

- VGGNet

- GoogLeNet

- ResNet

 

Part 4. Seq2Seq with Attention for Natural Language Understanding and Generation

1. RNN

- sequence data์— ์œ ์šฉ

- hidden state๋ฅผ ๋ฐ›์•„์„œ ํ˜„์žฌ time step์— RNN module์˜ output์ธ ht ํ˜น์€ current hidden state vector์„ ๋งŒ๋“ค์–ด์ฃผ๊ฒŒ ๋จ

 

2. RNN์˜ ๋‹ค์–‘ํ•œ ํ˜•ํƒœ

- one to one

- one to many: image captioning(์ด๋ฏธ์ง€ ๋„ฃ์œผ๋ฉด ๋‹จ์–ด๋กœ ์ถœ๋ ฅ)

- many to one (๋ฌธ์žฅ ๋ถ„๋ฅ˜)

- many to many(๊ธฐ๊ณ„ ๋ฒˆ์—ญ)

3. LSTM

- cell state๊ฐ€ ์ƒˆ๋กœ ์ƒ๊น€ 

4. seq2seq

- Seq2Seq๋Š” "Sequence-to-Sequence"์˜ ์•ฝ์–ด๋กœ, ์ž…๋ ฅ ์‹œํ€€์Šค์™€ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ๋ชจ๋‘ ์ฒ˜๋ฆฌํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ตฌ์กฐ.

์ฃผ๋กœ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ(Natural Language Processing)์—์„œ ์‚ฌ์šฉ๋˜๋ฉฐ, ๋ฒˆ์—ญ, ์š”์•ฝ, ์ฑ—๋ด‡ ๋“ฑ ๋‹ค์–‘ํ•œ ์ž‘์—…์— ์ ์šฉ.

- Seq2Seq ๋ชจ๋ธ์€ ๋‘ ๊ฐœ์˜ ์ฃผ์š” ๊ตฌ์„ฑ ์š”์†Œ๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ.

์ฒซ ๋ฒˆ์งธ๋Š” ์ธ์ฝ”๋”(Encoder)๋กœ ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ๊ณ ์ •๋œ ๊ธธ์ด์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์œผ๋กœ ์••์ถ•ํ•จ. ์ด ๋ฒกํ„ฐ๋Š” ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ์˜๋ฏธ๋ฅผ ๋‹ด๊ณ  ์žˆ์Œ

๋‘ ๋ฒˆ์งธ๋Š” ๋””์ฝ”๋”(Decoder)๋กœ ์ธ์ฝ”๋”์˜ ๋ฒกํ„ฐ ํ‘œํ˜„์„ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•จ. ๋””์ฝ”๋”๋Š” ์‹œ์ž‘ ํ† ํฐ์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ์ˆœ์ฐจ์ ์œผ๋กœ ๋‹ค์Œ ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋ฉด์„œ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑ.

- Seq2Seq ๋ชจ๋ธ์€ ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ ์‹œํ€€์Šค์˜ ๊ธธ์ด๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ์ž…๋ ฅ๊ณผ ์ถœ๋ ฅ์˜ ์‹œํ€€์Šค ๊ธธ์ด์— ์ƒ๊ด€์—†์ด ์œ ์—ฐํ•˜๊ฒŒ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ. ์ด๋ฅผ ํ†ตํ•ด ๋ฒˆ์—ญ์ด๋‚˜ ์ฑ—๋ด‡๊ณผ ๊ฐ™์€ ์ž‘์—…์—์„œ ๋ฌธ์žฅ ๊ธธ์ด์˜ ๋ณ€ํ™”์— ๋Œ€์‘ํ•  ์ˆ˜ ์žˆ์Œ.

 - Seq2Seq ๋ชจ๋ธ์€ ์žฅ๊ธฐ ์˜์กด์„ฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด LSTM(Long Short-Term Memory)๊ณผ ๊ฐ™์€ ์ˆœํ™˜ ์‹ ๊ฒฝ๋ง(RNN) ์•„ํ‚คํ…์ฒ˜๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Œ. LSTM์€ ์ž…๋ ฅ ์‹œํ€€์Šค์˜ ์žฅ๊ธฐ ์˜์กด์„ฑ์„ ์บก์ฒ˜ํ•˜์—ฌ ๋ฒˆ์—ญ์ด๋‚˜ ๋‹ค๋ฅธ ์‹œํ€€์Šค ์ƒ์„ฑ ์ž‘์—…์—์„œ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์คŒ.

- ์š”์•ฝํ•˜์ž๋ฉด, Seq2Seq๋Š” ์ž…๋ ฅ ์‹œํ€€์Šค๋ฅผ ์••์ถ•ํ•˜๋Š” ์ธ์ฝ”๋”์™€ ์••์ถ•๋œ ์ •๋ณด๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ถœ๋ ฅ ์‹œํ€€์Šค๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๋””์ฝ”๋”๋กœ ๊ตฌ์„ฑ๋œ ๋”ฅ๋Ÿฌ๋‹ ๋ชจ๋ธ ๊ตฌ์กฐ์ž„.

5. Attention

- ์ƒ๋žต 

Part 5. Transformer

- Transformer

- ์ƒ๋žต

 

Part 6. Self-Supervised Learning andLarge-Scale Pre-Trained Models

- Bert

- GPT

-> ์ƒ๋žต

  • One-shot, Few-shot learning์€ ๋ชจ๋ธ์„ ํ•™์Šตํ•˜๋Š”๋ฐ ์‚ฌ์šฉ๋˜๋Š” ๋ฐ์ดํ„ฐ์˜ ์–‘์„ ๋‚˜ํƒ€๋ƒ„ -> ์ด๋ฏธ์ง€ ์ธ์‹, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ, ๊ฒŒ์ž„ ๋“ฑ์˜ ๋ถ„์•ผ์—์„œ ์‚ฌ์šฉ๋จ

 

2. One Shot

- ํ•˜๋‚˜์˜ ์˜ˆ์‹œ๋งŒ์œผ๋กœ์œผ๋กœ ์ƒˆ๋กœ์šด ํด๋ž˜์Šค๋ฅผ ์ธ์‹ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•จ

ex) ์›์ˆญ์ด์˜ ์‚ฌ์ง„์„ ๊ฐ€๋ฅด์ณ์ฃผ๋ฉด ๋‹ค๋ฅธ ๋ชจ์–‘์˜ ์›์ˆญ์ด์‚ฌ์ง„์„ ๋ณด์—ฌ์ฃผ์–ด๋„ ์›์ˆญ์ด๋ผ๊ณ  ๋งž์ถœ์ˆ˜๊ฐ€ ์žˆ๋‹ค

"The animal was a zebra. It had black and white stripes. What is the name of this animal?"๋ผ๋Š” ๋ฌธ์žฅ์—์„œ "zebra"๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๋ชจ๋ธ์ด ํ•œ ๋ฒˆ์˜ ์˜ˆ์‹œ๋กœ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ

 

 

3. Few Shot

- ํ•œ ํด๋ž˜์Šค ๋‹น ์ผ๋ถ€ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€๋งŒ ์‚ฌ์šฉํ•ด์„œ ์ƒˆ๋กœ์šด ํด๋ž˜์Šค๋ฅผ ์ธ์‹

- ๋ช‡ ๊ฐœ์˜ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ์ƒˆ๋กœ์šด ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•

- ๋ชจ๋ธ์€ ์ด์ „์— ๋ณธ ์ ์ด ์—†๋Š” ๋‹จ์–ด์— ๋Œ€ํ•ด ํ•™์Šต๋˜์–ด ์žˆ์ง€ ์•Š์ง€๋งŒ, ์ œํ•œ๋œ ์ˆ˜์˜ ์˜ˆ์‹œ๋ฅผ ํ†ตํ•ด ํ•ด๋‹น ๋‹จ์–ด์˜ ์˜๋ฏธ๋ฅผ ํŒŒ์•…ํ•˜๊ณ  ์˜ˆ์ธกํ•  ์ˆ˜ ์žˆ์Œ

 

ex) ์—ฌ๋Ÿฌ ์ข…๋ฅ˜์˜ ๊ฐœ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉด์„œ ๊ฐ€๋ฅด์ณ์ฃผ๊ณ , ์ƒˆ๋กœ์šด ์ข…์˜ ๊ฐœ๋ฅผ ๋ณด์—ฌ์ฃผ๋ฉด ๊ฐœ๋ผ๋Š” ๊ฒƒ์„ ๋งž์ถœ ์ˆ˜ ์žˆ๋Š” ๊ฒƒ

-> ์ „์ฒด ๋ฐ์ดํ„ฐ ์…‹ ํ•™์Šตํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ํšจ์œจ์ ์ž„.

 

ex) "The animal was a zebra. It had black and white stripes. It was similar to a horse. What is the name of this animal?"๋ผ๋Š” ๋ฌธ์žฅ์—์„œ "zebra"๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๋ชจ๋ธ์ด ๋ช‡ ๊ฐœ์˜ ์˜ˆ์‹œ๋กœ ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ

 

1. Zero Shot

- ๋ผ๋ฒจ๋ง ๋˜์ง€ ์•Š์€ ์ƒˆ๋กœ์šด ํด๋ž˜์Šค์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜ ์ž‘์—…์„ ์‹œํ–‰ํ•  ๋•Œ, ์ด์ „์— ํ•™์Šต๋œ ๋ชจ๋ธ์„ ์‚ฌ์šฉํ•ด์„œ ๋ถ„๋ฅ˜ํ•˜๋Š” ๊ธฐ์ˆ 

- ๋ชจ๋ธ์ด ์ด์ „์— ๋ณธ ์ ์ด ์—†๋Š” ๋‹จ์–ด๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋งํ•จ.

- ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ํ›ˆ๋ จ ๊ณผ์ •์—์„œ ํ•ด๋‹น ๋‹จ์–ด์— ๋Œ€ํ•œ ์–ด๋– ํ•œ ์ •๋ณด๋„ ๋ฐ›์ง€ ์•Š์•˜์Œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ๋ฌธ๋งฅ๊ณผ ์ง€์‹์„ ํ™œ์šฉํ•˜์—ฌ ๋‹จ์–ด๋ฅผ ์ถ”๋ก ํ•˜๋ ค๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•จ

 

ex) ์•„์ด์—๊ฒŒ ์†Œ์™€ ๋ง์„ ๊ฐ€๋ฅด์ณ์ฃผ๊ณ , ์–ผ๋ฃฉ๋ง์€ ๊ฐ€๋ฅด์ณ์ค€ ๋™๋ฌผ๋“ค์˜ ํŠน์ง•์„ ํ•ฉ์ณ๋‘” ๊ฒƒ์ด๋ผ๊ณ  ์„ค๋ช…์„ ํ•ด์ค€๋‹ค. ๊ทธ๋Ÿฌ๋ฉด ์•„์ด๋Š” ์–ผ๋ฃฉ๋ง์„ ๋ณธ์ ์€ ์—†์ง€๋งŒ ์–ผ๋ฃฉ๋ง์ด๋ผ๊ณ  ๋งž์ถœ ์ˆ˜๊ฐ€ ์žˆ๋Š” ๊ฒƒ์ด๋‹ค

 

The animal was a zebra. It had black and white stripes. It was similar to a horse. What is another animal that has spots?"๋ผ๋Š” ๋ฌธ์žฅ์—์„œ "spots"๋ผ๋Š” ๋‹จ์–ด๋ฅผ ๋ชจ๋ธ์ด ์ถ”๋ก ํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธ

ex) ์ฒ˜์Œ ๋ณด๋Š” ๋‹จ์–ด๊ฐ€ ๋‚˜์™”์„ ๋•Œ, ์ด ๋‹จ์–ด์˜ ๋ฒˆ์—ญ ver์„ ์•Œ๋ ค์คŒ

 

https://ds-jungsoo.tistory.com/20

https://velog.io/@nomaday/n-shot-learning

728x90
๋ฐ˜์‘ํ˜•