๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๋Œ€์™ธํ™œ๋™/2023 LG Aimers 3๊ธฐ

Module 5. ์ง€๋„ํ•™์Šต (๋ถ„๋ฅ˜/ํšŒ๊ท€) (์ดํ™”์—ฌ์ž๋Œ€ํ•™๊ต ๊ฐ•์ œ์› ๊ต์ˆ˜)

by ์ œ๋ฃฝ 2023. 7. 8.
728x90
๋ฐ˜์‘ํ˜•

๋‚ ์งœ: 2023๋…„ 7์›” 8์ผ

Part 1. SL Foundation

1.Supervised Learning

- label๊ฐ’์ด ์žˆ๋Š” ๊ฒƒ์„ ๋งํ•จ

- training๊ณผ test ๋‹จ๊ณ„๊ฐ€ ์กด์žฌํ•จ

- feature์˜ ๊ฒฝ์šฐ, domain ์ง€์‹์ด ์–ด๋Š ์ •๋„ ํ•„์š”ํ•จ

- ๋”ฅ๋Ÿฌ๋‹์˜ ๊ฒฝ์šฐ, feature๋ฅผ ์Šค์Šค๋กœ ํ•™์Šตํ•˜๊ธฐ๋„ ํ•จ

- SL์˜ ๊ฒฝ์šฐ, training error, val error, test error์„ ํ†ตํ•ด generalization error์„ ์ตœ์†Œํ™”ํ•˜๋„๋ก ํ•˜๋Š” ๋…ธ๋ ฅ์„ ํ•˜๊ฒŒ ๋จ

- loss function=cost function

2. Bias-variance trade-off

- bias์™€ variance์˜ trade off๋ฅผ ์ž˜ ์กฐ์ •ํ•ด์„œ ์ตœ์ ์˜ generalization error๋ฅผ ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ

 

- ๋”ฅ๋Ÿฌ๋‹๊ณผ ๊ฐ™์€ model์€ ๊ณ ์ฐจ์›์˜ data๋ฅผ ์‚ฌ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ํ•จ

- ๋ฐ์ดํ„ฐ ์ˆ˜์— ๋น„ํ•ด ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€ => ์˜ค๋ฒ„ํ”ผํŒ… ๋ฐœ์ƒ์ด ์ฆ๊ฐ€ => ์ฐจ์›์˜ ์ €์ฃผ๋ผ๊ณ  ์นญํ•จ

=> data augmentation, regularization, ensemble์„ ํ†ตํ•ด ํ•ด๊ฒฐ ๊ฐ€๋Šฅ

 

3. k-fold cross validation

- k๊ฐœ์˜ fold๋กœ ๋‚˜๋ˆ , 1๊ฐœ์˜ ๊ทธ๋ฃน์€ val๋กœ, ๋‚˜๋จธ์ง€๋Š” train์œผ๋กœ ์‚ฌ์šฉ

Part 2. Linear Regression

1. Linear model

- ์ฃผ์–ด์ง„ ์ž…๋ ฅ์— ๋Œ€ํ•ด ์ถœ๋ ฅ๊ณผ์˜ ์„ ํ˜•์ ์ธ ๊ด€๊ณ„๋ฅผ ์ถ”๋ก ํ•˜๋Š” ๋ชจ๋ธ

- ์„ ํ˜•ํ•ฉ์œผ๋กœ ๊ตฌ์„ฑ

- ์„ ํ˜• model์ด์ง€๋งŒ ๋ฐ˜๋“œ์‹œ ์ž…๋ ฅ ๋ณ€์ˆ˜์— ์„ ํ˜•์ผ ํ•„์š”๋Š” ์—†์Œ 

 

2. Optimization

- ์ตœ์ ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ ๊ตฌํ•˜๊ธฐ

Part 3. Gradient Discent

- ์•ŒํŒŒ์˜ ๋ณ€ํ™”์— ๋”ฐ๋ผ ์ˆ˜๋ ด ํ˜•ํƒœ๊ฐ€ ๋ฐ”๋€œ

- ์•ŒํŒŒ๊ฐ’์ด ๋„ˆ๋ฌด ํฌ๋ฉด, ์ตœ์†Œ ์ง€์ ์„ ์ฐพ๊ธฐ ์–ด๋ ค์›€

1. Batch gradient descent

- m์„ ๊ณ ๋ ค

- data๊ฐ€ ์ปค์งˆ์ˆ˜๋ก ๋ณต์žก๋„๊ฐ€ ์ฆ๊ฐ€

2. SGD

- noise ์˜ํ–ฅ์„ ๋ฐ›๊ธฐ์— ์‰ฌ์›€

- m์„ 1๋กœ ๊ทน๋‹จ์ ์œผ๋กœ ์ค„์ธ ๊ฒƒ

- ์ƒ˜ํ”Œ ํ•˜๋‚˜ํ•˜๋‚˜ ์—ฐ์‚ฐ ํ•ด์•ผ๋จ

โ€ป  Local Optimum

3. Momentum

- ๊ด€์„ฑ์˜ ๋ฒ•์น™

- ์ค‘๊ฐ„์— 0์— ๋จธ๋ฌผ๋Ÿฌ๋„ ์ „์— ์žˆ๋˜ ์Šต์„ฑ์„ ํ™œ์šฉํ•ด ๊ณ„์† ์ง„ํ–‰ํ•˜๊ฒŒ ๋” ํ•ด์ฃผ๋Š” ๊ฒƒ

4. nestrov momentum

- gradient๋ฅผ ๋จผ์ € ํ™•์ธํ•˜๊ณ  ์—…๋ฐ์ดํŠธ ์ง„ํ–‰

- momentum step์„ ๊ฐ„ ์‹œ์ ์—์„œ lookahead step์„ ๊ณ„์‚ฐํ•˜๊ณ , ๋‘ ๋ฒกํ„ฐ์˜ ํ•ฉ์œผ๋กœ์จ actual step์„ ๊ฒฐ์ •ํ•จ

5. AdaGrad

- ๊ฐ ๋ฐฉํ–ฅ์œผ๋กœ์˜ learning rate๋ฅผ ์ ์‘์ ์œผ๋กœ ์กฐ์ ˆํ•ด ํ•™์Šต ํšจ์œจ์„ ๋†’์ž„

- learning rate๊ฐ€ ์ž‘์•„์ง€๋ฉด์„œ ํ•™์Šต์ด ์•ˆ๋  ์ˆ˜ ์žˆ์Œ

 

6. RMSProp

- AdaGrad๋ฅผ ๋ณด์™„ํ•œ ๋ฐฉ์‹

 

7. Adam

- RMSProp+Momentum ๋ฐฉ์‹

 

8. ๊ณผ์ ํ•ฉ

9. Regularization

 

Part 4. Linear Classification

 

1. Zero-One Loss 

- ๋‚ด๋ถ€์˜ logic์„ ํŒ๋ณ„ํ•ด์„œ ๋งž์œผ๋ฉด 0 ํ‹€๋ฆฌ๋ฉด 1 ์ถœ๋ ฅํ•˜๋Š” ํ•จ์ˆ˜

- ๋ฏธ๋ถ„ํ•œ ๊ฒฐ๊ณผ, gradient๊ฐ€ 0์ด ๋˜์–ด๋ฒ„๋ฆผ => ํ•™์Šต์ด ๋ถˆ๊ฐ€๋Šฅํ•จ

 

2. Hinge Loss

- ์œ„๋ฅผ ๋ณด์™„ํ•œ loss

 

3. Cross-entropy Loss

- ํ™•๋ฅ  ๊ฐ’์„ ์„œ๋กœ ๋น„๊ต

- score์€ ์‹ค์ˆ˜๊ฐ’์ด๊ธฐ์— sigmoid ํ•จ์ˆ˜์™€ ๊ฐ™์€ ํ™•๋ฅ ํ•จ์ˆ˜๋กœ mapping

-> logistic model์ด๋ผ๊ณ  ํ•จ

 

4. Multiclass Classification

 

Part 5. Advanced Classification

1. SVM

2. Optimization

- Hard margin SVM

- Nonlinear transform & kernel trick

3.  Kernel ํ•จ์ˆ˜

- linearly sepableํ•˜์ง€ ์•Š์€ data sample ๋“ค์ด ์žˆ๋‹ค๊ณ  ํ•  ๋•Œ, ๊ทธ ์ฐจ์ˆ˜๋ฅผ ๋†’์—ฌ linearly sepableํ•˜๊ฒŒ ๋งŒ๋“œ๋Š” ๊ณผ์ •

- ์ปค๋„์˜ ์ข…๋ฅ˜

- polynomial kernel

- Gaussian radial basis function

- Hyperbolic tangent kernel 

4. ANN

- ReLU๋ฅผ ๋งŽ์ด ์‚ฌ์šฉํ•จ

- ANN์„ ๋งŽ์ด ์Œ“์œผ๋ฉด DNN

- linear activation function์„ ์Œ“์œผ๋ฉด ๋ณด๋‹ค ๋ณต์žกํ•œ ํ˜•ํƒœ์˜ data๋ฅผ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ์Œ

- XOR

- MLP(multilayer perceptron)

 

Part 6. Ensemble

1. Performance Evaluation in supervised learning

- Accuracy

- Precision

- Recall

- F1

์ด ์กด์žฌ

2. ROC Curve

3. Bagging

- ํ•™์Šต๊ณผ์ •์—์„œ training sample์„ ๋žœ๋คํ•˜๊ฒŒ ๋‚˜๋ˆ ์„œ ํ•™์Šต

- n๊ฐœ๋กœ ๊ตฌ๋ถ„

- low variance์˜ ์•ˆ์ •์ ์ธ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•˜๋Š”๋ฐ ์œ ์šฉํ•œ ๋ฐฉ๋ฒ•

- overfitting์˜ ๋ฌธ์ œ์—์„œ sample์„ randomํ•˜๊ฒŒ ์„ ํƒํ•˜๋Š” ๊ณผ์ •์—์„œ data augmentation ํšจ๊ณผ๋ฅผ ์ง€๋‹ ์ˆ˜ ์žˆ์Œ

- ๊ฐ„๋‹จํ•œ model์„ ์ง‘ํ•ฉ์ ์œผ๋กœ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Œ

- bootstrapping: ๋‹ค์ˆ˜์˜ sample data set์„ ์ƒ์„ฑํ•ด์„œ ํ•™์Šตํ•˜๋Š” ๋ฐฉ์‹์„ ์˜๋ฏธํ•จ

 

4. Boosting

- Week classifier: bias๊ฐ€ ๋†’์€ classifier

=> cascading์„ ํ•˜๊ฒŒ ๋˜๋ฉด ์—ฐ์†์ ์ธ ๊ณผ์ •์„ ํ†ตํ•ด ์„ฑ๋Šฅ์„ ์˜ฌ๋ฆด ์ˆ˜ ์žˆ๊ฒŒ ๋จ

 

728x90
๋ฐ˜์‘ํ˜•