๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
๋Œ€์™ธํ™œ๋™/2023 LG Aimers 3๊ธฐ

Module 2. Mathmatics for ML (KAIST ์‹ ์ง„์šฐ ๊ต์ˆ˜)

by ์ œ๋ฃฝ 2023. 7. 4.
728x90
๋ฐ˜์‘ํ˜•

๋‚ ์งœ: 2023๋…„ 7์›” 2์ผ

 

Part 1. Matrix Decomposition(ํ–‰๋ ฌ ๋ถ„ํ•ด)

1. Determinant(ํ–‰๋ ฌ์‹)

- 3x3 matrix์˜ Determinant๋ฅผ 2x2 matrix์˜ Determinant๋กœ ๋‹ค์‹œ ์ •์˜ํ•  ์ˆ˜ ์žˆ์Œ → Laplace expansion์ด๋ผ๊ณ  ์นญํ•จ

- Determinant์˜ ์„ฑ์งˆ

2. Trace

- Determinant์™€ ์œ ์‚ฌํ•œ ์˜๋ฏธ

- Matrix์˜ ์–ด๋–ค Diagonal Entry๋ฅผ ๋‹ค ๋”ํ•œ ํ˜•ํƒœ๋ฅผ Trace๋ผ๊ณ  ํ•จ

- ๋ง์…ˆ ๋ถ„ํ•ด๊ฐ€ ๊ฐ€๋Šฅํ•จ

3. Eigenvalue and Eigenvector

- Ax = lambdax ๋กœ ํ‘œํ˜„๋  ๋•Œ, lambda์˜ scala value์ธ lambda์™€ ์ด๋Ÿฐ x Vector๋ฅผ Eigenvalue์™€ Eigenvector๋ผ๊ณ  ๋ถ€๋ฅด๊ฒŒ ๋จ

- Eigenvector๋“ค์ด unique ํ•˜์ง€๋Š” ์•Š๋Š”๋‹ค๋Š” ํŠน์ง•์„ ์ง€๋‹˜

1. Determinant A๋Š” Eigenvalue๋“ค์˜ ๊ณฑ์…ˆ์œผ๋กœ ํ‘œํ˜„์ด ๋จ

2. Trace๋Š” Eigenvalue๋“ค์˜ ๋ง์…ˆ์œผ๋กœ ํ‘œํ˜„์ด ๋จ

4. Cholesky Decomposition

5.Diagnonal Matrix

- Diagonal Entry๋งŒ ์กด์žฌํ•˜๊ณ , ๋‚˜๋จธ์ง€ Entry๋Š” ๋‹ค 0์ธ ํ˜•ํƒœ๋ฅผ Diagonal Matrix๋ผ๊ณ  ํ•จ

- ๋‹ค์–‘ํ•œ ์—ฐ์‚ฐ๋“ค์ด ๋งค์šฐ ์‰ฝ๊ฒŒ ๋˜๋Š” ์žฅ์ ์„ ์ง€๋‹˜

- ๊ฐ„๋‹จํ•œ ํ˜•ํƒœ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Œ

6. Singular Value Decomposition

- ์–ด๋–ค Matrix A๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, UsigmaV๋กœ ๋ถ„ํ•ดํ•˜๋Š” ๊ผด์„ Singular Value Decomposition์ด๋ผ๊ณ  ํ•จ

- ํ•ญ์ƒ ์กด์žฌ๋” ์œ ์šฉํ•˜๊ฒŒ ์“ฐ์ž„

 

 

 

 

Part 2. Convex Optimization

1. Unconstrained Optimization and Gradient Algorithms

- ๋‚ด์ ํ•ด์„œ 0์ด ๋˜๋Š” ๋ฐฉํ–ฅ ์ค‘์—  ๋ฐ˜๋Œ€๋ฐฉํ–ฅ์œผ๋กœ d๋ฅผ ์„ ํƒํ•˜๋Š” ๊ฒƒ์„ Steepest Gradient Descent๋ผ๊ณ  ๋ถ€๋ฆ„ = Gradient Descent

 

2. Batch gradient

- ๋ชจ๋“  data point๋ฅผ ๋‹ค ๊ณ ๋ คํ•ด์„œ ๊ณ„์‚ฐํ•˜๋Š” ์—…๋ฐ์ดํŠธ๋ฅผ batch gradient๋ผ๊ณ  ๋ถ€๋ฆ„

 

3. Mini-batch gradient

- Data point๊ฐ€ n๊ฐœ ์žˆ์„ ๋•Œ, ์–ด๋–ค ํŠน์ • subset์„ ๊ตฌํ•ด์„œ ๊ทธ subset์— ์žˆ๋Š” Gradient๋งŒ ๊ณ„์‚ฐํ•ด์„œ ์—…๋ฐ์ดํŠธ

 

4. Stochastic Gradient Descent(SGD)

- mini-batch gradient์˜ ์–ด๋–ค gradient๊ฐ€ ์–ด๋–ค original batch gradient๋ฅผ ์ž˜ ๊ทผ์‚ฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ์ด๋Ÿฐ์‹์œผ๋กœ ๋””์ž์ธํ•ด์„œ ์—…๋ฐ์ดํŠธํ•˜๋Š” ๋ฐฉ์‹์„ stochastic gradient๋ผ๊ณ  ํ•จ

⇒ ๋ฐ์ดํ„ฐ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ์„ ๋•Œ ํ™œ์šฉ

 

5. Adaptivity for Better Convergence: Momentum

- ์ด์ „์— ์—…๋ฐ์ดํŠธ ํ–ˆ๋˜ ๋ฐฉํ–ฅ์„ ์ถ”๊ฐ€์ ์œผ๋กœ ๋”ํ•ด์คŒ (๊ด€์„ฑ์˜ ๋ฒ•์น™)

6. Convex Optimization

- Set์ด ์žˆ์„ ๋•Œ, point๋ฅผ ๋‘๊ฐœ๋ฅผ ์žก๊ณ  x1๊ณผ x2๋ฅผ ๊ฐ€๋ฅด๋Š” ์„ ๋ถ„์„ ๊ทธ์Œ. ์ด ์„ ๋ถ„์ด ํ•ญ์ƒ Set ์•ˆ์— ์žˆ์„ ๋•Œ๋ฅผ convex set์ด๋ผ๊ณ  ์นญํ•จ

  ์ฒซ๋ฒˆ์งธ ๋„ํ˜•๋งŒ convex set

 

- f๋ผ๋Š” ํ•จ์ˆ˜๋ฅผ ์ตœ์†Œํ™”ํ•˜๊ณ , f์— ๋Œ€ํ•œ ์กฐ๊ฑด์„ ๋‹ค๋ฃจ๋Š” ์–ด๋–ค f๊ฐ€ ์–ด๋–ค x๋ผ๋Š” ์–ด๋–ค Set ์•ˆ์— ์†ํ•ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •์„ ํ•˜๋ฉด, f๊ฐ€ convex ํ•จ์ˆ˜์ด๊ณ , ์ด๋Ÿฐ subset์„ ์ด๋ฃจ๋Š” x๊ฐ€ convex set์ด ๋  ๋•Œ, convex optimization์ด๋ผ๊ณ  ์–˜๊ธฐํ•จ

- Examples of Convex or Concave Functions

 

 

 

Part 3. PCA

1. PCA

ex) ์ง‘์„ ์‚ด ๋•Œ ๊ณ ๋ คํ•ด์•ผํ•˜๋Š” 5๊ฐ€์ง€ ๊ฒฝ์šฐ๊ฐ€ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •

- 5๊ฐ€์ง€๋ฅผ ๊ณ ๋ คํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, size์— ๋Œ€ํ•œ ์š”์†Œ ํ•˜๋‚˜, location์— ๋Œ€ํ•œ ์š”์†Œ ํ•˜๋‚˜, ์ฆ‰ 2๊ฐ€์ง€๋กœ ์ค„์—ฌ์ค€๋‹ค๋Š” ๋ฐฉ๋ฒ•๋ก ์ด ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ–ˆ์„ ๋•Œ, ๋” ์ˆ˜์›”ํ•˜๊ฒŒ ์ง‘์„ ์‚ด์ง€ ๋ง์ง€๋ฅผ ๊ฒฐ์ •ํ•  ์ˆ˜ ์žˆ๊ฒŒ ๋จ์ด ๊ณผ์ •์ด ๋ฐ”๋กœ PCA๋ผ๊ณ  ํ•  ์ˆ˜ ์žˆ์Œ

 

2. PCA Algorithm

 1. Centering: ๊ฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ํ‰๊ท ์„ ๊ตฌํ•˜๊ณ  (x1,x2,x3) ๊ฐ ์ฐจ์›๋งˆ๋‹ค ํ‰๊ท ์„ ๊ตฌํ•ด ๊ทธ ํ‰๊ท ์„ ๋นผ์คŒ ( ์›์ ์„ ์ค‘์‹ฌ์œผ๋กœ ์ •๋ ฌํ•˜๋Š” ๊ณผ์ • )

2. Standardization: ๋ถ„์‚ฐ์„ ๊ตฌํ•˜๊ณ , ๋ถ„์‚ฐ์œผ๋กœ normalizationํ•ด์ฃผ๋Š” ๊ณผ์ •

๊ฐ ์ฐจ์›์„ ํ‰๊ท ์ด 0์ด๊ณ , ๋ถ„์‚ฐ์ด 1์ด ๋  ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“œ๋Š” ์„ ํ˜• ๋ณ€ํ™˜์˜ ๊ณผ์ •

3. Digenvalue/vector: M๊ฐœ(์ถ•์†Œํ•˜๊ณ  ์‹ถ์€ ์ฐจ์›์˜ ๊ฐœ์ˆ˜ ex) 5→ 2, M= 2)์˜ Eigenvector์„ ๊ตฌํ•จ

4. Projection: Data point๋ฅผ ์ถ•์†Œ์‹œํ‚ค๋Š” ๊ณผ์ •

5. Undo stadardization and centering: 1,2๋ฒˆ์˜ ๊ณผ์ •์„ ๋‹ค์‹œ ์›๋ž˜์˜ ๋ถ„ํฌ๋กœ ์˜ฎ๊ฒจ์ฃผ๋Š” ์—ญํ• 

 

1. ์›์ ์œผ๋กœ ์ด๋™ํ•˜๋Š” ๊ฒƒ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ

2. ๋‚˜๋ˆ ์ฃผ๋ฉด ๋ถ„์‚ฐ์ด 1์ด ๋˜๋Š” ํšจ๊ณผ๊ฐ€ ๋จ

3. step2์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ์ดํ„ฐ๋ผ๊ณ  ๊ฐ€์ •ํ•˜๊ณ , ์ด ๋ฐ์ดํ„ฐ์˜ data covariance Matrix๋ฅผ ๊ตฌํ•˜๊ฒŒ ๋จ ⇒ Eigenvector์„ ๊ตฌํ•˜๊ฒŒ ๋˜๋ฉด ๋Œ€๊ฐ์„  ๋ฐฉํ–ฅ์˜ eigenvector์ด ๋” ํฌ๊ธฐ ๋•Œ๋ฌธ์— ๋Œ€๊ฐ์„  ๋ฐฉํ–ฅ์œผ๋กœ projection ์‹œํ‚ด

 โ€ป ์ด ํ‰๋ฉด์œผ๋กœ ๊ฐ€์žฅ ๊ฐ€๊นŒ์šด point๋ฅผ ์ฐพ์•„์„œ ๋ฐ์ดํ„ฐ๋ฅผ squeezing ํ•จ (step 4)

 

 

 

4. ๋งˆ์ง€๋ง‰์€ ์ฒซ๋ฒˆ์งธ ๊ณผ์ •์œผ๋กœ ๊ฑฐ๊พธ๋กœ ๋‹ค์‹œ ๊ณ„์‚ฐํ•˜๋Š” ๊ณผ์ •

⇒ ์›๋ž˜๋Š” 2์ฐจ์› ๋ฐ์ดํ„ฐ์˜€๋Š”๋ฐ, ์„ ๋ถ„ ์œ„์— ์žˆ๋Š” 1์ฐจ์› ๋ฐ์ดํ„ฐ๊ฐ€ ๋จ.

 

3. Idea

- ๋ถ„์‚ฐ์ด ํฐ ๋ฐฉํ–ฅ์€ ๋ฐ์ดํ„ฐ๊ฐ€ ๊ฐ€์žฅ ๋งŽ์ด ํผ์ ธ์žˆ๋Š” ๋ฐฉํ–ฅ์ด๋ฏ€๋กœ, ํ•ด๋‹น ์ถ•์œผ๋กœ ๋ฐ์ดํ„ฐ๋ฅผ ํˆฌ์˜ํ•˜๋ฉด ์ตœ๋Œ€ํ•œ ๋งŽ์€ ์ •๋ณด๋ฅผ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ์Œ.

- ๋ถ„์‚ฐ์ด ์ž‘์€ ๋ฐฉํ–ฅ์€ ๋ฐ์ดํ„ฐ์˜ ๋ณ€๋™์„ฑ์ด ์ž‘๊ธฐ ๋•Œ๋ฌธ์— ํ•ด๋‹น ์ถ•์œผ๋กœ ํˆฌ์˜ํ•˜๋ฉด ๋ฐ์ดํ„ฐ์˜ ์ •๋ณด ์†์‹ค์ด ํฌ๊ฒŒ ์ผ์–ด๋‚  ์ˆ˜ ์žˆ์Œ

 

 

4. ์ˆ˜์‹

728x90
๋ฐ˜์‘ํ˜•