๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/2023 DL ๊ธฐ์ดˆ ์ด๋ก  ๊ณต๋ถ€

[๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ 2] chap8(์–ดํ…์…˜)

by ์ œ๋ฃฝ 2023. 7. 8.
728x90
๋ฐ˜์‘ํ˜•

 

  • seq2seq ⇒ 2๊ฐœ์˜ RNN์„ ์—ฐ๊ฒฐํ•ด ํ•˜๋‚˜์˜ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฅธ ์‹œ๊ณ„์—ด ๋ฐ์ดํ„ฐ๋กœ ๋ณ€ํ™˜.
  • ์–ดํ…์…˜์˜ ์—ญํ•  ⇒ seq2seq๋ฅผ ๋” ๊ฐ•๋ ฅํ•˜๊ฒŒ ํ•ด์คŒ.
8.1 ์–ดํ…์…˜์˜ ๊ตฌ์กฐ
8.1.1 seq2seq์˜ ๋ฌธ์ œ์ 
  • ๊ณ ์ •๊ธธ์ด๋ฒกํ„ฐ
  • ex) ์•„๋ฌด๋ฆฌ ๋ฌธ์žฅ ๊ธธ์ด๊ฐ€ ๊ธธ์–ด์ ธ๋„ ๊ณ ์ •๊ธธ์ด๋ฒกํ„ฐ → ๊ฐ™์€ ๊ธธ์ด์˜ ๋ฒกํ„ฐ๋กœ ํ‘œํ˜„ํ•ด๋ฒ„๋ฆผ ⇒ ๋งŽ์€ ์ •๋ณด๋ฅผ ์••์ถ•ํ•˜๋‹ค๋ณด๋‹ˆ ํ•œ๊ณ„์ ์ด ๋ถ„๋ช…์ด ์˜ฌ ๊ฒƒ์ž„.
  • ex) ์˜ท์„ ์šฐ๊ฑฐ์ง€๋กœ ๋„ฃ์œผ๋ฉด ๊ฒฐ๊ตญ ์˜ท์žฅ์—์„œ ์‚์ ธ๋‚˜์˜ค๋Š” ๊ฒƒ๊ณผ ๊ฐ™์€ ๊ฒƒ.
8.1.2 Encoder ๊ฐœ์„ 
  • encoder ์ถœ๋ ฅ ๊ธธ์ด๋ฅผ ์ž…๋ ฅ ๋ฌธ์žฅ์˜ ๊ธธ์ด์— ๋”ฐ๋ผ ๋ฐ”๊ฟ”์ฃผ๋Š” ๊ฒƒ.
  • ๊ทธ๋Ÿฌ๊ธฐ ์œ„ํ•ด์„  ์‹œ๊ฐ๋ณ„(๋‹จ์–ด๋ณ„) LSTM ๊ณ„์ธต์˜ ์€๋‹‰ ์ƒํƒœ ๋ฒกํ„ฐ๋ฅผ ๋ชจ๋‘ ์ด์šฉ.
  • ex) 5๊ฐœ์˜ ๋‹จ์–ด๊ฐ€ ์ž…๋ ฅ๋œ ๊ฒฝ์šฐ, encoder์€ 5๊ฐœ ๋ฒกํ„ฐ๋ฅผ ์ถœ๋ ฅํ•จ.
8.1.3 Decoder ๊ฐœ์„  1(์„ ํƒ ์ž‘์—… ๊ณ„์ธต)
  • ์ธ๊ฐ„์ด ๋ฒˆ์—ญํ•  ๋•Œ๋Š” ‘๋‚˜=I’, ๊ณ ์–‘์ด=cat์ด๋ผ๋Š” ๋Œ€์‘ ๊ด€๊ณ„๋กœ ์ง€์‹์„ ํ™œ์šฉ⇒ ์–ผ๋ผ์ด๋จผํŠธ๋ผ๊ณ  ์นญํ•จ
  • seq2seq2์—๊ฒŒ๋„ ์ ์šฉ์‹œํ‚ด
  • ๊ธฐ์กด decoder์€ ๋งˆ์ง€๋ง‰์— ์žˆ๋Š”, ์ตœ์ข… ์€๋‹‰ ์ƒํƒœ๋งŒ์„ ๊ฐ€์ ธ๊ฐ€๋Š” ํ˜•์‹.
  • ์–ด๋–ค ๊ณ„์‚ฐ ์ธต์ด ์ƒˆ๋กœ ์ƒ๊น€
  1. ์„ ํƒ์ž‘์—… ๊ณ„์ธต
    • 2๊ฐ€์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์Œ 1) (encoder์—์„œ ๋‚˜์˜จ hs), 2) ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ์—์„œ ๋‚˜์˜จ a
    • ์–˜๋„ค๋“ค์„ ๊ณ„์‚ฐํ•ด์„œ c๋กœ ์ „๋‹ฌ
    • ๊ฐ€์ค‘ํ•ฉ์„ ๊ตฌํ•ด์„œ ๋งฅ๋ฝ๋ฒกํ„ฐ๋กœ ๋งŒ๋“ ๋‹ค ex) ‘๋‚˜’ ๋ผ๋Š” ๋‹จ์–ด์— ๊ฐ€์ค‘์น˜๊ฐ€ 0.8๋กœ ๊ฐ€์žฅ ๋†’์œผ๋ฏ€๋กœ c๋ผ๋Š” ๋งฅ๋ฝ ๋ฒกํ„ฐ์—๋Š” ‘๋‚˜’ ๋ผ๋Š” ๋‹จ์–ด์— ๋Œ€ํ•œ ์ •๋ณด๊ฐ€ ๋งŽ์ด ํฌํ•จ ๋˜์–ด ์žˆ์„ ๊ฒƒ์ž„.
8.1.4 Decoder ๊ฐœ์„  2 (๊ฐ€์ค‘์น˜(a) ๊ณ„์‚ฐ ๊ณ„์ธต)
  1. ๊ฐ€์ค‘์น˜(a) ๊ณ„์‚ฐ ๊ณ„์ธต
    • encoder์˜ ์€๋‹‰ ์ƒํƒœ ๋ชจ์Œ์ธ hs + LSTM ์ถœ๋ ฅ๊ฐ’ h 2๊ฐœ ๊ฐ’์„ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์Œ.
    • ๋‘๊ฐœ์˜ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋กœ a๋ผ๋Š” ๊ฐ€์ค‘์น˜๋ฅผ ๋งŒ๋“ค์–ด์•ผ ํ•จ.
    • ์ด ๋•Œ ์šฐ๋ฆฌ๊ฐ€ ํ•ด์•ผํ•  ์ผ์€ hlstm์ด hs์—์„œ์˜ ๊ฐ ์€๋‹‰ ์ƒํƒœ์™€ ์–ผ๋งˆ๋‚˜ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€๋ฅผ ํ•˜๋‚˜์˜ ์ˆ˜์น˜๋กœ ํ‘œํ˜„ํ•˜๋Š” ๊ฐ€์ค‘์น˜๋กœ ๋งŒ๋“ค์–ด์•ผ ํ•จ.⇒ ๋ฒกํ„ฐ์˜ ๋‚ด์ ์„ ์ด์šฉํ•ด ๊ตฌํ•จ.
    • ⇒ ์ฝ”์‚ฌ์ธ ์œ ์‚ฌ๋„: ๋‘ ๋ฒกํ„ฐ์˜ ํฌ๊ธฐ ์‹ ๊ฒฝ x, ๋‘ ๋ฒกํ„ฐ๊ฐ€ ๊ฐ€๋ฆฌํ‚ค๋Š” ๋ฐฉํ–ฅ ๊ฐ„์˜ ๊ฐ๋„๋ฅผ ๊ตฌํ•จ ⇒ ์œ ์‚ฌ๋„ ๊ณ„์‚ฐ.
    • ๊ฐ€์ค‘์น˜ ๊ณ„์‚ฐ ๊ณ„์ธต์˜ ๊ณ„์‚ฐ ๊ทธ๋ž˜ํ”„
8.1.5 Decoder ๊ฐœ์„  3 (1+2 ๊ฒฐํ•ฉํ•˜๊ธฐ)
  • affine ๊ณ„์ธต์œผ๋กœ ์ž…๋ ฅํ•˜๋Š” ๊ตฌ์กฐ ์ถ”๊ฐ€.
8.2 ์–ดํ…์…˜์„ ๊ฐ–์ถ˜ seq2seq ๊ตฌํ˜„

 

8.3 ์–ดํ…์…˜ ํ‰๊ฐ€

 

8.4 ์–ดํ…์…˜์— ๊ด€ํ•œ ๋‚จ์€ ์ด์•ผ๊ธฐ
8.4.1 ์–‘๋ฐฉํ–ฅ RNN
class TimeBiLSTM:     def __init__(self, Wx1, Wh1, b1,                  Wx2, Wh2, b2, stateful=False):         self.forward_lstm = TimeLSTM(Wx1, Wh1, b1, stateful)         self.backward_lstm = TimeLSTM(Wx2, Wh2, b2, stateful)         self.params = self.forward_lstm.params + self.backward_lstm.params         self.grads = self.forward_lstm.grads + self.backward_lstm.grads      def forward(self, xs):         o1 = self.forward_lstm.forward(xs)         o2 = self.backward_lstm.forward(xs[:, ::-1])         o2 = o2[:, ::-1] ## ์™œ ๋‹ค์‹œ ๋’ค์ง‘์Œ?          out = np.concatenate((o1, o2), axis=2)         return out      def backward(self, dhs):         H = dhs.shape[2] // 2         do1 = dhs[:, :, :H]         do2 = dhs[:, :, H:]          dxs1 = self.forward_lstm.backward(do1)         do2 = do2[:, ::-1]         dxs2 = self.backward_lstm.backward(do2)         dxs2 = dxs2[:, ::-1]         dxs = dxs1 + dxs2         return dxs
8.4.2 Attention ๊ณ„์ธต ์‚ฌ์šฉ ๋ฐฉ๋ฒ•
  • ๊ผญ ์ƒํ•˜ ๋ฐฉํ–ฅ์œผ๋กœ attention์„ ๋„ฃ์„ ํ•„์š”๋Š” ์—†์Œ
  • ๋‹ค์–‘ํ•˜๊ฒŒ ๋ชจ๋ธ์ด ๊ตฌ์„ฑ๋จ.
8.4.3 seq2seq ์‹ฌ์ธตํ™” & skip connection
  • ๊ณ„์ธต์„ ๊นŠ๊ฒŒ ํ•  ๊ฒฝ์šฐ, ๋ชจ๋ธ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ( ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•˜์ง€ ์•Š๋„๋ก ) ๋–จ์–ด๋œจ๋ฆฌ์ง€ ์•Š๋„๋ก ํ•˜๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ.⇒ ๋ฐฉ๋ฒ•์˜ ์˜ˆ๋กœ skip connection ๊ธฐ๋ฒ•์ด ์กด์žฌ.
    • skip connection์€ ์ถœ๋ ฅ๊ฐ’์„ ๊นŠ์ด ๋ฐฉํ–ฅ์˜ ๋‹ค์Œ LSTM ๊ณ„์ธต ์ถœ๋ ฅ๊ฐ’์— ๋”ํ•ด์ฃผ๋Š” ๋ฐฉ์‹์œผ๋กœ ์ˆ˜ํ–‰
    • ๋ง์…ˆ ์—ฐ์‚ฐ์„ ํ•˜๊ฒŒ ๋˜๋ฉด ์—ญ์ „ํŒŒ๊ฐ€ ์ง„ํ–‰๋˜์–ด๋„ ๊ธฐ์šธ๊ธฐ ์†์‹ค์ด ์ผ์–ด๋‚˜์ง€ ์•Š์Œ.
    • ๊นŠ์ด ๋ฐฉํ–ฅ์—์„œ์˜ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค๊ณผ ํญ๋ฐœ ⇒ skip connection
    • ์‹œ๊ฐ„ ๋ฐฉํ–ฅ์—์„œ์˜ ๊ธฐ์šธ๊ธฐ ์†Œ์‹ค ⇒ ๊ฒŒ์ดํŠธ ์ถ”๊ฐ€ํ•œ LSTM, GRU
    • ์‹œ๊ฐ„ ๋ฐฉํ–ฅ์—์„œ์˜ ๊ธฐ์šธ๊ธฐ ํญ๋ฐœ ⇒ Gradient Clipping( L2 ๊ทœ์ œ)
8.5 ์–ดํ…์…˜ ์‘์šฉ
8.5.1 ๊ตฌ๊ธ€ ์‹ ๊ฒฝ๋ง ๊ธฐ๊ณ„ ๋ฒˆ์—ญ(GNMT)
  • ๊ทœ์น™ ๊ธฐ๋ฐ˜ ๋ฒˆ์—ญ ⇒ ์šฉ๋ก€ ๊ธฐ๋ฐ˜ ๋ฒˆ์—ญ⇒ ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๋ฒˆ์—ญ

• ์šฐ๋ฆฌ๊ฐ€ ์•ž์„œ ๋ฐฐ์› ๋˜ ์–ดํƒ ์…˜์„ ๊ฐ–์ถ˜ seq2seq์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ Encoder, Decoder, Attention์œผ๋กœ ๊ตฌ์„ฑ๋˜์–ด์žˆ๋‹ค. ๋‹ค๋งŒ, ์—ฌ๊ธฐ์— ๋ฒˆ์—ญ ์ •ํ™•๋„๋ฅผ ๋†’์ด๊ธฐ ์œ„ํ•ด LSTM ๊ณ„์ธต์˜ ๋‹ค์ธตํ™”, ์–‘๋ฐฉํ–ฅ LSTM, skip ์—ฐ๊ฒฐ ๋“ฑ์„ ์ถ”๊ฐ€ํ–ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ํ•™์Šต ์‹œ๊ฐ„์„ ๋‹จ์ถ•ํ•˜๊ธฐ ์œ„ํ•ด GPU๋กœ ๋ถ„์‚ฐํ•™์Šต์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ๋‹ค. ์ด์™ธ์—๋„ ๋‚ฎ์€ ๋นˆ๋„์˜ ๋‹จ์–ด์ฒ˜๋ฆฌ๋‚˜ ์ถ”๋ก  ๊ณ ์†ํ™”๋ฅผ ์œ„ํ•œ ์–‘์žํ™” ๋“ฑ์˜ ์—ฐ๊ตฌ๋„ ์ด๋ฃจ์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋กœ์จ ์ ์  ์‚ฌ๋žŒ์˜ ์ •ํ™•๋„์— ๊ฐ€๊นŒ์›Œ์ง€๊ณ  ์žˆ๋‹ค.

8.5.2 ํŠธ๋žœ์Šคํฌ๋จธ
  • RNN ๋Œ€์‹  ํ•ฉ์„ฑ๊ณฑ ๊ณ„์ธต์„ ํ™œ์šฉํ•ด seq2seq ๊ตฌํ˜„1. RNN: ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ ๋ถˆ๊ฐ€.
    1. RNN: ๋ฌผ๋ฆฌ์  ๊ฑฐ๋ฆฌ๊ฐ€ ๋ฉ€์–ด์ง€๋ฉด ๋Œ€์‘๊ด€๊ณ„ ์ž˜ ํ•™์Šต ๋ชปํ•จ.
    ⇒ ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜์˜จ ๊ธฐ๋ฒ•์ด ํŠธ๋žœ์Šคํฌ๋จธ
  • ⇒ ์ด์ „ ์‹œ๊ฐ์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋ฅผ ์ด์šฉํ•ด ์ˆœ์„œ๋Œ€๋กœ ๊ณ„์‚ฐํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณ‘๋ ฌ์ฒ˜๋ฆฌ๊ฐ€ ๋ถˆ๊ฐ€๋Šฅ.
  • ์…€ํ”„ ์–ดํ…์…˜
  • ์…€ํ”„ ์–ดํ…์…˜์˜ ๊ฒฝ์šฐ, ์ž…๋ ฅ ์‹œํ€€์Šค ๋‚ด์—์„œ์˜ ๋Œ€์‘๊ด€๊ณ„๋ฅผ ํ•™์Šตํ•˜๊ณ , ๋™์‹œ์— ์ถœ๋ ฅ ์‹œํ€€์Šค ๋‚ด์—์„œ์˜ ๋Œ€์‘๊ด€๊ณ„๋„ ํ•™์Šตํ•จ.
  • ๊ณ„์‚ฐ๋Ÿ‰ ์ค„๊ณ , GPU ํ™œ์šฉํ•œ ๋ณ‘๋ ฌ ๊ณ„์‚ฐ์˜ ํ˜œํƒ๋„ ์–ป์Œ.
8.5.3 ๋‰ด๋Ÿด ํŠœ๋ง ๋จธ์‹ (NTM)
  • ์™ธ๋ถ€ ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ํ†ตํ•œ ํ™•์žฅ
  • ์ •๋ณด๋Ÿ‰์ด ๋งŽ์•„์ง€๊ฒŒ ๋  ๋•Œ, RNN ์™ธ๋ถ€์— ๊ธฐ์–ต ์žฅ์น˜๋ฅผ ๋‘๊ณ  ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ์ ์ ˆํ•˜๊ฒŒ ๊ธฐ๋กํ•˜๋Š”.
  • activation + memory ๊ตฌ์„ฑ
  • ???

 

 

728x90
๋ฐ˜์‘ํ˜•