๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

ELMO

by ์ œ๋ฃฝ 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•

 

 

1. Intro
  • ๊ฐ™์€ read๋ผ๊ณ  ํ•ด๋„ ํ˜„์žฌํ˜•๊ณผ ๊ณผ๊ฑฐํ˜•์ด ์žˆ์Œ -> ์•ž์—์„œ๋งŒ ์˜ˆ์ธก์„ ํ•ด์„œ ์ถœ๋ ฅํ•˜๋ฉด ์ •ํ™•ํžˆ ๋ชจ๋ฅด๊ธฐ ๋•Œ๋ฌธ์—, ๋’ค์—์„œ๋ถ€ํ„ฐ ์˜ค๋Š” ์• ๋“ค์„ ๊ฐ€์ง€๊ณ  ์˜ˆ์ธก์„ ํ•ด์„œ read๊ฐ€ ๊ณผ๊ฑฐํ˜•์œผ๋กœ ์“ฐ์ธ๋‹ค! ๋ผ๊ณ  ์•Œ๋ ค์ฃผ๋Š”๊ฒŒ ์—˜๋ชจ์˜ ์—ญํ• 
2. Overall architecture
  1. read์— ํ•ด๋‹นํ•˜๋Š” ์นœ๊ตฌ๋ฅผ ๋ฝ‘๋Š”๋‹ค
  1. forward ๋ถ€๋ถ„๊ณผ backward ๋ถ€๋ถ„์„ ํ•จ๊ป˜ ํ•™์Šต์‹œํ‚ด
  1. ์ด๋•Œ, word embedding ๋ถ€๋ถ„, LSTM1์ธต, LSTM2์ธต ๋“ฑ ๊ฐ๊ฐ์˜embedding๊ณผ LSTM๋ผ๋ฆฌ concat์„ ์‹œํ‚ด
  1. ์ดํ›„, ์•Œ๋งž๊ฒŒ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณฑํ•ด์คŒ ( ์ด๋•Œ ์•„๋ž˜์— ์žˆ์„์ˆ˜๋ก ๋ฌธ๋ฒ•์ ์ธ ์ธก๋ฉด์—์„œ์˜ ๋ฒกํ„ฐ์ด๊ณ , ์œ„๋กœ ๊ฐˆ์ˆ˜๋ก ๋ฌธ๋งฅ์— ๋งž๋Š” ๋ฒกํ„ฐ๋ผ๊ณ  ํ•จ)
  1. ์ดํ›„, ๊ฐ€์ค‘ํ•ฉ์„ ํ•˜๋ฉด ํ•˜๋‚˜์˜ ๋ฒกํ„ฐ๊ฐ€ ๋งŒ๋“ค์–ด์ง → read์— ๋Œ€ํ•œ embedding ์ธต์— elmo ๊ฐ’์„ ๋ถ™์ธ ํ›„, output ์ „์— LSTM ์ธต์—๋„ ์˜†์— elmo ๊ฐ’์„ ๋ถ™์—ฌ์„œ ํ•™์Šต ์ง„ํ–‰
3. Bidirectional language models (biLM)
  • t(k) k๋ฒˆ์งธ์— ๋Œ€ํ•œ ํ† ํฐ ๊ตฌํ•˜๋Š” ์‹
  • token Sequence์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•  ๋•Œ, token tk ์˜ ํ™•๋ฅ ์„ ์ด์ „ token๋“ค (t1,...,tk−1)์„ ํ™œ์šฉํ•ด์„œ ๋ชจ๋ธ๋งํ•จ
  • ๋ฐ˜๋Œ€๋กœ backward LM์€ Token Sequence์˜ ํ™•๋ฅ ์„ ๊ณ„์‚ฐํ•  ๋•Œ, token tk์˜ ํ™•๋ฅ ์„  k ์‹œ์  ์ดํ›„์˜ token๋“ค (tk+1,...,tN) ์„ ํ™œ์šฉํ•˜์—ฌ ๋ชจ๋ธ๋ง
  • ์œ„์˜ forward, backward Language Model์„ ํ•ฉ์ณ ํ•จ๊ป˜ ํ™•๋ฅ ์„ ์ตœ๋Œ€ํ™” ํ•˜๋Š” ๋ฐฉ๋ฒ•์œผ๋กœ ์ง„ํ–‰
4. ELMo
  • ์ด 2L +1 ๊ฐœ์˜ representation์„ ๊ณ„์‚ฐ (์ˆœ์ „ํŒŒ LSTM + ์—ญ์ „ํŒŒ LSTM + input embedding)
  • ์ฆ‰, LSTM์ด ๋‘๊ฐœ๋ฉด, ์ด 5๊ฐœ์˜ representation์„ ๊ณ„์‚ฐํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ
  • ์œ„์˜ ์‹๊ณผ ๊ฐ™์€ ์„ค๋ช…
  • (๊ฐ€์ค‘์น˜(s0,s1,s2) x ๊ฐ LSTM ์ธต์˜ ๊ฐ’์˜ ํ•ฉ) x scaling๊ฐ’
  • ๊ฐ๋งˆ task: ELMo vector ํฌ๊ธฐ๋ฅผ scaling ํ•ด์คŒ
5. Where to include ELMo?
  1. input๊ณผ output ์ง์ „
  1. input๋งŒ
  1. output ์ง์ „⇒ input๊ณผ output์— ๋„ฃ์—ˆ์„ ๋•Œ๊ฐ€ ๊ฐ€์žฅ ์„ฑ๋Šฅ ์ข‹์•˜์Œ
6. Evaluation

ELMo๋ฅผ ๋‹จ์ˆœํ•˜๊ฒŒ ์ถ”๊ฐ€ํ•˜๋Š” ๊ฒƒ๋งŒ์œผ๋กœ๋„ baseline model์— ๋น„ํ•ด ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋๊ณ , ์ด๋ฅผ ํ†ตํ•ด SOTA๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ์Œ

7. Outro
  • biLM์„ ์‚ฌ์šฉํ•ด ๋†’์€ ์ˆ˜์ค€์˜ context๋ฅผ ํ•™์Šตํ•˜๋Š” ELMo model์„ ์ œ์•ˆํ•จ.
  • ELMo model์„ ์‚ฌ์šฉํ•˜๋ฉด ๋Œ€๋ถ€๋ถ„์˜ NLP task์—์„œ ์„ฑ๋Šฅ์ด ํ–ฅ์ƒ๋จ.
  • layer์˜ ์ธต์ด ์˜ฌ๋ผ๊ฐˆ์ˆ˜๋ก syntax๋ณด๋‹ค๋Š” semanticํ•œ ์ •๋ณด๋ฅผ ๋‹ด์•„๋‚ธ๋‹ค๋Š” ์‚ฌ์‹ค๋„ ๋ฐœ๊ฒฌํ•ด๋ƒ„.
  • ๋•Œ๋ฌธ์— ์–ด๋Š ํ•œ layer๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค๋Š” ๋ชจ๋“  layer์˜ representation์„ ๊ฒฐํ•ฉํ•ด ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด ์ „๋ฐ˜์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ๋„์›€์ด ๋œ๋‹ค๋Š” ๊ฒฐ๋ก ์„ ๋‚ด๋ฆด ์ˆ˜ ์žˆ์Œ.

 

728x90
๋ฐ˜์‘ํ˜•

'Deep Learning > [๋…ผ๋ฌธ] Paper Review' ์นดํ…Œ๊ณ ๋ฆฌ์˜ ๋‹ค๋ฅธ ๊ธ€

Transformer  (0) 2023.07.06
Inception V2/3  (0) 2023.07.06
SegNet  (0) 2023.07.06
CycleGAN  (0) 2023.07.05
XLNet: Generalized Autoregressive Pretraining for Language Understanding  (1) 2023.07.05