๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/2023 DL ๊ธฐ์ดˆ ์ด๋ก  ๊ณต๋ถ€

[๋ฐ‘๋ฐ”๋‹ฅ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•˜๋Š” ๋”ฅ๋Ÿฌ๋‹ 2] chap3(word2vec)

by ์ œ๋ฃฝ 2023. 7. 9.
728x90
๋ฐ˜์‘ํ˜•

 

  • ์ถ”๋ก  ๊ธฐ๋ฐ˜์˜ ๊ธฐ๋ฒ• ⇒ ๋‹จ์–ด๋ฅผ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜์‹œ์ผœ์„œ ํ•™์Šตํ•˜๋Š” ๊ฒƒ
  • ์ด ๋˜ํ•œ ๋ถ„ํฌ๊ฐ€์„ค์„ ๊ธฐ๋ฐ˜์œผ๋กœ(๋‹จ์–ด์˜ ์˜๋ฏธ๋Š” ์ฃผ๋ณ€ ๋‹จ์–ด์— ์˜ํ•ด ํ˜•์„ฑ๋œ๋‹ค)
ํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•์˜ ๋ฌธ์ œ์ 
  • ํ†ต๊ณ„ ๊ธฐ๋ฐ˜์˜ ๊ฒฝ์šฐ, ๋ชจ๋“  ๋ฐ์ดํ„ฐ๋ฅผ ํ•œ๋ฒˆ์— ํ•™์Šต์‹œํ‚ค๊ฒŒ ๋จ.
  • ๋”ฐ๋ผ์„œ ์‹œ๊ฐ„์ด ๊ต‰์žฅํžˆ ์˜ค๋ž˜ ๊ฑธ๋ฆผ
  • ์ด์— ๋”ฐ๋ผ word2vec(์ถ”๋ก ๊ธฐ๋ฐ˜๊ธฐ๋ฒ•)์˜ ๊ฒฝ์šฐ, ๋ฏธ๋‹ˆ๋ฐฐ์น˜๋ฅผ ํ™œ์šฉํ•ด ํ•™์Šตํ•จ
์ถ”๋ก  ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•
  • ์ถ”๋ก  ๋ฌธ์ œ๋ฅผ ๋ฐ˜๋ณตํ•ด ๋‹จ์–ด์˜ ์ถœํ˜„ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š”.
  • ์‹ ๊ฒฝ๋ง์„ ํ™œ์šฉํ•ด ๋‹จ์–ด๋ฅผ ์ฒ˜๋ฆฌ
  • ๋งฅ๋ฝ → ๋ชจ๋ธ → ํ™•๋ฅ ๋ถ„ํฌ
  • ์‹ ๊ฒฝ๋ง์˜ ๊ฒฝ์šฐ, ๋‹จ์–ด๋ฅผ ๊ทธ๋Œ€๋กœ ์ฒ˜๋ฆฌํ•˜์ง€ ๋ชปํ•จ
  • ๋”ฐ๋ผ์„œ ‘๊ณ ์ • ๊ธธ์ด์˜ ๋ฒกํ„ฐ’๋กœ ๋ณ€ํ™˜ํ•ด์•ผ ํ•จ → ์›ํ•ซ ๋ฒกํ„ฐ๋กœ ๋ณ€ํ™˜.
  • ๊ฒฐ๊ตญ์—๋Š” c์—์„œ 1์ธ ์• ๋“ค์˜ ๊ฐ€์ค‘์น˜์˜ ํ–‰ ๋ฒกํ„ฐ๋งŒ ์ถœ๋ ฅ๋จ (๋‚˜๋จธ์ง€๋Š” 0์ด๋ฏ€๋กœ)
  • ๊ตณ์ด ํ–‰๋ ฌ๊ณฑ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒƒ์€ ์‹œ๊ฐ„ ๋‚ญ๋น„.
  • ๊ทธ๋ž˜์„œ embedded ๊ณ„์ธต์„ ์‚ฌ์šฉํ•˜๊ฒŒ ๋จ.
word2vec
  1. CBOW ๋ชจ๋ธ
  1. skip- gram ๋ชจ๋ธ
  • ์ถ”๋ก ๊ธฐ๋ฐ˜ ๊ธฐ๋ฒ•
  • ๋‹จ์ˆœ 2์ธต ์‹ ๊ฒฝ๋ง
  • ๊ฐ€์ค‘์น˜ ์žฌํ•™์Šต์ด ๊ฐ€๋Šฅ. ๋”ฐ๋ผ์„œ ์ถ”๊ฐ€๋œ ๋‹จ์–ด๋ฅผ ํšจ์œจ์ ์œผ๋กœ ํ•™์Šต ๊ฐ€๋Šฅ
CBOW ๋ชจ๋ธ
  • ๋งฅ๋ฝ์œผ๋กœ๋ถ€ํ„ฐ ํƒ€๊นƒ์„ ์ถ”์ธกํ•˜๋Š” ์šฉ๋„์˜ ์‹ ๊ฒฝ๋ง
  • ex) ๋งฅ๋ฝ: you goodbye, ํƒ€๊นƒ: say
  • ๊ฐ€์ค‘์น˜์˜ ๊ฒฝ์šฐ ๋™์ผ ๊ฐ€์ค‘์น˜๋ฅผ ๋„ฃ์—ˆ์Œ ⇒
  • Win: ํ–‰๋ฒกํ„ฐ(๊ฐ ํ–‰์€ ๊ฐ ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„)
  • Wout: ์—ด๋ฒกํ„ฐ(๋‹จ์–ด์˜ ์˜๋ฏธ๊ฐ€ ์ธ์ฝ”๋”ฉ๋œ ๋ฒกํ„ฐ ์ €์žฅ)
  • ๋งฅ๋ฝ์ด 2๊ฐœ ์ด๋ฏ€๋กœ (you, goodbye) MatMul ๊ณ„์ธต(์ž…๋ ฅ์ธต) ๋„ ๋‘๊ฐœ๋กœ ๋งŒ๋“ฌ
  • 0.5: ๋‘๊ฐœ์˜ ํ‰๊ท ๊ฐ’
  • ์—ฌ๊ธฐ์„œ ํ•ต์‹ฌ์€ ์€๋‹‰์ธต์˜ ๋‰ด๋Ÿฐ ์ˆ˜๋ฅผ ์ž…๋ ฅ์ธต์˜ ๋‰ด๋Ÿฐ ์ˆ˜๋ณด๋‹ค ์ ๊ฒŒ ํ•˜๋Š” ๊ฒƒ!
  • ๊ทธ๋ž˜์•ผ ๋‹จ์–ด ์˜ˆ์ธก์— ํ•„์š”ํ•œ ์ •๋ณด๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋‹ด๊ฒŒ ๋œ๋‹ค.
๋ชจ๋ธ์˜ ํ•™์Šต
def preprocess(text):     text = text.lower() #์†Œ๋ฌธ์ž๋กœ     text = text.replace('.', ' .')  #ํŠน์ˆ˜๋ฌธ์ž ๋ถ„๋ฆฌ?     words = text.split(' ') #๋„์–ด์“ฐ๊ธฐ๋กœ ๋Š๊ธฐ  	#๋‹จ์–ด id ๋ณ€ํ™˜     word_to_id = {}     id_to_word = {}     for word in words:         if word not in word_to_id:             new_id = len(word_to_id)             word_to_id[word] = new_id             id_to_word[new_id] = word      corpus = np.array([word_to_id[w] for w in words])      return corpus, word_to_id, id_to_word
text = 'You say goodbye and I say hello' corpus, word_to_id, id_to_word = preprocess(text) print(corpus) print(id_to_word)
def create_contexts_target(corpus, window_size=1):     target = corpus[window_size:-window_size]     contexts = []      for idx in range(window_size, len(corpus)-window_size):         cs = []         for t in range(-window_size, window_size + 1):             if t == 0:                 continue             cs.append(corpus[idx + t])         contexts.append(cs)      return np.array(contexts), np.array(target)
contexts, target = create_contexts_target(corpus, window_size= 1) print(contexts) print(target)
def convert_one_hot(corpus, vocab_size):     N = corpus.shape[0]      if corpus.ndim == 1:         one_hot = np.zeros((N, vocab_size), dtype=np.int32)         for idx, word_id in enumerate(corpus):             one_hot[idx, word_id] = 1      elif corpus.ndim == 2:         C = corpus.shape[1]         one_hot = np.zeros((N, C, vocab_size), dtype=np.int32)         for idx_0, word_ids in enumerate(corpus):             for idx_1, word_id in enumerate(word_ids):                 one_hot[idx_0, idx_1, word_id] = 1      return one_hot
text = "You say goodbye and I say hello" corpus,word_to_id, id_to_word = preprocess(text)  contexts, target = create_contexts_target(corpus, window_size=1)  vocab_size = len(word_to_id) target = convert_one_hot(target, vocab_size) contexts = convert_one_hot(contexts,vocab_size)
class SimpleCBOW:     def __init__(self, vocab_size, hidden_size):         V, H = vocab_size, hidden_size          # ๊ฐ€์ค‘์น˜ ์ดˆ๊ธฐํ™”         W_in = 0.01 * np.random.randn(V, H).astype('f')         W_out = 0.01 * np.random.randn(H, V).astype('f')          # ๊ณ„์ธต ์ƒ์„ฑ         self.in_layer0 = MatMul(W_in)         self.in_layer1 = MatMul(W_in)         self.out_layer = MatMul(W_out)         self.loss_layer = SoftmaxWithLoss()          # ๋ชจ๋“  ๊ฐ€์ค‘์น˜์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ ๋ฆฌ์ŠคํŠธ์— ๋ชจ์€๋‹ค.         layers = [self.in_layer0, self.in_layer1, self.out_layer]         self.params, self.grads = [], []         for layer in layers:             self.params += layer.params             self.grads += layer.grads          # ์ธ์Šคํ„ด์Šค ๋ณ€์ˆ˜์— ๋‹จ์–ด์˜ ๋ถ„์‚ฐ ํ‘œํ˜„์„ ์ €์žฅํ•œ๋‹ค.         self.word_vecs = W_in

์ธ์ˆ˜๋กœ๋Š” ์–ดํœ˜ ์ˆ˜(vocab_size)์™€ ์€๋‹‰์ธต์˜ ๋‰ด๋Ÿฐ ์ˆ˜(hidden_size)๋ฅผ ๋ฐ›๋Š”๋‹ค.

๊ฐ€์ค‘์น˜๋Š” 2๊ฐœ(W_in, W_out)๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ๋ฌด์ž‘์œ„ ๊ฐ’์œผ๋กœ ์ดˆ๊ธฐํ™”ํ•œ๋‹ค.

ํ•„์š”ํ•œ ๊ณ„์ธต(์ž…๋ ฅ์ธก2๊ฐœ, ์ถœ๋ ฅ์ธก 1๊ฐœ)์„ ์ƒ์„ฑํ•œ๋‹ค.

Softmax with Loss ๊ณ„์ธต์„ ์ƒ์„ฑํ•œ๋‹ค.

์ž…๋ ฅ ์ธก์˜ ๋งฅ๋ฝ์„ ์ฒ˜๋ฆฌํ•˜๋Š” MatMul๊ณ„์ธต์€ ๋งฅ๋ฝ์—์„œ ์‚ฌ์šฉํ•˜๋Š” ๋‹จ์–ด์˜ ์ˆ˜๋งŒํผ ๋งŒ๋“ค๊ณ  ์ด๋“ค์€ ๋ชจ๋‘ ๊ฐ™์€ ๊ฐ€์ค‘์น˜๋ฅผ ์ด์šฉ.

์‹ ๊ฒฝ๋ง์—์„œ ์‚ฌ์šฉ๋˜๋Š” ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ๊ธฐ์šธ๊ธฐ๋ฅผ params, grads๋ฆฌ์ŠคํŠธ์— ๊ฐ๊ฐ ๋ชจ์•„๋‘”๋‹ค.

์‹ ๊ฒฝ๋ง์˜ ์ˆœ์ „ํŒŒ์ธ forward()๋ฉ”์„œ๋“œ๋ฅผ ๊ตฌํ˜„ํ•œ๋‹ค. ์ธ์ˆ˜๋กœ ๋งฅ๋ฝ๊ณผ ํƒ€๊นƒ์„ ๋ฐ›์•„ ์†์‹ค์„ ๋ฐ˜ํ™˜ํ•œ๋‹ค.

def forward(self, contexts, target):         h0 = self.in_layer0.forward(contexts[:, 0])         h1 = self.in_layer1.forward(contexts[:, 1])         h = (h0 + h1) * 0.5         score = self.out_layer.forward(h)         loss = self.loss_layer.forward(score, target)         return loss

contexts๋Š” ์œ„ ์˜ˆ์—์„œ (6,2,7)์ด ๋˜๋Š”๋ฐ

0๋ฒˆ์งธ ์ฐจ์›์˜ ์›์†Œ ์ˆ˜๋Š” ๋ฏธ๋‹ˆ๋ฐฐ์น˜์˜ ์ˆ˜,

1๋ฒˆ์งธ ์ฐจ์›์˜ ์›์†Œ ์ˆ˜๋Š” ๋งฅ๋ฝ ์œˆ๋„์šฐ ํฌ๊ธฐ

2๋ฒˆ์งธ ์ฐจ์›์€ ์› ํ•ซ ๋ฒกํ„ฐ

target์€ ์œ„ ์˜ˆ์—์„œ (6,7)๊ณผ ๊ฐ™์€ ํ˜•์ƒ์ด ๋œ๋‹ค.

def backward(self, dout=1):         ds = self.loss_layer.backward(dout)         da = self.out_layer.backward(ds)         da *= 0.5         self.in_layer1.backward(da)         self.in_layer0.backward(da)         return None
CBOW์™€ ํ™•๋ฅ 
  • ์‹์œผ๋กœ ํ‘œํ˜„
  • cross entropy์— ๋Œ€์ž…ํ•œ ์‹.
skip- gram ๋ชจ๋ธ
  • ํƒ€๊นƒ์œผ๋กœ๋ถ€ํ„ฐ ์ฃผ๋ณ€ ๋‹จ์–ด(๋งฅ๋ฝ)์„ ์ถ”์ธกํ•˜๋Š” ๊ฒƒ์„ ์˜๋ฏธํ•จ.
  • ์ž…๋ ฅ์ธต์€ ํ•œ๊ฐœ
  • ์ถœ๋ ฅ์ธต์€ ๋‘๊ฐœ (๋งฅ๋ฝ์ด 2๊ฐœ)
  • ๋‘˜์„ ๋น„๊ตํ•ด๋ณด๋ฉด, skip-gram์ด ์„ฑ๋Šฅ์ด ๋” ๋›ฐ์–ด๋‚˜๊ณ , cbow๋Š” ํ•™์Šต ์†๋„๊ฐ€ ๋น ๋ฆ„.
  • skip์˜ ๊ฒฝ์šฐ ์†์‹ค๊ฐ’์„ ๋งฅ๋ฝ ์ˆ˜๋งŒํผ ๊ตฌํ•ด์•ผ ํ•˜๊ธฐ์— ์†๋„๋Š” cbow๋ณด๋‹ค ๋Š๋ฆผ
  • ํ•˜์ง€๋งŒ ํƒ€๊นƒ์„ ๊ธฐ์ค€์œผ๋กœ ๋งฅ๋ฝ์„ ๋ฝ‘์•„๋‚ธ๋‹ค๋Š” ๊ฒƒ ์ž์ฒด๊ฐ€ ์–ด๋ ค์šด ์ผ ⇒ skip ๋ชจ๋ธ์ด ๋” ๋›ฐ์–ด๋‚˜๋‹ค๋Š” ์žฅ์ ์ด ์žˆ์Œ

→ ์ƒํ™ฉ์— ๋”ฐ๋ผ ์‚ฌ์šฉํ•˜๊ธธ.

  • ์ถ”๋ก ๊ธฐ๋ฐ˜๊ธฐ๋ฒ•+ ํ†ต๊ณ„๊ธฐ๋ฐ˜๊ธฐ๋ฒ• : glove ๊ธฐ๋ฒ•๋„ ์กด์žฌํ•จ.
  • ๋ง๋ญ‰์น˜ ์ „์ฒด์˜ ํ†ต๊ณ„ ์ •๋ณด๋ฅผ ์†์‹ค ํ•จ์ˆ˜์— ๋„์ž…ํ•ด ๋ฏธ๋‹ˆ๋ฐฐ์น˜ ํ•™์Šต์„ ํ•˜๋Š” ๊ฒƒ.

728x90
๋ฐ˜์‘ํ˜•