๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90
๋ฐ˜์‘ํ˜•

nlp12

[Transformer] train.py, dataset.py, config.py, Mask ๊ตฌํ˜„ํ•˜๊ธฐ - 2 (Pytorch) ์ง€๋‚œ ์‹œ๊ฐ„์— ์ด์–ด, ์˜ค๋Š˜์€ ๋‚˜๋จธ์ง€ train.py, config.py, dataset.py ํŒŒ์ผ์„ ๊ตฌํ˜„ํ–ˆ๋‹ค. https://www.youtube.com/watch?v=ISNdQcPhsts ์ด ๋ถ„ ์ฝ”๋“œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€์Šต๋‹ˆ๋‹ค. 1. Dataset.py ๊ตฌํ˜„ 1-1. Bilingual Dataset ์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ์…‹์€ Hugging Face์—์„œ ์ œ๊ณตํ•˜๋Š” opus_books Dataset์„ ํ™œ์šฉํ•˜์˜€๋‹ค. https://huggingface.co/datasets/opus_books/viewer/en-it opus_books · Datasets at Hugging Face { "en": "Nor could I pass unnoticed the suggestion of the bleak shores of Laplan.. 2024. 2. 21.
DETR: End-to-End Object Detection with Transformers ๐Ÿ“ ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” object detection์„ direct set prediction(์ผ๋Œ€์ผ๋Œ€์‘)์œผ๋กœ ์ •์˜, transformer์™€ bipartite matching loss๋ฅผ ์‚ฌ์šฉํ•œ DETR(DEtection TRansformer)์„ ์ œ์•ˆํ•จ. DETR์€ COCO dataset์— ๋Œ€ํ•˜์—ฌ Faster R-CNN๊ณผ ๋น„์Šทํ•œ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋ณด์ž„ ์ถ”๊ฐ€์ ์œผ๋กœ, self-attention์„ ํ†ตํ•œ global information(์ „์—ญ ์ •๋ณด)๋ฅผ ํ™œ์šฉํ•จ์œผ๋กœ์จ ํฌ๊ธฐ๊ฐ€ ํฐ ๊ฐ์ฒด๋ฅผ Faster R-CNN๋ณด๋‹ค ํ›จ์”ฌ ์ž˜ ํฌ์ฐฉ. ๐Ÿ“ 1. Backbone(ResNet)์„ ์ž…๋ ฅํ•ด์„œ ํ”ผ์ฒ˜๋งต์„ ์ถ”์ถœ 2. ํ”ผ์ฒ˜๋งต์„ 1x1 conv์— ์ž…๋ ฅํ•ด์„œ flattenํ•œ ํ”ผ์ฒ˜๋งต์— ๋Œ€ํ•ด positional encoding ๊ตฌํ•ด์„œ ๋”ํ•จ โ€ป spatial.. 2023. 7. 23.
[2์ฃผ์ฐจ] SRNet: Editing Text in the Wild Review 0. Abstract ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ž์—ฐ ์ด๋ฏธ์ง€์˜ ํ…์ŠคํŠธ ํŽธ์ง‘์— ๊ด€์‹ฌ์ด ์žˆ์œผ๋ฉฐ, ์›๋ณธ ์ด๋ฏธ์ง€์˜ ๋‹จ์–ด๋ฅผ ๋‹ค๋ฅธ ๋‹จ์–ด๋กœ ๊ต์ฒดํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•˜์—ฌ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์‹œ๊ฐ์ ์œผ๋กœ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์–ด๋ ค์šด ํŽธ์ง‘๋œ ์ด๋ฏธ์ง€๋ฅผ ์œ ์ง€ํ•˜๋Š” ์ž‘์—…์„ ๋ชฉํ‘œ๋กœ ํ•จ ์„ธ ๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋œ end-to-end ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์Šคํƒ€์ผ ๋ณด์กด ๋„คํŠธ์›Œํฌ (SRNet)๋ฅผ ์ œ์•ˆ ํ…์ŠคํŠธ ๋ณ€ํ™˜ ๋ชจ๋“ˆ: ์›๋ณธ ์ด๋ฏธ์ง€์˜ ํ…์ŠคํŠธ ๋‚ด์šฉ์„ ๋Œ€์ƒ ํ…์ŠคํŠธ๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์›๋ž˜์˜ ํ…์ŠคํŠธ ์Šคํƒ€์ผ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ๊ฒฝ ์ธํŽ˜์ธํŒ… ๋ชจ๋“ˆ: ์›๋ณธ ํ…์ŠคํŠธ๋ฅผ ์ง€์šฐ๊ณ  ์ ์ ˆํ•œ ํ…์Šค์ฒ˜๋กœ ํ…์ŠคํŠธ ์˜์—ญ์„ ์ฑ„์›๋‹ˆ๋‹ค. ํ“จ์ „ ๋ชจ๋“ˆ: ๋‘ ๋ชจ๋“ˆ์˜ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ˆ˜์ •๋œ ํ…์ŠคํŠธ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ๐Ÿ’ก 1. Text Editing(ํ…์ŠคํŠธ ํŽธ์ง‘) 2. Text Synthesis(ํ…์ŠคํŠธ ํ•ฉ์„ฑ) 3. Text Erasure(ํ…์ŠคํŠธ ์‚ญ์ œ).. 2023. 7. 17.
SRNet: Editing Text in the Wild Review 0. Abstract ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ž์—ฐ ์ด๋ฏธ์ง€์˜ ํ…์ŠคํŠธ ํŽธ์ง‘์— ๊ด€์‹ฌ์ด ์žˆ์œผ๋ฉฐ, ์›๋ณธ ์ด๋ฏธ์ง€์˜ ๋‹จ์–ด๋ฅผ ๋‹ค๋ฅธ ๋‹จ์–ด๋กœ ๊ต์ฒดํ•˜๊ฑฐ๋‚˜ ์ˆ˜์ •ํ•˜์—ฌ ์›๋ณธ ์ด๋ฏธ์ง€์™€ ์‹œ๊ฐ์ ์œผ๋กœ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์–ด๋ ค์šด ํŽธ์ง‘๋œ ์ด๋ฏธ์ง€๋ฅผ ์œ ์ง€ํ•˜๋Š” ์ž‘์—…์„ ๋ชฉํ‘œ๋กœ ํ•จ ์„ธ ๊ฐ€์ง€ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋œ end-to-end ํ•™์Šต ๊ฐ€๋Šฅํ•œ ์Šคํƒ€์ผ ๋ณด์กด ๋„คํŠธ์›Œํฌ (SRNet)๋ฅผ ์ œ์•ˆ ํ…์ŠคํŠธ ๋ณ€ํ™˜ ๋ชจ๋“ˆ: ์›๋ณธ ์ด๋ฏธ์ง€์˜ ํ…์ŠคํŠธ ๋‚ด์šฉ์„ ๋Œ€์ƒ ํ…์ŠคํŠธ๋กœ ๋ณ€๊ฒฝํ•˜๋ฉด์„œ ์›๋ž˜์˜ ํ…์ŠคํŠธ ์Šคํƒ€์ผ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋ฐฐ๊ฒฝ ์ธํŽ˜์ธํŒ… ๋ชจ๋“ˆ: ์›๋ณธ ํ…์ŠคํŠธ๋ฅผ ์ง€์šฐ๊ณ  ์ ์ ˆํ•œ ํ…์Šค์ฒ˜๋กœ ํ…์ŠคํŠธ ์˜์—ญ์„ ์ฑ„์›๋‹ˆ๋‹ค. ํ“จ์ „ ๋ชจ๋“ˆ: ๋‘ ๋ชจ๋“ˆ์˜ ์ •๋ณด๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ์ˆ˜์ •๋œ ํ…์ŠคํŠธ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ๐Ÿ’ก 1. Text Editing(ํ…์ŠคํŠธ ํŽธ์ง‘) 2. Text Synthesis(ํ…์ŠคํŠธ ํ•ฉ์„ฑ) 3. Text Erasure(ํ…์ŠคํŠธ ์‚ญ์ œ).. 2023. 7. 17.
XLM: Cross-lingual Language Model Pretraining ๐Ÿ’ก 0. Abstract ์ตœ๊ทผ ์—ฐ๊ตฌ๋“ค์€ ์˜์–ด ์ž์—ฐ์–ด ์ดํ•ด์— ๋Œ€ํ•œ ์ƒ์„ฑ ์‚ฌ์ „ ํ›ˆ๋ จ์˜ ํšจ์œจ์„ฑ์„ ์ž…์ฆํ•˜์˜€์Šต๋‹ˆ๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ๋Š” ์ด ์ ‘๊ทผ๋ฒ•์„ ๋‹ค๊ตญ์–ด๋กœ ํ™•์žฅํ•˜์—ฌ ๊ต์ฐจ ์–ธ์–ด ์‚ฌ์ „ ํ›ˆ๋ จ์˜ ํšจ๊ณผ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ์–ธ์–ด ์–ธ์–ด ๋ชจ๋ธ (XLM)์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด ๋‘ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค. ํ•˜๋‚˜๋Š” ๋‹จ์ผ ์–ธ์–ด ๋ฐ์ดํ„ฐ์—๋งŒ ์˜์กดํ•˜๋Š” ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์ด๊ณ , ๋‹ค๋ฅธ ํ•˜๋‚˜๋Š” ๋ณ‘๋ ฌ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฐ๋… ํ•™์Šต ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์šฐ๋ฆฌ๋Š” ๊ต์ฐจ ์–ธ์–ด ๋ถ„๋ฅ˜, ๋น„์ง€๋„ ๋ฐ ๊ฐ๋… ๊ธฐ๊ณ„ ๋ฒˆ์—ญ์—์„œ ์ตœ๊ณ  ์ˆ˜์ค€์˜ ๊ฒฐ๊ณผ๋ฅผ ์–ป์—ˆ์Šต๋‹ˆ๋‹ค. XNLI์—์„œ ์šฐ๋ฆฌ์˜ ์ ‘๊ทผ๋ฒ•์€ 4.9%์˜ ์ ˆ๋Œ€์ ์ธ ์ •ํ™•๋„ ํ–ฅ์ƒ์„ ์ด๋Œ์–ด๋ƒˆ์Šต๋‹ˆ๋‹ค. ๋น„์ง€๋„ ๊ธฐ๊ณ„ ๋ฒˆ์—ญ์—์„œ๋Š” WMT'16 ๋…์ผ์–ด-์˜์–ด์—์„œ 34.3 BLEU๋ฅผ ๋‹ฌ์„ฑํ•˜์—ฌ ์ด์ „ ์ตœ๊ณ  ์ˆ˜์ค€๋ณด๋‹ค 9 BLEU ์ด์ƒ ํ–ฅ์ƒ๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ๊ฐ๋… ๊ธฐ๊ณ„ ๋ฒˆ.. 2023. 7. 9.
Transformer 1. overall architecture 2. overall procedure encoder์˜ ๊ฒฝ์šฐ input ๋ฌธ์žฅ์„ ๋„ฃ๊ณ  embedding ๋ฒกํ„ฐ๋กœ ๋ฐ”๊ฟ”์คŒ positional encoding์„ ๋”ํ•ด์ฃผ์–ด ๊ฐ ๋‹จ์–ด์˜ ์ˆœ์„œ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋ถ€์—ฌํ•จ. ๋”ํ•ด์„œ multi-head attention์„ ์ˆ˜ํ–‰ ์ด ๋•Œ, ๊ฐ™์€ embedding์˜ ๊ฐ’์„ Q,K,V๋กœ ๋ถ„๋ฐฐ. (Q,K,V)๋Š” ์„œ๋กœ ๊ฐ™์€ ๊ฐ’. ex) head๊ฐ€ 3๊ฐœ๋ฉด, ๊ฐ Q,K,V์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ€์ค‘์น˜ 3๊ฐœ์”ฉ ์กด์žฌํ•จ (Linear) ⇒ ์ด 9๊ฐœ์˜ ๋‹ค๋ฅธ ๊ฐ’์ด ์ƒ๊ธฐ๊ฒŒ ๋จ ์ด๋•Œ, V๋Š” encoding์˜ embedding์—์„œ ๋‚˜์˜จ ๊ฐ’์— ๊ฐ€์ค‘์น˜ ๊ณฑํ•œ ๊ฒƒ์„ ์˜๋ฏธ. ํ•˜๋‚˜์˜ head๋‹น Q์™€ K๋ฅผ ๊ณฑํ•ด์„œ softmax ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์นœ ํ›„, V๊ฐ’๊ณผ ๊ณฑํ•จ ์ด ๊ฐ๊ฐ ๊ณฑํ•œ 3๊ฐœ์˜ head ๊ฐ’.. 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•