๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90
๋ฐ˜์‘ํ˜•
AE AutoEncoder ์ž…๋ ฅ์ด ๋“ค์–ด์™”์„ ๋•Œ, ํ•ด๋‹น ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ์ตœ๋Œ€ํ•œ ์••์ถ• ์‹œํ‚จ ํ›„, ์••์ถ• ์‹œํ‚จ ๋ฐ์ดํ„ฐ๋ฅผ ๋ณธ๋ž˜์˜ ์ž…๋ ฅ ํ˜•ํƒœ๋กœ ๋ณต์›์‹œํ‚ค๋Š” ์‹ ๊ฒฝ๋ง ์••์ถ•ํ•˜๋Š” ๋ถ€๋ถ„์„ encoder ๋ณต์›ํ•˜๋Š” ๋ถ€๋ถ„์„ decoder ์••์ถ•๊ณผ์ •์—์„œ ์ถ”์ถœํ•œ ์˜๋ฏธ์žˆ๋Š” ๋ฐ์ดํ„ฐ๋ฅผ latent vector AutoEncoder์˜ ์ˆ˜์‹๊ณผ ํ•™์Šต ๋ฐฉ๋ฒ• ์ˆ˜์‹ Input Data๋ฅผ Encoder Network์— ํ†ต๊ณผ์‹œ์ผœ ์••์ถ•๋œ z๊ฐ’์„ ์–ป์Œ ์••์ถ•๋œ z vector๋กœ๋ถ€ํ„ฐ Input Data์™€ ๊ฐ™์€ ํฌ๊ธฐ์˜ ์ถœ๋ ฅ ๊ฐ’์„ ์ƒ์„ฑ ์ด๋•Œ Loss๊ฐ’์€ ์ž…๋ ฅ๊ฐ’ x์™€ Decoder๋ฅผ ํ†ต๊ณผํ•œ y๊ฐ’์˜ ์ฐจ์ด ํ•™์Šต ๋ฐฉ๋ฒ• Decoder Network๋ฅผ ํ†ต๊ณผํ•œ Output layer์˜ ์ถœ๋ ฅ ๊ฐ’์€ Input๊ฐ’์˜ ํฌ๊ธฐ์™€ ๊ฐ™์•„์•ผ ํ•จ(๊ฐ™์€ ์ด๋ฏธ์ง€๋ฅผ ๋ณต์›ํ•œ๋‹ค๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋จ) ์ด๋•Œ ํ•™์Šต์„ ์œ„ํ•ด์„œ.. 2023. 7. 6.
SPPNet 1. Intro ๊ธฐ์กด์—์„œ๋Š” ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ input์œผ๋กœ ๋ฐ›์•˜์Œ ์™œ? : FC layer์—์„œ ๊ณ ์ •๊ธธ์ด ๋ฒกํ„ฐ๋งŒ ๋ฐ›์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ ๋ฌธ์ œ์ ? : ํฌ๊ธฐ๊ฐ€ ๋‹ค ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ํ•œ ์‚ฌ์ด์ฆˆ๋กœ ํ†ต์ผํ•ด๋ฒ„๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€์˜ ์™œ๊ณก์ด๋‚˜, ์•„๋ž˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์ž˜๋ฆฌ๊ฑฐ๋‚˜ ์ด๋ฏธ์ง€๊ฐ€ ๊ณ ์žฅ๋‚จ. ํ•˜์ง€๋งŒ? : ์‚ฌ์‹ค FC layer์— ๋“ค์–ด๊ฐ€๊ธฐ ์ „๊นŒ์ง€๋Š” ์‚ฌ์ด์ฆˆ๊ฐ€ ์ œ๊ฐ๊ฐ ์ด์–ด๋„ ๊ดœ์ฐฎ์Œ ๊ทธ๋ž˜์„œ? : ์ด๋ฒˆ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๋ณด์™„ํ•œ “Saptial Pyramid Pooling layer”๋ฅผ ์„ค๋ช…. โ€ป CNN์ด ๊ณ ์ •๋œ ์ž…๋ ฅ ํฌ๊ธฐ๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ์ด์œ  CNN์€ Convolutional layer + fc layer๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ ์ด๋•Œ conv์˜ ๊ฒฝ์šฐ, sliding window ๋ฐฉ์‹์œผ๋กœ ์ด๋™ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์‹ ๊ฒฝ์“ฐ์ง€ ์•Š์•„๋„ ๋ชจ๋“ .. 2023. 7. 6.
Faster R-CNN 0. R-CNN ์ž…๋ ฅ ์ด๋ฏธ์ง€์— Selective Search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ bounding box(region proposal) 2000๊ฐœ๋ฅผ ์ถ”์ถœ. ์ถ”์ถœ๋œ bounding box๋ฅผ warp(resize)ํ•˜์—ฌ CNN์— ์ž…๋ ฅ. fine tunning ๋˜์–ด ์žˆ๋Š” pre-trained CNN์„ ์‚ฌ์šฉํ•˜์—ฌ bounding box์˜ 4096์ฐจ์›์˜ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœ. ์ถ”์ถœ๋œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ SVM์„ ์ด์šฉํ•˜์—ฌ class๋ฅผ ๋ถ„๋ฅ˜. bounding box regression์„ ์ ์šฉํ•˜์—ฌ bounding box์˜ ์œ„์น˜๋ฅผ ์กฐ์ •. non maximum supression์„ ์ง„ํ–‰ ⇒ ์ด ์นœ๊ตฌ์˜ ๋ฌธ์ œ์ : 1) ๊ฐœ๋Š๋ฆผ 2) ๋“ค์–ด๊ฐˆ ๋•Œ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ์™œ๊ณก๋จ 0. SPPNet R-CNN์˜ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜.. 2023. 7. 6.
YOLO: You Only Look Once: Unified, Real-Time Object Detection 1. Intro What is objection Detection? object classification: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ณ  ๊ทธ๊ฒƒ์ด ๊ฐœ์ธ์ง€ ๊ณ ์–‘์ธ์ง€๋ฅผ ํŒ๋‹จ object localization: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐœ๋Š” ์–ด๋””์— ์œ„์น˜ํ•˜๋Š”์ง€ ํŒ๋‹จ → output: x,y,w,h object detection: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ object๋ฅผ ๊ฐ๊ฐ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ ex) DPM, R-CNN one-stage vs two-stage detector one stage: localization+classification์„ ๋™์‹œ์— ์ˆ˜ํ–‰ex) conv๋ฅผ ํ†ต๊ณผํ•œ ํ›„, ๊ฐ grid cell ๋งˆ๋‹ค classification๊ฒฐ๊ณผ์™€ bounding box regression์„ ํ†ตํ•ด ๊ฒฐ๊ณผ ๋„์ถœ two stage:.. 2023. 7. 6.
Fast R-CNN 0. Fast R-CNN ๊ทธ๋ž˜์„œ ๋‚˜์˜จ ์นœ๊ตฌ๊ฐ€ fast R-CNN Selective Search input image๋ฅผ ๊ฐ€์ง€๊ณ  selective search ์ง„ํ–‰ image ์•ˆ์— ๊ฐ์ฒด๊ฐ€ ์žˆ์„๋ฒ•ํ•œ ํ›„๋ณด๊ตฐ๋“ค์„ ์ตœ๋Œ€ ex) 2000๊ฐœ ์„ ์ •ํ•จ ROI ์˜์—ญ ์ถ”์ถœ⇒ ์ด ๋•Œ 2000๊ฐœ์˜ ์˜์—ญ์„ ๋‹ค ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ(Hierarohical sampling)์ด๋ผ๊ณ  ํ•จex) input image๊ฐ€ 2๊ฐœ๊ณ , region์ด 128๋กœ ์žก์•˜๋‹ค๋ฉด 64๊ฐœ์˜ ์˜์—ญ๋งŒ ํ›„๋ณด ์˜์—ญ์œผ๋กœ ๊ฐ€์ ธ๊ฐ ⇒ ํ•œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๋‹น์˜ ์ด๋ฏธ์ง€๋งŒํผ ๋‚˜๋ˆ ์ค€ ์• ๋“ค๋งŒ ์‚ฌ์šฉํ•œ๋‹ค CNN input image ํ•œ ์žฅ์„ ๊ทธ๋ƒฅ CNN ๊ตฌ์กฐ์— ๋„ฃ์–ด๋ฒ„๋ฆผ (conv+pooling์˜ ๋ฐ˜๋ณต ๊ตฌ๊ฐ„) CNN ๊ณ„์ธต ๋ฐ˜๋ณตํ•˜๋‹ค๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์—์„œ์˜ ํ’€๋ง์„ ROI pooling์œผ๋กœ ์ง„ํ–‰ํ•จ ROI poo.. 2023. 7. 6.
Transformer 1. overall architecture 2. overall procedure encoder์˜ ๊ฒฝ์šฐ input ๋ฌธ์žฅ์„ ๋„ฃ๊ณ  embedding ๋ฒกํ„ฐ๋กœ ๋ฐ”๊ฟ”์คŒ positional encoding์„ ๋”ํ•ด์ฃผ์–ด ๊ฐ ๋‹จ์–ด์˜ ์ˆœ์„œ์— ๋Œ€ํ•œ ์ •๋ณด๋ฅผ ๋ถ€์—ฌํ•จ. ๋”ํ•ด์„œ multi-head attention์„ ์ˆ˜ํ–‰ ์ด ๋•Œ, ๊ฐ™์€ embedding์˜ ๊ฐ’์„ Q,K,V๋กœ ๋ถ„๋ฐฐ. (Q,K,V)๋Š” ์„œ๋กœ ๊ฐ™์€ ๊ฐ’. ex) head๊ฐ€ 3๊ฐœ๋ฉด, ๊ฐ Q,K,V์— ํ•ด๋‹นํ•˜๋Š” ๊ฐ€์ค‘์น˜ 3๊ฐœ์”ฉ ์กด์žฌํ•จ (Linear) ⇒ ์ด 9๊ฐœ์˜ ๋‹ค๋ฅธ ๊ฐ’์ด ์ƒ๊ธฐ๊ฒŒ ๋จ ์ด๋•Œ, V๋Š” encoding์˜ embedding์—์„œ ๋‚˜์˜จ ๊ฐ’์— ๊ฐ€์ค‘์น˜ ๊ณฑํ•œ ๊ฒƒ์„ ์˜๋ฏธ. ํ•˜๋‚˜์˜ head๋‹น Q์™€ K๋ฅผ ๊ณฑํ•ด์„œ softmax ํ•จ์ˆ˜๋ฅผ ๊ฑฐ์นœ ํ›„, V๊ฐ’๊ณผ ๊ณฑํ•จ ์ด ๊ฐ๊ฐ ๊ณฑํ•œ 3๊ฐœ์˜ head ๊ฐ’.. 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•