๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
728x90
๋ฐ˜์‘ํ˜•

CV15

YOLOv4: Optimal Speed and Accuracy of Object Detection ๐Ÿ’ก 0. Abstract CNN์˜ ์ •ํ™•๋„๋ฅผ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋‹ค์–‘ํ•œ ๊ธฐ๋Šฅ์ด ๋งŽ์ด ์กด์žฌํ•ฉ๋‹ˆ๋‹ค. ์ด๋Ÿฌํ•œ ๊ธฐ๋Šฅ๋“ค์˜ ์กฐํ•ฉ์„ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‹ค์ œ๋กœ ํ…Œ์ŠคํŠธํ•˜๊ณ  ๊ฒฐ๊ณผ๋ฅผ ์ด๋ก ์ ์œผ๋กœ ์ •๋‹นํ™”ํ•˜๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค. ์ผ๋ถ€ ๊ธฐ๋Šฅ์€ ํŠน์ • ๋ชจ๋ธ์ด๋‚˜ ๋ฌธ์ œ์—๋งŒ ์ ์šฉ๋˜๊ฑฐ๋‚˜ ์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—๋งŒ ์ ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋ฐฐ์น˜ ์ •๊ทœํ™”(batch-normalization)์™€ ์ž”์ฐจ ์—ฐ๊ฒฐ(residual-connections)๊ณผ ๊ฐ™์€ ๊ธฐ๋Šฅ์€ ๋Œ€๋ถ€๋ถ„์˜ ๋ชจ๋ธ, ์ž‘์—… ๋ฐ ๋ฐ์ดํ„ฐ์…‹์— ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ €ํฌ๋Š” ๊ฐ€์ค‘ ์ž”์ฐจ ์—ฐ๊ฒฐ(Weighted-Residual-Connections, WRC), ํฌ๋กœ์Šค ์Šคํ…Œ์ด์ง€ ๋ถ€๋ถ„ ์—ฐ๊ฒฐ(Cross-Stage-Partial-connections, CSP), ํฌ๋กœ์Šค ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ์ •๊ทœํ™”(Cross mini-Batch Norma.. 2023. 7. 9.
R-CNN 1. Intro R-CNN 'Rich feature hierarchies for accurate object detection and semantic segmentation'. R-CNN์€ region proposals์™€ CNN์ด ๊ฒฐํ•ฉ๋œ Regions with CNN์˜ ์•ฝ์ž๋กœ ์ง€์นญ (1) region proposals๋กœ object ์œ„์น˜๋ฅผ ์•Œ์•„๋‚ด๊ณ , ์ด๋ฅผ CNN์— ์ž…๋ ฅํ•˜์—ฌ class๋ฅผ ๋ถ„๋ฅ˜. (2) Larger data set์œผ๋กœ ํ•™์Šต๋œ pre-trained CNN์„ fine-tunning. 2. Overall architecture ์ž…๋ ฅ ์ด๋ฏธ์ง€์— Selective Search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ bounding box(region proposal) 2000๊ฐœ๋ฅผ ์ถ”์ถœ. ์ถ”์ถœ๋œ bounding box๋ฅผ w.. 2023. 7. 6.
SPPNet 1. Intro ๊ธฐ์กด์—์„œ๋Š” ๊ณ ์ •๋œ ํฌ๊ธฐ์˜ ์ด๋ฏธ์ง€๋ฅผ input์œผ๋กœ ๋ฐ›์•˜์Œ ์™œ? : FC layer์—์„œ ๊ณ ์ •๊ธธ์ด ๋ฒกํ„ฐ๋งŒ ๋ฐ›์„ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ ๋ฌธ์ œ์ ? : ํฌ๊ธฐ๊ฐ€ ๋‹ค ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ํ•œ ์‚ฌ์ด์ฆˆ๋กœ ํ†ต์ผํ•ด๋ฒ„๋ฆฌ๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€์˜ ์™œ๊ณก์ด๋‚˜, ์•„๋ž˜ ์‚ฌ์ง„๊ณผ ๊ฐ™์ด ์ž˜๋ฆฌ๊ฑฐ๋‚˜ ์ด๋ฏธ์ง€๊ฐ€ ๊ณ ์žฅ๋‚จ. ํ•˜์ง€๋งŒ? : ์‚ฌ์‹ค FC layer์— ๋“ค์–ด๊ฐ€๊ธฐ ์ „๊นŒ์ง€๋Š” ์‚ฌ์ด์ฆˆ๊ฐ€ ์ œ๊ฐ๊ฐ ์ด์–ด๋„ ๊ดœ์ฐฎ์Œ ๊ทธ๋ž˜์„œ? : ์ด๋ฒˆ ๋…ผ๋ฌธ์—์„œ๋Š” ์ด๋Ÿฌํ•œ ๋ฌธ์ œ์ ์„ ๋ณด์™„ํ•œ “Saptial Pyramid Pooling layer”๋ฅผ ์„ค๋ช…. โ€ป CNN์ด ๊ณ ์ •๋œ ์ž…๋ ฅ ํฌ๊ธฐ๋ฅผ ํ•„์š”๋กœ ํ•˜๋Š” ์ด์œ  CNN์€ Convolutional layer + fc layer๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ์Œ ์ด๋•Œ conv์˜ ๊ฒฝ์šฐ, sliding window ๋ฐฉ์‹์œผ๋กœ ์ด๋™ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ์‹ ๊ฒฝ์“ฐ์ง€ ์•Š์•„๋„ ๋ชจ๋“ .. 2023. 7. 6.
Faster R-CNN 0. R-CNN ์ž…๋ ฅ ์ด๋ฏธ์ง€์— Selective Search ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ ์šฉํ•˜์—ฌ bounding box(region proposal) 2000๊ฐœ๋ฅผ ์ถ”์ถœ. ์ถ”์ถœ๋œ bounding box๋ฅผ warp(resize)ํ•˜์—ฌ CNN์— ์ž…๋ ฅ. fine tunning ๋˜์–ด ์žˆ๋Š” pre-trained CNN์„ ์‚ฌ์šฉํ•˜์—ฌ bounding box์˜ 4096์ฐจ์›์˜ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ ์ถ”์ถœ. ์ถ”์ถœ๋œ ํŠน์ง• ๋ฒกํ„ฐ๋ฅผ SVM์„ ์ด์šฉํ•˜์—ฌ class๋ฅผ ๋ถ„๋ฅ˜. bounding box regression์„ ์ ์šฉํ•˜์—ฌ bounding box์˜ ์œ„์น˜๋ฅผ ์กฐ์ •. non maximum supression์„ ์ง„ํ–‰ ⇒ ์ด ์นœ๊ตฌ์˜ ๋ฌธ์ œ์ : 1) ๊ฐœ๋Š๋ฆผ 2) ๋“ค์–ด๊ฐˆ ๋•Œ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋ฅผ ๊ณ ์ •์‹œํ‚ค๊ธฐ ๋•Œ๋ฌธ์— ์ด๋ฏธ์ง€ ์™œ๊ณก๋จ 0. SPPNet R-CNN์˜ ๋‹จ์ ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ๋‚˜.. 2023. 7. 6.
YOLO: You Only Look Once: Unified, Real-Time Object Detection 1. Intro What is objection Detection? object classification: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋ฅผ ๋ณด๊ณ  ๊ทธ๊ฒƒ์ด ๊ฐœ์ธ์ง€ ๊ณ ์–‘์ธ์ง€๋ฅผ ํŒ๋‹จ object localization: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ๊ฐœ๋Š” ์–ด๋””์— ์œ„์น˜ํ•˜๋Š”์ง€ ํŒ๋‹จ → output: x,y,w,h object detection: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€ ๋‚ด์—์„œ ์„œ๋กœ ๋‹ค๋ฅธ object๋ฅผ ๊ฐ๊ฐ ์ฐพ์•„๋‚ด๋Š” ๊ฒƒ ex) DPM, R-CNN one-stage vs two-stage detector one stage: localization+classification์„ ๋™์‹œ์— ์ˆ˜ํ–‰ex) conv๋ฅผ ํ†ต๊ณผํ•œ ํ›„, ๊ฐ grid cell ๋งˆ๋‹ค classification๊ฒฐ๊ณผ์™€ bounding box regression์„ ํ†ตํ•ด ๊ฒฐ๊ณผ ๋„์ถœ two stage:.. 2023. 7. 6.
Fast R-CNN 0. Fast R-CNN ๊ทธ๋ž˜์„œ ๋‚˜์˜จ ์นœ๊ตฌ๊ฐ€ fast R-CNN Selective Search input image๋ฅผ ๊ฐ€์ง€๊ณ  selective search ์ง„ํ–‰ image ์•ˆ์— ๊ฐ์ฒด๊ฐ€ ์žˆ์„๋ฒ•ํ•œ ํ›„๋ณด๊ตฐ๋“ค์„ ์ตœ๋Œ€ ex) 2000๊ฐœ ์„ ์ •ํ•จ ROI ์˜์—ญ ์ถ”์ถœ⇒ ์ด ๋•Œ 2000๊ฐœ์˜ ์˜์—ญ์„ ๋‹ค ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ(Hierarohical sampling)์ด๋ผ๊ณ  ํ•จex) input image๊ฐ€ 2๊ฐœ๊ณ , region์ด 128๋กœ ์žก์•˜๋‹ค๋ฉด 64๊ฐœ์˜ ์˜์—ญ๋งŒ ํ›„๋ณด ์˜์—ญ์œผ๋กœ ๊ฐ€์ ธ๊ฐ ⇒ ํ•œ ๋ฏธ๋‹ˆ ๋ฐฐ์น˜ ๋‹น์˜ ์ด๋ฏธ์ง€๋งŒํผ ๋‚˜๋ˆ ์ค€ ์• ๋“ค๋งŒ ์‚ฌ์šฉํ•œ๋‹ค CNN input image ํ•œ ์žฅ์„ ๊ทธ๋ƒฅ CNN ๊ตฌ์กฐ์— ๋„ฃ์–ด๋ฒ„๋ฆผ (conv+pooling์˜ ๋ฐ˜๋ณต ๊ตฌ๊ฐ„) CNN ๊ณ„์ธต ๋ฐ˜๋ณตํ•˜๋‹ค๊ฐ€ ๋งˆ์ง€๋ง‰ ๋ถ€๋ถ„์—์„œ์˜ ํ’€๋ง์„ ROI pooling์œผ๋กœ ์ง„ํ–‰ํ•จ ROI poo.. 2023. 7. 6.
728x90
๋ฐ˜์‘ํ˜•