๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

BodyNet: Volumetric Inference of 3D Human Body Shapes

by ์ œ๋ฃฝ 2023. 8. 3.
728x90
๋ฐ˜์‘ํ˜•

 

 

BodyNet์ด๋ž€?

: ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ 2D pose, segmentation ์ถ”์ถœ, ๋‘ ๊ฐœ์˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•ด 3D pose๋ฅผ ํ•™์Šต, ์ดํ›„, 3๊ฐ€์ง€ ์ •๋ณด์— RGB ์ •๋ณด๊นŒ์ง€ ํ™œ์šฉํ•ด 3D์˜ ๋ถ€ํ”ผ ๊ธฐ๋ฐ˜ ์ฒดํ˜•์„ ๊ตฌ์„ฑํ•˜๋Š” Network๋ฅผ ๋งํ•จ

: end to end ํ˜•์‹

<ํ•™์Šต ๋ฐฉ์‹>
1. ์ž…๋ ฅ RGB ์ด๋ฏธ์ง€๋Š” ๋จผ์ € 2D ํฌ์ฆˆ ์ถ”์ •๊ณผ 2D ์‹ ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ์œ„ํ•œ ํ•˜์œ„ ๋„คํŠธ์›Œํฌ๋ฅผ ํ†ต๊ณผ
2. 2D pose์™€ segmentation์„ ํ›ˆ๋ จ
3. ํ•™์Šต๋œ 2D pose์™€ Segmentation ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ ์ •ํ•ด์„œ 3D pose๋ฅผ ํ›ˆ๋ จ์‹œํ‚ด
4. ์ดํ›„, ์ด์ „์˜ ๋ชจ๋“  ๋„คํŠธ์›Œํฌ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ ์ •ํ•˜๊ณ  3D ํ˜•ํƒœ network๋ฅผ ํ›ˆ๋ จ
5. ์ถ”๊ฐ€ ์žฌํ”„๋กœ์ ์…˜ ์†์‹ค๋กœ ํ˜•ํƒœ ๋„คํŠธ์›Œํฌ ํ›ˆ๋ จํ•ด์„œ ๋ถ€ํ”ผ ๊ธฐ๋ฐ˜ ํ˜•ํƒœ ์ถ”์ • ์ž‘์—…์— ๋Œ€ํ•ด ์„ธ๋ฐ€ ์กฐ์ •
6. ๊ฒฐํ•ฉ๋œ ์†์‹ค๋กœ ๋ชจ๋“  ๋„คํŠธ์›Œํฌ ๊ฐ€์ค‘์น˜๋ฅผ end to end ๋ฏธ์„ธ ์กฐ์ •
7. ํ‰๊ฐ€๋กœ ๋ถ€ํ”ผ ์˜ˆ์ธก์— SMPL ๋ชจ๋ธ์„ ๋งž์ถค

 


 

0. ABSTRACT

: ์ธ๊ฐ„์˜ ํ˜•ํƒœ ์˜ˆ์ธก์€ ๋น„๋””์˜ค๋‚˜ ์• ๋‹ˆ๋ฉ”์ด์…˜ ํ˜น์€ ํŒจ์…˜ ์‚ฐ์—…์— ์žˆ์–ด์„œ ์ค‘์š”ํ•œ ์ž‘์—…

: ํ•˜์ง€๋งŒ, ์ด๋ฏธ์ง€์—์„œ 3D ์‹ ์ฒด ํ˜•ํƒœ๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์€ ์‹œ์ ์ด๋‚˜, ์ฒดํ˜•, ์˜๋ณต๊ณผ ๊ฐ™์€ ์š”์ธ๋“ค๋กœ ์ธํ•ด ๋งค์šฐ ์–ด๋ ค์›€

: ๋˜ํ•œ, ์ด๋Ÿฌํ•œ ๋ฐฉ๋ฒ•์„ ์œ„ํ•ด์„œ๋Š” ์ธ์ฒด ๋ชจ๋ธ์„ ๋งž์ถ”๋ ค๊ณ  ํ•˜๊ณ , ํฌ์ฆˆ ๋ฐ ํ˜•ํƒœ์— ๋Œ€ํ•œ ํŠน์ • ์‚ฌ์ „ ์ง€์‹์„ ์ง€๋‹ˆ๊ณ  ์žˆ์–ด์•ผ ํ•จ

โžก๏ธ ๋‹จ์ผ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ 3D ํ˜•ํƒœ๋ฅผ ์ง์ ‘ ์ถ”๋ก ํ•˜๋Š” BodyNet์„ ์ œ์•ˆ

: End to End ํ˜•์‹

(i) 3D ๋ณผ๋ฅจ ์†์‹ค

(ii) ๋‹ค์ค‘ ๋ทฐ ์žฌํˆฌ์˜ ์†์‹ค ๋ฐ

(iii) 2D ํฌ์ฆˆ, 2D ์‹ ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋ฐ 3D ํฌ์ฆˆ์˜ ์ค‘๊ฐ„ ์ง€๋„๊ฐ€ ์ด๋ฃจ์–ด์ง

: ํ‰๊ฐ€์˜ ๊ฒฝ์šฐ, SMPL ๋ชจ๋ธ์„ BodyNet ์ถœ๋ ฅ์— ๋งž์ถ”๊ณ  ์ตœ๊ทผ์˜ SURREAL [33]๊ณผ Unite the People [34] ๋ฐ์ดํ„ฐ์…‹์—์„œ ๋‹จ์ผ ๋ทฐ 3D ์ธ๊ฐ„ ํ˜•ํƒœ ์ถ”์ • ์„ฑ๋Šฅ์„ ์ธก์ •ํ•จ

 


 

1. Introduction

: ๋‹จ์ผ ๋ทฐ ํ™˜๊ฒฝ์—์„œ๋Š” 3D ํ˜•ํƒœ ์ถ”์ • ์—ฐ๊ตฌ๊ฐ€ ํ™œ์„ฑํ™” ๋˜์–ด์žˆ์ง€ ์•Š์•˜์Œ

: ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹, ๋†’์€ ์ฐจ์› ๋“ฑ์ด ํ•„์š”

โžก๏ธ ๋ณผ๋ฅจ ํ‘œํ˜„์„ ์ œ์•ˆ, 3D voxel grid๋ฅผ ํ™œ์šฉํ•จ, ์žฌํˆฌ์˜ ์†์‹ค ์ œ์•ˆ, segmentationํ™œ์šฉ

 

<Contribution>

  1. 3D ์ธ๊ฐ„ ํ˜•ํƒœ ์ถ”์ •์„ ์œ„ํ•ด ๋‹จ์ผ ๋ทฐ ์ ‘๊ทผ ๋ฐฉ์‹์„ ๋‹ค๋ฃจ๊ณ  ์ด ์ž‘์—…์„ ์œ„ํ•œ ๋ณผ๋ฅจ ํ‘œํ˜„์„ ์ œ์•ˆํ•จ
  1. ์—ฌ๋Ÿฌ ๊ฐ€์ง€ ๋„คํŠธ์›Œํฌ ์•„ํ‚คํ…์ฒ˜๋ฅผ ์กฐ์‚ฌํ•˜๊ณ  ๋ฉ€ํ‹ฐ๋ทฐ ์žฌํˆฌ์˜ ์†์‹ค๊ณผ 2D ํฌ์ฆˆ, 2D ์‹ ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜, 3D ํฌ์ฆˆ์˜ ์ค‘๊ฐ„ ๋„คํŠธ์›Œํฌ ์ง€๋„๋ฅผ ๊ฒฐํ•ฉํ•œ ์—”๋“œ ํˆฌ ์—”๋“œ ํ›ˆ๋ จ ๊ฐ€๋Šฅํ•œ BodyNet์„ ์ œ์•ˆ
  1. ํ•ด๋‹น network๋Š” ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•˜๋ฉฐ ๋ณผ๋ฅจ ๊ธฐ๋ฐ˜์˜ ์‹ ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ์ œ๊ณต

 


 

3. BodyNet

: ํ•˜๋‚˜์˜ ์ด๋ฏธ์ง€๋กœ๋ถ€ํ„ฐ 3D ์ธ์ฒด ํ˜•ํƒœ๋ฅผ ์˜ˆ์ธกํ•˜๋ฉฐ, 2D ํฌ์ฆˆ, 2D ์ธ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜, 3D ํฌ์ฆˆ ๋ฐ 3D ํ˜•ํƒœ๋ฅผ ์˜ˆ์ธกํ•˜๊ธฐ ์œ„ํ•ด ๋…๋ฆฝ์ ์œผ๋กœ ํ›ˆ๋ จ๋œ ๋„ค ๊ฐœ์˜ ํ•˜์œ„๋„คํŠธ์›Œํฌ๋กœ ๊ตฌ์„ฑ๋จ

3.1 Volumetric inference for 3D human shape

: 3D voxel grid๋ฅผ ์ •์˜ํ•จ

  • โ€ป voxel

    : 3์ฐจ์› ๊ณต๊ฐ„์—์„œ ์ •๊ทœ ๊ฒฉ์ž ๋‹จ์œ„์˜ ๊ฐ’์„ ๋‚˜ํƒ€๋ƒ„.

    : ๋ถ€ํ”ผ (volume)์™€ ํ”ฝ์…€ (pixel)์„ ์กฐํ•ฉํ•œ ํ˜ผ์„ฑ์–ด

    โžก๏ธ 3์ฐจ์›์—์„œ์˜ pixel์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ•˜๋ฉด ๋ ๋“ฏ

: voxel grid๋ฅผ ๊ณ ์ •๋œ ํ•ด์ƒ๋„ ๊ทธ๋ฆฌ๋“œ๋กœ ๋ณ€ํ™˜

: ์ง๊ต ํˆฌ์˜์„ ๊ฐ€์ •ํ•˜๊ณ , xy ํ‰๋ฉด์ด ์ž…๋ ฅ ์ด๋ฏธ์ง€์˜ 2D ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋งˆ์Šคํฌ์™€ ๊ณต๊ฐ„์ ์œผ๋กœ ๋Œ€์‘ํ•˜๋„๋ก ๋ณผ๋ฅจ์„ ์žฌ์กฐ์ • (2D segmenationํ•œ ๊ฒƒ์„ 3D ์œ„์น˜์— ๋‘”๋‹ค๋Š” ๋ง)

: ์žฌ์กฐ์ • ํ›„, ์ธ์ฒด๋Š” z์ถ•์„ ๊ธฐ์ค€์œผ๋กœ ์ค‘์‹ฌ์— ์œ„์น˜ (3Dํ™”)

: ๋‚˜๋จธ์ง€ ๊ณต๊ฐ„ 0์œผ๋กœ ํŒจ๋”ฉ

 

์ด์ง„ ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค

: ์œ„ ์†์‹คํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•ด์„œ ๋ฐฐ๊ฒฝ๊ณผ, ์‹ ์ฒด๋ฅผ ๋ถ„ํ• ํ•จ

: ๋˜ํ•œ, ๋‹ค์ค‘ ํด๋ž˜์Šค ๊ต์ฐจ ์—”ํŠธ๋กœํ”ผ ์†์‹ค์„ ์‚ฌ์šฉํ•˜์—ฌ 3D ์ธ์ฒด ๋ถ€์œ„ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ์ˆ˜ํ–‰ : ๋จธ๋ฆฌ, ์ƒ์ฒด, ์ขŒ/์šฐ ๋‹ค๋ฆฌ, ์ขŒ/์šฐ ํŒ”์„ ํฌํ•จํ•œ 6๊ฐœ ๋ถ€์œ„๋ฅผ ์ •์˜ํ•˜๊ณ  ๋ฐฐ๊ฒฝ์„ ํฌํ•จํ•˜์—ฌ 7๊ฐœ ํด๋ž˜์Šค ๋ถ„๋ฅ˜๋ฅผ ํ•™์Šต

โžก๏ธ ์ด๋กœ์จ, ๋น„์šฉ์ด ๋งŽ์ด ๋“œ๋Š” SMPL ๋ชจ๋ธ ์ ํ•ฉ์„ ๊ฑฐ์น˜์ง€ ์•Š๊ณ ๋„ ์ง์ ‘์ ์œผ๋กœ 3D ์ธ์ฒด ๋ถ€์œ„๋ฅผ ์ถ”๋ก ํ•  ์ˆ˜ ์žˆ์Œ


3.2 Multi-view re-projection loss on the silhouette

: 3D ๊ตฌ์„ฑ์„ ํ•˜๋‹ค๋ณด๋ฉด, ์ธ์ฒด ์ค‘์‹ฌ์œผ๋กœ๋ถ€ํ„ฐ ๋จผ ํŒ”๊ณผ ๋‹ค๋ฆฌ์˜ ์‹ ๋ขฐ๋„๊ฐ€ ๋‚ฎ์•„์ง€๋Š” ๊ฒฝํ–ฅ์„ ๊ด€์ฐฐํ•  ์ˆ˜ ์žˆ์Œ

โžก๏ธ ๋”ฐ๋ผ์„œ, ๊ฒฝ๊ณ„ ๋ณต์…€์˜ ์ค‘์š”์„ฑ์„ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ์ถ”๊ฐ€์ ์ธ 2D ์žฌํ”„๋กœ์ ์…˜ ์†์‹ค์„ ์‚ฌ์šฉ (๋‹ค์ค‘ ๋ทฐ ์žฌํ”„๋กœ์ ์…˜ ํ•ญ์ด ํ•„์š”(๋‹ค๊ฐ๋„์—์„œ์˜ ํฌ์ฆˆ๋ฅผ ์žฌํ•™์Šต ์‹œํ‚ค๋Š” ์˜๋ฏธ)

: ์ง๊ต ํˆฌ์˜์„ ๊ฐ€์ •ํ•จ

ํšŒ์ƒ‰ ์ฐธ๊ณ  (Re-projection loss)

1) ์•ž ๋ทฐ ํˆฌ์˜์ธ SˆF V๋Š” z์ถ•์„ ๋”ฐ๋ผ max ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณผ๋ฅจ ๊ทธ๋ฆฌ๋“œ๋ฅผ ์ด๋ฏธ์ง€์— ํˆฌ์˜ํ•˜์—ฌ ์–ป์Œ.

2) ์ธก๋ฉด ๋ทฐ ํˆฌ์˜์ธ SˆSV๋Š” x์ถ•์„ ๋”ฐ๋ผ max ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์–ป์Œ

์ตœ์ข… ์žฌํ”„๋กœ์ ์…˜ ํ•ญ์˜ ์‹

 


3.3 Multi-task learning with intermediate supervision

: ์„œ๋ธŒ๋„คํŠธ์›Œํฌ์˜ ์ž…๋ ฅ์€ RGB, 2D ์ž์„ธ, ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜, ๊ทธ๋ฆฌ๊ณ  3D ์ž์„ธ ์˜ˆ์ธก์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ตฌ์„ฑ

: ๊ฐ ์„œ๋ธŒ๋„คํŠธ์›Œํฌ์˜ ์•„ํ‚คํ…์ฒ˜๋Š” stacked hourglass ๋„คํŠธ์›Œํฌ [1]๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•จ

  • โ€ป stacked hourglass network

    : Residual block + top-down ๋Š๋‚Œ

    : ํ•˜๋‚˜์˜ block์€ residual unit์„ ์˜๋ฏธ

    : ์ž…๋ ฅ๊ฐ’์ด ์ตœ์†Œ resolution์„ ๊ฐ–๋„๋ก residual unit์„ ํ†ตํ•œ down sample์„ ๊ฑฐ์นจ

    : ์ตœ์†Œ resolution์— ๋„๋‹ฌํ•œ ๋’ค์— biliear upsample ๋ฐฉ์‹์œผ๋กœ ์›๋ž˜ ์ž…๋ ฅ๊ฐ’ ํฌ๊ธฐ๋กœ ๋ณต์›

    : ๋˜ํ•œ ๋™์ผํ•œ ํฌ๊ธฐ์˜ resolution ๋ผ๋ฆฌ element-wise addition ์—ฐ์‚ฐ์„ ์ˆ˜ํ–‰ํ•จ

    : ์œ„์™€ ๊ฐ™์€ ๊ณผ์ •์„ ๊ฑฐ์น˜๋ฉด, ์ตœ์†Œ resolution์ด ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ์–ผ๊ตด, ์†๊ณผ ๊ฐ™์€ local ์ •๋ณด์™€ ์›๋ž˜ ์ž…๋ ฅ ํฌ๊ธฐ๊ฐ€ ์ง€๋‹ˆ๊ณ  ์žˆ๋Š” ๋ชธ ์ „์ฒด, ์‚ฌ๋žŒ์˜ ๋ฐฉํ–ฅ, ํŒ”์˜ ํ˜•ํƒœ๋ฅผ ํ•จ๊ป˜ ์ด์šฉํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์žˆ๋‹ค๊ณ  ํ•จ.

 

  1. 2D Pose

: 2D ์ž์„ธ์˜ heatmap ํ‘œํ˜„์„ ์‚ฌ์šฉ

: ๊ฐ ์ธ์ฒด ๊ด€์ ˆ์— ๋Œ€ํ•ด ๊ณ ์ •๋œ ๋ถ„์‚ฐ์„ ๊ฐ€์ง„ ๊ฐ€์šฐ์‹œ์•ˆ์ด ํ•ด๋‹น ๊ด€์ ˆ์˜ ์ด๋ฏธ์ง€ ์œ„์น˜์— ์ค‘์‹ฌ์œผ๋กœ ์˜ˆ์ธก

: ์ตœ์ข… ๊ด€์ ˆ ์œ„์น˜๋Š” ๊ฐ ์ถœ๋ ฅ ์ฑ„๋„์—์„œ ์ตœ๋Œ“๊ฐ’์„ ๊ฐ€์ง„ ํ”ฝ์…€ ์ธ๋ฑ์Šค๋กœ ํ™•์ธ

: hourglass ๋„คํŠธ์›Œํฌ์˜ ์ฒซ ๋‘ ์Šคํƒ์„ ์‚ฌ์šฉ

: 16๊ฐœ์˜ ์ธ์ฒด ๊ด€์ ˆ์„ ์˜ˆ์ธก

 

  1. 2D Part Segmentation

: ์•„ํ‚คํ…์ฒ˜๋Š” 2D ์ž์„ธ ๋„คํŠธ์›Œํฌ์™€ ์œ ์‚ฌํ•˜๋ฉฐ ๋‹ค์‹œ ์ฒ˜์Œ ๋‘ ์Šคํƒ์„ ์‚ฌ์šฉ

๋„คํŠธ์›Œํฌ๋Š” ์ž…๋ ฅ RGB ์ด๋ฏธ์ง€์— ๋Œ€ํ•ด ๊ฐ ๋ถ€์œ„์— ๋Œ€ํ•œ ํ•˜๋‚˜์˜ ํžˆํŠธ๋งต์„ ์˜ˆ์ธก

 

  1. 3D Pose

: ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๊ด€์ ˆ ์œ„์น˜๋ฅผ ์ถ”์ •ํ•˜๋Š” ๊ฒƒ์€ ๋ณธ์งˆ์ ์œผ๋กœ ๋ชจํ˜ธํ•œ ๋ฌธ์ œ

: ์นด๋ฉ”๋ผ ๋‚ด๋ถ€ ๋งค๊ฐœ๋ณ€์ˆ˜๊ฐ€ ์•Œ๋ ค์ ธ ์žˆ๋‹ค๊ณ  ๊ฐ€์ •ํ•˜๊ณ  3D ์ž์„ธ๋ฅผ ์นด๋ฉ”๋ผ ์ขŒํ‘œ๊ณ„์—์„œ ์˜ˆ์ธกํ•จ

: 3D ํžˆํŠธ๋งต์œผ๋กœ 2D ํžˆํŠธ๋งต์„ ํ™•์žฅํ•˜์—ฌ ๊ฐ ๊ด€์ ˆ์˜ 3D ์œ„์น˜๋ฅผ 3D ๊ฐ€์šฐ์‹œ์•ˆ์œผ๋กœ ๋‚˜ํƒ€๋ƒ„ (2D ํžˆํŠธ๋งต์„ ํ™•์žฅํ•ด์„œ 3Dํ™” ํ•˜๊ฒ ๋‹ค)

: ๊ฐ ๊ด€์ ˆ์— ๋Œ€ํ•ด ๋„คํŠธ์›Œํฌ๋Š” ๊ด€์ ˆ ์œ„์น˜์—์„œ ์ค‘์‹ฌ์œผ๋กœ ํ•˜๋Š” ๋‹จ์ผ 3D ๊ฐ€์šฐ์‹œ์•ˆ์ด ์žˆ๋Š” ๊ณ ์ • ํ•ด์ƒ๋„ ๋ณผ๋ฅจ์„ ์˜ˆ์ธก

โžก๏ธ 3D ๊ทธ๋ฆฌ๋“œ์˜ xy ์ฐจ์›์€ ๊ฒฐ๊ตญ ์ด๋ฏธ์ง€ ์ขŒํ‘œ์™€ ์ผ์น˜ํ•˜๊ธฐ ๋•Œ๋ฌธ์—, 2D ๊ด€์ ˆ ์œ„์น˜๋ฅผ ๋‚˜ํƒ€๋‚ด๋ฉฐ, z ์ฐจ์›์€ ๊นŠ์ด๋ฅผ ๋‚˜ํƒ€๋ƒ„

: ๋ณต์…€ ๊ทธ๋ฆฌ๋“œ๊ฐ€ 3D ๋ณธ์ฒด์™€ ์ผ์น˜ํ•˜๋„๋ก ํ•˜๊ณ , ๋ฃจํŠธ ๊ด€์ ˆ์ด 3D ๋ณผ๋ฅจ์˜ ์ค‘์‹ฌ์— ํ•ด๋‹นํ•˜๋„๋ก ๊ฐ€์ •

 

<์ตœ์ข… ํ•™์Šต ๋ฐฉ์‹>

์ตœ์ข… Loss

(i) 2D ์ž์„ธ์™€ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜์„ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

(ii) ๊ณ ์ •๋œ 2D ์ž์„ธ์™€ ์„ธ๊ทธ๋ฉ˜ํ…Œ์ด์…˜ ๋„คํŠธ์›Œํฌ ๊ฐ€์ค‘์น˜๋กœ 3D ์ž์„ธ๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

(iii) ์ด์ „์˜ ๋ชจ๋“  ๋„คํŠธ์›Œํฌ ๊ฐ€์ค‘์น˜๋ฅผ ๊ณ ์ •ํ•˜๊ณ  3D ํ˜•ํƒœ ๋„คํŠธ์›Œํฌ๋ฅผ ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

(iv) ๊ทธ๋Ÿฐ ๋‹ค์Œ, ์ถ”๊ฐ€ ์žฌํ”„๋กœ์ ์…˜ ์†์‹ค๋กœ ํ˜•ํƒœ ๋„คํŠธ์›Œํฌ๋ฅผ ๊ณ„์† ํ›ˆ๋ จํ•ฉ๋‹ˆ๋‹ค.

(v) ๋งˆ์ง€๋ง‰์œผ๋กœ ๊ฒฐํ•ฉ๋œ ์†์‹ค๋กœ ๋ชจ๋“  ๋„คํŠธ์›Œํฌ ๊ฐ€์ค‘์น˜๋ฅผ ์—”๋“œ ํˆฌ ์—”๋“œ๋กœ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค

 


3.4 Fitting a parametric body model

: ํ‰๊ฐ€๋ฅผ ์œ„ํ•ด SMPL ๋ชจ๋ธ์„ ์‚ฌ์šฉ

SMPL๊ณผ์˜ ๊ฑฐ๋ฆฌ ์ธก์ •์— ์‚ฌ์šฉ๋˜๋Š” ์‹์ธ๊ฐ€๋ด„

 


 

EXPERIMENTS

4.1 Datasets and evaluation measures

: Dataset์œผ๋กœ๋Š” SURREAL ๊ณผ Unite the People์„ ์‚ฌ์šฉํ•จ

: ํ‰๊ฐ€์ง€ํ‘œ : IOU

SURREAL
Unite the People

4.2 Alternative methods

: BodyNet ์ž…๋ ฅ์— ๋Œ€ํ•œ fitting(SMPLify++)์€ ํ‰๊ท ์ ์ธ ๋ชจ์–‘๊ณผ ์œ ์‚ฌํ•œ ํ˜•ํƒœ๋ฅผ ์ƒ์„ฑ.

: BodyNet์€ ์ด๋ฏธ์ง€์—์„œ ๊ด€์ฐฐ๋œ ์‹ค์ œ ๋ชจ์–‘์ด ํ‰๊ท ์ ์ธ ๋ณ€ํ˜• ๊ฐ€๋Šฅํ•œ ๋ชจ์–‘ ๋ชจ๋ธ์—์„œ ์–ด๋–ป๊ฒŒ ๋ฒ—์–ด๋‚˜๋Š”์ง€๋ฅผ ํ•™์Šตํ•จ

: ์ค‘๊ฐ„ํ‘œํ˜„( 2d pose, 2d Segmentation, 3d pose)๋ฅผ ์˜๋ฏธํ•จ

: 2d prediction์ด ์‹คํŒจํ•ด๋„ ๋‹ค๋ฅธ ์ •๋ณด๋“ค์„ ์ƒํ˜ธ๋ณด์™„ํ•ด์„œ 3D ๋ชธ์ฒด ํ˜•ํƒœ๋ฅผ ์ถ”๋ก ํ•˜๊ธฐ์— 3D ๋ชจ์–‘ ๋ณต๊ตฌ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.

 

4.3 Effect of additional inputs

: ์ถ”๊ฐ€์ ์ธ 2D ํฌ์ฆˆ์™€ ์„ธ๋ถ„ํ™” ์ž…๋ ฅ์œผ๋กœ ์ด๋ฏธ ํ›ˆ๋ จ๋œ 3D ํฌ์ฆˆ ๋„คํŠธ์›Œํฌ๊ฐ€ ๊ฐ€์žฅ ์šฐ์ˆ˜ํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์ž„

: 3D ํฌ์ฆˆ์™€ 2D ์„ธ๋ถ„ํ™”๋ฅผ ์ค‘๊ฐ„ ํ‘œํ˜„์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ด RGB๋ณด๋‹ค ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ๋ณด์—ฌ์คŒ

 

4.4 Effect of re-projection error and end-to-end multi-task training

: ์•ž๋ฉด ์žฌํ”„๋กœ๋•์…˜๊ณผ ์ธก๋ฉด ์žฌํ”„๋กœ๋•์…˜์„ ํ–ˆ์„ ๋•Œ ์„ฑ๋Šฅ์ด ๊ฐ€์žฅ ์ข‹์•˜๋‹ค

 

4.5 Comparison to the state of the art on Unite the People

<์ƒ๋žต>

4.6 3D body part segmentation

: ์ตœ์‹  GPU๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋ฏธ์ง€ ๋‹น 0.28์ดˆ์™€ 0.58์ดˆ์˜ ์†๋„๋กœ BodyNet์ด ์ „๊ฒฝ ๋ฐ ๊ฐœ๋ณ„ ํŒ”๋‹ค๋ฆฌ ๋ณต์…€์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Œ

: ๋‹จ์ผ ์ด๋ฏธ์ง€์—์„œ 3D ๋ชธ์ฒด ๋ถ€์œ„ ๋ผ๋ฒจ๋ง์„ ์œ„ํ•œ ์ตœ์ดˆ์˜ ์ข…๋‹จ ๊ฐ„ ์ ‘๊ทผ ๋ฐฉ์‹์œผ๋กœ ์•Œ๋ ค์ง

: ๋ณ€ํ˜• ๊ฐ€๋Šฅํ•œ ๋ชจ๋ธ์˜ ๋ฐ˜๋ณต์  ์ ํ•ฉ ์—†์ด๋„ ๋„คํŠธ์›Œํฌ๋กœ ์ง์ ‘ 3D ๋ชธ์ฒด ๋ถ€์œ„๋ฅผ ์ถ”๋ก ํ•˜๊ณ  ์„ฑ๊ณต์ ์ธ ๊ฒฐ๊ณผ๋ฅผ ์–ป์Œ

 

 

<์ฐธ๊ณ >

https://deep-learning-study.tistory.com/617

https://ko.wikipedia.org/wiki/๋ณต์…€


728x90
๋ฐ˜์‘ํ˜•