๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ
Deep Learning/[๋…ผ๋ฌธ] Paper Review

Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization

by ์ œ๋ฃฝ 2023. 8. 13.
728x90
๋ฐ˜์‘ํ˜•

CAM(Class Activation Maps) ์ด๋ž€?

Global Max Pooling(GMP) vs Global Average Pooling(GAP)

: ์ „์ฒด ์˜์—ญ ๋‚ด์—์„œ ๊ฐ€์žฅ ํฐ ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ Global Max Pooling(GMP)๋ผ๊ณ  ํ•จ

: ๋ฐ˜๋ฉด, ๋ชจ๋“  ๊ฐ’์„ ๊ณ ๋ คํ•˜์—ฌ ํ‰๊ท ๊ฐ’์„ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉ๋ฒ•์„ Global Average Pooling(GAP)์ด๋ผ๊ณ  ํ•จ

 

: ๋ณดํ†ต CNN์˜ ๊ตฌ์กฐ์—์„œ๋Š”๋งˆ์ง€๋ง‰ feature map์„ flattenํ•˜์—ฌ 1์ฐจ์› ๋ฒกํ„ฐ๋กœ ๋งŒ๋“  ๋’ค ์ด๋ฅผ Fully Connected Netowork๋ฅผ ํ†ต๊ณผํ•˜์—ฌ softmax๋กœ classification์„ ํ–ˆ์—ˆ์Œ.

: ์ด FC layer๋Š” parameter์˜ ๊ฐœ์ˆ˜๋ฅผ ๋งค์šฐ ์ปค์ง€๋„๋ก ๋งŒ๋“ค๊ธฐ ๋•Œ๋ฌธ์— overfitting ์œ„ํ—˜์ด ์ฆ๊ฐ€ํ•  ์ˆ˜ ์žˆ๊ณ , Feature map(pooling์ด์ „)์— ์กด์žฌํ•˜๋Š” object๋“ค์˜ ์œ„์น˜์ •๋ณด๊ฐ€ ์†์‹ค๋œ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌ.

 

: CAM์€ flatten์„ ํ•˜์ง€ ์•Š๊ณ , ์ด๋ฅผ Global Average Pooling์œผ๋กœ ๋Œ€์ฒดํ•จ

: overfitting์„ ๋ฐฉ์ง€ํ•  ์ˆ˜ ์žˆ๋Š” regularization์˜ ์—ญํ• ์„ ํ•˜๋ฉฐ, ์œ„์น˜์ •๋ณด๋ฅผ ์†์‹คํ•˜์ง€ ์•Š์„ ์ˆ˜ ์žˆ๋„๋ก ํ•จ

: ์œ„ ๊ทธ๋ฆผ์—์„œ๋Š” ์ด 4๊ฐœ์˜ Feature Map์ด ์กด์žฌํ•˜๋ฏ€๋กœ ์ด 4๊ฐœ์˜ ํŠน์ง•๋ณ€์ˆ˜๊ฐ€ ์ƒ์„ฑ๋จ

: <CAM์„ ์‚ฌ์šฉํ•˜๊ธฐ ์œ„ํ•œ architecture> GAP + softmax layer๋กœ ์—ฐ๊ฒฐ ⇒ class ํ™•๋ฅ  ๊ณ„์‚ฐ

 

Class Activation Mapping

: ๊ฐ feature map๊ณผ feature map์ด ํŠน์ • class๋กœ ๋ถ„๋ฅ˜๋  ๊ฐ€์ค‘์น˜(w)๋ฅผ ๊ณฑํ•ด์„œ ํ•ฉํ•˜๋ฉด ์ขŒํ‘œ ๋ณ„ (x,y) ํŠน์ • ํด๋ž˜์Šค์— ๋Œ€ํ•œ ์˜ํ–ฅ๋ ฅ ๊ณ„์‚ฐํ•  ์ˆ˜ ์žˆ์Œ ⇒ ์ด๊ฑธ ๋ฐ”๋กœ CAM์ด๋ผ๊ณ  ๋ถ€๋ฆ„

: ๊ฐ ํด๋ž˜์Šค์— ๋Œ€ํ•ด CAM์„ ์ ์šฉํ•˜๋ฉด ์ด๋ฏธ์ง€์—์„œ ํด๋ž˜์Šค์— ์˜ํ–ฅ์„ ์ฃผ๋Š” ์ขŒํ‘œ์„ ์ถ”์ถœํ•  ์ˆ˜ ์žˆ์Œ

: ๋งˆ์ง€๋ง‰ convolution layer์—์„œ์˜ CAM์„ ์‹œ๊ฐํ™”ํ–ˆ๊ธฐ ๋•Œ๋ฌธ์—, ์ตœ์ข… CAM์„ ์ฒ˜์Œ input image์™€ ๊ฐ™์€ ํฌ๊ธฐ๋กœ unsamplingํ•˜๋ฉด, input image๋‚ด์—์„œ class c(๊ฐœ)์™€ ๊ด€๋ จ๋˜์–ด์žˆ๋Š” ์˜์—ญ์ด ์–ด๋””์ธ์ง€ ํ™•์ธํ•  ์ˆ˜ ์žˆ์Œ

 

Learning Deep Features for Discriminative Localization

<CAM์˜ ์ฃผ์š” ํŠน์ง•>

  • Weakly-supervised object localization: Object Classification๋งŒ์„ ์œ„ํ•ด ํ•™์Šต๋œ CNN ๋ชจ๋ธ์ด ์ด๋ฏธ์ง€๋ฅผ classify + localization ๊นŒ์ง€ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
  • Visualizing CNNs
    : Global Average Pooling์„ ์‚ฌ์šฉํ•˜์—ฌ CAM์„ ์‹œ๊ฐํ™”ํ•œ ์ด๋ฏธ์ง€
  • : ๊ฐ ์ด๋ฏธ์ง€๋“ค์— ๋Œ€ํ•ด classifyํ•˜๋ฉด์„œ๋„ object๋“ค์ด ์œ„์น˜ํ•˜๋Š” ์˜์—ญ๋„ ์ฐพ์•„๋‚ผ ์ˆ˜ ์žˆ์Œ์„ ๋ณผ ์ˆ˜ ์žˆ์Œ

 

 

1 INTRODUCTION

: ๋ชจ๋ธ์„ ํ•ด์„ํ•  ๋•Œ์—๋Š” Simplicity์™€ Interpretability์‚ฌ์ด์˜ tradeoff ๊ด€๊ณ„๊ฐ€ ์žˆ์Œ.

: ์ฆ‰, ๋ชจ๋ธ์ด ๊ฐ„๋‹จํ• ์ˆ˜๋ก ํ•ด์„์€ ์šฉ์ดํ•ด์ง€๊ณ  ๋ชจ๋ธ์ด ๋ณต์žกํ• ์ˆ˜๋ก ํ•ด์„์€ ์–ด๋ ค์›Œ์ง€๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์˜ accuracy๋ฅผ ์žƒ์ง€ ์•Š์œผ๋ฉด์„œ ํ•ด์„ํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์ด ๋‘˜ ์‚ฌ์ด์˜ ์ ์ •์ ์„ ์ฐพ๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•จ

ex) ๋ชจ๋ธ์˜ ํ•™์Šต์ด ์‹คํŒจํ•œ ๊ฒฝ์šฐ, ์ž˜๋ชป๋œ ์ถœ๋ ฅ์„ ๋‚ด๋ณด๋ƒˆ์„ ๋•Œ ์™œ ๋ชจ๋ธ์ด ์ด๋Ÿฐ ๊ฒฐ๊ณผ๋ฅผ ๋ƒˆ์„๊นŒ or ์–ด๋””๊ฐ€ ์ž˜๋ชป๋์ง€ ๋ผ๋Š” ์„ค๋ช…์ด ๊ฐ€๋Šฅํ•ด์•ผ ํ•˜๋Š”๋ฐ ๊ทธ๋Ÿฌ์ง€ ๋ชปํ•œ๋‹ค๋Š” ๊ฑฐ์ž„ ⇒ “Black box”

: ์ด ๋…ผ๋ฌธ์€ ๋ชจ๋ธ์ด ์˜ˆ์ธกํ•œ ๋‚ด์šฉ์„ ์™œ ์˜ˆ์ธกํ–ˆ๋Š”์ง€, ์™œ ๊ทธ๋ ‡๊ฒŒ ์˜ˆ์ธกํ–ˆ๋Š”์ง€ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ€์ ธ์•ผ ํ•œ๋‹ค๊ณ  ๋งํ•˜๊ณ  ์žˆ์Œ

: ํˆฌ๋ช…ํ•œ ๋ชจ๋ธ์„ ๊ตฌ์ถ•ํ•ด์•ผ ์šฐ๋ฆฌ๊ฐ€ ๋ชจ๋ธ์„ ์‹ ๋ขฐํ•˜๊ณ  ์ดํ•ดํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ!

โžก๏ธ “Black Box”์˜ ๋‚ด๋ถ€๋ฅผ ์•Œ์•„๋ณด์ž!

what makes a good visual explannation?

  1. Class discriminative: ๋‹ค๋ฅธ ํด๋ž˜์Šค๋ฅผ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ
  1. High-resolution: ์ถฉ๋ถ„ํžˆ object๋ฅผ ํŒ๋ณ„ํ•  ์ˆ˜ ์žˆ๊ณ  ํŠน์ง•์„ ์•Œ์•„๋‚ผ ์ˆ˜ ์žˆ์–ด์•ผ ํ•จ

: ์™ผ) ํŠน์ง•๋“ค์€ ์ž˜ ๋ณด์—ฌ์ฃผ์ง€๋งŒ class ๊ตฌ๋ณ„์€ ์ž˜ ํ•˜์ง€ ๋ชปํ•จ

: ์˜ค) ํด๋ž˜์Šค ๊ตฌ๋ถ„๋„ ์ž˜ํ•ด์คŒ

โžก๏ธ cat class์˜ ๊ฒฝ์šฐ, cat ์ง€์—ญ์„ ๊ฐ•์กฐํ•˜๊ณ  dog ์ง€์—ญ์€ ๊ฐ•์กฐํ•˜์ง€ ์•Š์Œ

โžก๏ธ ๊ณ ์–‘์ด์˜ ์ค„๋ฌด๋Šฌ๋„ ๊ฐ•์กฐํ•ด์„œ ํŠน์ • ๊ณ ์–‘์ด์˜ ์ข…์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์— ๋Œ€ํ•œ ์ •๋ณด๋„ ์•Œ๊ฒŒ ํ•ด์คŒ

Contribution

  1. Grad-CAM : Class-discriminativeํ•œ localization technique์œผ๋กœ, ์‹œ๊ฐํ™”๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์ด ์‹คํŒจํ–ˆ์„ ๋•Œ์˜ ์ง„๋‹จ ๋˜ํ•œ ๊ฐ€๋Šฅ(์™œ ์‹คํŒจํ–ˆ์„๊นŒ์— ๋Œ€ํ•œ ์„ค๋ช…์ด ๊ฐ€๋Šฅํ•˜๋‹ค๋Š” ๊ฒƒ์ž„)
  1. Top-performing classification, captioning, VQA ๋“ฑ ๋‹ค์–‘ํ•œ downstream work์— ์ ์šฉ ๊ฐ€๋Šฅํ•จ
  • โ€ป Top-performing classification, captioning, VQA ?

    : Top-performing classification: ์ด๋ฏธ์ง€๋‚˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ž…๋ ฅ์œผ๋กœ ๋ฐ›์•„ ํด๋ž˜์Šค ๋ ˆ์ด๋ธ”์„ ์˜ˆ์ธกํ•˜๋Š” ๋ชจ๋ธ.

    : captioning: ์ด๋ฏธ์ง€๋‚˜ ๋น„๋””์˜ค์— ๋Œ€ํ•œ ์„ค๋ช…์„ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ.

    : VQA (Visual Question Answering): ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์งˆ๋ฌธ์— ๋Œ€ํ•œ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๋Š” ๋ชจ๋ธ.

  1. Human study๋ฅผ ์ˆ˜ํ–‰ํ•ด์„œ Grad-CAM์ด class-discriminative ์žˆ๊ณ , ์‹ ๋ขฐ๋ฅผ ๋•๋Š”๋ฐ ๋„์›€์ด ๋˜๋„๋ก ํ•จ.⇒ ํ›ˆ๋ จ๋˜์ง€ ์•Š์€ ์ผ๋ฐ˜์ธ(์ด ๋ถ„์•ผ ๋ชจ๋ฅด๋Š” ์‚ฌ๋žŒ)๋„ ๋ชจ๋ธ์„ ๋ดค์„ ๋•Œ, ๊ฐ•ํ•œ’ ๋ชจ๋ธ๊ณผ ‘์•ฝํ•œ’ ๋ชจ๋ธ์„ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์คŒ (์•„, ์ด ๋ชจ๋ธ์€ ๊ตฌ๋ถ„์„ ์ž˜ ํ•˜๋Š” ๋ชจ๋ธ์ด๊ตฌ๋‚˜ ํ˜น์€ ํ•™์Šต์ด ์ž˜๋œ ๋ชจ๋ธ์ด๊ตฌ๋‚˜~ ์•„, ์ด ๋ชจ๋ธ์€ ๋ญ”์ง„ ๋ชจ๋ฅด๊ฒ ์ง€๋งŒ ์„ฑ๋Šฅ์ด ๋ณ„๋กœ์ธ ๋ชจ๋ธ์ด๊ตฌ๋‚˜)

2 RELATED WORK

<๊ธฐ์กด ์—ฐ๊ตฌ๋“ค์˜ ๋ฐฉ๋ฒ•๊ณผ ํ•œ๊ณ„์ ๋“ค์„ ์†Œ๊ฐœํ•จ ⇒ ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋ฐœ์ „์‹œํ‚ค๊ณ ์ž ํ•จ>

1. Visualizing CNNs.

  1. ‘pixel’ ๋‹จ์œ„์˜ ์˜ํ–ฅ๋ ฅ(์ค‘์š”์„ฑ)์„ ์‹œ๊ฐํ™” ํ•˜๋ ค๋Š” ์—ฐ๊ตฌ๊ฐ€ ๋งŽ์•˜์œผ๋‚˜, class-discriminative ํ•˜์ง€ ์•Š์Œ

     โžก๏ธ ๊ฐ cat๊ณผ dog ์˜ˆ์ธก์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , feature map ์ƒ ๋‘ ๊ฐœ์˜ ํด๋ž˜์Šค๊ฐ€ ๊ตฌ๋ถ„๋˜์ง€ ์•Š์Œ

  1. ๋˜ ๋‹ค๋ฅธ ์—ฐ๊ตฌ๋กœ๋Š” ํŠน์ • ์œ ๋‹›(๋‰ด๋Ÿฐ)์ด ํ™œ์„ฑํ™”๋  ๋•Œ ์–ด๋–ค ์ž…๋ ฅ ์ด๋ฏธ์ง€๊ฐ€ ์ƒ์„ฑ๋  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์•Œ์•„๋‚ด๊ณ ์ž ํ–ˆ์Œ

    โžก๏ธ high resolution + class-discriminative ์ง€๋งŒ, single image์— ๋Œ€ํ•ด์„œ๊ฐ€ ์•„๋‹ˆ๋ผ ๋ชจ๋ธ ์ „๋ฐ˜์„ ์‹œ๊ฐํ™” ํ•œ๋‹ค๋Š” ํ•œ๊ณ„์ ์„ ์ง€๋…”์Œ

2. Assessing Model Trust.

: [Why Should I Trust You?] ๋ผ๋Š” ๋…ผ๋ฌธ์—์„œ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์‹ ๋ขฐ์„ฑ์„ ์—ฐ๊ตฌํ•˜๋Š” ๊ฒƒ์„ ๋ณด๋ฉด์„œ, ์ด ๊ฐœ๋…์— ์˜๊ฐ์„ ๋ฐ›์•„ ๋ณธ ๋…ผ๋ฌธ์€ Human study๋ฅผ ํ†ตํ•ด Grad-CAM ์‹œ๊ฐํ™”๋ฅผ ํ‰๊ฐ€ํ•˜์˜€์Œ

โžก๏ธ Grad-CAM์ด ์ž๋™ํ™”๋œ ์‹œ์Šคํ…œ์„ ํ‰๊ฐ€ํ•˜๊ณ  ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๋„๊ตฌ๋กœ์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•  ์ˆ˜ ์žˆ์Œ

3. Aligning Gradient-based Importances.

: [Choose your neuron] ๋ผ๋Š” ๋…ผ๋ฌธ์—์„œ gradient์— ๊ธฐ๋ฐ˜ํ•œ neuron importance๋ฅผ ์ธ๊ฐ„์˜ class specific domain-knowledge์— ์—ฐ๊ฒฐํ•˜๊ณ  ์ƒˆ๋กœ์šด class์— ๋Œ€ํ•œ ๋ถ„๋ฅ˜๊ธฐ ํ•™์Šต

โžก๏ธ gradient ๊ธฐ๋ฐ˜์˜ ์ค‘์š”์„ฑ์„ Grad-CAM์— ์ ์šฉํ•˜๊ณ ์ž ํ•จ

4. Weakly-Supervised localization.

: class label๋งŒ์„ ์‚ฌ์šฉํ•ด์„œ ์ด๋ฏธ์ง€ ๋‚ด์˜ ๊ฐ์ฒด๋ฅผ localize ํ•˜๋Š” ๊ฐ€์žฅ ๊ด€๋ จ์žˆ๋Š” ๋ฐฉ์‹์ด CAM(Class Activation Map)

: CNN์˜ ๋งˆ์ง€๋ง‰ feature map์„ GAPํ•œ ๋’ค, ๊ฐ€์ค‘์น˜(w)๋ฅผ ๊ฐ๊ฐ ๊ณฑํ•ด์„œ class score์„ ๊ณ„์‚ฐ

: ๊ฐ w๋ฅผ feature map๊ณผ ์„ ํ˜• ๊ฒฐํ•ฉํ•ด์„œ class activation map์„ ์–ป์„ ์ˆ˜ ์žˆ๊ณ , ์ด๋ฅผ ํ†ตํ•ด class score์— ๋Œ€ํ•œ feature map์˜ ์ค‘์š”์„ฑ(์˜ํ–ฅ๋ ฅ)์„ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ์Œ

CAM์˜ ์ตœ๋Œ€ ๋‹จ์ : conv feature map → GAP → softmax ์˜ architecture๋กœ๋งŒ ๊ตฌ์„ฑ๋˜์–ด์•ผ CAM์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์—ˆ์Œ. ์ฆ‰, ๋ชจ๋ธ์„ ์žฌ๊ตฌ์„ฑ ํ•ด์•ผ ํ•œ๋‹ค๋Š” ๋‹จ์ ์ด ์กด์žฌํ–ˆ๋‹ค.

โžก๏ธ ๋”ฐ๋ผ์„œ, ์ด ๋…ผ๋ฌธ์€ architecture๋ฅผ ์ˆ˜์ •/์žฌ๊ตฌ์„ฑํ•  ํ•„์š” ์—†์ด ์‚ฌ์šฉํ•˜๋Š” “gradient ๋ฐฉ์‹”์„ ๋„์ž…ํ•จ

 


3 Grad-CAM

3-1) Grad-CAM

: CNN์—์„œ ์–•์€ ์ธต์€ low-level feature์„ ์ฝ๊ณ , ๊นŠ์€ ์ธต์œผ๋กœ ๊ฐˆ์ˆ˜๋ก semantic class-specific ์ •๋ณด๋ฅผ ์ฝ์Œ

: Grad-CAM์€ CNN์˜ ๋งˆ์ง€๋ง‰ layer๋กœ ํ๋ฅด๋Š” gradient๋ฅผ ์ด์šฉํ•ด ๋ชจ๋ธ์˜ ์˜ˆ์ธก์— ๊ฐ ๋‰ด๋Ÿฐ์ด ๋ฏธ์น˜๋Š” ์˜ํ–ฅ(=์ค‘์š”์„ฑ)์„ ํŒŒ์•…ํ•จ

: ์™œ? ๋งˆ์ง€๋ง‰์ธ๊ฐ€, ๋งˆ์ง€๋ง‰ layer๊ฐ€ ๋งŽ์€ ์ •๋ณด๋ฅผ ์ง€๋‹ˆ๊ณ  ์žˆ๊ธฐ ๋•Œ๋ฌธ์—.

: ๋ฌผ๋ก  ๋ชจ๋“  layer์— ๋Œ€ํ•ด์„œ๋„ ์˜ํ–ฅ๋ ฅ ํŒŒ์•… ๊ฐ€๋Šฅ(gradient๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ฒƒ์ด๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค ๊ฐ€๋Šฅ) (ํ•˜์ง€๋งŒ, ์•ž์—์„œ๋Š” ์˜๋ฏธ๊ฐ€ ๋ณ„๋กœ ์—†๊ฒ ์ฃ ? - ์ •๋ณด๊ฐ€ ๋ณ„๋กœ ์—†๊ธฐ ๋•Œ๋ฌธ์—)

3-1-0) Overall Architecture

3-1-1) Importance of each feature map

: y^c(class score-softmax ์ „๋‹จ๊ณ„)๋ฅผ feature map์œผ๋กœ ๋ฏธ๋ถ„

: i,y๋Š” ๊ฐ ํ•ด๋‹น ํ”ฝ์…€ ์œ„์น˜๊ฐ’

: k๋Š” ๋ช‡ ๋ฒˆ ์งธ feature map์ธ์ง€

โžก๏ธ ์ฆ‰, y^c๋ฅผ ํ•ด๋‹น ํ”ผ์ฒ˜๋งต A์˜ ๊ฐ ํ”ฝ์…€๋กœ ๋ฏธ๋ถ„ํ•œ ํ›„์—, GAP๋ฅผ ํ•˜๋ฉด, ์ค‘์š”๋„ ๊ฐ€์ค‘์น˜๋ฅผ ์–ป๊ฒŒ ๋จ

= k๋ฒˆ์งธ ํ”ผ์ฒ˜๋งต์ด y^c๋ผ๋Š” score map์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ๋ ฅ์„ ์˜๋ฏธ

= ๋ชจ๋ธ์˜ ์˜ˆ์ธก(y)์— ์–ด๋Š feature map(k)๊ฐ€ ํ‰๊ท ์ ์œผ๋กœ ์–ด๋Š ์ •๋„์˜ ์˜ํ–ฅ์„ ๋ฏธ์ณค๋Š”๊ฐ€๋ฅผ ์˜๋ฏธํ•จ

โ€ป y^c๋Š” ๊ผญ class score์ผ ํ•„์š” ์—†์ด ๋ฏธ๋ถ„ ๊ฐ€๋Šฅํ•œ downstream task๋ฉด ๋œ๋‹ค๊ณ  ํ•จ

 

3-2-2) Weighted Combination

: ์œ„์—์„œ ๊ตฌํ•œ ๊ฐ€์ค‘์น˜(์˜ํ–ฅ๋ ฅ)์— k๋ฒˆ์งธ feature map์„ ๊ณฑํ•ด์ค€ ํ›„ ํ•ฉ์นœ ํ›„, ReLU ์‚ฌ์šฉ

: ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ณฑํ•œ ๊ฐ’์ด ์Œ์ˆ˜์˜ ๊ฐ’์„ ์ง€๋‹Œ๋‹ค๋ฉด, ๊ทธ ๊ฐ’์€ ์ œ์™ธํ•˜๊ณ  ์–‘์ˆ˜ ๋ถ€๋ถ„๋งŒ ๊ฐ€์ง€๊ณ  ์‹œ๊ฐํ™”๋ฅผ ํ•˜๊ฒ ๋‹ค

: ์ฆ‰, ์šฐ๋ฆฌ๋Š” ๊ฐœ์— ๋Œ€ํ•œ label์„ classification ํ•ด์•ผ ํ•˜๋Š”๋ฐ, ์Œ์ˆ˜ ๊ฐ’์ด๋ผ๋ฉด ์‚ฌ๋žŒ or ๋ฐฐ๊ฒฝ์— ๋Œ€ํ•œ ํ”ผ์ฒ˜๋งต์„ ๋งํ•˜๋Š” ๊ฒƒ์ž„.

 

3-2-3) Weighted Combination

โ‘ 

: ์™ผ์ชฝ) ๊ธฐ์กด CAM์˜ ์ˆ˜์‹ , ๊ฐ€์ค‘์น˜ x GAPํ•œ feature map

: ๊ฐ€์šด๋ฐ) k๋ฒˆ์งธ ํ”ผ์ฒ˜๋งต์— ๋Œ€ํ•ด ๊ฐ ํ”ฝ์…€์˜ ํ•ฉ์„ ๊ฐ€์ค‘ ํ‰๊ท ํ•œ ๊ฐ’

: ์˜ค๋ฅธ์ชฝ) ๋Œ€์ฒดํ•œ ๊ฒฐ๊ณผ ์‹

โ‘ก

: ์™ผ์ชฝ) Y^c class score์„ F^k ํ”ผ์ฒ˜๋งต์˜ ํ‰๊ท ์œผ๋กœ ๋ฏธ๋ถ„ํ•œ ๊ฐ’

: ๊ฐ€์šด๋ฐ) F^k๋ฅผ A^k์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ํ•˜๋ฉด 1/Z๋งŒ ๋‚จ์Œ. ์ด๋ฅผ ์™ผ์ชฝ์— ๋Œ€์ž…ํ•˜๋ฉด ์˜ค๋ฅธ์ชฝ๊ณผ ๊ฐ™์€ ์‹์ด ๋จ

: ์˜ค๋ฅธ์ชฝ) ์ด ๋•Œ, Y^c๋ฅผ ๋Œ€์ž…ํ•ด์„œ F^k์— ๋Œ€ํ•ด ๋ฏธ๋ถ„ ์ตœ์ข… ์‹์„ ๊ตฌํ•จ

โ‘ข

: ์™ผ์ชฝ) ๊ฐ ํ”ฝ์…€์— ๋Œ€ํ•ด ์—ฐ์‚ฐ

… ๋ฏธ์•ˆํ•ฉ๋‹ˆ๋‹ค… ๋งˆ์ง€๋ง‰ ๊ฐ€์„œ ์ดํ•ด๋ฅผ ๋ชปํ–ˆ์–ด์š” (์™œ ์ € ํ•ฉ์ด Z์ธ ๊ฒƒ์ธ๊ฐ€..)

โžก๏ธ ๊ฒฐ๊ณผ์ ์œผ๋กœ ์–˜๊ฐ€ ๋งํ•˜๊ณ  ์‹ถ์—ˆ๋˜ ๊ฑด CAM ๋ฐฉ์‹๊ณผ ๊ฐ™๋‹ค๋Š” ๊ฒƒ์ž„

 

: CNN์˜ ๋งˆ์ง€๋ง‰์— GAP ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋ฉด ๊ฒฐ๊ณผ์ ์œผ๋กœ CAM๊ณผ ๊ฐ™์Œ (= ๋…ผ๋ฌธ์—์„œ๋Š” CAM์˜ generalization์ด๋ผ๊ณ  ๋งํ•จ)

: Grad-CAM์€ gradient ๊ธฐ๋ฐ˜์œผ๋กœ weight๋ฅผ ๊ตฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, GAP์ด ์—†๋Š” ์–ด๋– ํ•œ ๊ตฌ์กฐ์—์„œ๋„ visualization์ด ๊ฐ€๋Šฅํ•จ

(CAM์€ GAP ํ†ตํ•ด์„œ weight๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐ˜๋ฉด, Grad-CAM์€ gradient(์—ญ์ „ํŒŒ)ํ†ตํ•ด์„œ ๊ตฌํ•จ))

 


3-2) Guided Grad-CAM

: Grad-CAM์€ class ๊ตฌ๋ถ„(class-discriminative)์ด ๊ฐ€๋Šฅํ•˜๋ฉฐ ์˜ˆ์ธก๋œ ๊ด€๋ จ ์ด๋ฏธ์ง€ ์ง€์—ญ์„ ์ฐพ์•„๋‚ผ ์ˆ˜๋Š” ์žˆ์œผ๋‚˜, pixel-space gradient visualization์€ ์–ด๋ ค์›€

: ์ฆ‰, ์™œ tiger cat์œผ๋กœ ์˜ˆ์ธกํ–ˆ๋Š”์ง€๋ฅผ ์•Œ์•„๋‚ผ ์ˆ˜ ์—†์Œ.

  • Guided Backpropagation

: backpropation ํ•˜๊ธฐ์ „์— feature map์—์„œ 0 ์ดํ•˜์ธ ๋ถ€๋ถ„์„ ์ œ๊ฑฐ ํ•จ์œผ๋กœ์จ positive value๋งŒ์„ ์ด์šฉํ•˜์—ฌ backpropagation value๋ฅผ ์ถ”์ถœ

: ์Œ์ˆ˜์— ํ•ด๋‹นํ•˜๋Š” gradient๋ฅผ ์‚ฌ์šฉํ•˜์ง€ ์•Š์Œ์œผ๋กœ์จ ๊นจ๋—ํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ถ”์ถœํ•˜๋Š” ๋ฐฉ๋ฒ•

: class-discriminative๋Š” ํ•˜์ง€ ์•Š์œผ๋‚˜, ํŠน์ง•๋“ค์€(๊ณ ์–‘์ด ์ค„๋ฌด๋Šฌ, ๊ท€, ๋ˆˆ)์— ๋Œ€ํ•œ ์ •๋ณด๋Š” ์ž˜ ๋‚˜ํƒ€๋ƒ„

 

โžก๏ธ Grad-CAM + Guided Backprop = Guided Grad-CAM

์™ผ์ชฝ ๋ถ€๋ถ„์ด Guided Grad-CAM์„ ๋‚˜ํƒ€๋‚ธ ๊ฒƒ

: Guided Backprop๊ณผ Grad-CAM ๊ฐ„ element-wise ๊ณฑ์„ ํ†ตํ•ด ํ•ฉ์„ฑ

: Guided Backprop์€ ํ”ฝ์…€ ๋‹จ์œ„์—ฌ์„œ input ํฌ๊ธฐ์™€ ๋™์ผํ•œ ๋ฐ˜๋ฉด, Grad-CAM์€ feature map ๋‹จ์œ„์ด๊ธฐ ๋•Œ๋ฌธ์—, ํฌ๊ธฐ๋ฅผ ๋งž์ถฐ์ฃผ๊ธฐ ์œ„ํ•ด bilinear interpolation์œผ๋กœ up-sampling ํ•ด์คŒ

: ์ € ๋‘ ๊ฐœ๋ฅผ ๊ณฑ์…ˆํ•ด์ฃผ๋ฉด Guided Grad-CAM์ด ๋จ


3-3) Counterfactual Explanations

: Grad-CAM์„ ์•ฝ๊ฐ„ ์ˆ˜์ •ํ•˜์—ฌ ๋„คํŠธ์›Œํฌ์˜ ์˜ˆ์ธก์„ ๋ณ€๊ฒฝ์‹œํ‚ค๋Š” ์˜์—ญ์„ ๊ฐ•์กฐํ•˜๋Š” ์„ค๋ช…์„ ์–ป์„ ์ˆ˜ ์žˆ์Œ

: ์œ„์™€ ๋™์ผํ•˜๊ฒŒ, ์„ค๋ช…๋ ฅ์„ ์ค‘์‹œํ•˜๋Š” ๋Š๋‚Œ์ด๊ณ  ์ด๋Ÿฌํ•œ ํ”ผ์ฒ˜๋งต์ด ์™œ ๋ถ€์ •์ ์œผ๋กœ ์˜ํ–ฅ์„ ๋ผ์น˜๋Š”์ง€์— ์•Œ๊ณ ์ž ํ•  ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ์‹์ž„

: ๊ณ ์–‘์ด๋ฅผ ์˜ˆ์ธกํ•  ๋•Œ, ๊ฐ€์žฅ ๋ถ€์ •์ ์œผ๋กœ ์˜ํ–ฅ์„ ์ฃผ๋Š” ์นœ๊ตฌ ์ฐพ๋Š” ๊ฒƒ

: -1, -1, 1๋กœ feature map wieght๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, -1์„ ์ทจํ•˜๊ณ  ReLU ํ†ต๊ณผ์‹œํ‚ค๋ฉด 1,1,0์ด ๋จ

: ๊ฒฐ๊ตญ, ๊ณ ์–‘์ด๋ผ๊ณ  ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์— ๋„์›€์„ ์ฃผ์ง€ ์•Š๋Š” ๋ถ€๋ถ„๋งŒ์ด ๋‚จ๊ฒŒ ๋˜๋Š” ๊ฒƒ์ž„

 

4 Evaluating Localization Ability

Weakly-Supervised Segmentation

 

Diagnosing Image Classification CNNs with Grad-CAM

: VGG-16์ด ๋ถ„๋ฅ˜๋ฅผ ์‹คํŒจํ•œ ์ผ€์ด์Šค๋“ค์— ๋Œ€ํ•ด ์‹ค์ œ ๋ผ๋ฒจ๊ณผ ์˜ˆ์ธก๋œ ๋ผ๋ฒจ์˜ Guided Grad-CAM์„ ๊ทธ๋ ค๋ณธ ๊ฒฐ๊ณผ

: ์ž˜๋ชป๋œ ์˜ˆ์ธก์„ ํŒŒ์•…ํ•˜๋Š”๋ฐ ๋„์›€์ด ๋จ

 

Image Captioning

: ์ด๋ฏธ์ง€์— ๋Œ€ํ•œ ์„ค๋ช…์— ์ดˆ์ ์„ ๋งž์ถฐ ์‹œ๊ฐํ™” ํ•œ ๋ชจ์Šต

 

Visual Question Answering

: VQA pipeline์€ ์ด๋ฏธ์ง€๋ฅผ ์ฒ˜๋ฆฌํ•˜๊ธฐ ์œ„ํ•œ CNN๊ณผ question์„ ์œ„ํ•œ RNN language model๋กœ ์ด๋ค„์ ธ ์žˆ์Œ

: ์ด๋ฏธ์ง€์™€ ์งˆ๋ฌธ์€ ์ผ๋ฐ˜์ ์œผ๋กœ answer๋ฅผ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ

 

 

 

<์ฐธ๊ณ >

https://velog.io/@tobigs_xai/CAM-Grad-CAM-Grad-CAMpp

https://hellopotatoworld.tistory.com/18

https://joungheekim.github.io/2020/09/29/paper-review/

https://jays0606.tistory.com/4

https://minimin2.tistory.com/39

https://www.youtube.com/watch?v=uA5rIr79I0o&t=1514s

https://esinfam99.tistory.com/15

https://sotudy.tistory.com/19

728x90
๋ฐ˜์‘ํ˜•