Caffe:ImageNetTutorial

Caffe를 사용한 트레이닝 및 테스트 방법에 대하여 정리한다.

The terms

Training images: 훈련을 위한 이미지 목록. train.txt에 라벨정보와 함께 저장된다.
Validation images: 검증을 위한 이미지와 라벨 목록이다. val.txt에 라벨정보와 함께 저장된다.
deploy:
solver:

데이터 준비

ImageNet에서 이미지 데이터를 획득한다. 획득한 URL목록이 담긴 TEXT파일은 아래와 같은 방법으로 다운받을 수 있다.

#!/bin/sh

WORKING=$PWD
URLS_FILE=$WORKING/fall11_urls.txt
DEST=$WORKING/train

if [[ ! -d $DEST ]]; then
    mkdir $DEST
fi

cd $DEST

COUNT=`wc -l $URLS_FILE | awk '{ print $1 }'`
INDEX=0

echo "Total count: $COUNT"
echo "Download start!"

for path in $(cat $URLS_FILE); do
    let "INDEX = INDEX + 1"
    echo "Download: $INDEX/$COUNT"
    wget --timeout=10 --tries=2 $path
done

echo "Download end."

훈련 이미지(train)와 검증 이미지(val)를 아래와 같이 디스크에 배치한다.

/path/to/imagenet/train/n01440764/n01440764_10026.JPEG
/path/to/imagenet/val/ILSVRC2012_val_00000001.JPEG

또한 트레이닝을 위해 아래의 스크립트를 사용하여 보조 데이터를 다운받자. (참고로 용량은 약 17M이다)

./data/ilsvrc12/get_ilsvrc_aux.sh

실행이 완료되면 아래의 데이터가 Caffe루트 디렉터리에 생성된다. ¹

det_synset_words.txt
imagenet_mean.binaryproto
synset_words.txt
synsets.txt
test.txt
train.txt
val.txt
imagenet.bet.pickle

훈련목록(train.txt)과 검증목록(val.txt)파일에는 파일목록과 해당 파일의 라벨이 첨부되어 있다. 라벨에 대한 정의는 synset_words.txt파일을 참조하면 된다.

사용자 선택적으로 이미지 크기를 사전에 조정할 수 있다.²

for name in /path/to/imagenet/val/*.JPEG; do
    convert -resize 256x256\! $name $name
done

이 작업을 명시적으로 수행하지 않는 이유는 mapreduce를 사용하여 이미지조정을 병렬로 처리할 수 있는 이익을 볼 수 있기 때문이다.³

examples/imagenet/create_imagenet.sh파일을 열어, 훈련폴더와 검증폴더(각각 TRAIN_DATA_ROOT, VAL_DATA_ROOT)경로를 설정하고, 사전에 이미지조정을 하지 않은 경우 RESIZE=true를 적용한다.

`create_imagenet.sh` 스크립트 수정 주의사항
훈련폴더와 검증폴더(각각 `TRAIN_DATA_ROOT`, `VAL_DATA_ROOT`)경로의 마지막에 파일 분리 문자(`/`)를 입력해야 한다.

examples/imagenet/create_imagenet.sh파일을 실행하여 Level DB를 간단히 생성할 수 있다. 참고로 이 스크립트를 사용할 경우 train.txt와 val.txt파일의 위치가 다를 수 있으니 스크립트를 다시 한번 더 확인해야 한다.

실행 전 주의사항

examples/imagenet/create_imagenet.sh파일 실행 전, 아래와 같은 내용에 대하여 확인해야 한다.

실행시 caffe가 설치된 디렉토리에서 실행해야 한다.
examples/imagenet/ilsvrc12_train_leveldb파일과 examples/imagenet/ilsvrc12_val_leveldb파일은 미리 제거해야 한다.
최신버전 Caffe는 파일 이름이 ilsvrc12_train_lmdb로, leveldb가 lmdb로 변경됐을 가능성이 있다.
TOOLS의 경로가 build/tools로 되어있는데, 이 경로가 아닌, tools로 되어있을 수 있다. 컴파일 완료 후 caffe실행파일 경로를 확인해 보면 된다.

이미지 평균 계산

모델은 각 이미지에서 평균값을 이미지를 요구한다.⁴ (왜?; 관련 내용에 대하여 확인이 필요하다.) ⁵ 아래와 같이 평균 계산을 진행할 수 있다.

$ ./examples/imagenet/make_imagenet_mean.sh

실행 결과, data/ilsvrc12/imagenet_mean.binaryproto파일이 생성된다.

무슨 내용인가?

아래와 같은 답변이 달려있다.

It computes the mean pixel by pixel across the whole data-set (look at the compute_image_mean script). So say you had a constant background in the top half of all your images, that would all become zero'd out from mean subtraction.

평균 차감(mean-subtraction)에 대한여 조사해야 한다.

모델 정의

우선 참조 구현에 대한 설명이 필요하다. ⁶

네트워크 정의는 models/bvlc_reference_caffenet/train_val.prototxt를 따른다. 수정이 필요하다면 해당 파일을 열람하면 된다. 해당 파일에서 include섹션을 살펴보면 아래와 같은 내용이 존재한다.

include { phase: TRAIN }
include { phase: TEST }

위 내용은 입력 및 출력층에 대한 내용이 다르다.

Input layer differences

훈련(TRAIN) 네트워크의 데이터는 examples/imagenet/ilsvrc12_train_leveldb파일로 부터 무작위로 미러링(Randomly mirrors)된다.

테스트(TEST) 네트워크 데이터는 examples/imagenet/ilsvrc12_val_leveldb파일로 부터 획득하며, 무작위 미러링을 수행하지 않는다(does not perform random mirroring).

Output layer differences

두 네트워크는 softmax_loss레이어를 출력한다.

훈련중 손실함수(loss function)와 역전파초기화(Initialize the backpropagation)를 계산한다.⁷

테스트 네트워크는 데이터(Test set)에 대한 정확도를 출력하는 두 번째 출력층을 갖는다.

In the process of training, the test network will occasionally be instantiated and tested on the test set, producing lines like Test score #0: xxx and Test score #1: xxx. In this case score 0 is the accuracy (which will start around 1/1000 = 0.001 for an untrained network) and score 1 is the loss (which will start around 7 for an untrained network).

Plans

해결(Solver)을 위해 프로토콜 버퍼(Protocol buffer)를 배치한다.

We will run in batches of 256, and run a total of 450,000 iterations (about 90 epochs).
For every 1,000 iterations, we test the learned net on the validation data.
We set the initial learning rate to 0.01, and decrease it every 100,000 iterations (about 20 epochs).
Information will be displayed every 20 iterations.
The network will be trained with momentum 0.9 and a weight decay of 0.0005.
For every 10,000 iterations, we will take a snapshot of the current status.

위 내용은 models/bvlc_reference_caffenet/solver.prototxt에 구현되어 있다.

설명이 길었지만~~(사실 무슨말 하는지 모르겠다)~~아래와 같이 요약한다.
`models/bvlc_reference_caffenet/train_val.prototxt`파일에 네트워크 정의가 존재하며, 파일경로가 변경되었을 경우 해당 위치를 수정한다. 훈련을 위한 계획(Plans)은 `models/bvlc_reference_caffenet/solver.prototxt`에 정의되어있으며, 특별한 이유가 없다면 건들지 말자.

ImageNet 훈련 시작

아래의 명령으로 훈련을 시작할 수 있다.

$ ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt

Forward & Backward Iteration 횟수는 models/bvlc_reference_caffenet/solver.prototxt파일의 max_iter 속성을 변경하면 된다.

Tip

만약 계산 시간을 확인하고 싶을 경우 아래와 같이 실행하면 된다.

$ ./build/tools/caffe time --model=models/bvlc_reference_caffenet/train_val.prototxt

Resume Training?

~~전쟁이나 지진으로 인해~~정전이 발생되거나 훈련이 취소되면 재시작할 수 있다.⁸ 이 작업은 아래와 같이 실행하면 된다.

$ ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --snapshot=models/bvlc_reference_caffenet/caffenet_train_iter_10000.solverstate

Sample

아래의 파일을 확인하면 된다.

./examples/imagenet/create_imagenet.sh
./examples/imagenet/make_imagenet_mean.sh
./examples/imagenet/train_imagenet.sh

All in one script

한방에 전부 다 진행할 수 있는 스크립트는 아래와 같다.

#!/bin/sh

DATE_FORMAT=`date +%Y%m%d_%H-%M-%S`
LOG_FILE=brewing_imagenet-$DATE_FORMAT.log

echo "LOGFILE: $LOG_FILE"

echo "Create ImageNet."
./examples/imagenet/create_imagenet.sh >& ${LOG_FILE}.1

if [[ "$?" != "0" ]]; then; exit 1; fi

echo "Make ImageNet mean."
./examples/imagenet/make_imagenet_mean.sh >& ${LOG_FILE}.2

if [[ "$?" != "0" ]]; then; exit 1; fi

echo "Train CaffeNet."
./examples/imagenet/train_caffenet.sh >& ${LOG_FILE}.3

if [[ "$?" != "0" ]]; then; exit 1; fi

echo "Done."
exit 0

Use files

위의 튜토리얼을 진행하기 위한 파일 목록은 아래와 같다. 가급적 별도의 디렉터리에 복사한 후 진행하길 바란다.

data/ilsvrc12/ 디렉터리.
- train.txt, val.txt 등의 텍스트 파일들.
- imagenet_mean.binaryproto: 중간에 생성되는 파일.
examples/imagenet/ 디렉터리.
- create_imagenet.sh 등의 스크립트 파일들.
- ilsvrc12_train_lmdb/, ilsvrc12_val_lmdb/ 디렉터리 (이미지 데이터를 저장하는 DB 파일)
models/bvlc_reference_caffenet/ 디렉터리.
- *.solverstate와 *.caffemodel파일들. (중간 저장 데이터 및 모델파일)
- solver.prototxt, train_val.prototxt 등의 텍스트 파일들.
tools/ 디렉터리. (실행파일 존재)

Custom all in one script

위의 내용을 담은 샘플 파일은 아래와 같다. (CentOS6에서 컴파일 되었다.)

~~Caffe_copy.tar.gz~~ (OLD VERSION)

run.sh를 사용한 한번에 해결되는 스크립트를 아래와 같이 업로드한다. (Ubuntu에서 컴파일 되었다.)

Caffe_train.tar.gz

Troubleshooting

std::length_error

Error in make_imagenet_mean.sh : 'std::length_error'

make_imagenet_mean.sh를 사용할 경우 아래와 같은 에러가 발생한다.

[your@server caffe]$ ./examples/imagenet/make_imagenet_mean.sh 
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_S_create
./examples/imagenet/make_imagenet_mean.sh: line 10: 17802 Aborted                 (core dumped) $TOOLS/compute_image_mean $EXAMPLE/ilsvrc12_train_lmdb $DATA/imagenet_mean.binaryproto
Done.

이 경우 직전 단계인, create_imagenet.sh스크립트 실행시 정상적으로 이미지가 읽히지 않았을 가능성이 있다.

[your@server caffe]$ ./examples/imagenet/create_imagenet.sh 
Creating train lmdb...
I1020 13:51:01.576401 17461 convert_imageset.cpp:82] Shuffling data
I1020 13:51:01.744885 17461 convert_imageset.cpp:85] A total of 2 images.
I1020 13:51:01.745170 17461 db_lmdb.cpp:23] Opened lmdb examples/imagenet/ilsvrc12_train_lmdb
E1020 13:51:01.745205 17461 io.cpp:80] Could not open or find file /home/your/Project/caffe/examples/imagenet/testnofire.jpeg
E1020 13:51:01.745221 17461 io.cpp:80] Could not open or find file /home/your/Project/caffe/examples/imagenet/testfire.jpg
Creating val lmdb...
I1020 13:51:01.811265 17463 convert_imageset.cpp:82] Shuffling data
I1020 13:51:01.954166 17463 convert_imageset.cpp:85] A total of 2 images.
I1020 13:51:01.954432 17463 db_lmdb.cpp:23] Opened lmdb examples/imagenet/ilsvrc12_val_lmdb
E1020 13:51:01.954473 17463 io.cpp:80] Could not open or find file /home/your/Project/caffe/examples/imagenet/valfire.jpg
E1020 13:51:01.954481 17463 io.cpp:80] Could not open or find file /home/your/Project/caffe/examples/imagenet/valnofire.jpeg
Done.

이와 같이 create_imagenet.sh스크립트 출력을 재 확인해야 한다.

위의 경우는 파일경로가 잘못된 케이스 이다.

Favorite site

Guide

Any simple example? #550
CS231n Caffe Tutorial ⁹
Train and Test LeNet on your own dataset
Caffe tutorial ¹⁰
Stackoverflow: How to train a caffe model?
Caffe: Decaf, but Better (Caffe Layer에 관련된 설명)

References

.으로 시작되는 숨겨진 파일은 목록에서 제외하였다. ↩
convert실행파일은 ImageMagick를 설치하면 된다. ↩
Brewing ImageNet/Data Preparation 참조 ↩
원문: The model requires us to subtract the image mean from each image ↩
tools/compute_image_mean.cpp에 구현되어 있다. ↩
Advances in Neural Information Processing Systems 25 (NIPS 2012) 참조 ↩
계산중 발생되는 손실은 단순히 출력(Simply reported)된다. ↩
훈련 중간 중간 스냅샷을 생성한다. ↩
CS231n_Caffe_Tutorial.pdf ↩
Caffe_tutorial.pdf ↩