Cross validation
Cross Validation 은 database중 일부를 이용해서 Training set으로 정하되, 그 일부를 바꿔가면서 training and test 하는 방법입니다.
- Training set만 가지고 만든 알고리즘은 Training set에 대해서는 잘 동작하지만 좋은 Hypothesis Function으로 판단하기는 어렵다
- Training set에 대해서는 낮은 오류율을 보이지만 다른 데이터들에 대해서는 높은 오류율을 보일 수 있다
- 위 문제를 해결하기 위해 전체 훈련 데이터 중 일부를 Cross Validation Set 으로 사용 한다
- Training set (60%)
- Cross Validation Set (20%)
- Test set (20%)
K-fold cross-validation
One iteration of the K-fold cross-validation is performed in the following way: First, a random permutation of the sample set is generated and partitioned into K subsets ("folds") of about equal size. Of the K subsets, a single subset is retained as the validation data for testing the model (this subset is called the "testset"), and the remaining K - 1 subsets together are used as training data ("trainset"). Then a model is trained on the trainset and its accuracy is evaluated on the testset. Model training and evaluation is repeated K times, with each of the K subsets used exactly once as the testset.
The case of a 5-fold cross-validation with 30 samples is illustrated in the picture below:
Xv_folds.gif
Leave-one-out cross-validation
As the name suggests, leave-one-out cross-validation involves using a single sample from the original sample set as the validation data, and the remaining samples as the training data. This is repeated such that each sample in the sample set is used exactly once as the validation data. This is the same as K-fold cross-validation where K is equal to the number of samples in the sample set.
There is no need in generating random permutations for leave-one-out cross-validation and in repeating it, because the training and validation datasets for each of the folds are always the same, and therefore the result of the accuracy estimation is determined.
See also
Favorite site
- Wikipedia (en) Cross-validation에 대한 설명
- [추천] Cross-Validation Explained
- [추천] 머신 러닝의 모델 평가와 모델 선택, 알고리즘 선택 – 3장. 크로스밸리데이션과 하이퍼파라미터 튜닝
- Machine Learning : 모델 선택 방법. 1. Cross validation
- Cross Validation
- 기계학습 / 머신러닝 기초 ( Machine Learning Basics ) :: 모두의연구소 기술블로그
- Evaluating a Hypothesis Function
- 5.1 Cross-Validation
- Advice for applying machine learning – Model selection and training/validation/test sets
- 데이터 사이언스 스쿨 - 교차검증