Introduction of Repeated Holdout Method
Prerequisite: Introduction of Holdout Method
- Repeated Holdout Method is an iteration of the holdout method i.e it is the repeated execution of the holdout method.
- This method can be repeated — ‘K’ times/iterations.
- In this method, we employ random sampling of the dataset. The dataset is partitioned randomly and not on the basis of any formula.
[Note: Random sampling refers to the selection of ‘n’ individuals from the population, chosen in such a way that every set of ‘n’ individuals has the same chance to be selected. ]
Example – Consider a dataset, which is stratified into the training set and test set, randomly. We repeat the holdout method for ‘K’ iterations. Let us assume K=3
- The shaded portions in the above iterations are the test sets and the unshaded portions are the training sets, which are obtained after the stratification of the dataset.
- In the first iteration ‘ITERATION – 01’, a classifier is constructed on the basis of the data items/example that belongs to the training set. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E1’.
- In the second iteration ‘ITERATION – 02’, the first iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E2’.
- In the third iteration ‘ITERATION – 03’, the second iteration is randomly arranged. A classifier is now constructed on the basis of training set data items/examples. The classifier after construction is applied to the test set. The result obtained is an error estimate, say ‘E3’.
- The iterations are thus repeated ‘K=3’ times.
- To find the overall error estimate, we can use the formula –
Problem: Overlapping test set problem.
- Since we partition the dataset randomly into a training set and test set, there are some data items/examples that could not be placed in the training set at all
Example –