Apart from being computationally intensive and requiring that you build many models to test your hypothesis, the problem with LOOCV is that it tends to be pessimistic (making your error estimate higher). It’s also unstable for a small number of n, and the variance of the error is much higher. All these drawbacks make comparing models difficult.
Another alternative from statistics is bootstrapping, a method long used to estimate the sampling distribution of statistics, which are presumed not to follow a previously assumed distribution. Bootstrapping works by building a number (the more the better) of samples of size n (the original in-sample size) drawn with repetition. To draw with repetition means that the process could draw an example multiple times to use it as part of the bootstrapping resampling. Bootstrapping has the advantage of offering a simple and effective way to estimate the true error measure.
In fact, bootstrapped error measurements usually have much less variance than cross-validation ones. On the other hand, validation becomes more complicated due to the sampling with replacement, so your validation sample comes from the out-of-bootstrap examples. Moreover, using some training samples repeatedly can lead to a certain bias in the models built with bootstrapping.
If you are using out-of-bootstrapping examples for your test, you’ll notice that the test sample can be of various sizes, depending on the number of unique examples in the in-sample, likely accounting for about a third of your original in-sample size. This simple Python code snippet demonstrates randomly simulating a certain number of bootstraps:
from random import randint
import numpy as np
n = 1000 # number of examples
# your original set of examples
examples = set(range(n))
results = list()
for j in range(10000):
# your bootstrapped sample
chosen = [randint(0,n) for k in range(n)]
# out-of-sample
results.append((1000-len(set(choosen)&examples))
/float(n))
print ("Out-of-bootstrap: %0.1f %%" %
(np.mean(results)*100))
Out-of-bootstrap: 36.8 %
Running the experiment may require some time, and your results may be different due to the random nature of the experiment. However, you should see an output of around 36.8 percent.