# Sequential probability ratio test

34,142pages on
this wiki

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The sequential probability ratio test (SPRT) is a specific sequential hypothesis test, developed by Abraham Wald.[1] Neyman and Pearson's 1933 result inspired Wald to reformulate it as a sequential analysis problem. The Neyman-Pearson lemma, by contrast, offers a rule of thumb for when all the data is collected (and its likelihood ratio known).

While originally developed for use in quality control studies in the realm of manufacturing, SPRT has been formulated for use in the computerized testing of human examinees as a termination criterion.[2][3][4]

## TheoryEdit

As in classical hypothesis testing, SPRT starts with a pair of hypotheses, say $H_0$ and $H_1$ for the null hypothesis and alternative hypothesis respectively. They must be specified as follows:

$H_0: p=p_0$
$H_1: p=p_1$

The next step is calculate the cumulative sum of the log-likelihood ratio, $\log \Lambda_i$, as new data arrive:

$S_i=S_{i-1}+ \log \Lambda_i$

The stopping rule is a simple thresholding scheme:

• $a < S_i < b$: continue monitoring (critical inequality)
• $S_i \geq b$: Accept $H_1$
• $S_i \leq a$: Accept $H_0$

where a and b ($0) depend on the desired type I and type II errors, $\alpha$ and $\beta$. They may be chosen as follows:

$a \approx \log \frac{ \beta }{1-\alpha}$ and $b \approx \log \frac{1-\beta}{\alpha}$

In other words, $\alpha$ and $\beta$ must be decided beforehand in order to set the thresholds appropriately. The numerical value will depend on the application. The reason for using approximation signs is that, in the discrete case, the signal may cross the threshold between samples. Thus, depending on the penalty of making an error and the sampling frequency, one might set the thresholds more aggressively. Of course, the exact bounds may be used in the continuous case.

## ExampleEdit

A textbook example is parameter estimation of a probability distribution function. Let us consider the exponential distribution:

$f_\theta(x)=\theta^{-1}\exp\left(-x/\theta\right), x,\theta>0$

The hypotheses are simply $H_0: \theta=\theta_0$ and $H_1: \theta=\theta_1$, with $\theta_1>\theta_0$. Then the log-likelihood function (LLF) for one sample is

\begin{align} \log \Lambda(x)&=\log \left[ \frac{\theta_1^{-1}\exp\left(-x/\theta_1\right)}{\theta_0^{-1}\exp\left(-x/\theta_0\right)} \right] \\ &=\log \left[ \frac{\theta_0}{\theta_1} \exp \left(x/\theta_0 - x/\theta_1 \right) \right] \\ &=\frac{\theta_1-\theta_0}{\theta_0 \theta_1} x - \log \frac{\theta_1}{\theta_0} \end{align}

The cumulative sum of the LLFs for all x is

$S_n=\sum_{i=1}^n \log \Lambda(x_i)=\frac{\theta_1-\theta_0}{\theta_0 \theta_1} \sum_{i=1}^n x_i - n \log \frac{\theta_1}{\theta_0}$

Accordingly, the stopping rule is

$b<\frac{\theta_1-\theta_0}{\theta_0 \theta_1} \sum_{i=1}^n x_i - n \log \frac{\theta_1}{\theta_0}

After re-arranging we finally find

$b+n \log \frac{\theta_1}{\theta_0} < \frac{\theta_1-\theta_0}{\theta_0 \theta_1} \sum_{i=1}^n x_i < a+n \log \frac{\theta_1}{\theta_0}$

The thresholds are simply two parallel lines with slope $\log ( \theta_1/\theta_0 )$. Sampling should stop when the sum of the samples makes an excursion outside the continue-sampling region.

## ApplicationsEdit

### Testing of human examineesEdit

The SPRT is currently the predominant method of classifying examinees in a variable-length computerized classification test (CCT). The two parameters are p1 and p2 are specified by determining a cutscore (threshold) for examinees on the proportion correct metric, and selecting a point above and below that cutscore. For instance, suppose the cutscore is set at 70% for a test. We could select p1 = 0.65 and p2 = 0.75 . The test then evaluates the likelihood that an examinee's true score on that metric is equal to one of those two points. If the examinee is determined to be at 75%, they pass, and they fail if they are determined to be at 65%.

These points are not specified completely arbitrarily. A cutscore should always be set with a legally defensible method, such as a modified Angoff procedure. Again, the indifference region represents the region of scores that the test designer is OK with going either way (pass or fail). The upper parameter p2 is conceptually the highest level that the test designer is willing to accept for a Fail (because everyone below it has a good chance of failing), and the lower parameter p1 is the lowest level that the test designer is willing to accept for a pass (because everyone above it has a decent chance of passing). While this definition may seem to be a relatively small burden, consider the high-stakes case of a licensing test for medical doctors: at just what point should we consider somebody to be at one of these two levels?

While the SPRT was first applied to testing in the days of classical test theory, as is applied in the previous paragraph , Reckase (1983) suggested that item response theory be used to determine the p1 and p2 parameters. The cutscore and indifference region are defined on the latent ability (theta) metric, and translated onto the proportion metric for computation. Research on CCT since then has applied this methodology for several reasons:

1. Large item banks tend to be calibrated with IRT
2. This allows more accurate specification of the parameters
3. By using the item response function for each item, the parameters are easily allowed to vary between items.