# Dynamic treatment regimes

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
34,200pages on
this wiki

In medical research, a dynamic treatment regime is a set of sequential decision rules defining what actions should be taken to treat a patient based on information observed up to that point. Also referred to as adaptive treatment strategies, dynamic treatment regimes attempt to individualize treatment for patients while operationalizing clinical practice dealing with chronic illnesses. The goal of a dynamic treatment regime is to define the sequence of treatment actions which result in the most favorable clinical outcome possible.

## History Edit

Historically, medical research and the practice of medicine tended to rely on an acute care model for the treatment of all medical problems, including chronic illness (Wagner et al. 2001). More recently the medical field has begun to look at long term care plans to treat patients with a chronic illness. This shift in ideology, coupled with increased demand for evidence based medicine and individualized care led to the application of sequential decision making research to medical problems and the formulation of dynamic treatment regimes.

## Notation and Formulation Edit

For a series of decision time points, $t = 1, \ldots, T$, define $A_t$ to be the treatment action taken at time point $t$, and define $O_t$ to be all clinical observations taken at time $t$, prior to treatment action $A_t$. Then a dynamic treatment regime, $\pi = (\pi_1,...,\pi_t)$ consists of a set of rules for each time point $t$ for choosing treatment $A_t$, based clinical observations $O_t$. Thus $\pi_t(o_1, a_1, ..., o_t,a_t)$, is a function of the past and current observations, $(o_1, ..., o_t)$ and past actions $(a_1, ..., a_{t-1})$, which returns a choice of the current action, $a_t$. The goal of the dynamic treatment regime is to make decisions that result in the best possible clinical outcome or response, $R$.

For example, the goal of cancer treatment is to achieve and maintain remission if possible or to prolong and improve quality of life when remission is not possible. The treatment for cancer is often complex and may require many different decisions throughout the process. Some of these decisions may include:

• Should the patient have surgery to remove the cancer?
• Would the patient benefit from chemo/radiation therapy?
• How long should the patient be on chemo/radiation therapy?
• How will side effects of treatment be managed?
• What type of maintenance care should the patient have following chemo/radiation therapy?

A particular dynamic treatment regime might answer some of these questions as follows:

Here treatment actions, $A_t$, are listed in the boxes branching out from the box labeled 'CANCER' and clinical observations, $O_t$, are listed next to the arrows between boxes.

An example of a dynamic treatment regime that has been suggested for use in clinical practice is ...

### OptimizationEdit

For a dynamic treatment regime, $\pi$, to achieve the most favorable clinical outcome possible it must maximize the average outcome. In other words, the goal is to find a dynamic regime $\pi^*$ for which:

$\pi^* = \max_{\pi}{E[R |\text{ actions are chosen according to }\pi]}$

The quantity $E[R | \text{ actions are chosen according to }\pi]$ is often referred to as the Value of $\pi$.

## Methods for developing dynamic treatment regimes Edit

The development of dynamic treatment regimes that result in optimal clinical outcomes is a process which involves the consideration of many different issues. A brief review of some of this process is given below.

### Delayed EffectsEdit

To find an optimal dynamic treatment regimes it might seem reasonable just to collect data from individual trials over different time points, finding the optimal treatment at each time point and then combine these treatment steps to create a dynamic treatment regime. However, this type of data collection and analysis is shortsighted and will often result in an inferior dynamic treatment regime. Many treatments can have effects that do not occur until after the immediate treatment outcome has been measured, such as improving the effect of a future treatment or long term side effects which prevent a patient from being able to use an alternate useful treatment in the future. For example, cognitive behavioral therapy may not result in the best short term outcome of firs-tline monotherapies for drug addiction. However it may lead to dramatic improvements in future outcomes when followed by a combination drug and therapy treatment. This could happen if the cognitive behavioral therapy was useful for helping patients learn how to best use future behavioral therapies, but was not a strong first-line monotherapy treatment. This issue is often referred to as 'delayed effects' and is an important concept to consider when trying to find optimal dynamic treatment regimes. Data collection and analysis should always encompass the treatment regime, not just the individual treatments within the regime.

### Data Edit

Data analysis is needed to find optimal dynamic treatment regimes. The data used to find optimal dynamic treatment regimes should consist of the sequence of observations and actions $(o_1,a_1,o_2,a_2,..,o_T,a_T)_i$, for multiple patients $i=1,...,n$ along with the patients outcomes $R_j$ .

While this type of data can be obtained through careful observation, it is often preferable to collect data through experimentation if possible. The use of experimental data, where treatments have been randomly assigned, is preferred due to the fact that we are typically only able to see a patient's outcome under the treatment sequence they have been given and not under alternate treatment sequences. Randomizing treatments helps eliminate bias caused by confounding variables which affect both the choice of the treatment and the clinical outcome. This is especially important when dealing with sequential treatments since these biases can compound over time.

#### Experimental design Edit

Experimental design for dynamic treatment regimes basically involves an initial randomization of patients to treatments, followed by re-randomizations at each subsequent time point of some or all of the patients to another treatment. The re-randomizations at each subsequent time point my depend on information collected after previous treatments, but prior to assigning the new treatment, such as how successful the prior treatment was. These types of trials were introduced and developed in Lavori & Dawson (2000), Lavori (2003) and Murphy (2005) and are often referred to as SMART trials (Sequential Multiple Assignment Randomized Trail). Some examples of SMART trials are the CATIE trial for treatment of Alzheimer's (Schneider et al. 2001) and the STAR*D trial for treatment of depression (Lavori et al. 2001, Rush, Trivedi & Fava 2003).

SMART trials attempt to conform better with the way clinical practice actually occurs, but still retain the advantages of experimentation over observation. They are clearly more complicated and high dimensional than just individually testing a set of treatments for a given time point. However, they are necessary for dealing with the problem of delayed effects. Several suggestions have been made to attempt to reduce complexity and resources needed such as combining data over same treatment sequences within different treatment regimes, splitting up trials into screening, refining and confirmatory trials rather than trying to attack the problem in a single trial (Collins et al. 2005), using fractional factorial designs rather than a full factorial design (Nair et al. 2008), and targeting primary analysis to simple regime comparisons (Murphy 2005).

#### Reward construction Edit

A critical part of finding the best dynamic treatment regime is the construction of a meaningful and comprehensive outcome variable, $R$. To construct a useful outcome variable, the goals of the treatment need to be well defined and quantifiable. The goals of the treatment can include multiple aspects of a patient's health and welfare, such as degree of symptoms, treatment side effects, time until treatment response, quality of life and cost. However, quantifying the various aspects of a successful treatment with single function can be difficult. The outcome variable should reflect how successful the treatment regime was in achieving the overall goals for each patient.

#### Variable selection and feature construction Edit

Analysis is often improved by the collection of any variables that might be related to the illness or the treatment. This is especially important when data is collected by observation, to avoid bias in the analysis due to unmeasured confounders. Subsequently more observation variables are collected than are actually needed to estimate optimal dynamic treatment regimes. Thus variable selection is often required as a preprocessing step on the data before algorithms used to find the best dynamic treatment regime are employed.

### Algorithms and Inference Edit

Given an experimental data set, the optimal dynamic treatment regime can be estimated from the data using a number of different algorithms. Inference can also be done to determine if the estimated optimal dynamic treatment regime results in significant improvements in the expected outcome over an alternated treatment regime.

Several algorithms exist for estimating optimal dynamic treatment regimes from data. Many of these algorithms were developed in the field of computer science to help robots and computers make optimal decisions in an interactive environment. These types of algorithms are often referred to as reinforcement learning methods (Sutton & Barto 1998) . The most popular of these methods used to estimate dynamic treatment regimes is called q-learning (Watkins 1989). In q-learning models are fit sequentially to estimate the Value of the treatment regime used to collect the data and then the models are optimized with respect to the treatment actions to find the best dynamic treatment regime. Many variations of this algorithm exist including modeling only portions of the Value of the treatment regime (Murphy 2003, Robins 2004).

## References Edit

• Collins, L.M.; Murphy, S.A.; Nair, V.; Strecher, V. (2005), "A strategy for optimizing and evaluating behavioral interventions", Annuls of Behavioral Medicine 30: 65-73
• Lavori, P. W.; Dawson, R. (2000), "A design for testing clinical strategies: biased adaptive within-subject randomization", Journal of the Royal Statistical Society, Series A 163: 29-38
• Lavori, P.W.; Rush, A.J.; Wisniewski,, S.R.; Alpert, J.; Fava, M.; Kupfer, D.J.; Nierenberg, A.; Quitkin, F.M.; et al. (2001), "Strengthening clinical effectiveness trials: Equipoise-stratified randomization", Biological Psychiatry 50: 792-801
• Lavori, P. W. (2003), "Dynamic treatment regimes: practical design considerations", Clinical Trials 1: 9–20
• Murphy, Susan A.; van der Laan, M. J.; Robins; CPPRG; (2001), "Marginal Mean Models for Dynamic Regimes", Journal of the American Statistical Association 96: 1410–1423
• Murphy, Susan A. (2003), "Optimal Dynamic Treatment Regimes", Journal of the Royal Statistical Society, Series B 65 (2): 331-366
• Murphy, Susan A. (2005), "An Experimental Design for the Development of Adaptive Treatment Strategies", Statistics in Medicine 24: 1455-1481
• Murphy, Susan A.; Daniel Almiral; (2008), "Dynamic Treatment Regimes", Encyclopedia of Medical Decision Making: #–#
• Nair, V.; Strecher, V.; Fagerlin, A.; Ubel, P.; Resnicow, K.; Murphy, S.; Little, R.; Chakraborty, B.; et al. (2008), "Screening Experiments and Fractional Factorial Designs in Behavioral Intervention Research", The American Journal of Public Health 98 (8): 1534-1539
• Robins, James M. (2004), "Optimal structural nested models for optimal sequential decisions", in Lin, D. Y.; Heagerty, P. J., Proceedings of the Second Seattle Symposium on Biostatistics, Springer, New York, pp. 189-326
• Robins, James M. (1986), "A new approach to causal inference in mortality studies with sustained exposure periods-application to control of the healthy worker survivor effect", Computers and Mathematics with Applications 14: 1393-1512
• Robins, James M. (1987), "Adendum to 'A new approach to causal inference in mortality studies with sustained exposure periods-application to control of the healthy worker survivor effect'", Computers and Mathematics with Applications 14: 923-945
• Rush, A.J.; Tivedi, M.; Fava (2003), "Depression IV: STAR*D treatment trial for depression", American Journal of Psychiatry 160 (2): 237
• Schneider, L.S.; Tariot, P.N.; Lyketsos, C.G.; Dagerman, K.S.; Davis, K.L.; Davis, S.; Hsiao, J.K.; Jeste, D.V.; et al. (2001), "National Institute of Mental Health clinical antipsychotic trials of intervention effectiveness (CATIE) Alzheimer disease trial methodology", American Journal of Geriatric Psychiatry 9 (4): 346-360
• Wagner, E. H.; Austin, B. T.; Davis, C.; Hindmarsh, M.; Schaefer, J.; Bonomi, A. (2001), "Improving Chronic Illness Care: Translating Evidence Into Action", Health Affairs 20 (6): 64-78
• Watkins, C. J. C. H. (1989), "Learning from Delayed Rewards", PhD thesis, Cambridge University, Cambridge, England