Klinikfarmakoloji.com2
Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescenceJoanna Le Noury,1 John M Nardo,2 David Healy,1 Jon Jureidini,3 Melissa Raven,3 Catalin Tufanaru,4
Elia Abi-Jaoude5
1School of Medical Sciences,
(HAM-D score ≤8 or ≥50% reduction in baseline HAM-D)
Bangor University, Bangor,
at acute endpoint. Prespecified secondary outcomes
To reanalyse SmithKline Beecham's Study 329
were changes from baseline to endpoint in depression
2Emory University, Atlanta,
(published by Keller and colleagues in 2001), the
items in K-SADS-L, clinical global impression,
primary objective of which was to compare the efficacy
autonomous functioning checklist, self-perception
3Critical and Ethical Mental
Health Research Group,
and safety of paroxetine and imipramine with placebo
profile, and sickness impact scale; predictors of
Robinson Research Institute,
in the treatment of adolescents with unipolar major
response; and number of patients who relapse during
University of Adelaide,
depression. The reanalysis under the restoring invisible the maintenance phase. Adverse experiences were to
Adelaide, South Australia,
and abandoned trials (RIAT) initiative was done to see
be compared primarily by using descriptive statistics.
4Joanna Briggs Institute, Faculty
whether access to and reanalysis of a ful dataset from
No coding dictionary was prespecified.
of Health Sciences, University of
a randomised controlled trial would have clinically
Adelaide, Adelaide, South
relevant implications for evidence based medicine.
Australia, Australia
The efficacy of paroxetine and imipramine was not
statistically or clinically significantly different from
5Department of Psychiatry, The
Hospital for Sick Children,
Double blind randomised placebo controlled trial.
placebo for any prespecified primary or secondary
University of Toronto, Toronto,
efficacy outcome. HAM-D scores decreased by 10.7
12 North American academic psychiatry centres, from
(least squares mean) (95% confidence interval 9.1 to
Correspondence to: J Jureidini
12.3), 9.0 (7.4 to 10.5), and 9.1 (7.5 to 10.7) points,
20 April 1994 to 15 February 1998.
respectively, for the paroxetine, imipramine and
Additional material is published
online only. To view please visit
placebo groups (P=0.20). There were clinically
275 adolescents with major depression of at least
the journal online (http://dx.doi.
significant increases in harms, including suicidal
eight weeks in duration. Exclusion criteria included a
ideation and behaviour and other serious adverse
Cite this as: BMJ 2015;351:h4320
range of comorbid psychiatric and medical disorders
events in the paroxetine group and cardiovascular
doi: 10.1136/bmj.h4320
and suicidality.
problems in the imipramine group.
Accepted: 03 August 2015
Participants were randomised to eight weeks double
Neither paroxetine nor high dose imipramine showed
blind treatment with paroxetine (20-40 mg),
efficacy for major depression in adolescents, and there
imipramine (200-300 mg), or placebo.
was an increase in harms with both drugs. Access to
Main OutCOMe Measures
primary data from trials has important implications for
The prespecified primary efficacy variables were
both clinical practice and research, including that
change from baseline to the end of the eight week
published conclusions about efficacy and safety
acute treatment phase in total Hamilton depression
should not be read as authoritative. The reanalysis of
scale (HAM-D) score and the proportion of responders
Study 329 illustrates the necessity of making primary
trial data and protocols available to increase the rigour
of the evidence base.
WhAT IS AlReAdy knoWn on ThIS TopIC
There is a lack of access to data from most clinical randomised controlled trials,
making it difficult to detect biased reporting
In 2013, in the face of the selective reporting of outcomes
In the absence of access to primary data, misleading conclusions in publications of
of randomised controlled trials, an international group of
those trials can seem definitive
researchers called on funders and investigators of aban-
doned (unpublished) or misreported trials to publish
SmithKline Beecham's Study 329, an influential trial that reported that paroxetine
undisclosed outcomes or correct misleading publica-
was safe and effective for adolescents, is one such study
tions This initiative was called "restoring invisible and
WhAT ThIS STudy AddS
abandoned trials" (RIAT). The researchers identified
On the basis of access to the original data from Study 329, we report a reanalysis
many trials requiring restoration and emailed the funders,
that concludes that paroxetine was ineffective and unsafe in this study
asking them to signal their intention to publish the unpub-
lished trials or publish corrected versions of misreported
Access to primary data makes clear the many ways in which data can be analysed
trials. If funders and investigators failed to undertake to
and represented, showing the importance of access to data and the value of
correct a trial that had been identified as unpublished or
reanalysis of trials
misreported, independent groups were encouraged to
There are important implications for clinical practice, research, regulation of trials,
publish an accurate representation of the clinical trial
licensing of drugs, and the sociology and philosophy of science
based on the relevant regulatory information.
Our reanalysis required development of methods that could be adapted for future
The current article represents a RIAT publication
reanalyses of randomised controlled trials
of Study 329. The original study was funded by
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
SmithKline Beecham (SKB; subsequently GlaxoSmith- information on the safety profile of paroxetine and imip-
Kline, GSK). We acknowledge the work of the original ramine when these drugs were given for "an extended
investigators. This double blinded randomised con- period of time"; and to estimate the rate of relapse among
trolled trial to evaluate the efficacy and safety of parox- patients who responded to imipramine, paroxetine, and
etine and imipramine compared with placebo for placebo and were maintained on treatment. Study enrol-
adolescents diagnosed with major depression was ment took place between April 1994 and March 1997.
reported in the Journal of the American Academy of
The first RIAT trial publication was a surgery trial that
Child and Adolescent Psychiatry (JAACAP) in 2001, with had been only partly published before. Few previously
Martin Keller as the primary author. The RIAT published randomised controlled trials have ever been
researchers identified Study 329 as an example of a subsequently reported in published papers by different
misreported trial in need of restoration. The article by teams of authors
Keller and colleagues, which was largely ghostwritten,
claimed efficacy and safety for paroxetine that was at Methods
This is problematic because the We reanalysed the data from Study 329 according to the
article has been influential in the literature supporting RIAT recommendations. To this end, we used the clini-
the use of antidepressants in adolescents.
cal study report (SKB's "final clinical report"), including
On 14 June 2013, the RIAT researchers asked GSK appendices A-G, which are publically available on the
whether it had any intention to restore any of the trials GSK websit other publically available documents,
it sponsored, including Study 329. GSK did not signal and the individual participant data accessed through
any intent to publish a corrected version of any of its SAS Solutions OnDemand websit on which GSK sub-
trials. In later correspondence, GSK stated that the sequently also posted some Study 329 documents (avail-
study by Keller and colleagues "accurately reflects the able only to users approved by GSK). After neg
honestly-held views of the clinical investigator authors" GSK posted about 77 000 pages of de-identified individual
and that GSK did "not agree that the article is false, case report forms (appendix H) on that website. We used
fraudulent or misleading
a tool for documenting the transformation from regula-
Study 329 was a multicentre eight week double blind tory documents to journal publication, based on the CON-
randomised controlled trial (acute phase), followed by a SORT 2010 checklist of information to include when
six month continuation phase. SKB's stated primary reporting a randomised trial. The audit record, includ-
objective was to examine the efficacy and safety of imip- ing a table of sources of data consulted in preparing
ramine and paroxetine compared with placebo in the each part of this paper, is available in appendix 1.
treatment of adolescents with unipolar major depres-
Except where indicated, in accordance with RIAT
sion. Secondary objectives were to identify predictors of recommendations, our methods are those set out in the
treatment outcomes across clinical subtypes; to provide 1994-96 protocol for Study 329.hen the meth-
ods used and published by Keller and colleagues diverged
from the protocol, we followed the original protocol.
box 1 study eligibility criteria
Because the protocol specified method of correction for
missing values—last observation carried forward—has
• Adolescents aged 12-18 who met DSM-III-R criteria for major depression for at least
been questioned in the intervening years, we also
included a more modern method—multiple imputation—
• Severity score <60 on the children's global assessment scale (CGAS)
at the request of the BMJ peer reviewers. This is a post hoc
• Score ≥12 on the Hamilton depression scale (17 item) (HAM-D)
method added for comparison only and is not part of our
• Medically healthy
formal reanalysis. When the protocol was not specific, we
• IQ ≥80 (based on Peabody picture vocabulary test)
chose by consensus standard methods that best pre-
sented the data. The original 1993 protocol had minor
amendments in 1994 and 1996 (replacement of the
• Current or past DSM-III-R diagnosis of bipolar disorder, schizoaffective disorder,
anorexia nervosa, bulimia, alcohol or drug abuse/dependence, obsessive-
Schedule for Affective Disorders and Schizophrenia for
compulsive disorder, autism/pervasive mental disorder, or organic psychiatric
Adolescents-Present Version with the Lifetime Version
(K-SADS-L) and reduction in required sample size). Fur-
• Current (within 12 months) DSM-III-R diagnosis of post-traumatic stress disorder
thermore, the clinical study report (CSR) reported some
• Adequate trial of an antidepressant within six months (at least four weeks'
procedures that varied from those specified in the proto-
treatment with an adequate dose of antidepressant)
col. We have noted variations that we considered relevant.
• Suicidal ideation with a definite plan, suicide attempt during current depressive
episode, or history of suicide attempt by drug overdose
• Medical illness that contraindicates the use of heterocyclic antidepressants
The original study recruited 275 adolescents aged 12-18
• Current use of psychotropic drugs (including anxiolytics, antipsychotics, mood
who met DSM-IV criteriafor a current episode of
stabilisers), or illicit drugs
major depression of at least eight weeks' duration
• Organic brain disease, epilepsy, or "mental retardation"
(the protocol specified DSM-III-R criteria, which are
• Pregnancy or lactation
similar). Box 1 lists the eligibility criteria.
• Sexually active females not using reliable contraception
An unknown number of patients (not disclosed in the
• Use of an investigational drug within previous 30 days or five half lives of the
available documents) identified by telephone screening
investigation drug
as potential participants were subsequently evaluated
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
at the study site by a senior clinician (psychiatrist or of 4 in the HAM-D total score from baseline to endpoint,
psychologist). Multiple meetings and teleconferences specified in the protocol to be large enough to be clini-
were held by the sponsoring company with site study cally meaningful, considering a standard deviation of
investigators to ensure standardisation across sites. 10. No allowance was made in the power calculation for
Patients and parents were interviewed separately with attrition (anticipated dropout rate) or non-compliance
the K-SADS-L. After this initial assessment, the patient during the study.
and parent both signed the study informed consent
Recruitment was slower than expected, and report-
form; there was no mention of a separate assent form in edly supplies of treatment (mainly placebo) ran short
the protocol or in the CSR. A screening period of seven due to exceeding the expiry date. The researchers car-
to ten days was used to obtain past clinical records and ried out a midcourse evaluation of 189 patients, without
to document that the depressive symptoms were stable. breaking the blinding, which showed less variability in
At the end of the screening period, only patients con- HAM-D scores (SD 8) than expected. Therefore the
tinuing to meet the inclusion criteria (DSM-III-R major recruitment target was reduced to 275 on the grounds
depression and the HAM-D total score ≥12) were ran- that it would have no negative impact on the estimated
domised. There was no placebo lead-in phase.
80% power required to detect a 4 point difference
There were originally six study sites, but this was between placebo and active drug groups.
increased to 12 (10 in the United States and two in Can-
ada). The centres were affiliated with either a university randomisation
or a hospital psychiatry department and had experi- A computer generated randomisation list of 360 num-
ence with adolescent patients. The investigators were bers for the acute phase was generated and held by
selected for their interest in the study and their ability SKB. According to the CSR, treatments were balanced in
to recruit study patients.
blocks of six consecutive patients; however, there is an
The recruitment period ran from 20 April 1994 until 15 inconsistency in that appendix A randomisation code
March 1997, and the acute phase was completed on 7 details block sizes of both six and eight. Each investiga-
May 1997. In a small number of patients, 30 day fol- tor was allocated a block of consecutively numbered
low-up data for cases that went into the continuation treatment packs, and patients were assigned treatment
phase were collected into February 1998.
numbers in strict sequential order. Patients were ran-
domised in a 1:1:1 ratio to treatment with paroxetine,
imipramine, or placebo.
So far as we can ascertain, there was no patient involve-
ment in SKB's study design.
Paroxetine was supplied as film coated, capsule shaped
yellow (10 mg) and pink (20 mg) tablets. Imipramine
The study drug was provided to patients in weekly blis- (50 mg) was bought commercially and supplied as
ter packs. Patients were instructed to take the drug green film coated round 50 mg tablets. "Paroxetine pla-
twice daily. There were six dosing levels. Over the first cebos" matched the paroxetine 20 mg tablets, and
four weeks, all patients were titrated to level four, corre- "imipramine placebos" matched the imipramine tab-
sponding to 20 mg paroxetine or 200 mg imipramine, lets. All tablets were over-encapsulated in bluish-green
regardless of response. Non-responders (those failing to capsules to preserve blinding.
reach responder criteria) could be titrated up to level
The blinding was to be broken only in the event of a
five or six over the next four weeks. This corresponds to serious adverse event that the investigator thought
maximum doses of 60 mg paroxetine and 300 mg imip- could not be adequately treated without knowing the
identity of the allocated study treatment. The identity of
Compliance with treatment was evaluated from the the study treatment was not otherwise to be disclosed to
number of capsules dispensed, taken, and returned. the investigator or SKB staff associated with the study.
Non-compliance was defined as taking less than 80% or
more than 120% of the number of capsules, assessed Outcomes
from the numbers expected to be returned at two con- Patients were evaluated weekly for the following out-
secutive visits, and resulted in withdrawal. Any patient come variables during the eight week duration of the
missing two consecutive visits was also withdrawn from acute treatment phase.
Patients were provided with 45 minute weekly ses- Primary efficacy variables
sions of supportive psychotherapy, primarily for the The prespecified primary efficacy variables were
purpose of assessing the effects of treatment.
change in total scor from the beginning
of the treatment phase to the endpoint of the acute
sample size
phase and the proportion of responders at the end of
The acute phase of the trial was initially based on a the eight week acute treatment phase (longer than
power analysis that indicated that a sample size of 100 many antidepressant trials). Responders were defined
patients per treatment group was required to have a sta- as patients who had ≥50% reduction in the HAM-D or
tistical power of 80% for a two tailed α of 0.05 and an a HAM-D score of ≤8. (Scores on the HAM-D can vary
effect size of 0.40. This effect size entailed a difference from 0 to 52.)
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
Secondary efficacy variables
final analysis, there were no statistically or clinically
The prespecified secondary efficacy variables were:
significant findings for any outcome variable, so correc-
tions were not needed for this analysis.
• Changes from baseline to endpoint in:
Depression items in K-SADS-L
Statistical testing
Clinical global impression (CGI)
The protocol called for ANOVA testing (generalised lin-
Autonomous functioning chec
ear model) for continuous variables using a model that
Self perception profile
included the effects of site, treatment, and site × treat-
Sickness impact scale.
ment interaction, with the latter dropped if P≥0.10.
• Predictors of response (endogenous subtypes, age, Logistical regression (2×3 χ2) was prescribed for cate-
previous episodes, duration and severity of present gorical variables under the same model. Both methods
episode, comorbidity with separate anxiety, atten- begin with an omnibus statistic for the overall signifi-
tion deficit, and conduct disorder)
cance of the dataset, then progress to pairwise testing
• The number of patients who relapsed during the if, and only if, the omnibus statistic meets α=0.05.
maintenance phase (referred to in the CSR and in this Yet all statistical outcomes in the CSR and published
paper as "continuation phase").
paper were reported only as the pairwise values for only
two of the three possible comparisons (paroxetine v pla-
Both before and after breaking the blind, however, cebo and imipramine v placebo), with no mention of the
the sponsors made changes to the secondary outcomes omnibus statistic. Therefore, we conducted the required
as previousl We could not find any docu- omnibus analyses, with negative results as shown. The
ment that provided any scientific rationale for these pairwise values are available in table A in appendix 2.
post hoc changes,comes are therefore not
reported in this paper.
Missing values
The protocol called for evaluation of the observed case
Challenges in carrying out riat
and last observation carried forward datasets, with the
To our knowledge this is the first RIAT analysis of a mis- latter being definitive. The last observation carried for-
reported trial by an external team of authors, so there ward method for correcting missing values was the
are no clear precedents or guides. Challenges we have standard at the time the study was conducted. It contin-
encountered included:
ues to be widely used, although newer models such as
multiple imputation or mixed models are superior.
Potential or perceived bias
We chose to adhere to the protocol and use the last
A RIAT report is not intended to be a critique of a previ- observation carried forward method, including multi-
ous publication. The point is rather to produce a thor- ple imputation for comparison only.
ough independent analysis of a trial that has remained
unpublished or called into question. We acknowledge, Outcome variables not specified in protocol
however, that any RIAT team might be seen as having an There were four outcome variables in the CSR and in
intrinsic bias in that questioning the earlier published the published paper that were not specified in the proto-
conclusions is what brought some members of the team col. These were the only outcome measures reported as
together. Consequently, we took all appropriate proce- significant. They were not included in any version of the
dural steps to avoid such putative bias. In addition, we protocol as amendments (despite other amendments),
have made the data available for others to analyse.
nor were they submitted to the institutional review board.
The CSR (section 3.9.1) states they were part of an "analy-
Correction for testing multiple variables
sis plan" developed some two months before the blinding
We had multiple sources of information: the protocol; was broken. No such plan appears in the CSR, and we
the published paper; the documents posted on the GSK have no contemporaneous documentation of that claim,
website including the CSR and individual patient data; despite having repeatedly requested it from GSK.
and the raw primary data in the case report forms pro-
vided by GSK on a remote desktop for this project. The Conclusions
protocol declared two primary and six secondary vari- We decided that the best and most unbiased course of
ables for the three treatment groups in two differing action was to analyse the efficacy data in the individual
datasets (observed case and last observation carried patient data based on the last guaranteed a priori ver-
forward). The CSR contained statistical comparisons on sion of SKB's own protocol (1994, amended in 1996 to
28 discrete variables using two comparisons (paroxe- accept a reduced sample size). Although the protocol
tine v placebo and imipramine v placebo) in the two omitted a discussion of corrections that we would have
datasets (observed case and last observation carried thought necessary, correction for multiple variables is
forward). The published paper listed eight variables designed to prevent false positives and there were no
with two statistical comparisons each in one dataset positives. We agreed with the statistical mandates of the
(last observation carried forward). The authors of the protocol, but though we regarded pairwise comparisons
original paper, however, did not deal with the need for in the absence of overall significance as inappropriate,
corrections for multiple variables—a standard require- we recognise that this is not a universal opinion, so we
ment when there are multiple outcome measures. In the included the data in table A in appendix 2.
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
Finally, although investigators can explore the data
Data on adverse events come from the CSR lodged on
however they want, additional outcome variables out- GSK's websit primarily in appendix D. Appendix B
side those in the protocol cannot be legitimately declared provides details of concomitant drugs. Additional infor-
once the study is underway, except as "exploratory vari- mation was available from the summary narratives in
ables"—appropriate for the discussion or as material for the body of the CSR for patients who had adverse events
further study but not for the main analysis. The a priori that were designated as serious or led to withdrawal.
protocol and blinding are the bedrock of a randomised (Of the 11 patients taking paroxetine who experienced
controlled trial, guaranteeing that there is not even the adverse events designated as serious, nine discontin-
possibility of the HARK phenomenon ("hypothesis after ued treatment because of these events.) The large num-
results known"). Though we can readily show that none ber of other patients discontinued because of adverse
of the reportedly "positive" four non-protocol outcome events that were not regarded as serious, or discontin-
variables stands up to scrutiny, the primary mandate of ued for lack of efficacy or protocol violations, however,
the RIAT enterprise is to reaffirm essential practices in did not generate patient narratives. The tables in
randomised controlled trials, so we did not include these appendix D of the CSR provide the verbatim terms used
variables in our efficacy analysis.
by the blinded investigators, along with preferred terms
as coded by SKB using the adverse drug events coding
system (ADECS) dictionary. Appendix D also includes
An adverse experience/event was defined in the proto- ratings of severity and ratings of relatedness. We used
col (page 18) as "any noxious, pathologic or unintended the Medical Dictionary for Regulatory Activities (Med-
change in anatomical, physiologic or metabolic func- DRA) to code the verbatim terms provided in appendix D
tions as indicated by physical signs, symptoms and/or in the CSR. MedDRA terminology is the international
laboratory changes occurring in any phase of the clini- medical terminology developed under the auspices of
cal trial whether associated with drug or placebo and the International Conference on Harmonisation of
whether or not considered drug related. This includes Technical Requirements for Registration of Pharmaceu-
an exacerbation of pre-existing conditions or events, ticals for Human Used
intercurrent illnesses, drug interaction or the signifi- by the FDA, and now used by GSK.
cant worsening of the disease under investigation that
Several limitations of the ADECS coded preferred
is not recorded elsewhere in the case report form under terms provided in appendix D of the CSR became clear
specific efficacy assessments."
when we examined the ADECS preferred terms assigned
Adverse events were to be elicited by the investigator to the verbatim terms. Firstly, several verbatim terms
asking a non-leading question such as: "Do you feel dif- had been left uncoded into ADECS. Secondly, several
ferent in any way since starting the new treatment/the adverse events found in the patient narratives of serious
last assessment?" Details of adverse events that adverse events that led to discontinuation from the trial
emerged with treatment, their severity, including any were not transcribed into appendix D.
change in study drug administration, investigator attri-
We therefore approached GSK for access to case
bution to study drug, any corrective therapy given, and report forms (appendix H of the CSR), which are not
outcome status were documented. Attribution or rela- publically available. GSK made available all 275 case
tion to study drug was judged by the investigator to be report forms for patients entered into Study 329. These
"unrelated," "probably unrelated," "possibly related," forms, however, which totalled about 77 000 pages,
"probably related," or "related."
were available only through a remote desktop facility
Vital signs and electrocardiograms were obtained at (SAS Solutions OnDemand Secure Por which
weekly visits. Patients with potentially concerning car- made it difficult and extremely time consuming to
diovascular measures either had their drug dose inspect the records properly. Effectively only one per-
reduced or were withdrawn from the study. In addition, son could undertake the task, with backup for ambigu-
if the combined serum concentrations (obtained at ous cases. Accordingly we could not examine all case
weeks four and eight) of imipramine and desipramine report forms. Instead we decided to focus on those 85
exceeded 500 µg/mL the patient was to be withdrawn participants identified in appendices D and G of the CSR
from the study.
who were withdrawn from the study, along with eight
Clinical laboratory tests, including clinical chemis- further participants who were known from our inspec-
try, haematology, and urinalysis, were carried out at the tion of the CSRs to have become suicidal. Of the case
screening visit and at the end of week eight. Clinically report forms that were checked, 31 were from the parox-
relevant laboratory abnormalities were to be included etine group, 40 from the imipramine group, and 22 from
as adverse events.
the placebo group.
All case report forms were reviewed by JLN, who was
source of harms data
trained in the use of MedDRA. The second reviewer
The harms data in this paper cover the acute phase, a (JMN), a clinician, was not trained in the MedDRA
taper period, and a follow-up phase of up to 30 days for system, but training is not necessary for coding of
those who discontinued treatment because of adverse dropouts. These two reviewers agreed about reasons
events. To ensure comparability with the report by for discontinuation and coding of side effects (we did
Keller and colleagues, none of the tables contains data not use a quantitative indicator of agreement between
from the continuation phase.
raters). We scrutinised these 93 case report forms for all
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
adverse events occurring during the acute, taper, and erable difference to the apparent adverse event profile
follow-up phases, and compared our totals for adverse of a drug. In staying closer to the original description of
events with the totals reported in appendix D of the events, MedDRA codes suicidal events as "suicidal ide-
CSR. This review process identified additional adverse ation" or "self harm/attempted suicide" rather than the
events that had not been recorded as verbatim terms in ADECS option of "emotional lability"; similarly, aggres-
appendix D of the CSR. It also led to recoding of several sion is more clearly flagged as "aggressive events"
of the reasons for discontinuation. Tables B, C, and H in rather than "hostility."
appendix 2 show the new adverse events and the rea-
Most coding was straightforward. Nearly all the verba-
sons for changing the discontinuation category.
tim terms simply mapped onto coding terms in MedDRA.
At least 1000 pages were missing from the case report Coding challenges usually related to cases where there
forms we reviewed, with no discernible pattern to miss- were significant adverse events but the patients were
ing information—for example, one form came with a designated by SKB to have discontinued for lack of effi-
page inserted stating that pages 114 to 223 were missing, cacy. There was no patient narrative for such patients, in
without indicating reasons.
contrast to patients deemed to have discontinued
because of the adverse event occurring at discontinua-
Coding of adverse events
tion. There were few challenging coding decisions.
Choice of coding dictionary for harms
Appendix 3 shows our coding of cases in which suicidal
The protocol (page 25) indicates that adverse events and self injurious behaviours were considered.
were to be coded and compared by preferred term and
body system by using descriptive statistics but does not analysis of harms data
prespecify a choice of coding dictionary for generating In analysing the harms data for the safety population,
preferred terms from verbatim terms. The CSR (written we firstly explored the discrepancies in the number of
after the study ended) specifies that the adverse events events between case report forms and the CSR. Sec-
noted by clinical investigators in this trial were coded ondly, we presented all adverse events rather than those
with ADECS, which was being used by SKB at the time. happening only at a particular rate (as done by Keller
This system was derived from a coding system devel- and colleagues). Thirdly, we grouped events into
oped by the US Food and Drug Administration (FDA), broader system organ class (SOC) groups: psychiatric,
Coding Symbols for a Thesaurus of Adverse Reaction cardiovascular, gastrointestinal, respiratory, and other.
Terms (COSTART), but ADECS is not itself a recognised Table D in appendix 2 summarises all adverse events by
system and is no longer available.
all MedDRA SOC groupings. Fourthly, we broke down
We coded adverse events using MedDRA, which has events by severity, selecting adverse events coded as
replaced COSTART for the FDA because it is by far the severe and using the listing in appendix G of the CSR of
most commonly used coding system today. For coding patients who discontinued for any reason. Fifthly, we
purposes, we have taken the original terms used by the included an analysis of the effects of previous treatment,
clinical investigators, as transcribed into appendix D presenting the run-in phase profiles of drugs taken by
of the CSR, and applied MedDRA codes to these patients entering each of the three arms of the study and
descriptions. Information from appendix D was tran- comparing the list of adverse events experienced by
scribed into spreadsheets (availapatients on concomitant drugs (from appendix B) versus
erbatim terms and the ADECS coding terms those not on other drugs. Finally, we extracted the
were transcribed first into these sheets, allowing all events occurring during the taper and follow-up phase.
coding to be done before the drug names were added
We did not undertake statistical tests of harms data,
in. The transcription was carried out by a research as discussed below.
assistant who was a MedDRA trained coder but took
no part in the actual coding. All coding was carried out Patient withdrawal
by JLN, and checked by DH, or vice versa. All of our A study patient could withdraw or be withdrawn prema-
coding from the verbatim terms in the appendix D of turely for "adverse experiences including intercurrent
the CSR was done blind, as was coding from the case illness," "insufficient therapeutic effect," "deviation from
report forms.
protocol including non-compliance," "loss to follow-up,"
We present results as SKB presented them in the CSR "termination by SB [SKB]," and "other (specify)."
using the ADECS dictionary (table 14.2.1), and as coded
The CSR states that the primary reason for with-
by us using MedDRA. In general, MedDRA coding stays drawal was determined by the investigator. We reviewed
closer than ADECS to the original clinician description the codes given for discontinuation from the study,
of the event. For instance, MedDRA codes "sore throat" which are found in appendix G of the CSR, and we made
as "sore throat" but SKB, using ADECS, coded it as changes in a proportion of cases.
"pharyngitis" (inflammation of the throat). Sore throats
can arise because of pharyngitis, but when someone is statistical methods
taking selective serotonin reuptake inhibitors they can The primary population of interest was the intention to
indicate a dystonic reaction in the oropharyngeal artreat population that included all patients who received
Classification of a problem as a "respiratory system at least one dose of study drug and had at least one
disorder" (inflammation) rather than as a "dystonia" assessment of efficacy after baseline. The demographic
(a central nervous system disorder) can make a consid- characteristics, description of the baseline depressive
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
episode, additional psychiatric diagnoses, and per- edge differing opinions about this issue in the statistical
sonal history variables of the patients were summarised literature so we included them in table A in appendix 2,
descriptively by treatment group.
for completeness). The categorical variables were anal-
The acute phase eight week endpoint was our primary ysed with logistic regression, with the same effects
interest. Statistical conclusions concerning the efficacy included. In either case, if the treatment by investigator
of paroxetine and imipramine were made by using data interaction resulted in a two sided P>0.10, the interac-
obtained from the last observation carried forward (that tion term was dropped from the model. Statistical
is, the last assessment "on therapy" during the acute testing was done with the linear model (LM) and gen-
phase) and observed case datasets. Paroxetine and imip- eral linear models (GLM) procedures of the R statistical
ramine were each to be compared with placebo; there package (version 2.15.2) as provided by GSK. Imputation
was to be no comparison of paroxetine with imipramine. was performed with the multiple imputation by chained
We followed the methods of the a priori 1994 study equations (MICE) package also in R.
protocol (amended in 1996 to accept a reduced sample
For the analyses of relapse rates, we included all
size). It did not provide explicit statistical hypotheses responders (HAM-D ≤8 or ≥50% reduction in symptoms)
(null hypotheses and alternative hypotheses); nor were who met the original criteria for entry to the continua-
there justifications for the proposed statistical tion phase of the study. Patients were considered to have
approaches or statistical assumptions underlying them. relapsed if they no longer met the responder criteria or if
One of the two primary efficacy variables, proportion they were withdrawn for "intentional overdose."
of responders (response), and one secondary efficacy
variable, proportion of patients relapsing, were treated Results
as categorical variables. The second primary efficacy Table 1 shows the demographics of the groups, along
variable, change in total HAM-D score over the acute with depression parameters, comorbidities, and base-
phase, and the remaining secondary efficacy variables line scores for the efficacy variables.
were treated as continuous variables.
Figure 1 summarises the allocations and discontinua-
In accordance with the protocol, the continuous tions among the three treatment groups during the acute
variables were analysed with parametric analysis of stud The flow chart covers the intention to treat
variance (ANOVA) with effects in the model including population for the acute phase and the efficacy analysis.
treatment, investigator, and treatment by investigator The paroxetine group was titrated to a dose of 20 mg/day
interaction. Pairwise comparisons were not done if the by week four, with 55% (51/93) of participants moving to
omnibus (overall) ANOVA was not significant (two a higher dose (mean 28.0 mg/day, SD 8.4 mg) by week
sided P<0.05), as specified by the protocol (we acknowl- eight. The imipramine group was titrated to 200 mg/day
by week four, with 40% (38/95) moving to a higher dose
(mean 205.8 mg/day, SD 63.9 mg) by week eight. Twenty
table 1 baseline characteristics of groups in study 329
eight patients reached the highest permissible dose of
Paroxetine (n=93) imipramine (n=95) Placebo (n=87)
40 mg of paroxetine, and 20 patients were titrated to the
Mean (SD) age (years)
maximum 300 mg of imipramine.
Sex (male/female)
There were no discrepancies between any of our
African American
analyses and those contained in the CSR. Figures 2 and 3
illustrate the longitudinal values for the two primary
efficacy variables: mean change from baseline in the
Mean (SD) duration of episode (months) 14 (18)
HAM-D score and the percentage responding, defined as
Mean (SD) age at first episode (years)
a decrease in HAM-D score by 50% or more from base-
No (%) of previous episodes:
line or a final HAM-D score of ≤8. The difference between
paroxetine and placebo fell short of the prespecified
level of clinical significance (4 points) and neither pri-
mary outcome achieved significance at any measured
interval for any dataset during the acute phase.
No (%) with comorbidity:
The formal reanalysis included both observed case
Any comorbid disorder *
Current anxiety disorder*
and last observation carried forward datasets. As men-
ODD, CD, or ADHD*
tioned above, the multiple imputation dataset is
Least squares mean baseline scores (SEM):
included for comparison. There was no statistical
significance (considered at P<0.05) or clinical signifi-
cance shown for any of the prespecified primary or
Autonomous function
secondary efficacy variables in either the observed case
Self perception profile
or last observation carried forward datasets, so pairwise
Sickness impact profile
analysis was considered unjustified. Table 2 shows the
ODD=oppositional defiant disorder, CD=conduct disorder, ADHD=attention-deficit/hyperactivity disorder,
HAM-D=Hamilton depression scale, K-SADS-L=affective disorders and schizophrenia for adolescents-lifetime
results at week eight for reduction in HAM-D score and
version, SD=standard deviation, SEM=standard error of mean.
for the proportion of patients who met criteria for
*From K-SADS-L structured interview at screening.
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
Randomised (n=275)
Randomised (n=93)
Randomised (n=95)
Randomised (n=87)
Acute phase discontinuation (n=26):
Acute phase discontinuation (n=39):
Acute phase discontinuation (n=21):
Adverse events (n=13)
Adverse events (n=31)
Adverse events (n=6)
Lack of efficacy (n=3)
Lack of efficacy (n=0)
Lack of efficacy (n=4)
Protocol violations (n=1)
Protocol violations (n=6)
Protocol violations (n=9)
Lost to follow-up (n=4)
Lost to follow-up (n=1)
Lost to follow-up (n=1)
Withdrawn consent (n=5)
Withdrawn consent (n=1)
Withdrawn consent (n=1)
8 weeks (n=67; 72%)
8 weeks (n=56; 59%)
8 weeks (n=66; 76%)
Post-acute discontinuation (n=16):
Post-acute discontinuation (n=17):
Post-acute discontinuation (n=32):
Adverse events (n=5)
Adverse events (n=4)
Adverse events (n=0)
Lack of efficacy (n=5)
Lack of efficacy (n=8)
Lack of efficacy (n=17)
Protocol violations (n=2)
Protocol violations (n=4)
Protocol violations (n=4)
Lost to follow-up (n=1)
Lost to follow-up (n=0)
Lost to follow-up (n=0)
Withdrawn consent (n=2)
Withdrawn consent (n=0)
Withdrawn consent (n=5)
HAM-D responder (n=1)
HAM-D responder (n=1)
HAM-D responder (n=6)
32 weeks (n=51; 55%)
32 weeks (n=39; 41%)
32 weeks (n=34; 39%)
Fig 1 group allocations and discontinuations in trial of paroxetine and imipramine in treatment of major depression in
HAM-D scores decreased by 10.7 (95% confidence
interval 9.1 to 12.3), 9.0 (7.4 to 10.5), and 9.1 (7.5 to, 10.7)
points (least squares mean) for the paroxetine, imipra-
mine, and placebo groups, respectively.
Table 3 shows the results at eight weeks for the sec-
ondary efficacy variables.
Although the protocol listed "predictors of response"
among the secondary efficacy variables, the absence of
statistically or clinically significant differences among
HAM-D difference (observed cases) -15
the three arms rendered this analysis void.
The protocol also listed the relapse rate in the contin-
uation phase for responders as a secondary outcome
Fig 2 Differences in HaM-D scores in study of efficacy and
variable. Our calculation differed from that in the CSR
harms of paroxetine and imipramine in treatment of major
because we included those whose HAM-D scores rose
depression in adolescence (table 2 shows numerical
above the "response" range and those who intention-
values). Points are least squares means (95% Ci).
ally overdosed. In the continuation phase, the dropout
lOCF=last observation carried forward, Mi=multiple
rates were too high in all groups for any precise inter-
pretation: 33/51 (65%) in the paroxetine group; 25/39
(64%) in the imipramine group; and 21/34 (62%) in the
placebo group. The recorded relapses were 25/51 (49%),
16/39 (41%), and 12/34 (35%), respectively. Although the
relapse rate was lower in the placebo group, the differ-
ences were not significant (2×3 χ2 P=0.44).
Review of case report forms
We reviewed case report forms in appendix H for 93
(34%) of 275 patients. We discovered adverse events
recorded onto case report forms but not transcribed into
HAM-D % responders (observed cases)
the patient level listings of adverse events in appendix D
of the CSR. Table 4 shows these discrepancies. The most
Fig 3 Differences in HaM-D % responders in study of
common categories of additional adverse events found
efficacy and harms of paroxetine and imipramine in
in case report forms were psychiatric for paroxetine
treatment of major depression in adolescence (table 2
shows numerical values). lOCF=last observation carried
(12/23) and placebo (4/10) and cardiovascular for imip-
ramine (5/17) (table B in appendix 2).
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
table 2 Datasets for primary efficacy variables at eight weeks and proportion of patients who met criteria for HaM-D response >50% drop or <8 in study
329 for observed cases (OC), last observation carried forward (lOCF), and multiple imputation
Data
HaM-D change
least squares mean
least squares mean (95% Ci), seM
(95% Ci), seM
least squares mean (95% Ci), seM patients
−12.2 (−13.1 to −10.5), 0.88
−10.6 (−12.5 to −8.7), 0.97 56
−10.5 (−12.3 to −8.8), 0.88
−10.7 (−12.3 to −9.1), 0.81
−9.0 (−10.5 to −7.4), 0.81
−9.1 (−10.7 to −7.5), 0.83
−12.5 (−14.2 to −10.9), 0.83
−11.1 (−12.9 to −9.4), 0.89
−10.7 (−12.4 to −9.1), 0.83
HaM-D response (>50% reduction or <8)
Criteria met
Criteria met
Criteria met
HAM-D=Hamilton depression scale.
*Al P values uncorrected for multiple variable sampling.
table 3 Datasets for secondary efficacy variables at eight weeks in study 329 for observed cases (OC), last observation carried forward (lOCF), and
least squares mean
least squares mean
least squares mean
K-SADS-L change OC
−12.1 (−13.8 to −10.3)
−10.7 (−12.7 to −8.7)
−10.7 (−12.5 to −8.9)
−11.4 (−13.1 to −9.8)
−9.5 (−11.1 to −7.9)
−9.4 (−11.0 to −7.8)
−12.3 (−13.9 to −10.6)
−11.5 (−13.3 to −9.7)
−10.9 (−12.6 to −9.2)
Clinical global impression mean score OC
Autonomous function check list change OC
14.4 (8.8 to 19.9)
13.3 (7.3 to 19.4)
9.3 (3.8 to 14.8)
14.7 (9.2 to 20.2)
11.6 (5.8 to 17.3)
9.3 (8.1 to 17.2)
14.0 (8.7 to 19.3)
14.5 (9.4 to 19.6)
9.1 (4.2 to 14.1)
Self perception profile change OC
12.9 (8.3 to 17.5)
13.2 (8.4 to 18.1)
12.7 (6.9 to 15.9)
13.2 (8.6 to 17.8)
13.1 (8.3 to 17.8)
11.4 (6.9 to 15.9)
15.4 (10.7 to 20.0)
14.7 (10.0 to 19.4)
Sickness impact profile change OC
−11.2 (−14.3 to −8.1)
−13.5 (−16.9 to −10.2)
−10.6 (−13.7 to −7.5)
−11.4 (−14.4 to −8.3)
−13.0 (−16.2 to −9.8)
−9.9 (−12.9 to −6.9)
−11.5 (−14.2 to −8.7)
−13.9 (−16.8 to −10.9)
−10.1 (−13.0 to −7.1)
K-SADS-L=affective disorders and schizophrenia for adolescents-lifetime version.
*ANCOVA. All P values uncorrected for multiple variable sampling.
adverse events always fall within a particular system
table 4 adverse events found in case report forms (CrFs) compared with adverse events
organ class; others require that the coder choose
listed in appendix D of clinical study report of study 329
between system organ classes. A full listing of adverse
Paroxetine
imipramine*
events can be found in table E in appendix 2.
Adverse events found in CRFs (appendix H)
We included events occurring during the taper
Adverse events found in appendix D
phase that SKB allocated to the continuation phase as
% underestimate in relying only on appendix D
acute phase adverse events. In a study that has a con-
*In considering adverse effects from imipramine, it should be noted that doses (mean 205.8 mg) were high for
tinuation phase, the assessment of adverse events
adolescents. In six comparator studies submitted by SKB as part of their 1991 approval NDA for paroxetine in
throws up a methodological difficulty not yet
adults, mean imipramine dose overall was 140 mg, with mean endpoint dose of 170 mg.25
addressed by groups such as CONSORT. If a study has
only an acute phase, then all adverse events are
Coding and representation of adverse event data
counted for all patients receiving treatment as well as
Table 5 presents the number of adverse events found in in any taper phase, and often for a 30 day follow-up
this study summarised by system organ class (SOC), period. When a study has a continuation phase, the
firstly as coded by SKB using ADECS, secondly as taper and 30 day follow-up periods are displaced. To
reported by Keller and colleagues (who reported only ensure comparable analysis of all participants, we tal-
adverse events that occurred at frequency of more than lied the adverse events across the acute phase and
5%), and thirdly as coded by us using MedDRA. Some both taper and follow-up phases, whether displaced or
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
table 5 adverse events in sKb clinical study report (Csr) (aDeCs coded), Keller and colleagues (aDeCs coded), and riat
reanalysis (MedDra coded) in study 329
adverse event (system organ class)
*Coded with ADECS (adverse drug events coding system). While in CSR (table 14.2.1—it is not clear whether this includes taper phase), headaches were
included in "body as whole"; in paper by Kel er and col eagues, adverse events "headache" and "dizziness" were grouped with psychiatric adverse
events under heading "nervous system."
†Coded with MedDRA. MedDRA al ows dizziness to be coded under "cardiovascular" or "neurological" SOCs and puts headaches under "neurological"
SOC. See also tables D and E in appendix 2.
not. SKB do not seem to have done this, leading to analysis and compared with what was reported by
some differences in numbers.
Keller and colleagues and documented in the CSR
Figure 4 shows when suicidal and self injurious (table 6).
events occurred.
The full details for patients included in this table can
Table 6 shows the numbers of suicidal and self-in- be found in appendix 3, along with working notes and
jurious behaviours that we identified in our RIAT directions to where in the CSR the key details can be
found. It is possible to take different approaches to
moving taper phase events into the continuation phase
and reviewing the coding for all cases, especially cases
039, 089, and 106, that were designated suicidal and
self injurious behaviours in the RIAT recoding. This
would result in different figures.
There were no noteworthy changes in physiological
data, which are detailed in appendix F (patient data
listings of laboratory tests) in the CSR.
In the CSR, serious adverse events (defined as an event
that "resulted in hospitalization, was associated with
suicidal gestures, or was described by the treating phy-
sician as serious") were reported in 11 patients in the
paroxetine group, five in the imipramine group, and
two in the placebo group. Designating an adverse event
as serious hinged on the judgment of the clinical inves-
tigator. We were therefore unable to make comparable
judgments of seriousness, but there are two other meth-
ods to approach the issue of severity of adverse events.
One is to look at those rated as severe rather than mod-
erate or mild at the time of the event (table 7). A high
number and proportion of severe psychiatric events
occurred in the paroxetine group. In contrast, few of the
table 7 adverse events (aDeCs coded) deemed serious by
Fig 4 timing of suicidal and self injurious events in study
investigator in study 329 and reorganised by riat analysis
329, Keller and colleagues, and riat analysis
to MeDra system organ class (sOC)
adverse event
Paroxetine
imipramine
(system organ class)
table 6 numbers of patients with suicidal and self injurious behaviours in study 329
with different safety methods
Paroxetine (n=93) imipramine (n=95)
Keller and colleagues*
SKB acute from CSR*
RIAT acute and taper from CSR 11
4 (3 definite, 1 possible) 2 (1 definite, 1 possible)
*Kel er and col eagues and CSR mostly reported suicide related events as "emotional lability."
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
continuation phase drug or had a continuation phase
table 8 reasons for withdrawal (86 patients) during acute phase and taper* in
rating. The coding for discontinuation was particularly
ambiguous for this group.
Most patients stopped at this point were designated
reason for withdrawal
by SKB as "lack of efficacy" (table 9). Investigators in
four centres reported lack of efficacy as a reason for
stopping six patients allocated to placebo even though
the HAM-D score was in the responder range and was as
Depression worsening
low as 2 or 3 points in some instances.
In some cases there were clear protocol violations or
factors such as the unavailability of further treatments
(placebo in particular). We recategorised the lack of effi-
cacy dropouts based on factors such as adverse events
and HAM-D scores. Table 9 shows our analysis of rea-
sons for withdrawal at the end of the acute phase.
The protocol for Study 329 called for a taper phase for
Accidental injury
all participants and, in addition, a 30 day follow-up
period for all those who discontinued because of
adverse events. The data in the appendix D of the CSR
Intercurrent illness‡
make it possible to identify adverse events happening
Total adverse events (%)
in the taper and follow-up periods. These data are pre-
sented in table 10.
Non-compliance with
effects of other drugs
Table 11 shows data on the effects of other drugs on the
Recreational drug use
adverse events recorded. Patients taking other drugs
Total protocol violations (%)
Other (%)
had more adverse events than those who were not. This
Lost to fol ow-up
effect was slightly more marked in the placebo group,
and as such works to the apparent benefit of the active
Withdrawn consent
drug treatments in minimising any excess of adverse
events over placebo.
Total dropout rate (%)
*Reported in appendix G (tabulations by patient) from CSR and from appendix H CRFs.
†Patient 329.002.00058 was found to have stopped drug three days before attempting suicide. Original y this
had been classed as "continuation phase" drop out, but we moved it to "30 day discontinuation" period. Reason
Principal findings and comparison with original
for withdrawal was original y "adverse event including intercurrent il ness" but we changed it to "suicide
attempt." Consequently RIAT analysis found total of 86 withdrawals rather than 85 reported in CSR.
Our RIAT analysis of Study 329 showed that neither par-
‡We replaced term "adverse event: intercurrent illness" with more specific adverse event terms.
§Four patients enrol ed in study violated inclusion criterion. Two had cardiovascular problems, one had C-GAS
oxetine nor high dose imipramine was effective in the
score >60, and one was "extremely" suicidal at screening. All four were randomised to placebo. It was unclear
treatment of major depression in adolescents, and there
how to categorise their reasons for discontinuation; we chose "protocol violations."
was a clinically significant increase in harms with both
drugs. This analysis contrasts with both the published
many cardiovascular events in the imipramine group conclusions of Keller and colleagues and the way that
were rated as severe.
the outcomes were reported and interpreted in the CSR.
We analysed and reported Study 329 according to
the original protocol (with approved amendments).
A second method of approaching the issue of severity of Appendix 1 shows the sources of information we used in
adverse events is to look at rates of discontinuation preparing this paper, which should help other research-
because of such events. Table 8 shows reasons for ers who want to access the data to check our analysis or
withdrawal during the acute phase and taper because to interrogate it in other ways. We draw minimal conclu-
of adverse events and other causes. Note that we sions regarding efficacy and harms, inviting others to
examined the case report forms from appendix H for all offer their own analysis.
discontinuations reported in appendix G of the CSR. All
Our re-examination of the data, including a review of
changes of coding for discontinuation are laid out in 34% of the cases, showed no significant discrepancies in
table H in appendix 2.
the primary efficacy data. The marked difference
Consideration of the displaced taper in Study 329 between the efficacy outcomes as reported by us and
revealed a conundrum. In addition to the 86 dropouts those reported by SKB results from the fact that our anal-
from the acute phase noted by SKB, there were 65 drop- ysis kept faith with the protocol's methods and its desig-
outs after ratings were completed at week eight. SKB nation of primary and secondary outcome variables.
regarded these patients as participants in the
The authors/sponsors departed from their study
continuation phase, although none of them took a protocol in the CSR itself by performing pairwise
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
table 9 reasons for withdrawal (65 patients) at end of acute phase according to sKb and riat reanalysis in study 329*
Paroxetine group
imipramine group
Placebo group
(acute completers n=67)
(acute completers n= 56)
(acute completers n=66)
reason for withdrawal
Csr, appendix g
Csr, appendix g
Csr, appendix g
Depression worsening
Homicidal ideation
Total adverse events
Protocol violation
Non-compliance with study treatment
Recreational drug use
PV by investigator
Total protocol violations
lost to follow-up
Total
lack of efficacy
Total
No study drugs available
Moved out of state
Total "other" dropouts
HAM-D=Hamilton depression scale, ADHD=attention-deficit/hyperactivity disorder.
*Total discontinued at week 8 (end of acute phase). CSR and paper by Kel er and col eagues report 86 patients who withdrew in acute phase, but are
silent about these 65 patients who dropped out at end of acute phase.
†After review of reasons for withdrawal from study in the CSR (appendix G), along with review of patient narratives and CRFs where applicable, we
proposed changes to these reasons for withdrawal in a proportion of those discontinued.
table 10 adverse events from taper phase of study 329 according to riat (reanalysis study)*
riat MedDra
reported
riat MedDra
reported
riat MedDra
reported
system organ class (MedDra)
as severe
as severe
as severe
Cardiovascular disorders
Psychiatric disorders
Total adverse events
*SKB did not present ADECS analysis for taper phase in clinical study report.
table 11 use of other drugs in month before enrolment, and incidence of adverse events in study 329
No (%) of patients
Psychiatric adverse events subgroup* (acute+taper)
Total adverse events (acute+taper)
*Psychiatric adverse events included in this subgroup include: abnormal dreams, aggravated depression, agitation, akathisia, anxiety,
depersonalisation, disinhibition, hal ucinations, paranoia, psychosis, suicidal ideation/gesture/attempt.
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
comparisons of two of the three groups when the omni- ness, such as dizziness during paroxetine taper, as neu-
bus ANOVA showed no significance in either the contin- rological, but we have not carried out that more
uous or dichotomous variables. They also reported four complex analysis.
other variables as significant that had not been men-
As reported by Keller and colleagues, dizziness and
tioned in the protocol or its amendments, without any headache comprised 54 of 115 nervous system events in
acknowledgment that these measures were introduced those taking paroxetine (47%), 83 of 135 events in those
post hoc. This contravened provision II of appendix B of taking imipramine (62%), and 50 of 65 events in those
the Study 329 protocol ("Administrative Matters"), taking placebo (77%). The effect of disentangling these
according to which any change to the study protocol was two symptoms from psychiatric adverse events
required to be filed as an amendment/modification.
unmasks a clinically important difference in psychiatric
With regard to adverse events, there were large and adverse event profiles between paroxetine and placebo.
clinically meaningful differences between the data as
There was a major difference between the frequency
analysed by us, those summarised in the CSR using the of suicidal thinking and events reported by Keller and
ADECS methods, and those reported by Keller and col- colleagues and the frequency documented in the CSR,
leagues. These differences arise from inadequate and as shown in table 6.
incomplete entry of data from case report forms to sum-
With regard to dropouts, Keller and colleagues stated
mary data sheets in the CSR, the ADECS coding system that 69% of patients completed the acute phase. Only
used by SKB, and the reporting of these data sheets in 45%, however, went on to the continuation phase,
Keller and colleagues. SKB reported 338 adverse events which has not yet been subject to RIAT analysis.
with paroxetine and Keller and colleagues reported 265,
whereas we identified 481 from our analysis of the CSR, Comparison with other studies
and we found a further 23 that had been missed from Our findings are consistent with those of other studies,
the 93 case report forms that we reviewed.
including a recent examination of 142 studies of six psy-
Another reason why the figures of Keller and col- chotropic drugs for which journal articles and clinical
leagues are lower than ours is because they presented trial summaries were both available. Most deaths
data only for adverse events reported for 5% of patients (94/151, 62%) and suicides (8/15, 53%) reported in trial
or more. For all adverse events combined, their table 3 summaries were not reported in journal articles. Only
reported a burden of adverse events with paroxetine 1.2 one of nine suicides in olanzapine trials was reported in
times that of the burden with placebo. This compares published papers.
with the figure of 1.4 from our RIAT MedDRA coding of
data from the CSR. The figures from CSR and case report reporting of adverse events
forms also differ substantially from other figures quoted Our reanalysis of Study 329 showed considerable varia-
by Keller and colleagues because they did not report a tions in the way adverse events can be reported, demon-
category of psychiatric adverse events, but instead strating several ways in which the analysis and
grouped such events together with "dizziness" and presentation of safety data can influence the apparent
"headache" under the class "nervous system."
safety of a drug. We identified the following potential
MedDRA distinguishes between neurological and barriers to accurate reporting of harms (summarised in
psychiatric system organ classes. We placed headaches box 2).
in the neurological rather than the psychiatric class.
MedDRA allows dizziness to be coded under cardiovas- Use of an idiosyncratic coding system
cular or neurological classes. Given the dose of imipra- The term "emotional lability," as used in SKB's adverse
mine being used, most cases of dizziness seem likely to drug events coding system, masks differences in sui-
be cardiovascular, with Keller and colleagues also cidal behaviour between paroxetine and placebo.
reporting a high rate of postural hypotension on imipra-
mine. We have thus coded all dizziness under cardio- Failure to transcribe all adverse events from clinical
vascular rather than neurological. There is scope for record to adverse event database
others accessing the data to parse out whether there is Our review of case report forms disclosed significant
sufficient information to code certain instances of dizzi- under-recording of adverse events.
Filtering data on adverse events through statistical
box 2 Potential barriers to accurate reporting of harms
• Use of an idiosyncratic coding system
Keller and colleagues (and GSK in subsequent corre-
• Failure to transcribe all adverse events from clinical record to adverse event database
spondence) ignored unfavourable harms data on the
• Filtering data on adverse events through statistical techniques
grounds that the difference between paroxetine and
• Restriction of reporting to events that occurred above a given frequency in any one group
placebo was not statistically significant, at odds with
• Coding event under different headings for different patients (dilution)
the SKB protocol that called for primary comparisons to
• Grouping of adverse events
be made using descriptive statistics. In our opinion, sta-
• Insufficient consideration of severity
tistically significant or not, all relevant primary and
• Coding of relatedness to study medication
secondary outcomes and harms outcomes should be
• Masking effects of concomitant drugs
explicitly reported. Testing for statistical significance
• Ignoring effects of drug withdrawal
is most appropriately undertaken for the primary
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
outcome measures as study power is based on these. We Insufficient consideration of severity
have not undertaken statistical tests for harms as we In addition to coding adverse events, investigators rate
know of no valid way of interpreting them. To get them for severity. If no attempt is made to take severity
away from a dichotomous (significant/non-signifi- into account and include it in reporting, readers could
cant) presentation of evidence, we opted to present all get the impression that there was an equal burden of
original and recoded evidence to allow readers their adverse events in each arm, when in fact all events in
own interpretation. The data presented in appendix 2 one arm might be severe and enduring while those in
and related worksheets lodgthe other might be mild and transient.
will, however, readily permit other approaches to data
One way to manage this is to look specifically at those
analysis for those interested, and we welcome other patients who drop out of the study because of adverse
events. Another method is to report those adverse
events coded as severe for each drug group separately
Restriction of reporting to events that occurred above from those coded as mild or moderate. We used both
a given frequency in any one group
approaches (see tables 7 and 8).
In the paper by Keller and colleagues, reporting only
adverse events that occurred in more than 5% of Coding of relatedness to study medication
patients obscured the harms burden. In contrast, we Judgments by investigators as to whether an adverse
report all adverse events that have been recorded. These event is related to the drug can lead to discounting the
are available in table E in appendix 2.
importance of an effect. We have included these judg-
ments in the worksheets lodg,
Coding event under different headings for different
but we have not analysed them because it became
clear that the blinding had been broken in several
The effect of reporting only adverse events that have a cases before relatedness was adjudicated by the origi-
frequency of more than 5% is compounded when, for nal investigators and because some judgments were
instance, agitation might be coded under agitation, anx- implausible. For instance, it is documented on page
iety, nervousness, hyperkinesis, and emotional lability; 279 in the CSR that an investigator, knowing the
thus, a problem occurring at a rate of >10% could vanish patient was on placebo, declared that a suicidal event
by being coded under different subheadings such that was "definitely related to treatment" on the grounds
none of these reach a threshold rate of 5%.
that "the worsening of depression and suicidal
Aside from making all the data available so that oth- thought were life threatening and definitely related to
ers can scrutinise it, one way to compensate for this study medication [known to be placebo] in that there
possibility is to present all the data in broader system was a lack of effect." Notably, of the 11 patients with
organ class groups. MedDRA offers the following higher serious adverse events on paroxetine (compared with
levels: psychiatric, cardiovascular, gastrointestinal, two on placebo) reported in the paper by Keller and
respiratory, and other. In table E in appendix 2, the colleagues, only one "was considered by the treating
adverse events coded here under "other" are broken investigator to be related to paroxetine treatment,"
down under the additional MedDRA SOC headings, thus dismissing the clinically important difference
including general, nervous system, metabolic, and between the paroxetine and placebo groups for seri-
ous adverse events.
Grouping of adverse events
Masking effects of concomitant drugs
Even when they are presented in broader system In almost all trials, patients will be taking concomitant
groups, grouping common and benign symptoms with drugs. The adverse events from these other drugs will
more important ones can mask safety issues. For tend to obscure differences between active drug treat-
example, in the paper by Keller and colleagues, com- ment and placebo. This might be an important factor in
mon adverse events such as dizziness and headaches trials of treatments such as statins, where patients are
are grouped with psychiatric adverse events in the often taking multiple drugs.
"nervous system" SOC heading. As these adverse
Accordingly, we also compared the incidence of
events are common across treatment arms, this group- adverse events in patients taking concomitant drugs
ing has the effect of diluting the difference in psychiat- with the incidence in those not taking other drugs.
ric side effects between paroxetine, imipramine, and Other drugs were instituted in the course of the study
that we have not analysed, but the data are available
We followed MedDRA in reporting dizziness under in tables K and L in appendix 2 and worksheets lodged
"cardiovascular" events and headache under "ner- ppendix B from the CSR.
vous system." There might be better categorisations; There are several other angles in the data available at
our grouping is provisional rather than strategic. In ould be further explored,
table E in appendix 2, we have listed all events coded such as the effects of withdrawal of concomitant drugs
under each system organ class heading, and we invite on adverse event profiles, as the spreadsheets
others to further explore these issues, including alter- document the day of onset of adverse events and the
native higher level categorisation of these adverse dates of starting or stopping any concomitant drugs.
Another option to explore is the possibility of any
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
prescribing cascades triggered by adverse events
The RIAT analysis broke new ground but was limited
related to study drugs.
in that we could check only 34% (93/275) of case report
forms. Time and resources prevented access to all forms
Ignoring effects of drug withdrawal
because of the difficulties in using the portal for access-
The protocol included a taper phase lasting 7-17 days ing the study data and because considerable amounts
that investigators were encouraged to adhere to, even of data were missing.
in patients who discontinued because of adverse
The analysis generated a useful taxonomy of poten-
events. The original paper did not analyse these data tial barriers to accurate reporting of adverse events and,
separately. The increased rates of psychiatric adverse even allowing for the above limitations, showed the
events that emerged during the discontinuation phase value of permitting access to data.
in our analysis are consistent with dependence on
and withdrawal from paroxetine, as reported by Conclusion and implications for research and policy
Contrary to the original report by Keller and colleagues,
our reanalysis of Study 329 showed no advantage of
riat process
paroxetine or imipramine over placebo in adolescents
This RIAT exercise proved to be extremely demanding with symptoms of depression on any of the prespecified
of resources. We have logged over 250 000 words of variables. The extent of the clinically significant
email correspondence among the team over two increases in adverse events in the paroxetine and imip-
years. The single screen remote desktop interface ramine arms, including serious, severe, and suicide
(that we called the "periscope") proved to be an enor- related adverse events, became apparent only when the
mous challenge. The efficacy analysis required that data were made available for reanalysis. Researchers
multiple spreadsheet tables were open simultane- and clinicians should recognise the potential biases in
ously, with much copying, pasting, and cross check- published research, including the potential barriers to
ing, and the space was highly restrictive. Gaining accurate reporting of harms that we have identified.
access to the case report forms required extensive cor- Regulatory authorities should mandate accessibility of
respondence with GSK. Although GSK ultimately data and protocols.
provided case report forms, they were even harder to
As with most scientific papers, Keller and colleagues
manage, given that we could see only one page at a convey an impression that "the data have spoken." This
time. It required about a thousand hours to examine authoritative stance is possible only in the absence of
only a third of the case report forms. Being unable to access to the data. When the data become accessible to
print them was a considerable handicap. There were others, it becomes clear that scientific authorship is
no means to prepare packets for multiple indepen- provisional rather than authoritative.
dent coders, to decrease bias; to make annotations or We thank Carys Hogan for database work and Tom Jefferson and
use margin comments; or to sort and collate the Leemon McHenry for comments on earlier drafts.
adverse event reports. Our experience highlights that The SmithKline Beecham study was registered as No 29060/329. The
hard copies as well as electronic copies are crucial for protocol was SmithKline Beecham study 29060/329, final clinical
report (acute phase), appendix a, Protocol, frstudy
an enterprise like this.
was funded by SmithKline Beecham. The data analysis protocol for
Our analysis indicates that although CSRs are useful, RIAT reanalysis was submitted to GSK on 28 October 2013 and
and in this case all that was needed to reanalyse approved by GSK on 4 December 2013.
efficacy, analysis of adverse events requires access to Contributors: Conception/design of the work: DH, JJ, JMN. Acquisition
individual patient level data in case report forms.
of data: JJ (negotiation with GSK); CT and EA-J (RIATAR); JMN (efficacy
data using GSK online remote system); JLN (harms data using GSK
Because we have been breaking new ground, we have online remote system). Data analysis: JMN (efficacy); JLN and DH
not had precedents to call on in analysis and reporting. (harms). Data interpretation: all authors. Drafting the work and
We await with interest other efforts to do something revising it critically for important intellectual content, final approval of
the version to be published: all authors. Al authors agree to be
accountable for all aspects of the work. JJ is guarantor. The first four
authors made equal contribution to the paper.
strengths and limitations of this study
Funding: This research received no specific grant from any funding
Study 329 was a randomised controlled trial with a rea- agency in the public, commercial, or not-for-profit sectors.
sonable sample size. There was, however, evidence of Competing interests: Al authors have completed the ICMJE uniform
disclosure form at declare: DH
protocol violations, including some cases of breaking of has been and is an expert witness for plaintiffs in legal cases involving
blinding. The coding of adverse events by the original GlaxoSmithKline's drug paroxetine. He is also a witness for plaintiffs in
investigators raised the possibility that some other data actions involving other antidepressants with the same mechanism of
action as paroxetine. JJ has been paid by Baum, Hedlund, Aristei and
might be unreliable.
Goldman, Los Angeles, CA, to provide expert analysis and opinion
The trial lasted for only eight weeks. Participants had about documents obtained from GlaxoSmithKline in a class action
relatively chronic depression (mean duration more than over Study 329, and from Forest in relation to paediatric citalopram
randomised control ed trials.
one year), which would limit the generalisability of the Ethical approval: The protocol and statement of informed consent
results, particularly in primary care, because many were approved by an institutional review board before each centre's
cases of adolescent depression have shorter dura- initiation, in compliance with 21 United States Code of Federal
tionseneralisability to primary care would also be Regulations (CFR) Part 56. Written informed consent was obtained from
each patient before entry into the study, in compliance with 21 CFR Part
limited by the fact that participants were recruited 50. Case report forms were provided for each patient's data to be
through tertiary settings.
recorded (Final Clinical Report page 000030). The sample informed
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
consent is provided in the appendix to the protocol, appendix C, pp
14 Diagnostic and statistical manual of mental disorders, third edition,
000590-4. No further information is available regarding the particular
revised (DSM-III-R). American Psychiatric Association, 1987.
institutional review board that approved the study.
15 Fawcett J, Epstein P, Fiester SJ, Elkin I, Autry JH. Clinical management—
imipramine/placebo administration manual. NIMH Treatment of
Transparency: JJ affirms that the manuscript is an honest, accurate,
Depression Col aborative Research Program. Psychopharmacol Bull
and transparent account of the study being reported; that no
important aspects of the study have been omitted; and that any
16 Hamilton M. Development of a rating scale for primary depressive
discrepancies from the study as planned (and, if relevant, registered)
illness. Br J Soc Clin Psychol 1967;6:278-96.
have been explained.
17 Sigafoos AD, Feinstein CB, Damond M, Reiss D. The measurement of
Data sharing: Clinical study reports, detailed data tables, and
behavioral autonomy in adolescence: the Autonomous Functioning
programming code are available on the Dryad Digital Repository
Checklist. Adolesc Psychiatry 1988;15:432-62.
(http://dx.doi.org/10.5061/dryad.bv8j6) and at www.Study329.org/
18 SKB. Draft Minutes: 4/22/97 Teleconference. Paroxetine Study 329
This is an Open Access article distributed in accordance with the
Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license,
19 GlaxoSmithKline, Paroxetine—paediatric and adolescent patients.
which permits others to distribute, remix, adapt, build upon this work
non-commercially, and license their derivative works on different terms,
provided the original work is properly cited and the use is non-
20 Winter C. MedDRA in clinical trials—industry perspective SFDA‐ICH
MedDRA Workshop, Beijing, 13‐14 May
Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring
invisible and abandoned trials: a call for people to publish the
findings. BMJ 2013;346:f2865.
21 Jureidini JN, Nardo JM. Inadequacy of remote desktop interface
2 Keller MB, Ryan ND, Strober M, et al. Efficacy of paroxetine in the
for independent reanalysis of data from drug trials. BMJ
treatment of adolescent major depression: a randomized, control ed
trial. J Am Acad Child Adolesc Psychiatry 2001;40:762-72.
22 Fitzgerald K, Healy D. Dystonias and dyskinesias of the jaw associated
3 McHenry L, Jureidini J. Industry-sponsored ghostwriting in clinical trial
with the use of SSRIs. Human Psychopharmacol 1995;10:215-20.
reporting: a case study. Account Res 2008;15:152-67.
23 Kline RB. Beyond significance testing. Statistics reform in the behavioral
4 Jureidini J, McHenry L, Mansfield P. Clinical trials and drug promotion:
sciences. 2nd ed. American Psychological Association, 2013.
selective reporting of study 329. Int J Risk Saf Med 2008;20:73-81.
24 R Core Team. R: a language and environment for statistical computing.
Jureidini J, McHenry L. Conflicted medical journals and the failure of
R Foundation for Statistical Computin
trust. Account Res 2011;18:45-54.
25 Brecher M. Review and evaluation of clinical data. Original NDA
6 Kraus JE, letter to Jon Ju
20—031. Paroxetine (Aropax). Efficacy review. SmithKline Beecham
Treasure T, Monson K, Fiorentino F, Russell C. The CEA Second-Look
26 Hughes S, Cohen, D, Jaggi R. Differences in reporting serious adverse
Trial: a randomised control ed trial of carcinoembryonic antigen
events in industry sponsored clinical trial registries and journal
prompted reoperation for recurrent colorectal cancer. BMJ Open
articles on antidepressant and antipsychotic drugs: a cross-sectional
2014 May 13;4:e004385.
study. BMJ Open 2014;4:e005535.
8 Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of randomized
27 Maund E, Tendal B, Hróbjartsson A, et al. Benefits and harms in
clinical trial data. JAMA 2014;312:1024-32.
clinical trials of duloxetine for treatment of major depressive disorder:
9 SmithKline Beecham. A multi-center, double-blind, placebo control ed
comparison of clinical study reports, trial registries, and publications.
study of paroxetine and imipramine in adolescents with unipolar
major depression –acute phase, Final clinical report
28 Lewinsohn PM, Clarke GN, Seeley JR, Rohde P. Major depression in
community adolescents: age at onset, episode duration, and time to
10 Healthy Skepticism International News. Paxil Study 329: paroxetine vs
recurrence. J Am Acad Child Adolesc Psychiatry 1994;33:809-18.
imipramine vs placebo in adolesc
29 Fava M. Prospective studies of adverse events related to antidepressant
discontinuation. J Clin Psychiatry 2006;67(suppl 4):14-21.
11 SAS Solutions OnDemand
BMJ Publishing Group Ltd 2015
12 Correspondence between Jureidini and GSK. Rapid responses to putting
GlaxoSmithKline to the test over paroxetine. BMJ 2013;347
Appendix 1: RIAT audit record
13 SmithKline Beecham. A multi-center, double-blind, placebo control ed
Appendix 2: Supplementary tables A-M
study of paroxetine and imipramine in adolescents with unipolar
Appendix 3: Supplementary information on suicidal
major depression 1993/amended
and self-injurious behaviours in Study 329
No commercial reuse: See rights and reprints http://www.bmj.com/permissions
Source: http://www.klinikfarmakoloji.com/files/Study-329-Final.pdf
The Open Polytechnic Working Papers are a series of peer-reviewed academic and professional papers published in order to stimulate discussion and comment. Many papers are works in progress and feedback is therefore welcomed. This work may be cited as: Isakovic-Cocker, M. Integrative metalearning approach could facilitate improved rehabilitation outcomes for people with severe spinal cord injury, The Open Polytechnic of New Zealand, Working Paper, May 2007.
A Housing OverviewNeil Fox and Will SheppardWith cattle coming inside over the winter, their management becomes more involved than when they areat grass and if you get it wrong you could potentially lose a lot of money. We thought it would be useful togive you a short guide on the type of cases we are commonly presented with affecting housed animalsand how to deal with them should they arise on your farm.