The Open Polytechnic Working Papers are a series of peer-reviewed academic and professional papers published in order to stimulate discussion and comment. Many papers are works in progress and feedback is therefore welcomed. This work may be cited as: Isakovic-Cocker, M. Integrative metalearning approach could facilitate improved rehabilitation outcomes for people with severe spinal cord injury, The Open Polytechnic of New Zealand, Working Paper, May 2007.
Restoring Study 329: efficacy and harms of paroxetine and imipramine in treatment of major depression in adolescenceJoanna Le Noury,1 John M Nardo,2 David Healy,1 Jon Jureidini,3 Melissa Raven,3 Catalin Tufanaru,4 Elia Abi-Jaoude5 1School of Medical Sciences, (HAM-D score ≤8 or ≥50% reduction in baseline HAM-D) Bangor University, Bangor, at acute endpoint. Prespecified secondary outcomes To reanalyse SmithKline Beecham's Study 329 were changes from baseline to endpoint in depression 2Emory University, Atlanta, (published by Keller and colleagues in 2001), the items in K-SADS-L, clinical global impression, primary objective of which was to compare the efficacy autonomous functioning checklist, self-perception 3Critical and Ethical Mental Health Research Group, and safety of paroxetine and imipramine with placebo profile, and sickness impact scale; predictors of Robinson Research Institute, in the treatment of adolescents with unipolar major response; and number of patients who relapse during University of Adelaide, depression. The reanalysis under the restoring invisible the maintenance phase. Adverse experiences were to Adelaide, South Australia, and abandoned trials (RIAT) initiative was done to see be compared primarily by using descriptive statistics. 4Joanna Briggs Institute, Faculty whether access to and reanalysis of a ful dataset from No coding dictionary was prespecified.
of Health Sciences, University of a randomised controlled trial would have clinically Adelaide, Adelaide, South relevant implications for evidence based medicine.
Australia, Australia The efficacy of paroxetine and imipramine was not statistically or clinically significantly different from 5Department of Psychiatry, The Hospital for Sick Children, Double blind randomised placebo controlled trial.
placebo for any prespecified primary or secondary University of Toronto, Toronto, efficacy outcome. HAM-D scores decreased by 10.7 12 North American academic psychiatry centres, from (least squares mean) (95% confidence interval 9.1 to Correspondence to: J Jureidini 12.3), 9.0 (7.4 to 10.5), and 9.1 (7.5 to 10.7) points, 20 April 1994 to 15 February 1998.
respectively, for the paroxetine, imipramine and Additional material is published online only. To view please visit placebo groups (P=0.20). There were clinically 275 adolescents with major depression of at least the journal online (http://dx.doi.
significant increases in harms, including suicidal eight weeks in duration. Exclusion criteria included a ideation and behaviour and other serious adverse Cite this as: BMJ 2015;351:h4320
range of comorbid psychiatric and medical disorders events in the paroxetine group and cardiovascular doi: 10.1136/bmj.h4320 and suicidality.
problems in the imipramine group.
Accepted: 03 August 2015 Participants were randomised to eight weeks double Neither paroxetine nor high dose imipramine showed blind treatment with paroxetine (20-40 mg), efficacy for major depression in adolescents, and there imipramine (200-300 mg), or placebo.
was an increase in harms with both drugs. Access to Main OutCOMe Measures
primary data from trials has important implications for The prespecified primary efficacy variables were both clinical practice and research, including that change from baseline to the end of the eight week published conclusions about efficacy and safety acute treatment phase in total Hamilton depression should not be read as authoritative. The reanalysis of scale (HAM-D) score and the proportion of responders Study 329 illustrates the necessity of making primary trial data and protocols available to increase the rigour of the evidence base.
WhAT IS AlReAdy knoWn on ThIS TopIC
There is a lack of access to data from most clinical randomised controlled trials,
making it difficult to detect biased reporting In 2013, in the face of the selective reporting of outcomes In the absence of access to primary data, misleading conclusions in publications of of randomised controlled trials, an international group of those trials can seem definitive researchers called on funders and investigators of aban- doned (unpublished) or misreported trials to publish SmithKline Beecham's Study 329, an influential trial that reported that paroxetine undisclosed outcomes or correct misleading publica- was safe and effective for adolescents, is one such study tions This initiative was called "restoring invisible and WhAT ThIS STudy AddS
abandoned trials" (RIAT). The researchers identified On the basis of access to the original data from Study 329, we report a reanalysis many trials requiring restoration and emailed the funders, that concludes that paroxetine was ineffective and unsafe in this study asking them to signal their intention to publish the unpub- lished trials or publish corrected versions of misreported Access to primary data makes clear the many ways in which data can be analysed trials. If funders and investigators failed to undertake to and represented, showing the importance of access to data and the value of correct a trial that had been identified as unpublished or reanalysis of trials misreported, independent groups were encouraged to There are important implications for clinical practice, research, regulation of trials, publish an accurate representation of the clinical trial licensing of drugs, and the sociology and philosophy of science based on the relevant regulatory information.
Our reanalysis required development of methods that could be adapted for future The current article represents a RIAT publication reanalyses of randomised controlled trials of Study 329. The original study was funded by the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
SmithKline Beecham (SKB; subsequently GlaxoSmith- information on the safety profile of paroxetine and imip- Kline, GSK). We acknowledge the work of the original ramine when these drugs were given for "an extended investigators. This double blinded randomised con- period of time"; and to estimate the rate of relapse among trolled trial to evaluate the efficacy and safety of parox- patients who responded to imipramine, paroxetine, and etine and imipramine compared with placebo for placebo and were maintained on treatment. Study enrol- adolescents diagnosed with major depression was ment took place between April 1994 and March 1997.
reported in the Journal of the American Academy of The first RIAT trial publication was a surgery trial that Child and Adolescent Psychiatry (JAACAP) in 2001, with had been only partly published before. Few previously Martin Keller as the primary author. The RIAT published randomised controlled trials have ever been researchers identified Study 329 as an example of a subsequently reported in published papers by different misreported trial in need of restoration. The article by teams of authors Keller and colleagues, which was largely ghostwritten, claimed efficacy and safety for paroxetine that was at Methods
This is problematic because the We reanalysed the data from Study 329 according to the article has been influential in the literature supporting RIAT recommendations. To this end, we used the clini- the use of antidepressants in adolescents.
cal study report (SKB's "final clinical report"), including On 14 June 2013, the RIAT researchers asked GSK appendices A-G, which are publically available on the whether it had any intention to restore any of the trials GSK websit other publically available documents, it sponsored, including Study 329. GSK did not signal and the individual participant data accessed through any intent to publish a corrected version of any of its SAS Solutions OnDemand websit on which GSK sub- trials. In later correspondence, GSK stated that the sequently also posted some Study 329 documents (avail- study by Keller and colleagues "accurately reflects the able only to users approved by GSK). After neg honestly-held views of the clinical investigator authors" GSK posted about 77 000 pages of de-identified individual and that GSK did "not agree that the article is false, case report forms (appendix H) on that website. We used fraudulent or misleading a tool for documenting the transformation from regula- Study 329 was a multicentre eight week double blind tory documents to journal publication, based on the CON- randomised controlled trial (acute phase), followed by a SORT 2010 checklist of information to include when six month continuation phase. SKB's stated primary reporting a randomised trial. The audit record, includ- objective was to examine the efficacy and safety of imip- ing a table of sources of data consulted in preparing ramine and paroxetine compared with placebo in the each part of this paper, is available in appendix 1.
treatment of adolescents with unipolar major depres- Except where indicated, in accordance with RIAT sion. Secondary objectives were to identify predictors of recommendations, our methods are those set out in the treatment outcomes across clinical subtypes; to provide 1994-96 protocol for Study 329.hen the meth- ods used and published by Keller and colleagues diverged from the protocol, we followed the original protocol. box 1 study eligibility criteria
Because the protocol specified method of correction for missing values—last observation carried forward—has • Adolescents aged 12-18 who met DSM-III-R criteria for major depression for at least been questioned in the intervening years, we also included a more modern method—multiple imputation— • Severity score <60 on the children's global assessment scale (CGAS) at the request of the BMJ peer reviewers. This is a post hoc • Score ≥12 on the Hamilton depression scale (17 item) (HAM-D) method added for comparison only and is not part of our • Medically healthy formal reanalysis. When the protocol was not specific, we • IQ ≥80 (based on Peabody picture vocabulary test) chose by consensus standard methods that best pre- sented the data. The original 1993 protocol had minor amendments in 1994 and 1996 (replacement of the • Current or past DSM-III-R diagnosis of bipolar disorder, schizoaffective disorder, anorexia nervosa, bulimia, alcohol or drug abuse/dependence, obsessive- Schedule for Affective Disorders and Schizophrenia for compulsive disorder, autism/pervasive mental disorder, or organic psychiatric Adolescents-Present Version with the Lifetime Version (K-SADS-L) and reduction in required sample size). Fur- • Current (within 12 months) DSM-III-R diagnosis of post-traumatic stress disorder thermore, the clinical study report (CSR) reported some • Adequate trial of an antidepressant within six months (at least four weeks' procedures that varied from those specified in the proto- treatment with an adequate dose of antidepressant) col. We have noted variations that we considered relevant.
• Suicidal ideation with a definite plan, suicide attempt during current depressive episode, or history of suicide attempt by drug overdose • Medical illness that contraindicates the use of heterocyclic antidepressants The original study recruited 275 adolescents aged 12-18 • Current use of psychotropic drugs (including anxiolytics, antipsychotics, mood who met DSM-IV criteriafor a current episode of stabilisers), or illicit drugs major depression of at least eight weeks' duration • Organic brain disease, epilepsy, or "mental retardation" (the protocol specified DSM-III-R criteria, which are • Pregnancy or lactation similar). Box 1 lists the eligibility criteria.
• Sexually active females not using reliable contraception An unknown number of patients (not disclosed in the • Use of an investigational drug within previous 30 days or five half lives of the available documents) identified by telephone screening investigation drug as potential participants were subsequently evaluated doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
at the study site by a senior clinician (psychiatrist or of 4 in the HAM-D total score from baseline to endpoint, psychologist). Multiple meetings and teleconferences specified in the protocol to be large enough to be clini- were held by the sponsoring company with site study cally meaningful, considering a standard deviation of investigators to ensure standardisation across sites. 10. No allowance was made in the power calculation for Patients and parents were interviewed separately with attrition (anticipated dropout rate) or non-compliance the K-SADS-L. After this initial assessment, the patient during the study.
and parent both signed the study informed consent Recruitment was slower than expected, and report- form; there was no mention of a separate assent form in edly supplies of treatment (mainly placebo) ran short the protocol or in the CSR. A screening period of seven due to exceeding the expiry date. The researchers car- to ten days was used to obtain past clinical records and ried out a midcourse evaluation of 189 patients, without to document that the depressive symptoms were stable. breaking the blinding, which showed less variability in At the end of the screening period, only patients con- HAM-D scores (SD 8) than expected. Therefore the tinuing to meet the inclusion criteria (DSM-III-R major recruitment target was reduced to 275 on the grounds depression and the HAM-D total score ≥12) were ran- that it would have no negative impact on the estimated domised. There was no placebo lead-in phase.
80% power required to detect a 4 point difference There were originally six study sites, but this was between placebo and active drug groups.
increased to 12 (10 in the United States and two in Can- ada). The centres were affiliated with either a university randomisation
or a hospital psychiatry department and had experi- A computer generated randomisation list of 360 num- ence with adolescent patients. The investigators were bers for the acute phase was generated and held by selected for their interest in the study and their ability SKB. According to the CSR, treatments were balanced in to recruit study patients.
blocks of six consecutive patients; however, there is an The recruitment period ran from 20 April 1994 until 15 inconsistency in that appendix A randomisation code March 1997, and the acute phase was completed on 7 details block sizes of both six and eight. Each investiga- May 1997. In a small number of patients, 30 day fol- tor was allocated a block of consecutively numbered low-up data for cases that went into the continuation treatment packs, and patients were assigned treatment phase were collected into February 1998.
numbers in strict sequential order. Patients were ran- domised in a 1:1:1 ratio to treatment with paroxetine, imipramine, or placebo.
So far as we can ascertain, there was no patient involve- ment in SKB's study design.
Paroxetine was supplied as film coated, capsule shaped yellow (10 mg) and pink (20 mg) tablets. Imipramine The study drug was provided to patients in weekly blis- (50 mg) was bought commercially and supplied as ter packs. Patients were instructed to take the drug green film coated round 50 mg tablets. "Paroxetine pla- twice daily. There were six dosing levels. Over the first cebos" matched the paroxetine 20 mg tablets, and four weeks, all patients were titrated to level four, corre- "imipramine placebos" matched the imipramine tab- sponding to 20 mg paroxetine or 200 mg imipramine, lets. All tablets were over-encapsulated in bluish-green regardless of response. Non-responders (those failing to capsules to preserve blinding.
reach responder criteria) could be titrated up to level The blinding was to be broken only in the event of a five or six over the next four weeks. This corresponds to serious adverse event that the investigator thought maximum doses of 60 mg paroxetine and 300 mg imip- could not be adequately treated without knowing the identity of the allocated study treatment. The identity of Compliance with treatment was evaluated from the the study treatment was not otherwise to be disclosed to number of capsules dispensed, taken, and returned. the investigator or SKB staff associated with the study.
Non-compliance was defined as taking less than 80% or more than 120% of the number of capsules, assessed Outcomes
from the numbers expected to be returned at two con- Patients were evaluated weekly for the following out- secutive visits, and resulted in withdrawal. Any patient come variables during the eight week duration of the missing two consecutive visits was also withdrawn from acute treatment phase.
Patients were provided with 45 minute weekly ses- Primary efficacy variables sions of supportive psychotherapy, primarily for the The prespecified primary efficacy variables were purpose of assessing the effects of treatment.
change in total scor from the beginning of the treatment phase to the endpoint of the acute sample size
phase and the proportion of responders at the end of The acute phase of the trial was initially based on a the eight week acute treatment phase (longer than power analysis that indicated that a sample size of 100 many antidepressant trials). Responders were defined patients per treatment group was required to have a sta- as patients who had ≥50% reduction in the HAM-D or tistical power of 80% for a two tailed α of 0.05 and an a HAM-D score of ≤8. (Scores on the HAM-D can vary effect size of 0.40. This effect size entailed a difference from 0 to 52.) the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
Secondary efficacy variables final analysis, there were no statistically or clinically The prespecified secondary efficacy variables were: significant findings for any outcome variable, so correc- tions were not needed for this analysis.
• Changes from baseline to endpoint in: Depression items in K-SADS-L Statistical testing Clinical global impression (CGI) The protocol called for ANOVA testing (generalised lin- Autonomous functioning chec ear model) for continuous variables using a model that Self perception profile included the effects of site, treatment, and site × treat- Sickness impact scale.
ment interaction, with the latter dropped if P≥0.10. • Predictors of response (endogenous subtypes, age, Logistical regression (2×3 χ2) was prescribed for cate- previous episodes, duration and severity of present gorical variables under the same model. Both methods episode, comorbidity with separate anxiety, atten- begin with an omnibus statistic for the overall signifi- tion deficit, and conduct disorder) cance of the dataset, then progress to pairwise testing • The number of patients who relapsed during the if, and only if, the omnibus statistic meets α=0.05. maintenance phase (referred to in the CSR and in this Yet all statistical outcomes in the CSR and published paper as "continuation phase").
paper were reported only as the pairwise values for only two of the three possible comparisons (paroxetine v pla- Both before and after breaking the blind, however, cebo and imipramine v placebo), with no mention of the the sponsors made changes to the secondary outcomes omnibus statistic. Therefore, we conducted the required as previousl We could not find any docu- omnibus analyses, with negative results as shown. The ment that provided any scientific rationale for these pairwise values are available in table A in appendix 2.
post hoc changes,comes are therefore not reported in this paper.
Missing values The protocol called for evaluation of the observed case Challenges in carrying out riat
and last observation carried forward datasets, with the To our knowledge this is the first RIAT analysis of a mis- latter being definitive. The last observation carried for- reported trial by an external team of authors, so there ward method for correcting missing values was the are no clear precedents or guides. Challenges we have standard at the time the study was conducted. It contin- encountered included: ues to be widely used, although newer models such as multiple imputation or mixed models are superior. Potential or perceived bias We chose to adhere to the protocol and use the last A RIAT report is not intended to be a critique of a previ- observation carried forward method, including multi- ous publication. The point is rather to produce a thor- ple imputation for comparison only.
ough independent analysis of a trial that has remained unpublished or called into question. We acknowledge, Outcome variables not specified in protocol however, that any RIAT team might be seen as having an There were four outcome variables in the CSR and in intrinsic bias in that questioning the earlier published the published paper that were not specified in the proto- conclusions is what brought some members of the team col. These were the only outcome measures reported as together. Consequently, we took all appropriate proce- significant. They were not included in any version of the dural steps to avoid such putative bias. In addition, we protocol as amendments (despite other amendments), have made the data available for others to analyse. nor were they submitted to the institutional review board. The CSR (section 3.9.1) states they were part of an "analy- Correction for testing multiple variables sis plan" developed some two months before the blinding We had multiple sources of information: the protocol; was broken. No such plan appears in the CSR, and we the published paper; the documents posted on the GSK have no contemporaneous documentation of that claim, website including the CSR and individual patient data; despite having repeatedly requested it from GSK.
and the raw primary data in the case report forms pro- vided by GSK on a remote desktop for this project. The Conclusions protocol declared two primary and six secondary vari- We decided that the best and most unbiased course of ables for the three treatment groups in two differing action was to analyse the efficacy data in the individual datasets (observed case and last observation carried patient data based on the last guaranteed a priori ver- forward). The CSR contained statistical comparisons on sion of SKB's own protocol (1994, amended in 1996 to 28 discrete variables using two comparisons (paroxe- accept a reduced sample size). Although the protocol tine v placebo and imipramine v placebo) in the two omitted a discussion of corrections that we would have datasets (observed case and last observation carried thought necessary, correction for multiple variables is forward). The published paper listed eight variables designed to prevent false positives and there were no with two statistical comparisons each in one dataset positives. We agreed with the statistical mandates of the (last observation carried forward). The authors of the protocol, but though we regarded pairwise comparisons original paper, however, did not deal with the need for in the absence of overall significance as inappropriate, corrections for multiple variables—a standard require- we recognise that this is not a universal opinion, so we ment when there are multiple outcome measures. In the included the data in table A in appendix 2.
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
Finally, although investigators can explore the data Data on adverse events come from the CSR lodged on however they want, additional outcome variables out- GSK's websit primarily in appendix D. Appendix B side those in the protocol cannot be legitimately declared provides details of concomitant drugs. Additional infor- once the study is underway, except as "exploratory vari- mation was available from the summary narratives in ables"—appropriate for the discussion or as material for the body of the CSR for patients who had adverse events further study but not for the main analysis. The a priori that were designated as serious or led to withdrawal. protocol and blinding are the bedrock of a randomised (Of the 11 patients taking paroxetine who experienced controlled trial, guaranteeing that there is not even the adverse events designated as serious, nine discontin- possibility of the HARK phenomenon ("hypothesis after ued treatment because of these events.) The large num- results known"). Though we can readily show that none ber of other patients discontinued because of adverse of the reportedly "positive" four non-protocol outcome events that were not regarded as serious, or discontin- variables stands up to scrutiny, the primary mandate of ued for lack of efficacy or protocol violations, however, the RIAT enterprise is to reaffirm essential practices in did not generate patient narratives. The tables in randomised controlled trials, so we did not include these appendix D of the CSR provide the verbatim terms used variables in our efficacy analysis.
by the blinded investigators, along with preferred terms as coded by SKB using the adverse drug events coding system (ADECS) dictionary. Appendix D also includes An adverse experience/event was defined in the proto- ratings of severity and ratings of relatedness. We used col (page 18) as "any noxious, pathologic or unintended the Medical Dictionary for Regulatory Activities (Med- change in anatomical, physiologic or metabolic func- DRA) to code the verbatim terms provided in appendix D tions as indicated by physical signs, symptoms and/or in the CSR. MedDRA terminology is the international laboratory changes occurring in any phase of the clini- medical terminology developed under the auspices of cal trial whether associated with drug or placebo and the International Conference on Harmonisation of whether or not considered drug related. This includes Technical Requirements for Registration of Pharmaceu- an exacerbation of pre-existing conditions or events, ticals for Human Used intercurrent illnesses, drug interaction or the signifi- by the FDA, and now used by GSK.
cant worsening of the disease under investigation that Several limitations of the ADECS coded preferred is not recorded elsewhere in the case report form under terms provided in appendix D of the CSR became clear specific efficacy assessments." when we examined the ADECS preferred terms assigned Adverse events were to be elicited by the investigator to the verbatim terms. Firstly, several verbatim terms asking a non-leading question such as: "Do you feel dif- had been left uncoded into ADECS. Secondly, several ferent in any way since starting the new treatment/the adverse events found in the patient narratives of serious last assessment?" Details of adverse events that adverse events that led to discontinuation from the trial emerged with treatment, their severity, including any were not transcribed into appendix D.
change in study drug administration, investigator attri- We therefore approached GSK for access to case bution to study drug, any corrective therapy given, and report forms (appendix H of the CSR), which are not outcome status were documented. Attribution or rela- publically available. GSK made available all 275 case tion to study drug was judged by the investigator to be report forms for patients entered into Study 329. These "unrelated," "probably unrelated," "possibly related," forms, however, which totalled about 77 000 pages, "probably related," or "related." were available only through a remote desktop facility Vital signs and electrocardiograms were obtained at (SAS Solutions OnDemand Secure Por which weekly visits. Patients with potentially concerning car- made it difficult and extremely time consuming to diovascular measures either had their drug dose inspect the records properly. Effectively only one per- reduced or were withdrawn from the study. In addition, son could undertake the task, with backup for ambigu- if the combined serum concentrations (obtained at ous cases. Accordingly we could not examine all case weeks four and eight) of imipramine and desipramine report forms. Instead we decided to focus on those 85 exceeded 500 µg/mL the patient was to be withdrawn participants identified in appendices D and G of the CSR from the study.
who were withdrawn from the study, along with eight Clinical laboratory tests, including clinical chemis- further participants who were known from our inspec- try, haematology, and urinalysis, were carried out at the tion of the CSRs to have become suicidal. Of the case screening visit and at the end of week eight. Clinically report forms that were checked, 31 were from the parox- relevant laboratory abnormalities were to be included etine group, 40 from the imipramine group, and 22 from as adverse events.
the placebo group.
All case report forms were reviewed by JLN, who was source of harms data
trained in the use of MedDRA. The second reviewer The harms data in this paper cover the acute phase, a (JMN), a clinician, was not trained in the MedDRA taper period, and a follow-up phase of up to 30 days for system, but training is not necessary for coding of those who discontinued treatment because of adverse dropouts. These two reviewers agreed about reasons events. To ensure comparability with the report by for discontinuation and coding of side effects (we did Keller and colleagues, none of the tables contains data not use a quantitative indicator of agreement between from the continuation phase.
raters). We scrutinised these 93 case report forms for all the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
adverse events occurring during the acute, taper, and erable difference to the apparent adverse event profile follow-up phases, and compared our totals for adverse of a drug. In staying closer to the original description of events with the totals reported in appendix D of the events, MedDRA codes suicidal events as "suicidal ide- CSR. This review process identified additional adverse ation" or "self harm/attempted suicide" rather than the events that had not been recorded as verbatim terms in ADECS option of "emotional lability"; similarly, aggres- appendix D of the CSR. It also led to recoding of several sion is more clearly flagged as "aggressive events" of the reasons for discontinuation. Tables B, C, and H in rather than "hostility." appendix 2 show the new adverse events and the rea- Most coding was straightforward. Nearly all the verba- sons for changing the discontinuation category.
tim terms simply mapped onto coding terms in MedDRA. At least 1000 pages were missing from the case report Coding challenges usually related to cases where there forms we reviewed, with no discernible pattern to miss- were significant adverse events but the patients were ing information—for example, one form came with a designated by SKB to have discontinued for lack of effi- page inserted stating that pages 114 to 223 were missing, cacy. There was no patient narrative for such patients, in without indicating reasons.
contrast to patients deemed to have discontinued because of the adverse event occurring at discontinua- Coding of adverse events
tion. There were few challenging coding decisions. Choice of coding dictionary for harms Appendix 3 shows our coding of cases in which suicidal The protocol (page 25) indicates that adverse events and self injurious behaviours were considered.
were to be coded and compared by preferred term and body system by using descriptive statistics but does not analysis of harms data
prespecify a choice of coding dictionary for generating In analysing the harms data for the safety population, preferred terms from verbatim terms. The CSR (written we firstly explored the discrepancies in the number of after the study ended) specifies that the adverse events events between case report forms and the CSR. Sec- noted by clinical investigators in this trial were coded ondly, we presented all adverse events rather than those with ADECS, which was being used by SKB at the time. happening only at a particular rate (as done by Keller This system was derived from a coding system devel- and colleagues). Thirdly, we grouped events into oped by the US Food and Drug Administration (FDA), broader system organ class (SOC) groups: psychiatric, Coding Symbols for a Thesaurus of Adverse Reaction cardiovascular, gastrointestinal, respiratory, and other. Terms (COSTART), but ADECS is not itself a recognised Table D in appendix 2 summarises all adverse events by system and is no longer available.
all MedDRA SOC groupings. Fourthly, we broke down We coded adverse events using MedDRA, which has events by severity, selecting adverse events coded as replaced COSTART for the FDA because it is by far the severe and using the listing in appendix G of the CSR of most commonly used coding system today. For coding patients who discontinued for any reason. Fifthly, we purposes, we have taken the original terms used by the included an analysis of the effects of previous treatment, clinical investigators, as transcribed into appendix D presenting the run-in phase profiles of drugs taken by of the CSR, and applied MedDRA codes to these patients entering each of the three arms of the study and descriptions. Information from appendix D was tran- comparing the list of adverse events experienced by scribed into spreadsheets (availapatients on concomitant drugs (from appendix B) versus erbatim terms and the ADECS coding terms those not on other drugs. Finally, we extracted the were transcribed first into these sheets, allowing all events occurring during the taper and follow-up phase.
coding to be done before the drug names were added We did not undertake statistical tests of harms data, in. The transcription was carried out by a research as discussed below.
assistant who was a MedDRA trained coder but took no part in the actual coding. All coding was carried out Patient withdrawal
by JLN, and checked by DH, or vice versa. All of our A study patient could withdraw or be withdrawn prema- coding from the verbatim terms in the appendix D of turely for "adverse experiences including intercurrent the CSR was done blind, as was coding from the case illness," "insufficient therapeutic effect," "deviation from report forms.
protocol including non-compliance," "loss to follow-up," We present results as SKB presented them in the CSR "termination by SB [SKB]," and "other (specify)." using the ADECS dictionary (table 14.2.1), and as coded The CSR states that the primary reason for with- by us using MedDRA. In general, MedDRA coding stays drawal was determined by the investigator. We reviewed closer than ADECS to the original clinician description the codes given for discontinuation from the study, of the event. For instance, MedDRA codes "sore throat" which are found in appendix G of the CSR, and we made as "sore throat" but SKB, using ADECS, coded it as changes in a proportion of cases.
"pharyngitis" (inflammation of the throat). Sore throats can arise because of pharyngitis, but when someone is statistical methods
taking selective serotonin reuptake inhibitors they can The primary population of interest was the intention to indicate a dystonic reaction in the oropharyngeal artreat population that included all patients who received Classification of a problem as a "respiratory system at least one dose of study drug and had at least one disorder" (inflammation) rather than as a "dystonia" assessment of efficacy after baseline. The demographic (a central nervous system disorder) can make a consid- characteristics, description of the baseline depressive doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
episode, additional psychiatric diagnoses, and per- edge differing opinions about this issue in the statistical sonal history variables of the patients were summarised literature so we included them in table A in appendix 2, descriptively by treatment group.
for completeness). The categorical variables were anal- The acute phase eight week endpoint was our primary ysed with logistic regression, with the same effects interest. Statistical conclusions concerning the efficacy included. In either case, if the treatment by investigator of paroxetine and imipramine were made by using data interaction resulted in a two sided P>0.10, the interac- obtained from the last observation carried forward (that tion term was dropped from the model. Statistical is, the last assessment "on therapy" during the acute testing was done with the linear model (LM) and gen- phase) and observed case datasets. Paroxetine and imip- eral linear models (GLM) procedures of the R statistical ramine were each to be compared with placebo; there package (version 2.15.2) as provided by GSK. Imputation was to be no comparison of paroxetine with imipramine. was performed with the multiple imputation by chained We followed the methods of the a priori 1994 study equations (MICE) package also in R.
protocol (amended in 1996 to accept a reduced sample For the analyses of relapse rates, we included all size). It did not provide explicit statistical hypotheses responders (HAM-D ≤8 or ≥50% reduction in symptoms) (null hypotheses and alternative hypotheses); nor were who met the original criteria for entry to the continua- there justifications for the proposed statistical tion phase of the study. Patients were considered to have approaches or statistical assumptions underlying them. relapsed if they no longer met the responder criteria or if One of the two primary efficacy variables, proportion they were withdrawn for "intentional overdose." of responders (response), and one secondary efficacy variable, proportion of patients relapsing, were treated Results
as categorical variables. The second primary efficacy Table 1 shows the demographics of the groups, along variable, change in total HAM-D score over the acute with depression parameters, comorbidities, and base- phase, and the remaining secondary efficacy variables line scores for the efficacy variables.
were treated as continuous variables.
Figure 1 summarises the allocations and discontinua- In accordance with the protocol, the continuous tions among the three treatment groups during the acute variables were analysed with parametric analysis of stud The flow chart covers the intention to treat variance (ANOVA) with effects in the model including population for the acute phase and the efficacy analysis. treatment, investigator, and treatment by investigator The paroxetine group was titrated to a dose of 20 mg/day interaction. Pairwise comparisons were not done if the by week four, with 55% (51/93) of participants moving to omnibus (overall) ANOVA was not significant (two a higher dose (mean 28.0 mg/day, SD 8.4 mg) by week sided P<0.05), as specified by the protocol (we acknowl- eight. The imipramine group was titrated to 200 mg/day by week four, with 40% (38/95) moving to a higher dose (mean 205.8 mg/day, SD 63.9 mg) by week eight. Twenty table 1 baseline characteristics of groups in study 329
eight patients reached the highest permissible dose of Paroxetine (n=93) imipramine (n=95) Placebo (n=87)
40 mg of paroxetine, and 20 patients were titrated to the Mean (SD) age (years) maximum 300 mg of imipramine.
Sex (male/female) There were no discrepancies between any of our African American analyses and those contained in the CSR. Figures 2 and 3 illustrate the longitudinal values for the two primary efficacy variables: mean change from baseline in the Mean (SD) duration of episode (months) 14 (18) HAM-D score and the percentage responding, defined as Mean (SD) age at first episode (years) a decrease in HAM-D score by 50% or more from base- No (%) of previous episodes: line or a final HAM-D score of ≤8. The difference between paroxetine and placebo fell short of the prespecified level of clinical significance (4 points) and neither pri- mary outcome achieved significance at any measured interval for any dataset during the acute phase.
No (%) with comorbidity: The formal reanalysis included both observed case Any comorbid disorder * Current anxiety disorder* and last observation carried forward datasets. As men- ODD, CD, or ADHD* tioned above, the multiple imputation dataset is Least squares mean baseline scores (SEM): included for comparison. There was no statistical significance (considered at P<0.05) or clinical signifi- cance shown for any of the prespecified primary or Autonomous function secondary efficacy variables in either the observed case Self perception profile or last observation carried forward datasets, so pairwise Sickness impact profile analysis was considered unjustified. Table 2 shows the ODD=oppositional defiant disorder, CD=conduct disorder, ADHD=attention-deficit/hyperactivity disorder, HAM-D=Hamilton depression scale, K-SADS-L=affective disorders and schizophrenia for adolescents-lifetime results at week eight for reduction in HAM-D score and version, SD=standard deviation, SEM=standard error of mean.
for the proportion of patients who met criteria for *From K-SADS-L structured interview at screening.
the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
Randomised (n=275) Randomised (n=93) Randomised (n=95) Randomised (n=87) Acute phase discontinuation (n=26): Acute phase discontinuation (n=39): Acute phase discontinuation (n=21): Adverse events (n=13) Adverse events (n=31) Adverse events (n=6) Lack of efficacy (n=3) Lack of efficacy (n=0) Lack of efficacy (n=4) Protocol violations (n=1) Protocol violations (n=6) Protocol violations (n=9) Lost to follow-up (n=4) Lost to follow-up (n=1) Lost to follow-up (n=1) Withdrawn consent (n=5) Withdrawn consent (n=1) Withdrawn consent (n=1) 8 weeks (n=67; 72%) 8 weeks (n=56; 59%) 8 weeks (n=66; 76%) Post-acute discontinuation (n=16): Post-acute discontinuation (n=17): Post-acute discontinuation (n=32): Adverse events (n=5) Adverse events (n=4) Adverse events (n=0) Lack of efficacy (n=5) Lack of efficacy (n=8) Lack of efficacy (n=17) Protocol violations (n=2) Protocol violations (n=4) Protocol violations (n=4) Lost to follow-up (n=1) Lost to follow-up (n=0) Lost to follow-up (n=0) Withdrawn consent (n=2) Withdrawn consent (n=0) Withdrawn consent (n=5) HAM-D responder (n=1) HAM-D responder (n=1) HAM-D responder (n=6) 32 weeks (n=51; 55%) 32 weeks (n=39; 41%) 32 weeks (n=34; 39%) Fig 1 group allocations and discontinuations in trial of paroxetine and imipramine in treatment of major depression in
HAM-D scores decreased by 10.7 (95% confidence interval 9.1 to 12.3), 9.0 (7.4 to 10.5), and 9.1 (7.5 to, 10.7) points (least squares mean) for the paroxetine, imipra- mine, and placebo groups, respectively. Table 3 shows the results at eight weeks for the sec- ondary efficacy variables. Although the protocol listed "predictors of response" among the secondary efficacy variables, the absence of statistically or clinically significant differences among HAM-D difference (observed cases) -15
the three arms rendered this analysis void.
The protocol also listed the relapse rate in the contin- uation phase for responders as a secondary outcome Fig 2 Differences in HaM-D scores in study of efficacy and
variable. Our calculation differed from that in the CSR harms of paroxetine and imipramine in treatment of major
because we included those whose HAM-D scores rose depression in adolescence (table 2 shows numerical
above the "response" range and those who intention- values). Points are least squares means (95% Ci).
ally overdosed. In the continuation phase, the dropout lOCF=last observation carried forward, Mi=multiple
rates were too high in all groups for any precise inter- pretation: 33/51 (65%) in the paroxetine group; 25/39 (64%) in the imipramine group; and 21/34 (62%) in the placebo group. The recorded relapses were 25/51 (49%), 16/39 (41%), and 12/34 (35%), respectively. Although the relapse rate was lower in the placebo group, the differ- ences were not significant (2×3 χ2 P=0.44).
Review of case report forms We reviewed case report forms in appendix H for 93 (34%) of 275 patients. We discovered adverse events recorded onto case report forms but not transcribed into HAM-D % responders (observed cases)
the patient level listings of adverse events in appendix D of the CSR. Table 4 shows these discrepancies. The most Fig 3 Differences in HaM-D % responders in study of
common categories of additional adverse events found efficacy and harms of paroxetine and imipramine in
in case report forms were psychiatric for paroxetine treatment of major depression in adolescence (table 2
shows numerical values). lOCF=last observation carried
(12/23) and placebo (4/10) and cardiovascular for imip- ramine (5/17) (table B in appendix 2).
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
table 2 Datasets for primary efficacy variables at eight weeks and proportion of patients who met criteria for HaM-D response >50% drop or <8 in study
329 for observed cases (OC), last observation carried forward (lOCF), and multiple imputation
least squares mean
least squares mean (95% Ci), seM
(95% Ci), seM
least squares mean (95% Ci), seM patients
−12.2 (−13.1 to −10.5), 0.88 −10.6 (−12.5 to −8.7), 0.97 56 −10.5 (−12.3 to −8.8), 0.88 −10.7 (−12.3 to −9.1), 0.81 −9.0 (−10.5 to −7.4), 0.81 −9.1 (−10.7 to −7.5), 0.83 −12.5 (−14.2 to −10.9), 0.83 −11.1 (−12.9 to −9.4), 0.89 −10.7 (−12.4 to −9.1), 0.83 HaM-D response (>50% reduction or <8)
HAM-D=Hamilton depression scale.
*Al P values uncorrected for multiple variable sampling.
table 3 Datasets for secondary efficacy variables at eight weeks in study 329 for observed cases (OC), last observation carried forward (lOCF), and
least squares mean
least squares mean
least squares mean
K-SADS-L change OC −12.1 (−13.8 to −10.3) −10.7 (−12.7 to −8.7) −10.7 (−12.5 to −8.9) −11.4 (−13.1 to −9.8) −9.5 (−11.1 to −7.9) −9.4 (−11.0 to −7.8) −12.3 (−13.9 to −10.6) −11.5 (−13.3 to −9.7) −10.9 (−12.6 to −9.2) Clinical global impression mean score OC Autonomous function check list change OC 14.4 (8.8 to 19.9) 13.3 (7.3 to 19.4) 9.3 (3.8 to 14.8) 14.7 (9.2 to 20.2) 11.6 (5.8 to 17.3) 9.3 (8.1 to 17.2) 14.0 (8.7 to 19.3) 14.5 (9.4 to 19.6) 9.1 (4.2 to 14.1) Self perception profile change OC 12.9 (8.3 to 17.5) 13.2 (8.4 to 18.1) 12.7 (6.9 to 15.9) 13.2 (8.6 to 17.8) 13.1 (8.3 to 17.8) 11.4 (6.9 to 15.9) 15.4 (10.7 to 20.0) 14.7 (10.0 to 19.4) Sickness impact profile change OC −11.2 (−14.3 to −8.1) −13.5 (−16.9 to −10.2) −10.6 (−13.7 to −7.5) −11.4 (−14.4 to −8.3) −13.0 (−16.2 to −9.8) −9.9 (−12.9 to −6.9) −11.5 (−14.2 to −8.7) −13.9 (−16.8 to −10.9) −10.1 (−13.0 to −7.1) K-SADS-L=affective disorders and schizophrenia for adolescents-lifetime version.
*ANCOVA. All P values uncorrected for multiple variable sampling.
adverse events always fall within a particular system table 4 adverse events found in case report forms (CrFs) compared with adverse events
organ class; others require that the coder choose listed in appendix D of clinical study report of study 329
between system organ classes. A full listing of adverse Paroxetine
events can be found in table E in appendix 2.
Adverse events found in CRFs (appendix H) We included events occurring during the taper Adverse events found in appendix D phase that SKB allocated to the continuation phase as % underestimate in relying only on appendix D acute phase adverse events. In a study that has a con- *In considering adverse effects from imipramine, it should be noted that doses (mean 205.8 mg) were high for tinuation phase, the assessment of adverse events adolescents. In six comparator studies submitted by SKB as part of their 1991 approval NDA for paroxetine in throws up a methodological difficulty not yet adults, mean imipramine dose overall was 140 mg, with mean endpoint dose of 170 mg.25 addressed by groups such as CONSORT. If a study has only an acute phase, then all adverse events are Coding and representation of adverse event data counted for all patients receiving treatment as well as Table 5 presents the number of adverse events found in in any taper phase, and often for a 30 day follow-up this study summarised by system organ class (SOC), period. When a study has a continuation phase, the firstly as coded by SKB using ADECS, secondly as taper and 30 day follow-up periods are displaced. To reported by Keller and colleagues (who reported only ensure comparable analysis of all participants, we tal- adverse events that occurred at frequency of more than lied the adverse events across the acute phase and 5%), and thirdly as coded by us using MedDRA. Some both taper and follow-up phases, whether displaced or the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
table 5 adverse events in sKb clinical study report (Csr) (aDeCs coded), Keller and colleagues (aDeCs coded), and riat
reanalysis (MedDra coded) in study 329
adverse event (system organ class)
*Coded with ADECS (adverse drug events coding system). While in CSR (table 14.2.1—it is not clear whether this includes taper phase), headaches were included in "body as whole"; in paper by Kel er and col eagues, adverse events "headache" and "dizziness" were grouped with psychiatric adverse events under heading "nervous system." †Coded with MedDRA. MedDRA al ows dizziness to be coded under "cardiovascular" or "neurological" SOCs and puts headaches under "neurological" SOC. See also tables D and E in appendix 2.
not. SKB do not seem to have done this, leading to analysis and compared with what was reported by some differences in numbers.
Keller and colleagues and documented in the CSR Figure 4 shows when suicidal and self injurious (table 6).
The full details for patients included in this table can Table 6 shows the numbers of suicidal and self-in- be found in appendix 3, along with working notes and jurious behaviours that we identified in our RIAT directions to where in the CSR the key details can be found. It is possible to take different approaches to moving taper phase events into the continuation phase and reviewing the coding for all cases, especially cases 039, 089, and 106, that were designated suicidal and self injurious behaviours in the RIAT recoding. This would result in different figures.
There were no noteworthy changes in physiological data, which are detailed in appendix F (patient data listings of laboratory tests) in the CSR.
In the CSR, serious adverse events (defined as an event that "resulted in hospitalization, was associated with suicidal gestures, or was described by the treating phy- sician as serious") were reported in 11 patients in the paroxetine group, five in the imipramine group, and two in the placebo group. Designating an adverse event as serious hinged on the judgment of the clinical inves- tigator. We were therefore unable to make comparable judgments of seriousness, but there are two other meth- ods to approach the issue of severity of adverse events. One is to look at those rated as severe rather than mod- erate or mild at the time of the event (table 7). A high number and proportion of severe psychiatric events occurred in the paroxetine group. In contrast, few of the table 7 adverse events (aDeCs coded) deemed serious by
Fig 4 timing of suicidal and self injurious events in study
investigator in study 329 and reorganised by riat analysis
329, Keller and colleagues, and riat analysis
to MeDra system organ class (sOC)
(system organ class)
table 6 numbers of patients with suicidal and self injurious behaviours in study 329
with different safety methods
Paroxetine (n=93) imipramine (n=95)
Keller and colleagues* SKB acute from CSR* RIAT acute and taper from CSR 11 4 (3 definite, 1 possible) 2 (1 definite, 1 possible) *Kel er and col eagues and CSR mostly reported suicide related events as "emotional lability."
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
continuation phase drug or had a continuation phase table 8 reasons for withdrawal (86 patients) during acute phase and taper* in
rating. The coding for discontinuation was particularly ambiguous for this group.
Most patients stopped at this point were designated reason for withdrawal
by SKB as "lack of efficacy" (table 9). Investigators in four centres reported lack of efficacy as a reason for stopping six patients allocated to placebo even though the HAM-D score was in the responder range and was as Depression worsening low as 2 or 3 points in some instances.
In some cases there were clear protocol violations or factors such as the unavailability of further treatments (placebo in particular). We recategorised the lack of effi- cacy dropouts based on factors such as adverse events and HAM-D scores. Table 9 shows our analysis of rea- sons for withdrawal at the end of the acute phase.
The protocol for Study 329 called for a taper phase for Accidental injury all participants and, in addition, a 30 day follow-up period for all those who discontinued because of adverse events. The data in the appendix D of the CSR Intercurrent illness‡ make it possible to identify adverse events happening Total adverse events (%) in the taper and follow-up periods. These data are pre- sented in table 10. Non-compliance with effects of other drugs
Table 11 shows data on the effects of other drugs on the Recreational drug use adverse events recorded. Patients taking other drugs Total protocol violations (%) Other (%)
had more adverse events than those who were not. This Lost to fol ow-up effect was slightly more marked in the placebo group, and as such works to the apparent benefit of the active Withdrawn consent drug treatments in minimising any excess of adverse events over placebo.
Total dropout rate (%) *Reported in appendix G (tabulations by patient) from CSR and from appendix H CRFs.
†Patient 329.002.00058 was found to have stopped drug three days before attempting suicide. Original y this had been classed as "continuation phase" drop out, but we moved it to "30 day discontinuation" period. Reason Principal findings and comparison with original
for withdrawal was original y "adverse event including intercurrent il ness" but we changed it to "suicide attempt." Consequently RIAT analysis found total of 86 withdrawals rather than 85 reported in CSR.
Our RIAT analysis of Study 329 showed that neither par- ‡We replaced term "adverse event: intercurrent illness" with more specific adverse event terms.
§Four patients enrol ed in study violated inclusion criterion. Two had cardiovascular problems, one had C-GAS oxetine nor high dose imipramine was effective in the score >60, and one was "extremely" suicidal at screening. All four were randomised to placebo. It was unclear treatment of major depression in adolescents, and there how to categorise their reasons for discontinuation; we chose "protocol violations." was a clinically significant increase in harms with both drugs. This analysis contrasts with both the published many cardiovascular events in the imipramine group conclusions of Keller and colleagues and the way that were rated as severe.
the outcomes were reported and interpreted in the CSR.
We analysed and reported Study 329 according to the original protocol (with approved amendments). A second method of approaching the issue of severity of Appendix 1 shows the sources of information we used in adverse events is to look at rates of discontinuation preparing this paper, which should help other research- because of such events. Table 8 shows reasons for ers who want to access the data to check our analysis or withdrawal during the acute phase and taper because to interrogate it in other ways. We draw minimal conclu- of adverse events and other causes. Note that we sions regarding efficacy and harms, inviting others to examined the case report forms from appendix H for all offer their own analysis.
discontinuations reported in appendix G of the CSR. All Our re-examination of the data, including a review of changes of coding for discontinuation are laid out in 34% of the cases, showed no significant discrepancies in table H in appendix 2.
the primary efficacy data. The marked difference Consideration of the displaced taper in Study 329 between the efficacy outcomes as reported by us and revealed a conundrum. In addition to the 86 dropouts those reported by SKB results from the fact that our anal- from the acute phase noted by SKB, there were 65 drop- ysis kept faith with the protocol's methods and its desig- outs after ratings were completed at week eight. SKB nation of primary and secondary outcome variables.
regarded these patients as participants in the The authors/sponsors departed from their study continuation phase, although none of them took a protocol in the CSR itself by performing pairwise the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
table 9 reasons for withdrawal (65 patients) at end of acute phase according to sKb and riat reanalysis in study 329*
(acute completers n=67)
(acute completers n= 56)
(acute completers n=66)
reason for withdrawal
Csr, appendix g
Csr, appendix g
Csr, appendix g
Depression worsening Homicidal ideation Total adverse events Protocol violation
Non-compliance with study treatment
Recreational drug use PV by investigator Total protocol violations lost to follow-up
lack of efficacy
No study drugs available Moved out of state Total "other" dropouts HAM-D=Hamilton depression scale, ADHD=attention-deficit/hyperactivity disorder.
*Total discontinued at week 8 (end of acute phase). CSR and paper by Kel er and col eagues report 86 patients who withdrew in acute phase, but are silent about these 65 patients who dropped out at end of acute phase.
†After review of reasons for withdrawal from study in the CSR (appendix G), along with review of patient narratives and CRFs where applicable, we proposed changes to these reasons for withdrawal in a proportion of those discontinued.
table 10 adverse events from taper phase of study 329 according to riat (reanalysis study)*
system organ class (MedDra)
Cardiovascular disorders Psychiatric disorders Total adverse events *SKB did not present ADECS analysis for taper phase in clinical study report.
table 11 use of other drugs in month before enrolment, and incidence of adverse events in study 329
No (%) of patients Psychiatric adverse events subgroup* (acute+taper) Total adverse events (acute+taper) *Psychiatric adverse events included in this subgroup include: abnormal dreams, aggravated depression, agitation, akathisia, anxiety, depersonalisation, disinhibition, hal ucinations, paranoia, psychosis, suicidal ideation/gesture/attempt.
doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
comparisons of two of the three groups when the omni- ness, such as dizziness during paroxetine taper, as neu- bus ANOVA showed no significance in either the contin- rological, but we have not carried out that more uous or dichotomous variables. They also reported four complex analysis.
other variables as significant that had not been men- As reported by Keller and colleagues, dizziness and tioned in the protocol or its amendments, without any headache comprised 54 of 115 nervous system events in acknowledgment that these measures were introduced those taking paroxetine (47%), 83 of 135 events in those post hoc. This contravened provision II of appendix B of taking imipramine (62%), and 50 of 65 events in those the Study 329 protocol ("Administrative Matters"), taking placebo (77%). The effect of disentangling these according to which any change to the study protocol was two symptoms from psychiatric adverse events required to be filed as an amendment/modification.
unmasks a clinically important difference in psychiatric With regard to adverse events, there were large and adverse event profiles between paroxetine and placebo.
clinically meaningful differences between the data as There was a major difference between the frequency analysed by us, those summarised in the CSR using the of suicidal thinking and events reported by Keller and ADECS methods, and those reported by Keller and col- colleagues and the frequency documented in the CSR, leagues. These differences arise from inadequate and as shown in table 6.
incomplete entry of data from case report forms to sum- With regard to dropouts, Keller and colleagues stated mary data sheets in the CSR, the ADECS coding system that 69% of patients completed the acute phase. Only used by SKB, and the reporting of these data sheets in 45%, however, went on to the continuation phase, Keller and colleagues. SKB reported 338 adverse events which has not yet been subject to RIAT analysis.
with paroxetine and Keller and colleagues reported 265, whereas we identified 481 from our analysis of the CSR, Comparison with other studies
and we found a further 23 that had been missed from Our findings are consistent with those of other studies, the 93 case report forms that we reviewed. including a recent examination of 142 studies of six psy- Another reason why the figures of Keller and col- chotropic drugs for which journal articles and clinical leagues are lower than ours is because they presented trial summaries were both available. Most deaths data only for adverse events reported for 5% of patients (94/151, 62%) and suicides (8/15, 53%) reported in trial or more. For all adverse events combined, their table 3 summaries were not reported in journal articles. Only reported a burden of adverse events with paroxetine 1.2 one of nine suicides in olanzapine trials was reported in times that of the burden with placebo. This compares published papers.
with the figure of 1.4 from our RIAT MedDRA coding of data from the CSR. The figures from CSR and case report reporting of adverse events
forms also differ substantially from other figures quoted Our reanalysis of Study 329 showed considerable varia- by Keller and colleagues because they did not report a tions in the way adverse events can be reported, demon- category of psychiatric adverse events, but instead strating several ways in which the analysis and grouped such events together with "dizziness" and presentation of safety data can influence the apparent "headache" under the class "nervous system." safety of a drug. We identified the following potential MedDRA distinguishes between neurological and barriers to accurate reporting of harms (summarised in psychiatric system organ classes. We placed headaches box 2).
in the neurological rather than the psychiatric class. MedDRA allows dizziness to be coded under cardiovas- Use of an idiosyncratic coding system cular or neurological classes. Given the dose of imipra- The term "emotional lability," as used in SKB's adverse mine being used, most cases of dizziness seem likely to drug events coding system, masks differences in sui- be cardiovascular, with Keller and colleagues also cidal behaviour between paroxetine and placebo.
reporting a high rate of postural hypotension on imipra- mine. We have thus coded all dizziness under cardio- Failure to transcribe all adverse events from clinical vascular rather than neurological. There is scope for record to adverse event database others accessing the data to parse out whether there is Our review of case report forms disclosed significant sufficient information to code certain instances of dizzi- under-recording of adverse events.
Filtering data on adverse events through statistical box 2 Potential barriers to accurate reporting of harms
• Use of an idiosyncratic coding system Keller and colleagues (and GSK in subsequent corre- • Failure to transcribe all adverse events from clinical record to adverse event database spondence) ignored unfavourable harms data on the • Filtering data on adverse events through statistical techniques grounds that the difference between paroxetine and • Restriction of reporting to events that occurred above a given frequency in any one group placebo was not statistically significant, at odds with • Coding event under different headings for different patients (dilution) the SKB protocol that called for primary comparisons to • Grouping of adverse events be made using descriptive statistics. In our opinion, sta- • Insufficient consideration of severity tistically significant or not, all relevant primary and • Coding of relatedness to study medication secondary outcomes and harms outcomes should be • Masking effects of concomitant drugs explicitly reported. Testing for statistical significance • Ignoring effects of drug withdrawal is most appropriately undertaken for the primary the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
outcome measures as study power is based on these. We Insufficient consideration of severity have not undertaken statistical tests for harms as we In addition to coding adverse events, investigators rate know of no valid way of interpreting them. To get them for severity. If no attempt is made to take severity away from a dichotomous (significant/non-signifi- into account and include it in reporting, readers could cant) presentation of evidence, we opted to present all get the impression that there was an equal burden of original and recoded evidence to allow readers their adverse events in each arm, when in fact all events in own interpretation. The data presented in appendix 2 one arm might be severe and enduring while those in and related worksheets lodgthe other might be mild and transient.
will, however, readily permit other approaches to data One way to manage this is to look specifically at those analysis for those interested, and we welcome other patients who drop out of the study because of adverse events. Another method is to report those adverse events coded as severe for each drug group separately Restriction of reporting to events that occurred above from those coded as mild or moderate. We used both a given frequency in any one group approaches (see tables 7 and 8).
In the paper by Keller and colleagues, reporting only adverse events that occurred in more than 5% of Coding of relatedness to study medication patients obscured the harms burden. In contrast, we Judgments by investigators as to whether an adverse report all adverse events that have been recorded. These event is related to the drug can lead to discounting the are available in table E in appendix 2.
importance of an effect. We have included these judg- ments in the worksheets lodg, Coding event under different headings for different but we have not analysed them because it became clear that the blinding had been broken in several The effect of reporting only adverse events that have a cases before relatedness was adjudicated by the origi- frequency of more than 5% is compounded when, for nal investigators and because some judgments were instance, agitation might be coded under agitation, anx- implausible. For instance, it is documented on page iety, nervousness, hyperkinesis, and emotional lability; 279 in the CSR that an investigator, knowing the thus, a problem occurring at a rate of >10% could vanish patient was on placebo, declared that a suicidal event by being coded under different subheadings such that was "definitely related to treatment" on the grounds none of these reach a threshold rate of 5%.
that "the worsening of depression and suicidal Aside from making all the data available so that oth- thought were life threatening and definitely related to ers can scrutinise it, one way to compensate for this study medication [known to be placebo] in that there possibility is to present all the data in broader system was a lack of effect." Notably, of the 11 patients with organ class groups. MedDRA offers the following higher serious adverse events on paroxetine (compared with levels: psychiatric, cardiovascular, gastrointestinal, two on placebo) reported in the paper by Keller and respiratory, and other. In table E in appendix 2, the colleagues, only one "was considered by the treating adverse events coded here under "other" are broken investigator to be related to paroxetine treatment," down under the additional MedDRA SOC headings, thus dismissing the clinically important difference including general, nervous system, metabolic, and between the paroxetine and placebo groups for seri- ous adverse events.
Grouping of adverse events Masking effects of concomitant drugs Even when they are presented in broader system In almost all trials, patients will be taking concomitant groups, grouping common and benign symptoms with drugs. The adverse events from these other drugs will more important ones can mask safety issues. For tend to obscure differences between active drug treat- example, in the paper by Keller and colleagues, com- ment and placebo. This might be an important factor in mon adverse events such as dizziness and headaches trials of treatments such as statins, where patients are are grouped with psychiatric adverse events in the often taking multiple drugs.
"nervous system" SOC heading. As these adverse Accordingly, we also compared the incidence of events are common across treatment arms, this group- adverse events in patients taking concomitant drugs ing has the effect of diluting the difference in psychiat- with the incidence in those not taking other drugs. ric side effects between paroxetine, imipramine, and Other drugs were instituted in the course of the study that we have not analysed, but the data are available We followed MedDRA in reporting dizziness under in tables K and L in appendix 2 and worksheets lodged "cardiovascular" events and headache under "ner- ppendix B from the CSR. vous system." There might be better categorisations; There are several other angles in the data available at our grouping is provisional rather than strategic. In ould be further explored, table E in appendix 2, we have listed all events coded such as the effects of withdrawal of concomitant drugs under each system organ class heading, and we invite on adverse event profiles, as the spreadsheets others to further explore these issues, including alter- document the day of onset of adverse events and the native higher level categorisation of these adverse dates of starting or stopping any concomitant drugs. Another option to explore is the possibility of any doi1 02.00;6/bmj.hh; 2 BMJ 2015;101hh; 2 the bmj
prescribing cascades triggered by adverse events The RIAT analysis broke new ground but was limited related to study drugs.
in that we could check only 34% (93/275) of case report forms. Time and resources prevented access to all forms Ignoring effects of drug withdrawal because of the difficulties in using the portal for access- The protocol included a taper phase lasting 7-17 days ing the study data and because considerable amounts that investigators were encouraged to adhere to, even of data were missing.
in patients who discontinued because of adverse The analysis generated a useful taxonomy of poten- events. The original paper did not analyse these data tial barriers to accurate reporting of adverse events and, separately. The increased rates of psychiatric adverse even allowing for the above limitations, showed the events that emerged during the discontinuation phase value of permitting access to data.
in our analysis are consistent with dependence on and withdrawal from paroxetine, as reported by Conclusion and implications for research and policy
Contrary to the original report by Keller and colleagues, our reanalysis of Study 329 showed no advantage of riat process
paroxetine or imipramine over placebo in adolescents This RIAT exercise proved to be extremely demanding with symptoms of depression on any of the prespecified of resources. We have logged over 250 000 words of variables. The extent of the clinically significant email correspondence among the team over two increases in adverse events in the paroxetine and imip- years. The single screen remote desktop interface ramine arms, including serious, severe, and suicide (that we called the "periscope") proved to be an enor- related adverse events, became apparent only when the mous challenge. The efficacy analysis required that data were made available for reanalysis. Researchers multiple spreadsheet tables were open simultane- and clinicians should recognise the potential biases in ously, with much copying, pasting, and cross check- published research, including the potential barriers to ing, and the space was highly restrictive. Gaining accurate reporting of harms that we have identified. access to the case report forms required extensive cor- Regulatory authorities should mandate accessibility of respondence with GSK. Although GSK ultimately data and protocols.
provided case report forms, they were even harder to As with most scientific papers, Keller and colleagues manage, given that we could see only one page at a convey an impression that "the data have spoken." This time. It required about a thousand hours to examine authoritative stance is possible only in the absence of only a third of the case report forms. Being unable to access to the data. When the data become accessible to print them was a considerable handicap. There were others, it becomes clear that scientific authorship is no means to prepare packets for multiple indepen- provisional rather than authoritative.
dent coders, to decrease bias; to make annotations or We thank Carys Hogan for database work and Tom Jefferson and use margin comments; or to sort and collate the Leemon McHenry for comments on earlier drafts.
adverse event reports. Our experience highlights that The SmithKline Beecham study was registered as No 29060/329. The hard copies as well as electronic copies are crucial for protocol was SmithKline Beecham study 29060/329, final clinical report (acute phase), appendix a, Protocol, frstudy an enterprise like this.
was funded by SmithKline Beecham. The data analysis protocol for Our analysis indicates that although CSRs are useful, RIAT reanalysis was submitted to GSK on 28 October 2013 and and in this case all that was needed to reanalyse approved by GSK on 4 December 2013.
efficacy, analysis of adverse events requires access to Contributors: Conception/design of the work: DH, JJ, JMN. Acquisition individual patient level data in case report forms.
of data: JJ (negotiation with GSK); CT and EA-J (RIATAR); JMN (efficacy data using GSK online remote system); JLN (harms data using GSK Because we have been breaking new ground, we have online remote system). Data analysis: JMN (efficacy); JLN and DH not had precedents to call on in analysis and reporting. (harms). Data interpretation: all authors. Drafting the work and We await with interest other efforts to do something revising it critically for important intellectual content, final approval of the version to be published: all authors. Al authors agree to be accountable for all aspects of the work. JJ is guarantor. The first four authors made equal contribution to the paper.
strengths and limitations of this study
Funding: This research received no specific grant from any funding Study 329 was a randomised controlled trial with a rea- agency in the public, commercial, or not-for-profit sectors.
sonable sample size. There was, however, evidence of Competing interests: Al authors have completed the ICMJE uniform disclosure form at declare: DH protocol violations, including some cases of breaking of has been and is an expert witness for plaintiffs in legal cases involving blinding. The coding of adverse events by the original GlaxoSmithKline's drug paroxetine. He is also a witness for plaintiffs in investigators raised the possibility that some other data actions involving other antidepressants with the same mechanism of action as paroxetine. JJ has been paid by Baum, Hedlund, Aristei and might be unreliable.
Goldman, Los Angeles, CA, to provide expert analysis and opinion The trial lasted for only eight weeks. Participants had about documents obtained from GlaxoSmithKline in a class action relatively chronic depression (mean duration more than over Study 329, and from Forest in relation to paediatric citalopram randomised control ed trials.
one year), which would limit the generalisability of the Ethical approval: The protocol and statement of informed consent results, particularly in primary care, because many were approved by an institutional review board before each centre's cases of adolescent depression have shorter dura- initiation, in compliance with 21 United States Code of Federal tionseneralisability to primary care would also be Regulations (CFR) Part 56. Written informed consent was obtained from each patient before entry into the study, in compliance with 21 CFR Part limited by the fact that participants were recruited 50. Case report forms were provided for each patient's data to be through tertiary settings.
recorded (Final Clinical Report page 000030). The sample informed the bmj BMJ 2015;101hh; 2 doi1 02.00;6/bmj.hh; 2
consent is provided in the appendix to the protocol, appendix C, pp 14 Diagnostic and statistical manual of mental disorders, third edition, 000590-4. No further information is available regarding the particular revised (DSM-III-R). American Psychiatric Association, 1987.
institutional review board that approved the study.
15 Fawcett J, Epstein P, Fiester SJ, Elkin I, Autry JH. Clinical management— imipramine/placebo administration manual. NIMH Treatment of Transparency: JJ affirms that the manuscript is an honest, accurate, Depression Col aborative Research Program. Psychopharmacol Bull and transparent account of the study being reported; that no important aspects of the study have been omitted; and that any 16 Hamilton M. Development of a rating scale for primary depressive discrepancies from the study as planned (and, if relevant, registered) illness. Br J Soc Clin Psychol 1967;6:278-96.
have been explained.
17 Sigafoos AD, Feinstein CB, Damond M, Reiss D. The measurement of Data sharing: Clinical study reports, detailed data tables, and behavioral autonomy in adolescence: the Autonomous Functioning programming code are available on the Dryad Digital Repository Checklist. Adolesc Psychiatry 1988;15:432-62.
(http://dx.doi.org/10.5061/dryad.bv8j6) and at www.Study329.org/ 18 SKB. Draft Minutes: 4/22/97 Teleconference. Paroxetine Study 329 This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, 19 GlaxoSmithKline, Paroxetine—paediatric and adolescent patients. which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non- 20 Winter C. MedDRA in clinical trials—industry perspective SFDA‐ICH MedDRA Workshop, Beijing, 13‐14 May Doshi P, Dickersin K, Healy D, Vedula SS, Jefferson T. Restoring invisible and abandoned trials: a call for people to publish the findings. BMJ 2013;346:f2865.
21 Jureidini JN, Nardo JM. Inadequacy of remote desktop interface 2 Keller MB, Ryan ND, Strober M, et al. Efficacy of paroxetine in the for independent reanalysis of data from drug trials. BMJ treatment of adolescent major depression: a randomized, control ed trial. J Am Acad Child Adolesc Psychiatry 2001;40:762-72.
22 Fitzgerald K, Healy D. Dystonias and dyskinesias of the jaw associated 3 McHenry L, Jureidini J. Industry-sponsored ghostwriting in clinical trial with the use of SSRIs. Human Psychopharmacol 1995;10:215-20.
reporting: a case study. Account Res 2008;15:152-67.
23 Kline RB. Beyond significance testing. Statistics reform in the behavioral 4 Jureidini J, McHenry L, Mansfield P. Clinical trials and drug promotion: sciences. 2nd ed. American Psychological Association, 2013.
selective reporting of study 329. Int J Risk Saf Med 2008;20:73-81.
24 R Core Team. R: a language and environment for statistical computing. Jureidini J, McHenry L. Conflicted medical journals and the failure of R Foundation for Statistical Computin trust. Account Res 2011;18:45-54.
25 Brecher M. Review and evaluation of clinical data. Original NDA 6 Kraus JE, letter to Jon Ju 20—031. Paroxetine (Aropax). Efficacy review. SmithKline Beecham Treasure T, Monson K, Fiorentino F, Russell C. The CEA Second-Look 26 Hughes S, Cohen, D, Jaggi R. Differences in reporting serious adverse Trial: a randomised control ed trial of carcinoembryonic antigen events in industry sponsored clinical trial registries and journal prompted reoperation for recurrent colorectal cancer. BMJ Open articles on antidepressant and antipsychotic drugs: a cross-sectional 2014 May 13;4:e004385.
study. BMJ Open 2014;4:e005535.
8 Ebrahim S, Sohani ZN, Montoya L, et al. Reanalyses of randomized 27 Maund E, Tendal B, Hróbjartsson A, et al. Benefits and harms in clinical trial data. JAMA 2014;312:1024-32.
clinical trials of duloxetine for treatment of major depressive disorder: 9 SmithKline Beecham. A multi-center, double-blind, placebo control ed comparison of clinical study reports, trial registries, and publications. study of paroxetine and imipramine in adolescents with unipolar major depression –acute phase, Final clinical report 28 Lewinsohn PM, Clarke GN, Seeley JR, Rohde P. Major depression in community adolescents: age at onset, episode duration, and time to 10 Healthy Skepticism International News. Paxil Study 329: paroxetine vs recurrence. J Am Acad Child Adolesc Psychiatry 1994;33:809-18.
imipramine vs placebo in adolesc 29 Fava M. Prospective studies of adverse events related to antidepressant discontinuation. J Clin Psychiatry 2006;67(suppl 4):14-21.
11 SAS Solutions OnDemand BMJ Publishing Group Ltd 2015 12 Correspondence between Jureidini and GSK. Rapid responses to putting GlaxoSmithKline to the test over paroxetine. BMJ 2013;347 Appendix 1: RIAT audit record
13 SmithKline Beecham. A multi-center, double-blind, placebo control ed Appendix 2: Supplementary tables A-M
study of paroxetine and imipramine in adolescents with unipolar Appendix 3: Supplementary information on suicidal
major depression 1993/amended and self-injurious behaviours in Study 329 No commercial reuse: See rights and reprints http://www.bmj.com/permissions
A Housing OverviewNeil Fox and Will SheppardWith cattle coming inside over the winter, their management becomes more involved than when they areat grass and if you get it wrong you could potentially lose a lot of money. We thought it would be useful togive you a short guide on the type of cases we are commonly presented with affecting housed animalsand how to deal with them should they arise on your farm.