ORIGINAL RESEARCH A structured judgement method toenhance mortality case note review:development and evaluation Allen Hutchinson,1 Joanne E Coster,1 Katy L Cooper,1 Michael Pearson,2Aileen McIntosh,1 Peter A Bath3 ▸ Additional material is has resulted in a major public debate.1 2 published online only. To view Background Case note review remains a prime Concerns about hospital deaths in well- please visit the journal online means of retrospectively assessing quality of care.
developed health systems, especially when This study examines a new implicit judgement linked to the occurrence of adverse method, combining structured reviewer events,3 have also been expressed inter- 1Section of Public Health, Schoolof Health and Related Research comments with quality of care scores, to assess nationally. This has resulted in a number (ScHARR), University of Sheffield, care of people who die in hospital.
of rigorous epidemiological studies of Methods Using 1566 case notes from 20 adverse event frequency, for example in 2Department of Clinical English hospitals, 40 physicians each reviewed Australia, Canada and Sweden.4–6 More Evaluation, University ofLiverpool, Liverpool, UK 30–40 case notes, writing structured judgement- recently, there have been large studies of 3Information School, University based comments on care provided within three hospital deaths, together with associated of Sheffield, Sheffield, UK phases of care, and on care overall, and scoring events, which have examined whether quality of care from 1 (unsatisfactory) to 6 (very some hospital deaths might have been Correspondence toProfessor Allen Hutchinson, best care). Quality of care comments on 119 preventable.7 8 On a day-to-day level, Section of Public Health, School people who died (7.6% of the cohort) were however, there remains a need for rigor- of Health and Related Research analysed independently by two researchers to ous methods to enable clinical teams to (ScHARR), University of Sheffield,Regent Court, 30 Regent St., investigate how well reviewers provided retrospectively assess quality of care in a Sheffield S1 4DA, UK; structured short judgement notes on quality of timely manner and, thus, to identify when care, together with appropriate care scores.
deaths were inevitable or whether they Consistency between explanatory textual data might have been prevented with better Received 20 January 2013Revised 19 June 2013 and related scores was explored, using overall care. This could assist, for example, in the Accepted 22 June 2013 care score to group cases.
discussions on care that currently take Published Online First Results Physician reviewers made informative, place in hospital ‘morbidity and mortality' clinical judgement-based comments across all phases of care and usually provided a coherent quality of care score relating to each phase. The remains a prime means of retrospectively majority of comments (83%) were explicit assessing quality of care,3–8 despite the judgements. About a fifth of patients were known methodological and practical chal- considered to have received less than satisfactory lenges of this review method.9–11 Two care, often experiencing a series of adverse events.
principal review methods are used: expli- Conclusions A combination of implicit cit criterion-based methods and implicit judgement, explicit explanatory comment and (sometimes called holistic) methods which related quality of care scores can be used are based on clinical judgement.
effectively to review the spectrum of care Criterion-based methods, usually using provided for people who die in hospital. The frameworks of pre-determined criteria to method can be used to quickly evaluate deaths so identify elements of care which are either that lessons can be learned about both poor and met or not met, are useful for large-scale high quality care.
audits of care or for screening case notesusing criterion-based trigger tools.9 Implicit review methods are based on Hospital death rates are a matter of public clinical judgement, and are probably To cite: Hutchinson A,Coster JE, Cooper KL, et al.
concern in the UK and have been the more effective for identifying and record- BMJ Qual Saf 2013;22: subject of both country-wide data analysis ing the detail and nuance of care (both and local intensive reviews, one of which unsatisfactory and good).12 Thus, implicit Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research review methods are probably more appropriate for There was moderate inter-rater reliability of these detailed exploration of the care for people who die in judgement-based scores when two or three physicians, working separately, used structured implicit review on formats have been criticised for low inter-rater reli- the same set of case notes (intraclass correlation coef- ability (high variability) and for potential reviewer ficient (ICC) 0.52). Physician reviewers tended to bias,9–11 13 whereas structured implicit review limits make more explicit written judgements on the quality the variability and creates specific frameworks so that of care provided than did nurse reviewers, who more reviewers are able to make, justify and organise state- often made commentaries about the process/pathway Initial models of structured implicit review methods Subsequently, we asked 40 physician reviewers to were actually a fusion of implicit judgements of undertake this enhanced form of structured implicit quality of care which were required of the reviewer in review to examine the quality of care provided for order for them to check a set of explicit review cri- 1566 people with either chronic obstructive pulmon- teria (eg, a criterion such as ‘no appropriate nursing ary disease (COPD) or heart failure as their main diag- interventions carried out').14 A framework such as nosis. There was no oversampling of deaths and each this was used by Pearson et al15 to monitor nursing set of case notes was reviewed only once. There were care quality. More recently, Hogan et al8 used this two reviewers (one for COPD cases and one for heart approach in a study of the frequency of adverse events failure cases) for each of 20 randomly selected large and preventable deaths in English hospitals, where a hospitals in England and each reviewer judged judgement-based structured explicit 1–5 scale was between 30 and 40 consecutively selected sets of case used by reviewers to rate quality of care from very notes and associated clinical records in their own hos- poor to excellent. In a study of adverse event fre- pital. Reviewers were either senior respiratory or car- quency and preventability on 8400 patient records in diology physicians in training. Our initial quantitative the Netherlands, Zegers and colleagues used two analysis, reported elsewhere, examined the range of 6-point scales which reviewers employed to record phase of care scores and overall care scores for each their judgement as to whether injury was caused by of the 20 hospitals and the relationship of the care healthcare management or the disease process and to scores to broader quality of care markers.9 assess the degree of preventability.7 16 Here we report a new qualitative and quantitative However, this form of judgement-based structured analysis of the commentaries written by the reviewers implicit review only provides a scale-based quantita- to support their judgement scores of care provided for tive result and there is no way to determine how or the 119 cases who died in hospital (7.6% of the why the reviewer judgement was made. Thus the cohort of 1566 cases). The purpose of the analysis method is useful for large scale monitoring or epi- was to explore whether physician reviewers can con- demiological studies of adverse events, but has rather sistently provide short, structured, judgement-based less value for more detailed review at the ward or hos- comments on quality of care that they can also justify pital level of why an event occurred.
with an appropriate care score. The consistency To increase the value of structured implicit review between the explanatory textual data and the related in the context of reviewing the whole spectrum of scores is explored with a view to considering whether care quality, rather than focussing only on adverse this structured method, combining implicit judge- event rates, we designed and tested a structured care ments supported by explanatory comments, together review method, drawing on the initial work of Kahn with quality of care scores, can be used for routine and colleagues.14 This required reviewers to make mortality case note review.
implicit clinical judgements and to write explicit com-ments to support judgement-based quality of care scores.9 In the developmental stage of the study, Hospital and reviewer selection multi-professional groups of reviewers independently Acute care hospitals in England were first grouped into reviewed the same records, first using a quantitative quartiles using mortality data. Equal numbers of hospi- and then a qualitative review process. For each case, tals from the top and bottom quartiles were then ran- the review process was undertaken for three phases of domly selected (20 in total). Each randomly selected care (admission, initial management and later manage- hospital had to provide two reviewers, who were all ment), followed by an overall judgement of the care volunteers and specialists in training. Each was initially provided for the patient. For each phase of care, and approached by specialists in their own hospitals and for care overall, reviewers, both physicians and nurses, initial research team contact with the specialists was were asked to rate quality of care on a 1 (unsatisfac- made through the Royal College of Physicians.
tory) to 6 (excellent) scale. This was similar to a four-stage phase of care approach, together with overall Reviewer training care quality, subsequently used by Hogan et al8 to All reviewers received training in the review methods provide a framework on which to rate quality of care.
and in data recording prior to data collection.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research A full-day training session comprised a description of differences in categorisation were resolved through the methods, discussion about the need to be as expli- cit as possible about the judgement commentaries and Comments were categorised into three groups (see a session reviewing a set of case notes in pairs with box 1). All comments in categories B (implicit judge- tutors. Finally, all of the reviewers judged the care from ment comments) and C (explicit judgement com- the same set of anonymised case notes and then com- ments) were subsequently classified by the two study mented on their findings in a managed small group dis- analysts as indicating good quality of care ( positive cussion, which again emphasised the need to be comments) or as indicating poor quality of care (nega- explicit in their judgements. Data were collected via an tive comments). These two categories of comment for electronic form which enabled direct entry by each case were then grouped by their related overall reviewers of both comments and scores for all relevant quality of care scores, which were then used to classify care phases and care overall. This enabled reviewers to each case into one of six groups, from unsatisfactory structure their commentaries. The data collection pro- care (score 1) to very best care (score 6). Examples of gramme was also demonstrated during the training day.
the detailed textual analysis are presented in the Finally, reviewers were provided with a set of results in tables 2 and 3.
national clinical practice guidelines relevant to their The association between the quality scores for care clinical specialty. Regular contact was maintained overall for the group of 119 people who died was between the study team and the reviewers, who could compared with the distribution of scores for the 1447 ask for advice during the review period using a tele- patients who survived, using the χ2 test. The associ- phone helpline.
ation between the comment category and type andtheir relationship to each another were explored across overall care scores using the χ2 test. The χ2 Each set of case notes was reviewed by a single physician tests were undertaken using Microsoft Excel and reviewer. Quality of care was assessed in three phases— p values were calculated using GraphPad software admission, initial management and later management, and also for care overall. For each phase of care and forcare overall, reviewers wrote short textual comments onthe quality of care provided and were encouraged to be explicit in their comments on care. They also gave the The overall quality of care scores for the patients who care a score from 1 to 6 for each phase and for overall died are compared in table 4 with the scores for all care, based on the criteria in table 1.
patients who survived. The proportions of cases inwhich care fell short of good practice are relatively Analysis methodsOf the 1566 cases reviewed, 119 had died during theirhospital admission. To explore the type and content of Reviewer comment categories written comments by the reviewers on each of the 119cases, a textual analysis framework, developed during the study prior to this analysis and previously Little or no comment about care and/or little or no judge- reported,9 was applied to all of the phase and overall ment, including, for example, a description of what was care comments. Two authors (AH, JEC) reviewed and in the case note or a description of what happened to categorised the comments independently and any the patient (not the care they received).
Note: Category A did not contribute to the analysis pre-sented here, since this analysis was concerned with jud- Care score criteria gements rather than descriptive reports.
Unsatisfactory: care fell short of current best practice in one or more significant areas resulting in the potential for, or actual, Limited comment about quality of care and/or implied adverse impact on the patient judgement. This category included an implied judgment Care fell short of current best practice in more than onesignificant area, but is not considered to have the potential and/or a description of the care delivered (not just a for adverse impact on the patient description of a patient pathway) and/or a description of Care fell short of current best practice in only one significant an omission of care.
area, but is not considered to have the potential for adverse impact on the patient Comments about care with explicit judgements and This was satisfactory care, only falling short of current best views. This category included explicit judgements of care practice in more than two minor areas delivered, questioning or queries about the care deliv- This was good care, which only fell short of current best ered, explanations or justification of care delivered, alter- practice in one or two minor areas native options or justification of care that should have Very best care: this was excellent care and met current bestpractice been delivered or concerns about care.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research Reviewer commentary on care judged unsatisfactory overall Overall care score 1 Reviewer comments (Pos or Neg)/category Admission phase score 1 Poor history documentation Poor examination documentation Initial investigations requested CXR, ECG, bloods but no comment made re these No ABGs and patient was tachypnoeic and hypoxic No O2 (not documented) Pitiful dose of frusemide (furosemide) (20 mg IV) Extremely poor management Initial management phase Medical team made no attempt to adequately treat the heart failure No comment on the CXR CPAP started without ABGs Did record a resuscitation status Documentation very poor, for example, no reference to the fact that she was so unwell or whether they thought it likely that she would dieNo discussion with the family or relatives Overall care score 1 All aspects of this case were very poor. History, examination, medical management, documentationIf this lady was clearly dying and had multiple co-morbidities, they should have documented this, made the lady comfortable and called the family in ABGs, arterial blood gases; CPAP, continuous positive airway pressure; CXR, chest radiograph; ECG, electrocardiograph; GTN, glyceryl trinitrate; IV,intravenous; Neg, negative; Pos, positive.
similar across the two groups of cases, although there Relationship of positive and negative comments are a higher proportion of ‘satisfactory' cases and a to overall care scores somewhat lower proportion of ‘good' cases among Table 5 summarises the relationship between the people who died than in the survivor group. There overall care score for each case and the types of were no statistically significant differences between comment (whether positive or negative judgements) the two groups (χ2=9.800; df=5; p=0.0811).
provided by the reviewers for each of the phases and Reviewer commentary on care judged short of best practice Overall care score 3 Reviewer comments (Pos or Neg)/category Admission phase score 4 pH 7.436Good history taken of COPD symptoms and normal functional status, alternative diagnosis of PE and CCF not excluded in a patient with risk factors for both Clinical cardiovascular exam not thorough (no mention of JVP, pedal oedema, chest expansion, sputum characteristics) Initial management phase Patient received appropriate treatment for COPD (ie, steroids, antibiotics and nebulizers), however the CXR result was never recorded ?looked at Later management phase Although the patient was recorded to be clinically improving 2/7 post admission and team were considering early discharge, his ABG was not improving and patient's SOB+tachypnoea attributedto anxiety, pt (patient) gradually deterioratedPatient changed to inhalers too soon Seen appropriately by respiratory team, frusemide (furosemide) and aminophylline infusion appropriately suggestedNursing staff inappropriately withheld oral medications as they thought he was nil by mouth Developed severe type 2 respiratory failure but no decision on resus status made until patient very unwell. This needed to be made by on call teamEarlier referral to ITU and I.v aminophylline may have changed outcome Good chest physio(therapy) input Overall care score 3 Patient appropriately treated initially with nebs, antibiotics and steroids however patient's treatment plan not escalated until he was in severe type 2 respiratory failure NIV/ITU not considered in this patient ?why-he had no other co-morbidities and no previous hospital admissionsResus decision made inappropriately by on call team when patient very unwell ABG, arterial blood gases; CCF, congestive cardiac failure; COPD, chronic obstructive pulmonary disease; CXR, chest radiograph; ITU, intensive therapyunit; I.v, intravenous; JVP, jugular venous pressure; nebs, nebuliser; NIV, non-invasive intubation; PE, pulmonary embolus; pt, patient; resus, resuscitation;SOB, shortness of breath.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research Quality of care overall: score comparisons between people who died and those who survived Care fell short of good practice Good or better care Satisfactory care numbers ofreviews Quality of care scores 1 (unsatisfactory) 6 (very best care) People who died (%) People who survived (%) *Two cases from the group of 119 people are not included in this analysis due to incomplete data. Both had phase scores of 5 or 6 with no negativecomments, but for each the overall care score was missing, so they could not be grouped by overall care score.
χ2=9.800; df=5; p=0.0811.
for overall care. There was a significant association related to a qualitative judgement that suggested a between the total number of positive and negative lower quality of care had occurred (see, for example, comments and the overall scores (χ2=205.50; df=5; the case in table 3).
In the care score range unsatisfactory (1) to falling Categorisation of comments: implicit and explicit short of best practice (3), the proportion of negative judgements about care quality comments outweighs the positive comments. When Table 6 summarises the numbers of comments the care is rated from satisfactory (4) to very best care grouped by category (category B: implicit judgements (6), the positive comments increasingly outweigh the of care; category C: explicit judgements of care) and negative. Generally, the positive to negative ratio of comment type ( positive or negative) for each overall comments for each phase remains stable across each overall group score band. So where the overall score is Results in table 7 show that, overall, there were 3 or less, across each of the phases there are more more than four times as many explicit comments ( jud- negative comments than there are positive comments, gements) as there were implicit comments. For the and the reverse is true for the summary of the higher lower overall care scores (1–3), there tended to be a scores, indicating that the reviewer judgements are rather higher ratio of implicit (B) judgements than generally consistent with the overall score that was there were for the higher care scores, although the given. The ratios of positive to negative comments implicit judgements were always in the minority. This ranges between 0.28 for overall care score 1, to 21.17 trend is confirmed by a significant statistical associ- for those cases grouped by overall care score 6.
ation between the total number of implicit/explicit There are fewer comments in total in the later judgements of care and the overall care score phases of care because some patients died early in the (χ2=48.37; df=5; p<0.0001). Thus, the pattern of course of the admission. There is also some indication more explicit comments than implicit comments was in the textual commentaries that a number of seen for all quality of care scores, from 1 ( poor care) reviewers felt most of what needed to be said had to 6 (best care), indicating that reviewers were on the already been said in the earlier phase of care com- whole prepared to make explicit judgements where ments for a particular case, and so did not need to be care was poor as well as where care was good.
These results suggest that the reviewers were on the In general, the phase of care comments were more whole prepared to make the type of judgements and detailed than the overall care comments. Occasionally, explicit comments asked of them during training and however, reviewers gave an unexpectedly high score which would be valuable in a quality of care review.
Numbers of positive and negative comments per overall score χ2=205.50; df=5; p<0.0001.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research Comments by type and category and overall score Admission phase comment Early management phase Overall care comment type, category and comment type, category phase comment type, type, category and category and number Content and nature of comments Tables 2 and 3 provide examples demonstrating the Study of the individual comments showed that a range, type and category of comments made by number of B category comments contained concise reviewers in two cases. All of the comments are as technical summaries in addition to implicit judge- written by the reviewers and the scores given for each ments on the quality of care. Many of the C category phase of care are included. Reviewers were able both comments were incisive clinical observations with a to comment on the technical aspects of care and to strong view of the quality of care, especially when the take a holistic view of the overall care plan.
Table 2 is also used to demonstrate how the categor- Comments across the range of overall scores often isation of the comments was applied in the analysis.
included consideration of the broader, non-technical processes of care (eg, communication with relatives), ▸ Although the reviewer explicitly grades the documenta- as well as technical aspects of care.
tion as poor in the admission, this is only a description Of the 21 case reviews with low overall scores of the documentation without any explanation and (scores of 1, 2 or 3), 15 were accompanied by an therefore is categorised as a B level comment. In the explicit clinically relevant judgement that justified the initial management phase, however, there is a judgement low score. Some related to cases where care was gen- (very poor documentation) together with an explanation, erally poor throughout the inpatient episode, while which rates a C category.
others related to cases where a specific aspect of care ▸ When the reviewer implies in the initial management was of concern. In two of the cases, incorrect diagno- phase that it was poor practice not to take an arterial sis was the main problem, while in 12 cases there was blood gas sample (‘No ABGs and patient was tachypnoeic concern about suboptimal management. There were and hypoxic'), there is no explicit statement that this was usually multiple smaller events that were additive, unsatisfactory (and it is thus a B category comment).
rather than one main adverse event, which only ▸ A judgement on the therapy (‘pitiful dose of frusemide occurred in one of the 12 cases. Two of the 15 cases (furosemide) (20 mg IV)') is a C category comment.
were considered to have such poor record keeping as ▸ When commenting on the technical aspects of care, the to be a threat to the care of the patient.
reviewer could also be explicit about how the care Comparison between implicit/explicit and positive/negative comments Ratio of explicit (C) to implicit (B) comments Total positive comments Total negative comments χ2=48.37; df=5; p<0.0001.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research should have been managed overall, in the context of the patient's illness. This is an explicit, category C, However, structured explicit judgments can show how high quality care was provided, even if the patient has The case in table 2 also illustrates a pattern where not survived. For example, there were a number of there is a group or ‘constellation' of events which of instances where explicit comments were made about themselves may not cause severe harm but which, the quality of non-technical care such as the way taken together, can lead to harm to the patient. This information was provided to patients and their rela- pattern was also found in the main study among some tives. Conversely, when poor care occurs, the method of the patients who survived.17 can identify the points at which care fails to meet Although there are usually more negative comments expected standards, and when the situation can be, or than there are positive comments when overall care is, rescued. It is interesting to note that in table 4 the scores are low, as shown in table 5, the case in table 3 proportions of those who died and had less than satis- shows examples of how positive and negative com- factory care (about 20% of the cases) were similar to ments can be juxtaposed in each phase. In retrospect, those who survived and had poor care.
this case also raises the question of whether the During the training session, reviewers were encour- overall score of 3 was the most appropriate—it might aged to be as direct as possible in their commentaries, be argued from the level of the comment that the case and in the results overall (tables 6 and 7) there were could have been given a lower overall care score of 2 many more explicit comments than there were impli- cit comments. Nevertheless, when poor care was being described, while explicit comments predomi- Comments on good care tended to be more global nated, there was a noteworthy proportion of implicit, than those for unsatisfactory care but may also be B level, comments. Sometimes these B level comments quite explicit. Cases which demonstrate this and also were about documentation (which was not in the C how a single adverse event may change the reviewer's category) or concerned missed tests which the overall consideration of the case are included as add- reviewer listed and did not specifically make a judge- itional material (see online supplementary tables S7 ment upon (eg, ‘No ABGs'; see table 2). It may be that in this case the reviewer felt that the result said it Some of the reviewers in this study were more all and that an explicit comment was superfluous. On ‘explanatory' than others, so that, in some cases, the the other hand, it could also be that some reviewers number of comments may reflect individual style might have felt uncomfortable about making direct rather than the strength of the comment. For comments about very poor care.
example, comments such as ‘good care' or ‘unclear With the hindsight of these results, and when under- treatment' are short explicit judgements without taking reviews such as this in health service settings, further detail, while other reviewers are more exten- training should include discussion of an initial sample sively explicit.
of commentaries and scores with each reviewer to Of the 63 case reviews (54% of the total number of assist in maximising the number of explicit comments.
mortality reviews) that scored most highly (5 or 6), 52 Of course, training might identify some reviewers who were accompanied by a short explicit comment in the do not feel able to make explicit comments and so overall care section indicating that all key aspects of would not be suitable for this type of review.
care had been good or excellent (eg, ‘well looked The phase of care structure also contributes to an after') and in 16 of the 63 reviews there were com- understanding of how care may vary, and at what ments about the inevitable outcome of the case point. Interestingly, a phase of care approach has also despite the good care received.
been used by Shannon and colleagues in a review ofcardiac surgical care,18 albeit in a rather more struc- tured system with distinct changes in physical settings.
In this study we have shown that physician reviewers In the context of assessing whether death was a pre- are able to use structured review to make implicit ventable outcome, Hogan et al8 used a four-phase quality and safety judgements, write explicit short model to identify adverse incidents: initial assessment, care commentaries and give coherent matching quality treatment plan, ongoing monitoring and preparation of care scores. Quantitative scores and qualitative for discharge. Under the conditions of a service comments corresponded well, indicating that phys- review, a three-phase model might be easier to ician reviewers can appropriately score the quality of manage, but either a three or four-phase approach care on a rating scale.
would be appropriate.
These physician reviewers could identify and Qualitative comments from the reviewers were explain both technical and non-technical aspects of useful in that they could succinctly identify what was care, and could rank these aspects of care using a set done badly in poor cases. Such short explicit judge- of ‘benchmark' scores, ranging from very good care to ments could support a wider, more detailed service very unsatisfactory care. For people with complex review to assess what could be improved in a Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research particular setting or condition. Furthermore, since this structured review method assesses both process and This method is a refinement on both global implicit outcome of care, this mixed type of review, using judgement and structured implicit judgement used qualitative comments with scores, might be a useful upon a set of case notes, because it is able to provide addition to review measures which only assess out- information on aspects of each phase of care, enabling comes or are criterion based. This mixed qualitative more detailed, yet still brief, comments to show expli- and criterion-based method is published in detail citly how care may vary or be consistent with expected standards. For example, this method could In this study, assessments of the quality and safety be used to identify whether care has led to a prevent- of the care provided showed that, for over 80% of the able death, or to identify good quality of care even patients who died, care was rated at least satisfactory though the overall outcome is failure to survive. Thus, and, for approximately half of the cases, care was although the study did not explicitly seek to judge a judged to be of high quality. The processes of care death as preventable, as did Hogan et al,8 review described enable a qualitative judgement to be asso- training could straightforwardly include an explicit ciated with an objective score that is explicable to, judgement commentary about whether a death was and understandable by, a wide range of people and preventable or was not preventable (which some of would also be understood by the public. However, the study reviewers actually provided).
having graded a case as poor or not, there is the Results also show how explicit written judgements added advantage that the structured comments also and quality of care scoring can be used together and provide the reasoning behind the judgement in a thus may offer a range of case note review methods format to which clinical teams and individuals should for use under differing circumstances, together with be able to respond in a review process.
opportunities for providing training and assessment of‘reviewer quality'.
Structured judgement review provides the frame- In this study, the 40 reviewers were all volunteers who work for a quality of care review that can be used by undertook the work in their own hospitals. Although clinical leaders and quality managers to identify there might be concerns about the impartiality of potential priority areas for evaluation. For example, using internal review teams, results have shown that scoring allows for a screening of the overall care reviewers can make incisive short notes (commentar- quality for a case overall, or can identify issues in a ies) about quality of care, and can critically review particular phase of care, say at admission or initial care provided in their own hospitals.
management. Explicit comments allow exploration of Internal review teams have also been used in other particular aspects of care, for instance where good settings. Sharek et al19 commented on the strong per- treatment plans might be inadequately implemented.
formance of hospital-based internal review teams, For these purposes it is not necessary to analyse albeit when using more structured, criterion-based whether comments are implicit or explicit. The data trigger tools to identify adverse events.
collection framework is straightforward, has been pre- Although it could be argued that two reviewers per viously published and is easily available.9 case might enhance the quality and depth of a case Who should act as the reviewers? Because of the note review, there is some evidence to suggest that complexity of illness often presented in hospital set- this use of a more intensive resource does not neces- tings, studies of adverse events have used experienced sarily improve the review process. While we were able generalists with some specialist support.8 This struc- to show in our development study that there was rea- tured implicit review method could be used in a sonable coherence of quantitative care scores and similar way either with in-hospital teams or by visiting criterion-based scores between physician reviewers,9 13 teams from other hospitals. We do not know whether other work by Hofer and colleagues found that mul- the review results would be better when undertaken tiple reviewing of the same set of case notes did not by experienced specialists rather than by the reviewers enhance the results.20 in our study. However, our results have shown that Finally, it is important to recognise that there are this form of review can be undertaken by specialists at limits to the extent to which the quantitative analysis a senior level in a training programme—so increasing of the reviews can be used. For example, averaging the pool of trained senior reviewers in a hospital— phase scores across each case, to determine whether and thus the method offers the opportunity for early phase score averages are similar to the overall care review of the care of people who die in hospital so score, is not appropriate. An example of this can be that, where necessary, timely quality improvement found in online supplementary box S6 where care was lessons can be learnt.
judged excellent until moments before the patientdied. The value of this current study is that the context Contributors AH: lead on the conception and design of thestudy, lead on the analysis of the qualitative mortality review and the basis for any quantitative score can be found in data and principal author of all drafts of the paper; JEC, MP, the phase of care comments associated with each score.
AM, PAB: study conception; JEC, MP, AM, PAB, KLC: study Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 Original research design; JEC, KLC: data collection and analysis of mortality 9 Hutchinson A, Coster JE, Cooper KL, et al. Comparison of review data; MP: interpretation of mortality review data; AM: case note review methods for evaluating quality and safety in lead on the qualitative analysis framework; PAB: qualitative health care. Health Technol Assess 2010;14:1–170.
analysis framework and statistical analysis for the quantitativeanalysis; JEC, KLC, MP, AM, PAB: contributed to all drafts of 10 Lilford R, Edwards A, Girling A, et al. Inter-rater reliability of the paper. All authors have given approval for this version of case-note audit: a systematic review. J Health Serv Res Policy the paper to be published. AH acts as guarantor.
Funding This project was funded by the National Institute for 11 Hayward RA, Hofer TP. Estimating hospital deaths due to Health Research Health Technology Assessment (NIHR HTA) medical errors: preventability is in the eye of the reviewer.
Programme ( project number RM03/JH08/AH) and was published in full in Health Technology Assessment 2010;14 12 Mohammed MA, Mant J, Bentham L, et al. Process and (10):1–170. The views and opinions expressed herein are those of the authors and do not necessarily reflect those of the HTA mortality of stroke patients with and without do not resuscitate programme, NIHR, NHS or the Department of Health.
order in the West Midlands, UK. Int J Qual Health Care Competing interests None.
13 Hutchinson A, Coster JE, Cooper KL, et al. Assessing quality Provenance and peer review Not commissioned; externallypeer reviewed.
of care from hospital case notes: comparison of reliability oftwo methods. Qual Saf Health Care 2010;19:e2.
14 Kahn KL, Rubenstein LV, Sherwood MJ, et al. Structured implicit review for physician implicit measurement of quality of care: development of the form and guidelines for its use.
1 Dr Foster Ltd. The Dr Foster Hospital Guide 2009: how safe is California: Rand Corporation, 1989.
your hospital? London: Dr Foster Ltd, 2009.
15 Pearson ML, Lee JL, Chang BL, et al. Structured implicit 2 The Stationary Office. Report of the Mid Staffordshire NHS review: a new method for monitoring nursing care quality.
Foundation Trust public inquiry. London: The Stationary Med Care 2000;38:1074–91.
Office, 2013. ISBN 9780102981476.
16 Zegers M, de Bruijne MC, Wagner C, et al. Design of a 3 Thomas EJ, Brennan TA. Incidence and types of preventable retrospective patient record study on the occurrence of adverse adverse events in elderly patients: population based review of events among patients in Dutch hospitals. BMC Health Serv medical records. BMJ 2000;320:741–4.
Res 2007;7:27.
4 Wilson RM, Runciman WB, Gibberd RW, et al. The Quality in 17 Hutchinson A, McIntosh A, Coster JE, et al. When is an event Australian Health Care Study. Med J Aust 1995;163:458–71.
an event? Data from the quality and safety continuum. In: 5 Baker GR, Norton PG, Flintoft V, et al. The Canadian Adverse Hignett S, Norris B, Catchpole K, et al. eds. From safe design Events Study: the incidence of adverse events among hospital to safe practice. Cambridge: The Ergonomics Society, 2008.
patients in Canada. CMAJ 2004;170:1678–86.
ISBN 978-0-9554225-2-2, 179–183.
6 Soop M, Fryksmark U, Koster M, et al. The incidence of 18 Shannon FL, Fazzalari FL, Theurer PF, et al. A method to adverse events in Swedish hospitals: a retrospective medical evaluate cardiac surgery mortality: phase of care mortality record review study. Int J Qual Health Care 2009;21:285–91.
analysis. Ann Thoracic Surg 2012;93:36–43.
7 Zegers M, de Bruijne MC, Wagner C, et al. Adverse events and 19 Sharek PJ, Parry G, Goldmann D, et al. Performance potentially preventable deaths in Dutch hospitals: results of a characteristics of a methodology to quantify adverse events retrospective patient record review study. Qual Saf Health Care over time in hospitalised patients. Health Serv Res 8 Hogan H, Healey F, Neale G, et al. Preventable deaths in care 20 Hofer TP, Bernstein SJ, DeMonner S, et al. Discussion between in English acute hospitals: a retrospective case record study.
reviewers does not improve reliability of peer review of BMJ Qual Saf 2012;21:737–45.
hospital quality. Med Care 2000;38:152–61.
Hutchinson A, et al. BMJ Qual Saf 2013;22:1032–1040. doi:10.1136/bmjqs-2013-001839 A structured judgement method to enhance
mortality case note review: development and
Allen Hutchinson, Joanne E Coster, Katy L Cooper, Michael Pearson,
Aileen McIntosh and Peter A Bath
BMJ Qual Saf 2013 22: 1032-1040 originally published online July 18,2013doi: 10.1136/bmjqs-2013-001839 Updated information and services can be found at: Supplementary material can be found at: These include: This article cites 16 articles, 8 of which you can access for free at: Receive free email alerts when new articles cite this article. Sign up in the box at the top right corner of the online article. To request permissions go to: To order reprints go to: To subscribe to BMJ go to:


