Lumpers, Splitters and STarTers

In recent years there have been many debates about the disappointing results from clinical trials of treatments for non-specific low back pain. One argument has been about the targeting of treatment for back pain. Many folk have argued that trials which apply a one-size-fits-all treatment fail to show a reasonable effect because amongst those who should be given that treatment there are others for whom the treatment is inappropriate and so the true effect is washed out. The proposed solution is to more effectively target treatments to specific groups of patients in trials, or only include the patients who are likely to be responders. Let’s call the people who hold this position “the splitters”.

Then there are others (myself included) who wonder if the existing evidence has ever really pointed towards sub-grouping as an explanation for the poor performance of treatments. We tend to believe the message from the existing trial evidence, that current treatments are a bit rubbish. We wonder whether subgrouping might be a case of “searching for the pony”*. We can’t really see a valid way of choosing which treatment is likely to work for whom and we are often critical of some of the attempts that have been made to identify specific responders, wondering if they might be a bit statistically “dodgy”. Certainly we don’t really feel that we are close to a diagnostic model in back pain that could guide this process. Let’s call folk like me the “lumpers”.

There are a number of ways to go about answering this question and ways in which one might try to subgroup patients. You might try to establish valid diagnostic subgroups in back pain (good luck with that), you might mine the data from existing trials to see what factors seem to predict a good outcome (a minefield of potential statistical wrongness) or you might look at the best data available on the factors which predict poor outcome and then group folk by their risk of not getting better, tailoring the amount and type of treatment to that risk. Who else thinks that’s nifty and a question worth asking?

A research team from Keele University in the UK, led by Jonathan Hill have tested just such an approach in a massive (851 participants!) and impressive RCT. This is big ambitious research: precisely the kind we need to draw reliable conclusions. The fact that Dr Hill is a physiotherapist should be pleasing to many of us. The STarT Back Trial used a tool based on established risk factors to group patients with back pain into low, medium and high risk of poor outcome and then randomised participants to care that was tailored to this risk or to standard physiotherapy care. In the tailored care group patients assessed as low risk had just one session of advice focusing on being active and exercising through a therapist, a video and a pamphlet and were only referred for more treatment at the discretion of the referring physiotherapist (kind of usual physiotherapy care), medium and high risk patients were always referred for further physiotherapy led treatment, with high risk patients specifically receiving psychologically informed treatment.

So was this the good news trial for back pain that we’ve been looking for? The results showed significant improvements in disability and pain in the intervention group overall. Those in the medium and high risk groups who received stratified care did better and those in the low risk group did no worse than those receiving standard care. An important message that arises is that for low risk patients one session of advice is not inferior to the same advice plus usual physiotherapy care. More is not always better.  On top of this the new approach seemed to cost a little less.

But there are some buts. The effects are small at the end of treatment (at 4 months around a 2 point change in disability from a baseline score of about 10 overall), and are smaller still at 1 year (around a 1 point change compared with usual physiotherapy care that is only significant in the medium risk group).The changes within the groups look more impressive but these can not be confidently attributed to the treatment. In fact you would need to treat about 10 patients for one more patient who received targeted care to achieve a “good outcome” (a 30% improvement in their disability score) at one year compared to usual physiotherapy.  In the group at highest risk of not getting better we can estimate that they would still have an average disability score of around 10/24 a year after treatment.

To be clear these are not reservations about the trial which is a real achievement. Here is evidence that sub-grouping can influence outcome, a little bit. But the benefits are modest and suggest to a grumpy lumper (or “grumper”) like myself that subgrouping, at least in this way, is not likely to make a big dent in the problem of back pain just yet.

About Neil

Neil OConnellAs well as writing for Body in Mind, Neil O’Connell is a researcher in the Centre for Research in Rehabilitation, Brunel University, West London, UK. He divides his time between research and training new physiotherapists and previously worked extensively as a musculoskeletal physiotherapist. He also tweets! @NeilOConnell

He is currently fighting his way through a PhD investigating chronic low back pain and cortically directed treatment approaches. He is particularly interested in low back pain, pain generally and the rigorous testing of treatments. Link to Neil’s published research here. Downloadable PDFs here.

*Searching for the pony

There was once a man with twin daughters, one irrepressibly optimistic, the other incurably pessimistic. He felt that neither was a good approach to life, so on their birthday he presented the pessimist with a roomful of fantastic toys and games, and the optimist with a roomful of manure and left them to it

When he came back a while later, he found the pessimist holding a broken toy, crying that sooner or later, all the rest would also break. The optimist, however, had found a shovel and was tackling the manure with gusto. When asked why she was bothering, she replied that: “With all that shit, there’s got to be a pony in there somewhere.”

(this version of the apocryphal story copied from a comment from the bad science forums by a poster “DrJG” )


Hill JC, Whitehurst DG, Lewis M, Bryan S, Dunn KM, Foster NE, Konstantinou K, Main CJ, Mason E, Somerville S, Sowden G, Vohora K, & Hay EM (2011). Comparison of stratified primary care management for low back pain with current best practice (STarT Back): a randomised controlled trial. Lancet PMID: 21963002

Hancock M, Herbert RD, & Maher CG (2009). A guide to interpretation of studies investigating subgroups of responders to physical therapy interventions. Physical therapy, 89 (7), 698-704 PMID: 19465372

Wand BM, & O’Connell NE (2008). Chronic non-specific low back pain – sub-groups or a single mechanism? BMC musculoskeletal disorders, 9 PMID: 18221521



  1. Neil,
    You mean human beings are more likely to remember their successes and forget their failures? Say it ain’t so!

  2. Neil O'Connell says

    Hi Arco,

    It is worth considering that the idea that most RCTs in back pain apply a “one size fits all” treatment to a heterogenous problem is a popular one but actually does not reflect the current state of the data. Many large trials, and many more in recent years take a more pragmatic approach wherein therapists are allowed to assess and treat as they wish.This is the case in large part for the STarT trial. So within this process one might imagine that the physio’s have gone through the process you describe, tried to fit the treatment to the patient, delivered individualised care and effectively (say it quietly) subgrouped within the paradigm in which they conceptualise back pain.

    Nevertheless what we don’t see yet from these more pragmatic trials is a big leap in effect size. I am sure that many expert practitioners feel that they are able to distinguish types of back pain, deliver targeted treatment and get better outcomes than those reflected in the trial data. So therefore the problem must be with the trials? Or perhaps we fall into the numerous perceptual traps when we reflect on our own successes that create an illusion of efficacy?

    arco Reply:

    Hi Neil,

    Don’t get me wrong, I agree, but as the time passes by, the more I see is new “subgrouping” approaches, where the important thing is matching criteria to treatments. Here there is a lot a failure, and please we can take a look to the headache classification according to the International Headache Society, and how useless it is!

    I’m sure there is going to be a lot of different results, as many as different clinical presentations. I’m not sure how thorough the assessment has been. Not long ago we had a discussion regarding the use of centralization to match patients into the exercise category, based just in 10 reps, according to Fritz suggestions. She may be making a new centralization system and the kiwi guy was wrong, ) don’t know, but this is not doing things properly, because you’re basing your hypothesis in wrong foundations.

    So RCT’s may show proper results, if we apply proper hypothesis and respect and define the foundations of what we say, instead of trying to show what we wish it happens.

    Currently, I don’t find researches in NSLBP and many other non specific disorders in the body, are being very well performed.

    I’m a clinician not a researcher, and I don’t know how reserchers can assess my job, and viceversa. Are we looking for the same ponies or elephants?

    Your post title, is controversial, you’re saying there are subgroups of people trying to find their own truth regarding NSLBP, but this is again a barrier and barriers mean yellow flags.

    We are the big yello flag holders/wavers, and because we act like that, we’re not able to do the proper things, and the discussions will go on for ever.

    We should use treatments as tools, based on assessments. And assessing doesn’t mean giving a treatment to a patient, treatment is a consequence. So we may be looking and reserching the best way of assessing instead of the best way of treating. Otherwise we’ll be only looking for ponies. The elephant will dye!


  3. arco,

    I believe the “pathology” we’re talking about here and what the Hill article addresses is non-pathological pain (i.e. no discernable tissue damage), which in this case is located in the low back region.

    So, when you suggest (with your above recitation of how to perform a clinical examination) that many of us are insufficiently skilled to identify all the “pathologies” present in a patient with non-pathological pain, I’m not sure what exactly you’re talking about.

    But, I have to say, I do find your lists rather condescending.

    arco Reply:

    Dear John,

    Yes the pathology I meant is the one related to the NSLBP.

    When I said about the skills, I meant that assessing/treating patients involves a lot of features related to the therapist, and we need to be skilled to get the picture from every patient. We’re dealing with people not with flesh and bones.
    I’m not saying I’m skilled, or the others aren’t, just trying to show the difficulty of putting all this in a research scenario.

    Hope that’s clarifying for your quest


  4. Subgrouping, as far as I know means targeting a specific treatment to a specific patient suffering a disorder (pain, disability,…).

    Subgrouping is not trying to put every patient in MY favourite category.

    Subgrouping is not to put a patient not matching MY favourite caegory into a “psychosocial” box.

    Treating means to apply a specific therapeutic input to a specific diagnosis.

    Each and every diagnosis come after an assessment.

    The assessment must be appropriate to that problem. We can not diagnose the back, assessing the heart, although a vascular problem can give you back pain.

    Assessment means a good history (HIS-STORY) taking to collect as much information from the patient. Taking a history means the clinician should be skilled enough to get the information, and the information must be relevant to the patient complain. The questions must be opened, and not leading to MY favourite questions.

    The aim of a history taking is to try to get some relevant information:

    1- Red flags/contraindications to our approach
    2- Yellow flags/precautions
    3- State of the disorder
    4- Stage of healing
    5- Baselines
    6-Hypothesis/possibilities (differential diagnosis) – Examination strategy/ies

    After that we can carry our examination. That means we have to tick (rule out/in) our differential diagnosis list

    1- Is the examination appropriate to the patient complain?
    2- Do I need to test neurological?
    3- Do I have all my baselines?
    4- Apply my strategy
    5- Re-check my baselines
    6- Do I have a provisional diagnosis?
    7- Do I need to further test or do I have a definite answer
    8- Has the patient understood what we’re doing and got the right thing to do?
    9- Has he been compliant?
    10- Has the treatment affected his problem? Follow-up
    11- Can we confirm the provisional diagnosis?
    12- Do I need to deal with the yellow flags, or they’ve just dissapeared as the patient improved?
    13- The patient doesn’t improve as expected. Is he doing what we asked him to do? No? Why? Education? Not able to do it/too demanding/difficult/other setbacks?
    Yes? Wrong diagnosis? Yeelow flags that need to be addressed? Chronic pain status?

    Do we really believe, that just matching clinical presentations/criteria are we going to be able to subgroup patients into a very specific treatment?

    Do we really believe, we are going to be able to find just one problem bothering the patient?

    Should we be treating patients, pathologies, or should we instead shift to treat patients with a pathology?

    Is the pathology just a pathology, or is it also many other things around the patient? (Yellow flags)

    It will be a waste of time to keep trying to demonstrate specific problems/treatments. Only if we are able to treat patients with pathologies, we will be changing something.

    And now, how many of us are so good/skilled asking questions and putting this into a clinical reasoning.

    How many of us are so good/skilled to perform a good examination?

    How many examinations are looking for the right thing?

    Researchers are trying to find something that is impossible to achieve from a research point of view.

    A lot of results with our patients rely on the clinican skills. People with the same training, can show very different skills.

    That’s why I think we have a lot of very different diagnosis, beliefs, slumpers, pony searchers and ignored elephant in the room.

    How can we change this.

    I’ll buy 2 chocolate bars!!!


  5. I read this blog entry with interest last week because it reminded me of some research I did for a paper on classifying cervicogenic headache (CGH) a couple of years ago. For the reasons Neil summarizes above, I’ve never been satisfied with the currently popular methods of classifying pain problems, and I wanted to address those pitfalls in the paper I wrote describing an alternative classification scheme for patients with CGH.

    During my research I came across some work by Denison et al describing a tool they developed, the Pain Beliefs Questionnaire, which was designed to identify patients’ “risk profiles” based on several domains that have been linked to prolonged disability. (Denison E, Asenlöf P, Sandborgh M, Lindberg P. Musculoskeletal pain in primary health care: subgroups based on pain intensity, disability, self-efficacy, and fear-avoidance variables. J Pain. 2007 Jan;8(1):67-74.)

    It’s becoming increasingly clear to me that disability risk profiles are what we should be looking at to help guide our treatment interventions, certainly not physical characteristics, but also not even the vaunted predictors based on treatment response aka CPRs. It’s heresy to say, but I think the Treatment-Based Classification System initially described by Delitto et al should be scrapped altogether.

    Anyway, I wanted to bring attention to this work being done in Sweden by some other PT researchers using a classification approach to persistent pain problems that is much more consistent with current pain theory than what we’ve seen so much of in the PT/rehabilitation literature.

    Neil O'Connell Reply:

    Thanks John,

    I think you are right – if you wish to classify in should be on important prognostic indicators based on sound evidence. Thats what STaRT has done but I am not sure the findings suggest it achieves much added value. But it was definitely a question worth asking.

    Often subgrouping is used by people to defend their cherished treatment/ belief system. I guarantee at some stage at some physio conference any day now someone will stand up and use the STaRT trial as evidence of the need for subgrouping, in an effort to defend a manual therapy approach, or a muscle imbalance approach etc fromthe threat of negative trials, all the while ignoring the fact the STarT didn’t try to subgroup by “structure” or “movement dusfucntion” or diagnosis, or the fact the the effectes are really very small.

    Let me know if anyone hears it and I’ll by you a chocolate bar!

  6. Any elephant in the room? It may not be all about searching ponies. It may be just looking for the big guy!

  7. Hi Neill
    very nice review and looks like a promising research methodology there – It has reopened the wounds of Research Methodology & Statistics Modules too long ago – but rekindled an interest nonetheless!

    I think the idea of matching the care to the individual is a marvelous idea (although not entirely novel) and I guess the question then becomes what care and does it matter?

    I,m often impressed how some LBP patient respond to quite simple measures (without debating mechanism) having failed trial courses of NSAISD’s for week’s as current best practice guidelines suggest. More often they self-refer to Physiotherapy suggesting disillusionment with the Healthcare system.
    I don’t profile these cases other than to manage as normal but I guess they are in a group that need some facilitated recovery?

    Conversely, there are patient’s who don’t want to be discharged and insist on periodic monitoring. I have gone well past the ethical /moral debate about this management strategy and have concluded that this group like this type of arrangement and they feel better for it – even if I don’t feel I,m doing anything particularly special. I guess they are another type of group?

    What do you think of the practicalities for care delivery and patient selection in primary care settings? I,m still not comfortable with questionaire profiling on initial contact but always vigilant for behavioral clues and subliminal signs that ditcate the communication.


    Neil O'Connell Reply:

    Thanks David,

    In terms of the practicalities of delivering primary care I think a commissioner of care would have to look at the effect sizes from this methods and take a view on cost-efficacy. The STarT team have done quite a lot in validating their tool, and I see it is now being validated in countries other than the UK. No-one likes the idea that a standardised form might do better than their expertise but thes results are suggestive that it might, a little bit.

    My personal feeling is that while savings were made there isn’t a big change in the offing for the patients. That’s pretty important in terms of trying to keep costs down (and that’s probably never been as important as it is right now) but in terms of the bottom line: the probability of getting significantly better (and by that I mean clinically significantly better) the results are not so compelling (in my view).

  8. Hi Neil,

    As an Australian physiotherapist who works with people injured at work quite a lot I find this study really interesting.
    We use a modified OMPQ test to assess for the risk of long term pain changes in all clients if they are 4 weeks post injury and not pain free. The test result then gives us a guide to preferred future treatment options depending on the score.
    A high score leads to physio plus psychology, a medium score leads to just physiotherapy but with an eye on changes that may precipitate the need for psychological intervention and a low score means carry on as normal.
    We are also encouraged to decrease hands on treatment either prior to the 4 week point or at that time and start to look at more exercise and self managed treatments for all patients (to hopefully decrease dependence on “treatment” and stop people looking to be “cured” while learning to help themselves.
    The system doesn’t always work out the way we would expect but it does tend to pick up those more difficult cases earlier and tends to lead to better results long term.
    In the case of injured workers we have found that a major contributor to long term risk of chronic pain is job satisfaction. Someone who hates their job and does not want to go back to work has a higher risk of not getting back to work than someone who loves their job and wants to get back ASAP.


    Michael Shilson-Josling

    Neil O'Connell Reply:

    Hi Michael, thanks for your comments,

    Here you have one of those nice examples where a big trial actually applies (to an extent – your population is likely a bit different) to your practice. What the data tells us is that your approach probably does increase the chance of a better outcome, a little bit.

    What is disappointing about the trial results though is the size of the difference. A no-treatment group would have been nice for us to get an idea of how much any treatment really affected outcome but in a trial of this size one can’t have everything!

  9. Hi Neil,

    Great post! A question for you:

    These cortical changes that appear to occur in CLBP might suggest that chronic pain is a disease unto itself (a common mechanism that might be a view shared by grumpers). However, are these changes a consequence of persistent nociceptive barrage or a potential mechanism that ’causes’ chronic pain. The latter would suggest a ‘disease’ and that addressing these cortical changes would lead to resolving or at least better attenuating pain. However, if the former were the case then the mechanism would still be unknown and the cortical changes simply represent indicators of CLBP. I gather much of the changed brain research is cross-sectional so suggestions of causation are difficult to make, but I’m interested to read your opinion on whether these cortical changes precipitate the transition to chronic pain or if they are simply indicators. Alternately, since physios often treat impairments instead of causes, does the distinction matter?


    Neil O'Connell Reply:

    Hi Geoff,

    GREAT question – really. You have put your finger on the crux of the problem. All of these central changes – what do they really mean? The data is exclusively cros-sectional and for much of it we don’t actually have a clear handle onwhat the observed changes represent, structurally or functionally. So we might be looking at causal factors, effects of chronicity or epiphenomena. Given that chronic pain is essentially an experience and the brain is the seat of experience it would be odd if that experience were not reflected in measures of brain function, whatever the driving cause.

    The model we presented in 2008 was (as we admit) speculative. I think given the failure to date of a spinal structural model (which itself has many possible interpretations) it is worth testing this kind of model. But we shouldn’t begin to assume that it is correct at this time.We simply don’t know enough.

    Geoff Reply:

    Hi Neil,

    Thanks for the reply. I applaud your group for speculating, theorizing and working toward models that can be tested. I feel we are often quick to the RCT without developing enough theory to guide results interpretation. I enjoyed that 2008 paper and wish there was more room for discussion/debate papers in peer-review journals.