Sunday, October 15, 2006

Flawed methodology, flawed results. Worst of the Lancet.

I feel quite confident that anyone questioning the number of violent deaths reported in the recent Lancet article Mortality after the 2003 invasion of Iraq: a cross-sectional cluster sample survey will be said by some (those for whom a bigger number is better, perhaps?) to be merely partisan. Pointing out to these people that Les Roberts, who "instigated the study and assisted with the analysis and interpretation of the data and the writing of the manuscript" is blatantly partisan himself, publicly says that he is against the war, and campaigned as a Democrat for a seat in the House of Representatives before pulling out in May of this year, is unlikely to get past that.

Pointing out that 600,000 violent deaths is absurd on its face, being roughly twice the combined death tolls at Hiroshima, Nagasaki, and Tokyo, is unlikely to get you anywhere. Noting that attributing any increase in deaths to the invasion of Iraq, as opposed to the actions of fanatics and insurgents, is suspect, is unlikely to be well-received. In fact, you might just as well resign yourself to the fact that you're talking to a true believer in the boundless evil of the American military, and accept that there is no way that you will ever convince them that the true death toll is anything less than whatever the highest number they've ever seen propounded by "authority" is, or that blame accrues anywhere other than to their preferred targets for the deaths that did occur.

That said, let's take a closer look at the methodology used and the results obtained by this survey, and see if we can't come to some conclusions regarding the validity of each.

First, let's look at the methodology used. This was a face-to-face survey, which, of course, required that the researchers identify the people they would actually visit. They took a list of the 18 governorates within Iraq, and allocated 50 "clusters" across these areas on the basis of population - the governorate with the largest population (Baghdad) was allocated 12 clusters, less populous ones got proportionately fewer (the two smallest were apparently not sampled at all due to "miscommunications"). This accomplishes geographic and administrative-unit stratification, and is accepted practice.

Within each governorate there are smaller administrative units. Dividing the size of each smaller unit by the total population of the governorate gives the proportion of the population of that governorate in each unit. Units were selected randomly, but with more populous units having a greater probability of being selected. This causes heavily populated areas to be sampled more often than rural ones; it's a trade-off between representativeness of the sample across regions of different population density and representativeness for the population as a whole. They could have stratified again based on population density, which would have given them more insight into death rates in the less densely populated areas, but at the cost of less accurate estimates for the densely populated ones. They chose door "B". That's fine - as long as they accounted for that in their statistics. As far as I can see, though, they did not. In fact, the words "urban" and "rural" don't occur at all, except in the title of a referenced article (Mortality and use of health services surveys in rural Zaire) which is cited in support of the idea that they have actually undercounted deaths.

Next, they randomly chose a "main street" within each selected administrative unit from a list of all main streets. It isn't clear what the definition of a "main street" is, but one criterion appears to be that it had to be crossed by residential streets. What impact this had on selection isn't entirely clear, but certainly it rules out any areas in which there are no crossing residential streets. By doing so it further biases selection towards urban areas, but to a degree dependent on the relative proportion of crossing side streets to main streets in an area. Small towns and rural areas might completely lack main streets by this definition, for all I know.

The next step in the process was to randomly choose a crossing residential street from among all those crossing the selected main street. Again, it isn't clear exactly what impact this has, but again, there appears to be an implicit condition applied - that the crossing residential street have at least 40 residences on it. Again, because smaller towns and rural areas are less likely to have any qualifying crossing streets by this criterion, there appears to be a bias toward heavily populated areas.

Because between 30 and 50 percent of Iraqis live outside of urban areas, this cumulative bias could have a significant impact on the results of the survey if mortality rates differed significantly as a function of the urban/rural divide.

The final step was to number the residences on the selected residential street and randomly select a starting home, then proceed to the next available house until 40 homes within each "cluster" had been sampled. All studies make compromises between collecting the best possible data and reality, but this one is a heck of a compromise. If you are looking at deaths due to car bombs, for example, people living near one another have a much higher chance of having been similarly impacted - half the people in the neighborhood could have lost a family member to a single bomb. Select one unlucky cluster, and you might wind up with an estimate that is far too high. Miss the unlucky clusters altogether, and you'll wind up with an estimate that is too low. But that's just adding error. As long as you account for this appropriately when you do your statistics, expanding the reported confidence interval, you're fine.

Note here that the researchers don't say whether they have done this. They never talk about deaths within a cluster being correlated, except across their "14-month" (really, one 14-month and two 13-month) time periods used to look at change in mortality rates over time:
The SE for mortality rates were calculated with robust variance estimation that took into account the correlation between rates of death within the same cluster over time.[14]
But that's a different issue. We're talking about accounting for correlated deaths that occurred at the same time. Did they account for that?

Now here's where we get to the most serious problem with the study. Two survey teams went house to house, collecting data, under the supervision of "field manager" (and co-author) Riyadh Lafta. The authors report that Lafta made "decisions on sampling sites" - which means that the choice of homes for the survey wasn't as random as it would have been under ideal circumstances. It isn't clear how broad this power was, but it at least extends to the "responsibility and authority to change to an alternate location if they perceived the level of insecurity or risk to be unacceptable". Certainly there should be no expectation that the interviewers would expose themselves to gunfire, but the vagueness of this provision causes concern. No explanation of how the alternative sites were chosen is provided; one example is provided:
In Wassit, insecurity caused the team to choose the next nearest population area, in accordance with the study protocol.
but "in accordance with" may mean simply that choosing the next nearest population area was allowable under the protocol, not that it was mandated by it, and it doesn't tell us how the residences within that area were chosen at all.

How often was Lafta forced to use this discretion? We don't know, because the authors don't say. They should, because it is a crucial factor in verifying the validity of the study. If the interview teams stuck to surveying randomly selected households in almost all cases, then the sample of households, at least, was nearly as random as the methodology for selecting them allowed. On the other hand, if the interview teams were forced to find new areas to survey with any degree of regularity, it is possible that they selected areas (intentionally or unintentionally) that looked like promising candidates for previous violence.

We don't know how often this discretion was used, but there is a troubling reference to "extreme insecurity during this survey" - does this mean that Lafta was forced to use discretion in choosing survey areas with great frequency? If so, this raises grave doubts about the survey.

A few other relevant notes on the methodology used. Presumably, the interview teams were previously known to and selected by Riyadh Latah. No attempt to separate combatant deaths from non-combatant deaths was made (out of fear, apparently). Probing to establish the details of deaths (e.g. who was responsible) took place at the end of the interview. Death certificates were requested and provided in 92% of cases.

1,849 households completed the survey, accounting for 12,801 people. Less than 1% of households refused to participate, a surprisingly low number. Whether this is due to cultural norms or some other factor, we don't know.

All right, at this point, what do we know? The sample was almost certainly skewed heavily toward urban areas, though no breakdown is provided along these lines. The "field manager", Riyadh Lafta, had discretion to deviate from the randomly designated sample at any time; we have no idea how often he did so, though there is a hint that he did it frequently. Moreover, we know that the integrity of the survey depended entirely on the integrity of this man and his interview teams. It also depends entirely on the integrity of those who compiled, cleaned, and analyzed the data. Their impartiality is not beyond question. For all we know, Lafta is a Ba'ath Party member.

And what did this survey produce? We know it produced an extremely (unreasonably, in my opinion) large estimate of deaths due to violence since March, 2003. It also shows the number of deaths accelerating - each successive 13-14 month period is more violent than the last.

But most of all, it produced some really crazy numbers. Like this:

From June of 2005 to June of 2006, the survey recorded 30 deaths due to car bombs. Simply scaling that up to the size of the population of slightly over 27 million equates to about 64,000 deaths from car bombs during this period nationwide.

Now, from Iraq Body Count, we know that the average number of people killed in a car bombing (assuming that anyone was killed at all, which is not always the case, and using their "reported maximum killed") is about 5.38 - we can estimate the 95% confidence interval using bootstrapping techniques at 4.57 to 6.27. So we know we need to have about 11,800 deadly car bombs nationwide to get to 64,000 deaths. At least 11,800 total car bombs, because not every car bomb can be assumed to kill people, and some would be targeted against non-Iraqis, and so would not count.

Were there that many car bombs during that time frame? Well, no. Not even close.
In the first four months of this year, insurgents detonated 284 car bombs against civilian, police or military targets, according to U.S. military statistics.
That's 71 per month, which gets you 923 over the 13 month period. So we're off by a factor of at least 12 to 13.

Now, it's not like these are kidnappings, or even shootings, which might go unnoticed or unreported. These are cars blowing up in populated areas. To even suggest that 92% of them go unnoticed is frankly ridiculous. So we have one data point that suggests that the Lancet study is flat wrong.

Of course, it could be that they just happened to select a cluster in which a huge car bomb went off - that was a possibility I pointed to earlier. The researchers should have noted that episodes of violence within clusters were strongly associated, and expanded their confidence interval accordingly if that's what happened, but it's a possible explanation that requires no more than incompetence on their parts - as opposed to mendacity.

So what about air strikes? The researchers report that 78,000 people have been killed as a result of air strikes alone. Once your eyes stop rolling around your head, note two things. First, airstrikes occurring after the hottest portion of the war (after March, 2003) killed about 18.39 (9.7-26.17) people each, provided that they killed anyone at all. By my count, and using a somewhat liberal definition of what constitutes an "air strike", Iraq Body Count records 137 of them since March 20, 2003. But to account for the number killed (according to these researchers), we need more than 4,000 of them. Now sure, the argument can be made that Iraq Body Count misses some deadly air strikes. But more than 95% of them? This doesn't just strain credibility, it forcibly rips its limbs from their sockets.

So the Lancet study is deeply flawed, overcounting violent deaths from air strikes and car bombs by a factor of maybe 15-20. But the largest portion of deaths comes from gunshots. 56% of all violent deaths - about 350,000 people - are estimated to have died from gunfire. IBC counts 2,330 incidents of death in which "gunfire" or "gunshots" are listed as the first cause - killing an average of 2.96 people each. That's 6,901 deaths from gunshots.

Now, it's true that, compared to airstrikes or car bombs, deaths by gunshot are relatively likely go unreported (though, as the researchers tell us, they were shown death certificates for 92% of deaths, so the death was presumably reported to someone). But 98.1% of them? Does anyone really believe that?

The authors go to great pains to point out that population-based methods like theirs routinely count more deaths than passive surveillance does:
Data from passive surveillance are rarely complete, even in stable circumstances, and are even less complete during conflict, when access is restricted and fatal events could be intentionally hidden. Aside from Bosnia,[21] we can find no conflict situation where passive surveillance recorded more than 20% of the deaths measured by population-based methods. In several outbreaks, disease and death recorded by facility-based methods underestimated events by a factor of ten or more when compared with population based estimates.[11,22–25] Between 1960 and 1990, newspaper accounts of political deaths in Guatemala correctly reported over 50% of deaths in years of low violence but less than 5% in years of highest violence.[26]
But Iraq is not Guatemala, or even Bosnia. Car bombs and airstrikes are not "intentionally hidden", nor are they the result of pandemic disease.

No, undercounting by passive measures is unlikely to account for these discrepancies - yes, they undercount, but no, not by a factor of 15:1, not for car bombs and airstrikes, and not by 50:1 for gunshots. The most reasonable way to account for the extreme discrepancies between the numbers reported in the Lancet and those arrived at by common sense and alternative calculations is this - the Lancet researchers' results were skewed by the choice of "alternative" sites by Riyadh Lafta, by methodological bias toward urban areas that were heavily impacted by violence, or both.

You may disagree. Maybe there is something missing from my thinking that makes these events far more likely to go unreported in Iraq than in Bosnia. But there is a relatively simple way to determine who is correct. The Lancet researchers report that subjects were able to produce death certificates for 92% of deaths. That means there is no need for population-based surveys. Instead, we can simply ask those issuing death certificates for a simple count.

Care to fund that study, MIT?

17 comments:

Anonymous said...

Morgan:

Thank you for taking the time to explain all that. I am saving this for the next time I have an argument with some moron who really believes this survey.

Luther said...

Add my thanks as well Morgan.

There is no doubt in my mind that these folks set out with an agenda. To manipulate a study and its numbers so as to impact world opinion, more so the opinions of the voters in the upcoming US election.

They succeeded in that goal. My local fish-wrap published (under an AP byline of course) the 'news' of the study, with no independent analysis nor contrary viewpoints included and that has been it. No followup stories on the controversy surrounding the methods nor the numbers derived from same since.

Science is being cheapened and damaged by work such as this. All involved should be looking for other jobs as far as I'm concerned.

Rick Ballard said...

Bravo, Morgan. I sent this over to Thom at AT and mentioned it in comments at JOM. It's really excellent, a great analysis of peer reviewed GIGO.

Syl said...

Yes, I thank you too. The main impact is in the coherence you've achieved.

And I think the same argument you made for the correlation of deaths from car bombs could be made for airstrikes too. Certain neighborhoods of Iraq held more militants than other areas. Just as a car bomb that killed several children near a school raises the possibility many lived on the same block, militants having a session in a safe house that is demolished in an airstrike raises the possiblity that many of the individuals came from families living in the same 40-house row.

What has bothered me so much about this study is the fact, as you noted, that the researchers were able to see death certificates for such a high percentage of the deaths. Yet the UN reported figures,which used the passive method of counting death certificates via morgues and hospitals, came in with an overall death rate of 97/day. Which is about 20% the rate reported by this study.

Rick Ballard said...

Wrt death certificates, another argument that I've seen is that the Iraqis were able to conduct a successful election - if they can get ballots from all of Iraq why in the world are we supposed to think that death certificates aren't being collated and the totals forwarded?

I was very disappointed in the response that ITM provided to Pajamas. While it may have been a satisfactory emotional vent it was as hollow as their whole concept is proving to be.

truepeers said...

Good work. Frankly, I find it most believable that conscious fraud, bending of rules, was involved. The left believes that the ends justify the means and they believe in no truth higher than a will to power in the name of victims.

But I'll feign sincerity and ask a methodological question anyway. What exactly was the question asked - did they make clear that they were inquiring only into deaths of people living in a specific residence, or did they allow the reporting of any death in one's extended family? And how long did they allow for someone reporting a death to produce a birth certificate? And how did they explain the purpose of their questioning to their interviewees - did they possibly involve them in politicizing the project?

Luther said...

Truepeers

Do you mean something like this:

Cox and Forkum

Rick Ballard said...

Sticking with the epidemiological criticism, I would note two other errors of design.

The first concerns "best available data". A study which purports to present a 'scientific' proof of a hypothesis on degree of magnitude regarding issues of mortality yet makes absolutely no reference to any cross checks with local health officials concerning available data (someone issued those death certificates) is laughably flawed.

Health officials in Baghdad should have been questioned concerning their level of confidence in the data for which they are responsible and a review of that data should have been included as a control.

The second issue regards an analysis of the availability of other health data in Iraq. The Lancet survey ignores the fact that there are other Iraqi governmental bodies dealing with claims pertaining to death and injury resulting from violent causes.

There is a reason that they are ignoring the existence of data which would support or contradict their assertions and it has nothing whatsoever to do with science.

Anonymous said...

It is the glogal warming syndrome. Listen to us we ae the experts.

But I don't think most people really believe this study. Why? Well because they can turn on the TV and know it was a bad day in Iraq because 63 people died. To believe this study that would be a good day. It is as if they are telling us not to believe our lying eyes.

I just think the kneee jerk reaction is that the number is ridiculous.

Anonymous said...

global warming with a b, not a g. I should preview but I am too tired.

truepeers said...

Luther, nice cartoon. I used it in my link to this post at IBA.

Morgan said...

Hello all. Sorry for being off line all day. Thanks for the kind comments. Rick, thanks for the flog over at AT.

Syl, you're absolutely correct that the same could be said for airstrikes - gunshots, too, though to a lesser extent. And I didn't touch on the "other explosions" (or whatever they called it) category. Same there as well.

PeterUK: Richard Horton can be as socialist or any other "ist" as he wants, but if that crosses over into the corruption of a respected scientific/medical journal into a political one, he's clearly waived his right to be treated with respect.

Rick: I agree with the observation that if the Iraqis can hold (several) successful elections and run hundreds (if not thousands) of newspapers, they can count deaths rather successfully. That the researchers confirmed this is the only reasonable thing to come out of their survey.

Truepeers: There were a number of questions asked, apparently, but reported deaths were supposed to be of those within the household, with the further proviso that the deceased had to have "lived in the household continuously for at least three months prior to death". The interviewees apparently had mere minutes to produce a death certificate, they were asked to produce the certificates "at the conclusion" of surveys during which deaths were reported.

(A related tidbit that got by me first pass: "Where differences between the household account and the cause mentioned on the certificate existed, further discussions were sometimes needed to establish the primary cause of death.")

Since the two survey teams were typically able to complete a cluster of 40 houses in a single day, that's 20 for each team, or about 30 minutes for each household, including time spent moving from one household to the next (and trying to raise a response from empty ones).

Beyond that, we don't know exactly what the interviewers said. As far as we know, there was no "script".

Fresh Air: They report being shown 501 death certificates, presumed valid. The other 600,000-plus deaths are presumed to have occurred.

Rick, again: Yes, there is a reason that they ignore other sources of data, and also that their results tend to confirm the reliability of those other sources.

PeterUK, again: Interesting numbers.

Syl said...

Morgan

Rick: I agree with the observation that if the Iraqis can hold (several) successful elections and run hundreds (if not thousands) of newspapers, they can count deaths rather successfully. That the researchers confirmed this is the only reasonable thing to come out of their survey.

Just a general comment regarding those who tout this survey as the true word about Iraqi deaths, the people who do so surely don't think much of the Iraqi people, do they?

One commenter at JOM gave as a hypothetical an airstrike that killed several Iraqis who were buried in the rubble. Well, they'd just leave the bodies there and forget about them. No funerals, no reporting someone dead or missing, they'd just walk away. Therefore those deaths obviously prove this study is correct.

Which shows this person believes Iraqis are uncaring, stupid, and totally incompetent.

Words fail me.

Ron said...

1) Your criticisms have have themselves been strongly critiqued by statistical scientists, most of whom have supported the study's methodology as being standard for disaster and war scenarios where data collection is extremely difficult (see, for example, Rebecca Goldin's excellent article, "The Science of Counting the Dead", on Stats.org).

2) The paper went through a rigorous peer review process by one of the UK's most prestigious academic journals, as defended by Lancet editor Richard Horton (his comments may be found in the same issue as the study).

3) The study is properly scientific in Karl Popper's sense, i.e., it makes risky and falsifiable predictions which others can now test by engaging in actual field research. Conservative think-tanks should, one would think, be eager to sponsor scientists to replicate the study and see what they find for themselves, if they are seriously concerned with establishing the truth--although no such attempts were made after the first study and it is doubtful they will be made after the second.

4) Whatever the political views of the authors (one of the more spurious lines of criticism is that they hold "anti-war" views, that classic logical fallacy known as the attack ad hominem), it is certain that their methods and findings would not be so vigorously contested were they simply estimating deaths from, say, the 2004 tsunami in Southeast Asia or ongoing civil war in the Congo.

Morgan said...

Ron,

With regard to 1; my critique stands unanswered. There are at least two glaring flaws in the study - a near certainty of a strong bias toward urban areas and an obvious possibility of bias due to nonrandom selection by the interview teams. Whether these are inherent flaws in the "standard methodology" or specific to this study is irrelevant.

With regard to 2; your reference to peer review is a simple appeal to authority. If the flaws remain unnoted after peer review, they are flaws nonetheless. That I see them and the reviewers did not speaks poorly of their review, not my critique.

With regard to 3; certainly, it is falsifiable. In fact, numerous other studies have been done with results that are incompatible with the results of this one, and I have presented analysis that shows that the numbers produced by this study are extremely unlikely to be correct. In that sense this study has already been falsified.

With regard to 4; of course there would be less vigorous criticism of the methodology if the study pertained to deaths due to a tsunami, and you seem to understand the reason - no one would be motivated to introduce bias into the results of the "tsunami" study, but it is plausible that these researchers were motivated to do so in this study. It is not simple ad hominem to point to their political leanings (as you suggest). Rather, it provides valuable context - their political leanings go to motivation, the methodological flaws to opportunity.

The simple facts are these:

1) The results of this study are incompatible with other studies and a rational analysis of available data.

2) There are methodological flaws in this study that provide a possible explanation for the discrepancy.

3) The researchers hold views that may have provided motivation to exploit these flaws to inflate the death toll.

I welcome further comments.

Ron said...

Not being a statistician I'm afraid I must rely to some extent on the authority of others to interpret the study and explain its meaning. It is clear from your readers' comments that many of them are doing the same, only placing greater confidence in your authority than the authority of the many scientists who find the Hopkins team's research credible and methodologically serious. Relying on the authority of others need not, however, be blind reliance. I refer to the Lancet's defense of its peer review process and to Goldin's article because they seem to me, as a non-expert, to respond to your critique with logical counter arguments. Your readers can judge for themselves by exploring various pieces at www.Stats.org.

I also don't believe that your assertion that the study has already been falsified is correct. To critique a study as you have done, while often helpful, is not to falsify it. There have been other reports that have reached different conclusions (such as the IBC's reports and the UNDP's survey after the first year of the war), but these have all used very different and it seems also seriously flawed methodologies. The UNDP's study, for example, made no effort to identify increased mortality in Iraq resulting from increased disease, crime, accidents and other indirect factors.

There have in fact been no other attempts, to my knowledge, to identify total deaths in Iraq using the kinds of epidemiological methods used by the Hopkins team. To falsify the study, then, one would need to identify possible sources of bias/error, correct these problems, and repeat the study to see if the results are in fact sustained. It is revealing to me that for all the criticism of the study, no one I have read has ever suggested that this be done. It suggests to me that, whatever the problems/biases of the study, which may be real, there are equally power political biases/assumptions at work behind efforts to discredit its findings. A truly objective analysis of the study will surely make every effort to also bring both kinds of bias to light. Honest scholarship will also make clear in unequivocal language that by even the most conservative estimates the war in Iraq is a human tragedy on an order of magnitude Americans have hardly begun to grasp.

EpiInfo said...

As a statistician and also epidemiologist I think that most of the arguments put forward against the Lancet studies are weak. The most important elements of ascertaining validiy within these types of studies are:
1. Representativeness: Did the study address a broad population representing the underlying population affected by conflict?
2. Was random sampling used? Random sampling is used to reduce investigator-driven bias.
3. What is the accuracy of the deaths. Did the investigators use means to confirm that deaths actually happened?
4. Did investigators revisit houses to ensure responsiveness.
In this case, the Lancet studies, the authors did all but point 4. However, seeing as most deaths were confirmed with death certificates, point 4 doesn't matter that much.
Your points about main street bias are weak. There is no evidence, beyond your theoretical argument, to indicate that main streets receive more violence. Further, the authors did use random sampling, which is imperative. The issue of sample size has less to do with the number of houses in a cluster than the number of clusters altogether. There are weaknesses to this study and certainly the previous one had extremely wide confidence intervals, but even if one takes the lower confidence interval of the 2006 study, the number dead is catestrophic.
It seems to me that it is not the Lancet authors that are manipulating the findings. They have been decent enough to lay their methods out in the open for criticism.
Remember, this is precisiely the methods used in almost all conflict settings.