Thursday, May 2, 2013

AidData: Why it is not Wikipedia

The debate about AidData's "crowd-sourcing" methodology is heating up (see comments on my post from April 30). A recent note on Development Gateway in support of AidData compared AidData's methodology to Wikipedia, saying that this kind of crowd-sourcing of information can be productive.

I'm actually a big fan of Wikipedia. It is open, transparent, and often very clear and helpful. But I am not using it as a database for cross-country regressions. Many Wikipedia articles are "stubs" or have notes that controversy exists. We can take that into consideration when we read the article. That's also possible if someone is "reading" AidData's dataset. But let's face it, the purpose of having this data is to test relationships. I would be very surprised if users are going to "read" the data and pick and choose which numbers to include.

As Philippa Brant pointed out at the Lowy Institute, nearly half of the cases in the dataset -- 47 percent -- have only one source. That's almost 800 "projects" (1700 * .47). That level of uncertainty is not good enough for publication. AidData should have done more cleaning themselves before publishing this data.

AidData has responded to my critique in part by pointing out that they realize that many deals are not confirmed. That's not how it looks to me. The table we are discussing has a column on status. Of the nearly 1700 projects, 687 have a status of "Pipeline: Commitment" or "Pipeline: Pledge". None were coded "Pipeline: Vague" or "Suspended".  As many others have noted, an African ministry official's press conference about a popular public works project does not mean a commitment or pledge has occurred. Those of us who work on China and Africa have seen this multiple, multiple times. (Note: it is very rarely a Chinese announcement of a project commitment: when the Chinese government announces that it is financing a deal, a project is much more likely -- although not always -- goes ahead).

Finally, although AidData authors stress in their comments that their results are tentative, that they can't vouch for much of the data, they didn't write like that when discussing the numbers. We read about the "top recipients" of Chinese finance as though each of the numbers aggregated on their list is equally firm. That's simply not the case, as their own database points out.


  1. Dr. Brautigam, don't you also use media reports to verify your own data? You've said that you compile your data by seeing the projects for yourself, but seeing a budget or a check for a cash transfer or debt cancellation isn't inherently more reliable than seeing it in a newspaper. I'd hope that newspapers have editors that fact check their raw data, whereas I am not certain that you do.

  2. The ~1700 official finance projects that you reference exclude all known project cancellations/suspensions. Have a look at the static excel spreadsheet file at

    It looks like AidData separated the suspended and cancelled projects from the ~1700 official finance projects. I found the suspended and cancelled projects in a different tab within the static spreadsheet.

  3. Thanks @Anonymous for the clarification on the suspended/cancelled projects.
    @Anonymous, one can use media reports (among other sources) to verify, but it takes some knowledge to sort out credible reports from others. And for some projects (or alleged projects), it takes a lot of digging and interpretation to put the story together. I refer you to the article Sigrid Ekman and I wrote on Chinese agricultural engagement in Mozambique, published in African Affairs last January.

  4. My quick conclusion:

    As always in history , there are reasons why certain information is not released.

    China can easily ensure transparency but does not want this ...

    On the other hand, the creators of Aiddata beg for everyone's cooperation.
    So from specialists like you and by extension, many others who have first-hand information available on one or more projects.
    The possibility to make additions or corrections is explicitly available.
    What are you waiting for?

    I myself use datasets only to disentangle trends, sometimes over decades...

    “China” was till recently a small economic player in Africa and is still small in the area of ​​"land grabs".
    But to think that there is a linear increase is a mistake, it comes with ups and downs depending, in the past, on the changing views in Beijing and nowadays it depends more on the state of the world economy...

    I also follow the Chinese news, company announcements included, and with these contracts "China" is usually in the drivers seat. Hence, I foresee a great increase in such contracts. Because many large and medium sized Chinese companies are confronted with overproduction and beg the Chinese government for foreign contracts …

    The ball is in your camp: continue to argue or collaborate ....
    Of course you prefer to have it in more academic language, hence again Phillippa Brant in his conclusion as I could read it in Lowy yesterday evening:
    Despite my reservations, I think this database can be a good resource if used wisely. Now that it is published, I hope other researchers will take up the authors' call to arms to improve and clean the data. I only wish they had waited for more accurate data before publishing. Like the ill-fated Congressional Research Service data before it, the CGD/AidData US$75 billion Chinese 'aid' figure will unfortunately be circulated for years to come.

  5. The initial April jobs report for the U.S. came out today. These first reports are often revised substantially in the months after their release. The press, however, nearly always treats them as gospel. To suggest that the BLS should not release these initial estimates because the press might abuse them or overstate their accuracy is a bit laughable.

    The right solution is to educate reporters about the nature of these initial reports. The wrong solution is the ask the BLS to release less data. Mutatis mutandis with respect to AidData's China data.

  6. BLS isn't using crowd sourcing to revise their estimates.

  7. This comment has been removed by the author.

  8. no, it's not crowd-sourcing, it's a survey, with a sampling methodology, a standardised questionnaire. Surveys are not crowd-sourcing. Check your research methods undergrad textbook.

  9. The jobs data is not based on media reports of the job data. The issue with AIDdata would be as if the jobs data was an aggregate of media reports on the job data. And updates would be updates of media reports on the jobs data.

  10. There are limits to being Anonymous,
    It should never be Synonymous,
    With using it as a perk,
    To write and act like a jerk,
    Unless, as any undergrad literary text will tell you- its Eponymous (he, he)

  11. From someone who works with real aid "data" in another developing country, the attempt by AidData to use their large grant from USAID to put manpower on their effort of combing through media reports on Chinese aid and then marketing their work as "data" is ludicrous! It points to a total lack of understanding by the management of AidData on how foreign assistance really works. Do any of the leaders of AidData have any significant field experience working in developing countries and on foreign aid? Governments often have their own reasons for putting out "stories" in the media on what they are giving to whom. But such a single "story" does not a data point make (as well put by the Lowy Institute). Moreover, for the academics who manage AidData to market their work as "data" calls into question the broader work done by AidData. But the genie is now out of the bottle. The largely fictitious $75 billion figure is being quoted by media sources and even such well-known figures as the former head of ODI at a recent AusAid conference in Australia. What a pity that AidData has conducted such work and marketed it as "data." I hope more academics and people who actually work on the ground on aid issues help clarify the many misconceptions that AidData has now put out.

  12. Let's not call the AidData estimates of Chinese aid "data". They are guesstimates at best, and very poor ones at that!