The current data situation on the Web

The current data situation on the Web. Not pictured: landing net R. Image from

This is a guest post by Simon Munzert, PhD student at the University of Konstanz, who is currently on a visit at the Lab.

It’s not that the people here at Duke’s Department of Political Science—and the WardLab members in particular—risk to run out of hot data in the near future. As somebody who is primarily concerned with research on public opinion and election forecasting, I was stunned in view of the masses of high quality event data and its potential for so many applications. Still, during my short stay at the Lab as a visiting scholar I had the opportunity to give a little introduction to various web scraping techniques using R.

Why web scraping? We have observed that the rapid growth of the World Wide Web over the past two decades tremendously changed the way we share, collect and publish data. Firms, public institutions and private users provide every imaginable type of information and new channels of communication generate vast amounts of data on human behavior. As many data on the Web are products of social interaction, they are of immediate interest for us as social scientists. Over the past years research on computer-based methods for classification and analysis of existing large amounts of data is booming across all disciplines, and political scientists contribute heavily to this process.

Read More


Journal Article , 2013


在冲突研究的领域中,虽然预测分析的重要性不言可喻,但是却一直没有受到足够的重视。我们认为,预测不仅具有实质公共政策参考的能力,另一方面也能用来检证既有理论模型、避免统计上过度配适(overfitting)且降低确认误差(confirmation bias),藉以建构出更可靠的冲突预测。在本篇文章中,我们回顾了学界在冲突预测研究中有哪些进展,发现由于这五十年来学科在资料搜集和运算能力的进步下,研究者得以从事过去所难以企及的预测研究工作,尤其在自动化的编码程序辅助下,快速的搜集数字化的新闻讯息成为可能,冲突研究得以应用以每日、每周、每月为单位的事件解析数据(disaggregated event data)来进行国家层次以下,有关政府与反抗团体的个体活动资料进行及时性的冲突预测工作。

为了呈现冲突研究在过去几年的重大进展,本文重新检视Fearon and Laitin (2003)这份奠定冲突研究基础的文献,从而比较和凸显预测分析在近几年的进展。结果发现,虽然Fearon and Laitin的研究中有很多的解释变量具有统计上的显著性,但是模型对于样本外事件的预测精确度却不高,这因为利用观察型的资料建构出具有统计上显著变量的模型,并无法回答像是何时、何处会发生内战这种决策者所关注的预测问题。

Read More

Mining Texts to Generate Fuzzy Measures of Political Regime Type at Low Cost.  Reposted from Dart Throwing Chimp, by Jay Ulfelder.

Political scientists use the term “regime type” to refer to the formal and informal structure of a country’s government. Of course, “government” entails a lot of things, so discussions of regime type focus more specifically on how rulers are selected and how their authority is organized and exercised. The chief distinction in contemporary work on regime type is between democracies and non-democracies, but there’s some really good work on variations of non-democracy as well (see here and here, for example).

Unfortunately, measuring regime type is hard, and conventional measures of regime type suffer from one or two crucial drawbacks.

Read More

It is the end of the year, and we’re supposed to be reflective.  But not too much. After all, this is a blog. The colleagues in this lab are terrific and it serves to pause for a moment to reflect on one tiny aspect of their accomplishments this last year: their publications.  I do think publishing is broken, but not everyone is ready or able to abandon ship just yet.  You will read no whine about publishing here. Well, at least not today.  In any case, we have been remarkably successful as you can see below. Why?

One reason is that research in 2013 is a collaborative process. It took sixteen of us to produce the dozen or so articles listed below. This means that we can do a lot collectively, but each of us has to do a lot individually to make that happen.  Indeed, we can do more collectively than each of us can do individually. Partially, this is supported by good will and common purpose, but more than a sliver of dropbox, github, and skype are involved as well. And some tolerance for the 24/7 lifestyle that everyone leads.  We live in a fantastic world where anyone with a laptop and internet access can really collaborate with colleagues who might be (as “we” have been at various times) in London, India, Seattle, Pennsylvania, Korea, Mexico, Austin, Croatia, Madison, New York, Santiago, Berlin, or Boulder Colorado.

It is also important to recognize that we have made a decision to join together and work together on projects. Most of these projects have a common theme, sure. But that theme is fairly permeable and open. And, the amount of what we really do not know about political life remains enormous. As a result, opportunities abound. But “suddenly” we have a lot of new ways of thinking about and investigating the perplexing world we live in.  We are not really always stuck in the corner solving things the so-called Gell-man way (sitting in our office and thinking real hard).  That may be helpful, but so is doing proofs, writing simulation code, querying databases, and writing computer programs. These things are especially helpful after a bit of reflection, but it turns out that they work better if the ideas being investigated have been annealed by discussion and dialogue among interested colleagues, who often see weakness and nuance where if left to our own devices  we might not perceive even the most glaring imperfection, let along the smallest.

Collaboration with bright colleagues is terrifically fun, and I am truly grateful to have the opportunity to participate with them in this lab.  Here is a list of projects that we published in the year 2013, minus a few things still snagged by reviewer number three.. Stay tuned for more good things in 2014 and for a forthcoming post on current lab projects.

  1. Michael D. Ward, Nils W. Metternich, Cassy L. Dorff, Max Gallop, Florian M. Hollenbach, Anna Schultz, and Simon Weschle. “Learning from the Past and Stepping into the Future: Toward a New Generation of Conflict Prediction,” International Studies Review (2013) 15, 473–490.
  2. Michael D. Ward, Cassy L. Dorff. “Les réseaux, les dyades et le modèle des relations sociales.” Liber amicorum: Hommage en l’honneur du Professeur Jacques Fontanel. Ed. Liliane Perrin-Bensahel and Jean-Francois Guilhaudis L’Harmattan, March, 2013: 271-288.
  3. Kristin M. Bakke, John V. O’Loughlin, Gerard O’Tuathail, and Michael D. Ward. “Convincing State-Builders? Disaggregating Internal Legitimacy in Abkhazia.”International Studies Quarterly 58.3 (2013).
  4. Cassy L. Dorff and Michael D. Ward. “Networks, Dyads, and the Social Relations Model.” Political Science Research Methods 1.2 (December, 2013): 159-178.
  5. Nils W. Metternich Cassy L. Dorff, Max Gallop, Simon Weschle & Michael D. Ward. “Anti-Government Networks in Civil Conflicts; How Network Structures Affect Conflictual Behavior.” American Journal of Political Science 57.4 (October, 2013): 777-1028.
  6. Michael D. Ward, John S. Ahlquist, and Arturas Rozenas. “Gravity’s Rainbow: A Dynamic Latent Space Model for the World Trade Network.” Network Science 1.1 (March, 2013): 95-118.
  7. Xun Cao and Michael D. Ward. “Do Democracies Attract Portfolio Investment? Transnational Portfolio Investments Modeled as Dynamic Network.” International Interactions 39.1 (2013 in press): in press.
  8. Jacob M. Montgomery, Florian M. Hollenbach, and Michael D. Ward. “Aggregation and Ensembles: Principled Combinations of Data.” PS: Political Science & Politics 46.1 (January, 2013): 43-44.
  9. Kristian Skrede Gleditsch and Michael D. Ward. “Forecasting is Difficult, Especially about the Future: Using Contentious Issues to Forecast Interstate Disputes.”Journal of Peace Research 50.1 (2013): 17-31.
  10. Jan Pierskalla and Florian M. Hollenbach. “Technology and Collective Action: The Effect of Cell Phone Coverage on Political Violence in Africa.” American Political Science Review 107.2 (2013): 207-224.
  11. Matthew Dickenson. “Leadership Transition and Violence in Mexican Drug Trafficking Organizations 2006-2010.”  Journal of Quantitative Criminology (2013): tba.
  12. Simon Weschle. “Two Types of Economic Voting: How Economic Conditions Jointly Affect Vote Choice and Turnout.” Electoral Studies in press (2013).
  13. December 30 update: Jacob M. Montgomery and  Josh Cutler. “Computerized Adaptive Testing for Public Opinion Surveys.” Political Analysis 21.2 (2013): 172-192. 

Reviewer 1
This turkey is a bit over done. I think the problem is that the authors need a better theory of turkey before they try to stick one in an oven for four hours and then serve it. A recent example is recently published in the Journal of Poultry and many earlier contributions in Giblets and Drumsticks have been overlooked. Many earlier scholars have actually caught their own turkeys and fed them assumptions and corn to produce a really substantial turkey, that not only reflects the theory of turkey, but also glistens with the implications of a well thought out turkey. Until a better theory of turkey is employed to motivate this particular baked turkey, it is hard to reach a satisfactory conclusion with this effort. While I appreciate the efforts, I don’t support revising this particular turkey for resubmission, though I am tempted to suggest that a soup be created with the remains.
Reviewer 2
Have the authors never tasted chicken? Neither duck? Medieval scholars knew that a combination of these fowl with turkey was necessary to provide a substantial empirical test of the “Thanksgiving Hypothesis.” Curiously, the authors have ignored this long standing research tradition, even though there is a Stata recipe that will undertake this effortlessly for them. Surely this could easily be done in revisions.
Reviewer 3
I appreciate the authors efforts to examine the “Thanksgiving Hypothesis,” but it would appear there is a serious flaw in their analysis. The turkey has been cooked, and we see the standard inclusions: sweet potatoes, mashed potatoes, gravy, freshly cooked rolls, and even cranberry sauce. I even appreciate the introduction of oysters as an instrument into the stuffing to rule out the endogeneity that the turkey was actually fed ground fishmeal. But there is no adequate control–such as a tofurkey–introduced to examine the possiblity that a general triptophane coma is responsible for outcomes in the “Thanksgiving Hypothesis.” That and the absence of soup leads me to conclude that this project is not ready. But I am encouraged enough to recommend revisions.

Editor: The reviewers see much merit in your work, but point to serious missteps as well. I have personally tasted a Turkey dinner, and would like to suggest that after considering the comments above, you revise your procedures and resubmit the results. If you choose to do so, I will send the effort to a new round of reviewers, including one of the original critics. If you decide to accept this invitation, I will need to have your submission by November 27th, 2014.

ICEWS is an early warning system designed to help US policy analysts predict a variety of international crises. This project was created at the Defense Advanced Research Projects Agency in 2007, but has since been funded (through 2013) by the Office of Naval Research. ICEWS has not been widely written about, in part because of its operational nature, and in part because articles about prediction in politics face special hurdles in the publication process. An academic article (gated) described the early phase of the project in 2010, including assessments of its accuracy, and a WIRED article in 2011 criticized ICEWS for missing the Arab Spring–at a time when the project was only focused on Asia.

In an article (here for now) forthcoming in the International Studies Review, as one of the original teams on the ICEWS project, we highlight the basic framework used in the more recent, worldwide version of ICEWS. Specifically, we discuss our model that is focused on forecasting, which is our main contribution to the larger, overall project. We call this CRISP. We argue that forecasting not only increases the dialogue between academia and the policy community, but that it also provides a gold standard for evaluating the empirical content of models. Thus, this gold standard improves not only the dialogue, but actually augments the science itself. In an earlier article in Foreign Policy, with Nils Metternich, we compared Billy Beane and Lewis Frye Richardson (sort of).


Read More