实验室文章:迈向冲突预测的新时代

Journal Article , 2013

文章摘要

在冲突研究的领域中,虽然预测分析的重要性不言可喻,但是却一直没有受到足够的重视。我们认为,预测不仅具有实质公共政策参考的能力,另一方面也能用来检证既有理论模型、避免统计上过度配适(overfitting)且降低确认误差(confirmation bias),藉以建构出更可靠的冲突预测。在本篇文章中,我们回顾了学界在冲突预测研究中有哪些进展,发现由于这五十年来学科在资料搜集和运算能力的进步下,研究者得以从事过去所难以企及的预测研究工作,尤其在自动化的编码程序辅助下,快速的搜集数字化的新闻讯息成为可能,冲突研究得以应用以每日、每周、每月为单位的事件解析数据(disaggregated event data)来进行国家层次以下,有关政府与反抗团体的个体活动资料进行及时性的冲突预测工作。

为了呈现冲突研究在过去几年的重大进展,本文重新检视Fearon and Laitin (2003)这份奠定冲突研究基础的文献,从而比较和凸显预测分析在近几年的进展。结果发现,虽然Fearon and Laitin的研究中有很多的解释变量具有统计上的显著性,但是模型对于样本外事件的预测精确度却不高,这因为利用观察型的资料建构出具有统计上显著变量的模型,并无法回答像是何时、何处会发生内战这种决策者所关注的预测问题。

站在修正Fearon and Laitin的基础上,我们利用年度层次以下的事件时间解析数据来建构冲突预测模型,并且运用层级模型(hierarchical model)来追踪估计变量在不同国家属性群集中的变化。具体而言,我们利用CRISP事件数据库,建构从1997到2011以每个月为基础的冲突模型来预测UCDP数据库的内战发生事件。这里简要说明内文中两张图表来显示预测模型的效果。图一用分离图(separation plot)说明样本内及样本外模型的配适度,这两个图示说明在预测机率间离散的范围和程度,以及预测机率相应的真实事件的离散程度说明模型的配适度。另外,它依照所有国家年度的内战发生机率从左边最低到右边最高依序排列,在中间的黑色线条表示这个机率,那些实际发生内战的国家是红色,白色是那些没有发生内战的国家。在左边的红线表示负向的错误预测(false negative),但在右边白色表示正向的错误预测(false positive),一个高配适度的预测应该会有比较多的红色部分(事件发生)在图的右侧。这个图说明了:(1)有实际内战的事件是那些有比较高预测机率的,以及(2)如我们所料,样本外预测的配适度比样本内预测的配适度稍微差一些,不过仅管是样本外,模型与数据仍算相当的配适。

另外表五则呈现双元模型配适度的标准表现统计(假设以0.5为分割点)。样本外的配适度一样在所有个别的估计中都略逊于样本内的配适度,但除了低估实际内战的次数之外,仍然算是很不错的估计。样本内的表现则是相当精准的,所有具有最高预测机率事件都实际发生了内战,而拥有最低预测机率的国家则是没有发生任何的内战事件。

对于社会科学常见的批评指出,像是国际冲突这样的复杂社会现象是无法用任何方法进行预测的。但正因为政治冲突的内在理路十分复杂,我们更应该找寻背后的解释机制来试图对它进行解释与预测。我们在这篇文章中凸显了冲突模型对于了解不同国家背景下政治冲突的功用,内战的统计模型不论在样本内或是样本外都可以是高度精确的,而且在新数据不断涌现的世界,我们得以利用模型建构外的数据回头来检验预测模型的可靠性,这样的模型评估黄金法则对于学科统计方法上也有深远的贡献。

Mining Texts to Generate Fuzzy Measures of Political Regime Type at Low Cost.  Reposted from Dart Throwing Chimp, by Jay Ulfelder.

Political scientists use the term “regime type” to refer to the formal and informal structure of a country’s government. Of course, “government” entails a lot of things, so discussions of regime type focus more specifically on how rulers are selected and how their authority is organized and exercised. The chief distinction in contemporary work on regime type is between democracies and non-democracies, but there’s some really good work on variations of non-democracy as well (see here and here, for example).

Unfortunately, measuring regime type is hard, and conventional measures of regime type suffer from one or two crucial drawbacks.

First, many of the data sets we have now represent regime types or their components with bivalent categorical measures that sweep meaningful uncertainty under the rug. Specific countries at specific times are identified as fitting into one and only one category, even when researchers knowledgeable about those cases might be unsure or disagree about where they belong. For example, all of the data sets that distinguish categorically between democracies and non-democracies—like this one, this one, and this one—agree that Norway is the former and Saudi Arabia the latter, but they sometimes diverge on the classification of countries like Russia, Venezuela, and Pakistan, and rightly so.

Importantly, the degree of our uncertainty about where a case belongs may itself be correlated with many of the things that researchers use data on regime type to study. As a result, findings and forecasts derived from those data are likely to be sensitive to those bivalent calls in ways that are hard to understand when that uncertainty is ignored. In principle, it should be possible to make that uncertainty explicit by reporting the probability that a case belongs in a specific set instead of making a crisp yes/no decision, but that’s not what most of the data sets we have now do.

Second, virtually all of the existing measures are expensive to produce. These data sets are coded either by hand or through expert surveys, and routinely covering the world this way takes a lot of time and resources. (I say this from knowledge of the budgets for the production of some of these data sets, and from personal experience.) Partly because these data are so costly to make, many of these measures aren’t regularly updated. And, if the data aren’t regularly updated, we can’t use them to generate the real-time forecasts that offer the toughest test of our theories and are of practical value to some audiences.

As part of the NSF-funded MADCOW project*, Michael D. (Mike) Ward, Philip Schrodt, and I are exploring ways to use text mining and machine learning to generate measures of regime type that are fuzzier in a good way from a process that is mostly automated. These measures would explicitly represent uncertainty about where specific cases belong by reporting the probability that a certain case fits a certain regime type instead of forcing an either/or decision. Because the process of generating these measures would be mostly automated, they would be much cheaper to produce than the hand-coded or survey-based data sets we use now, and they could be updated in near-real time as relevant texts become available.

At this week’s annual meeting of the American Political Science Association, I’ll be presenting a paper—co-authored with Mike and Shahryar Minhas of Duke University’s WardLab—that describes preliminary results from this endeavor. Shahryar, Mike, and I started by selecting a corpus of familiar and well-structured texts describing politics and human-rights practices each year in all countries worldwide: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s Freedom in the World. After pre-processing those texts in a few conventional ways, we dumped the two reports for each country-year into a single bag of words and used text mining to extract features from those bags in the form of vectorized tokens that may be grossly described as word counts. (See this recent post for some things I learned from that process.) Next, we used those vectorized tokens as inputs to a series of binary classification models representing a few different ideal-typical regime types as observed in few widely used, human-coded data sets. Finally, we applied those classification models to a test set of country-years held out at the start to assess the models’ ability to classify regime types in cases they had not previously “seen.” The picture below illustrates the process and shows how we hope eventually to develop models that can be applied to recent documents to generate new regime data in near-real time.

Overview of MADCOW Regime Classification Process

Our initial results demonstrate that this strategy can work. Our classifiers perform well out of sample, achieving high or very high precision and recall scores in cross-validation on all four of the regime types we have tried to measure so far: democracy, monarchy, military rule, and one-party rule. The separation plots below are based on out-of-sample results from support vector machines trained on data from the 1990s and most of the 2000s and then applied to new data from the most recent few years available. When a classifier works perfectly, all of sepdemthe red bars in the separation plot will appear to the right of all oseponepartyf the pink bars, and the black line denoting the probability of a “yes” case will jump from 0 to 1 at the point of separation. These classifiers aren’t perfect, but they seem to be working very well.

 

sepmilruleOf course, what most of us want to do when we find a new data set is to see how it characterizes cases we know. We can do that here with heat maps of the confidence scores from the support vector machines. The maps below show the values from the most recent year available for tsepmonarchywo of the four regime types: 2012 for democracy and 2010 for military rule. These SVM confidence scores indicate the distance and direction of each case from the hyperplane used to classify the set of observations into 0s and 1s. The probabilities used in the separation plots are derived from them, but we choose to map the raw confidence scores because they exhibit more variance than the probabilities and are therefore easier to visualize in this form.

 

 

 

 

 

map4

On the whole, cases fall out as we would expect them to. The democracy classifier confidently identifies Western Europe, Canada, Australia, and New Zealand as democracies; shows interesting variations in Eastern Europe and Latin America; and confidently identifies nearly all of the rest of the world as non-democracies (defined for this task as a Polity score of 10). Meanwhile, the military rule classifier sees Myanmar, Pakistan, and (more surprisingly) Algeria as likely map3examples in 2010, and is less certain about the absence of military rule in several West African and Middle Eastern countries than in the rest of the world.

These preliminary results demonstrate that it is possible to generate probabilistic measures of regime type from publicly available texts at relatively low cost. That does not mean we’re fully satisfied with the output and ready to move to routine data production, however. For now, we’re looking at a couple of ways to improve the process.

First, the texts included in the relatively small corpus we have assembled so far only cover a narrow set of human-rights practices and political procedures. In future iterations, we plan to expand the corpus to include annual or occasional reports that discuss a broader range of features in each country’s national politics. Eventually, we hope to add news stories to the mix. If we can develop models that perform well on an amalgamation of occasional reports and news stories, we will be able to implement this process in near-real time, constantly updating probabilistic measures of regime type for all countries of the world at very low cost.

Second, the stringent criteria we used to observe each regime type in constructing the binary indicators on which the classifiers are trained also appear to be shaping the results in undesirable ways. We started this project with a belief that membership in these regime categories is inherently fuzzy, and we are trying to build a process that uses text mining to estimate degrees of membership in those fuzzy sets. If set membership is inherently ambiguous in a fair number of cases, then our approximation of a membership function should be bimodal, but not too neatly so. Most cases most of the time can be placed confidently at one end of the range of degrees of membership or the other, but there is considerable uncertainty at any moment in time about a non-trivial number of cases, and our estimates should reflect that fact.

If that’s right, then our initial estimates are probably too tidy, and we suspect that the stringent operationalization of each regime type in the training data is partly to blame. In future iterations, we plan to experiment with less stringent criteria—for example, by identifying a case as military rule if any of our sources tags it as such. With help from Sean J. Taylor, we’re also looking at ways we might use Bayesian measurement error models to derive fuzzy measures of regime type from multiple categorical data sets, and then use that fuzzy measure as the target in our machine-learning process.

So, stay tuned for more, and if you’ll be at APSA this week, please come to our Friday-morning panel and let us know what you think.

* NSF Award 1259190, Collaborative Research: Automated Real-time Production of Political Indicators

 

This post was written by Jay Ulfelder and originally appeared on Dart-Throwing Chimp. The work it describes is part of the NSF-funded MADCOW project to automate the coding of common political science datasets.

Guess what? Text mining isn’t push-button, data-making magic, either. As Phil Schrodt likes to say, there is no Data Fairy.

I’m quickly learning this point from my first real foray into text mining. Under a grant from the National Science Foundation, I’m working with Phil Schrodt and Mike Ward to use these techniques to develop new measures of several things, including national political regime type.

I wish I could say that I’m doing the programming for this task, but I’m not there yet. For the regime-data project, the heavy lifting is being done by Shahryar Minhas, a sharp and able Ph.D. student in political science at Duke University, where Mike leads the WardLab. Shahryar and I are scheduled to present preliminary results from this project at the upcoming Annual Meeting of the American Political Science Association in Washington, DC (see here for details).

When we started work on the project, I imagined a relatively simple and mostly automatic process running from location and ingestion of the relevant texts to data extraction, model training, and, finally, data production. Now that we’re actually doing it, though, I’m finding that, as always, the devil is in the details. Here are just a few of the difficulties and decision points we’ve had to confront so far.

First, the structure of the documents available online often makes it difficult to scrape and organize them. We initially hoped to include annual reports on politics and human-rights practices from four or five different organizations, but some of the ones we wanted weren’t posted online in a format we could readily scrape. At least one was scrapable but not organized by country, so we couldn’t properly group the text for analysis. In the end, we wound up with just two sets of documents in our initial corpus: the U.S. State Department’s Country Reports on Human Rights Practices, and Freedom House’s annual Freedom in the World documents.

Differences in naming conventions almost tripped us up, too. For our first pass at the problem, we are trying to create country-year data, so we want to treat all of the documents describing a particular country in a particular year as a single bag of words. As it happens, the State Department labels its human rights reports for the year on which they report, whereas Freedom House labels its Freedom in the World report for the year in which it’s released. So, for example, both organizations have already issued their reports on conditions in 2013, but Freedom House dates that report to 2014 while State dates its version to 2013. Fortunately, we knew this and made a simple adjustment before blending the texts. If we hadn’t known about this difference in naming conventions, however, we would have ended up combining reports for different years from the two sources and made a mess of the analysis.

Once ingested, those documents include some text that isn’t relevant to our task, or that is relevant but the meaning of which is tacit. Common stop words like “the”, “a”, and “an” are obvious and easy to remove. More challenging are the names of people, places, and organizations. For our regime-data task, we’re interested in the abstract roles behind some of those proper names—president, prime minister, ruling party, opposition party, and so on—rather than the names themselves, but text mining can’t automatically derive the one for the other.

For our initial analysis, we decided to omit all proper names and acronyms to focus the classification models on the most general language. In future iterations, though, it would be neat if we could borrow dictionaries developed for related tasks and use them to replace those proper names with more general markers. For example, in a report or story on Russia, Vladimir Putin might get translated into <head of government>, the FSB into <police>, and Chechen Republic of Ichkeria into <rebel group>. This approach would preserve the valuable tacit information in those names while making it explicit and uniform for the pattern-recognition stage.

That’s not all, but it’s enough to make the point. These things are always harder than they look, and text mining is no exception. In any case, we’ve now run this gantlet once and made our way to an encouraging set of initial results. I’ll post something about those results closer to the conference when the paper describing them is ready for public consumption. In the meantime, though, I wanted to share a few of the things I’ve already learned about these techniques with others who might be thinking about applying them, or who already do and can commiserate.

Improvised explosive devices, or IEDs, were extensively used during the US wars in Iraq and Afghanistan, causing half of all US and coalition casualties despite increasingly sophisticated countermeasures. Although both of these wars have come to a close, it is unlikely that the threat of IEDs will disappear. If anything, their success implies that US and European forces are more likely to face them in similar future conflicts. As a result there is value in understanding the process by which they are employed, and being able to predict where and when they will be used. This is a goal we have been working on for some time now as part of a project funded by the Office of Naval Research, using SIGACT event data on IEDs and other forms of violence in Afghanistan.

expl-haz

Explosive hazards, which include IEDs, for our SIGACT data.

Read More

thai_coup_announcement

Thailand’s Army chief General Prayuth announces the coup on television on 22 May 2014. Source: SCMP

This morning (May 22nd, 2014, East Coast time), the Thai military staged a coup against the caretaker government that had been in power for the past several weeks, after months of protests and political turmoil directed at the government of Yingluck Shinawatra, who herself had been ordered to resign on 7 May by the judiciary. This follows a military coup in 2006, and more than a dozen successful or attempted coups before then.

We predicted this event last month, in a report commissioned by the CIA-funded Political Instability Task Force (which we can’t quite share yet). In the report, we forecast irregular regime changes, which include coups but also successful protest campaigns and armed rebellions, for 168 countries around the world for the 6-month period from April to September 2014. Thailand was number 4 on our list, shown below alongside our top 20 forecasts. It was number 10 on Jay Ulfelder’s 2014 coup forecasts. So much for our inability to forecast (very rare) political events, and the irrelevance of what we do.

Read More

Recently, Syrian rebels (under EU embargo until mid-2013) have relied on weapons smuggled from neighboring states including Iraq, Lebanon, and Turkey (source).

Recently, Syrian rebels (under EU embargo until mid-2013) have relied on weapons smuggled from neighboring states including Iraq, Lebanon, and Turkey (source).  Image from commons.wikimedia.org.

Why do arms embargoes fail? Despite their frequent use by international organizations like the United Nations and the European Union, arms embargoes suffer from a poor record of success. For half a century now, multilateral arms embargoes have been the primary tool used to fight the proliferation of small arms and light weapons (SALW) to conflict zones and perpetrators of mass violence. These agreements between countries prohibit the sale of weapons to a particular target country (or sometimes a target organization). However, official reviews and academic studies alike tend to conclude that small arms are still making their way to embargoed actors.

Black markets are often cited as a source of this failure. Still, no large-n studies have presented evidence of increased black market activity in the presence of embargoes. To remedy this, I look for evidence of black market activity in records of legal arms trades. The data reveal that arms embargoes are associated with a substantial increase in the value of arms imports into nearby states. Given previous research on the nature of black market arms trade, this seems likely to result from an incentive for neighboring states to import more weapons that will then be transferred illegally to the embargoed state.

Black market arms transfers are difficult to study. Most of what we know about illicit arms transfers comes from those cases where somebody has made a mistake and the illicit activity has been uncovered. Apart from those few select cases, reliable data on actual illegal arms transfers is unavailable. Nonetheless, the illicit arms trade is big business, measuring roughly one billion USD per year.

Embargoed states and their neighbors.

Embargoed states and their neighbors. Embargoes based on data from Erickson (2013), Journal of Peace Research.

Black markets are of particular concern in situations where the legal supply of weapons is low but the demand is high. These circumstances often apply to criminal organizations, rebel groups, and embargoed states. While these illicit trades are difficult to collect data on systematically, most of the weapons involved begin as legally-traded arms. They are traded legally and then diverted from their authorized recipients. Arms embargoes provide an interesting case for the study of illicit arms. Those countries that border embargoed states can take advantage of their shared border to traffic illegal arms to the embargoed neighbor without fear of discovery by a third party. Therefore, if embargoed countries circumvent those embargoes by purchasing arms illicitly, we should expect to see an increase in the arms imported to their neighbors.

I have used data on multilateral arms embargos and legal arms transfers to test this proposition. Statistical models reveal that arms embargoes are indeed associated with greater levels of weapons imports in nearby countries. In fact, the predicted increase is substantial: those countries that border embargoed countries are estimated to import 38% more arms than they would have had they not been neighboring an embargoed country (measured in value, constant 2000 USD). This can translate into hundreds of thousands or even millions of dollars worth of additional weapons. Furthermore, this result takes into account both domestic and international conflict as well as other predictors of arms imports like the overall level of arms imports to the region, government type, and GDP per capita. On the other hand, this analysis indicates that arms embargoes are indeed effective at stemming the flow of legal arms into embargoed countries. Countries targeted by an embargo are predicted to import, on average, 63% fewer arms than they would otherwise.

Predicted levels of arms imports for a hypothetical median state bordering an embargoed state and not bordering an embargoed state.  Fixed effect uncertainty included.  Based on 100,000 simulations.

Predicted levels of arms imports for a hypothetical median state bordering an embargoed state and not bordering an embargoed state. Fixed effect uncertainty included. Based on 100,000 simulations.

Arms embargoes appear to effectively decrease the legal, or recorded, sale of arms to target states. However, this effect is accompanied by a significant increase in the level of arms imported to the surrounding region. Absent other possible explanations, it seems likely that many of these arms are destined for the embargoed country. Effective arms control measures must account for the regional conditions that may undermine nonproliferation efforts.

Large-scale event data based on worldwide media reports already help us to explain and forecast crises events such as civil wars or insurgencies. But the millions of data points provided by ICEWS or GDELT are a treasure trove for social scientists interested in all kinds of topics, whether they involve violence or not.

For example, they can be used to look at the way politicians interact with each other. A lot of research on political competition in the past two or three decades has focused on party positions and politicians’ ideological leanings, fueled by the convenient availability of suitable data (i.e. NOMINATE and the Comparative Manifestos Project). But political competition is about more than just ideology and policy positions. Recent contributions on the Monkey Cage (here and here) have pointed out that the discussion about polarization in the US is to a significant degree about the way politicians interact with each other: that they are more interested in attacking each other verbally, rather than “getting things done” for the good of the country. Arguably, this kind of behavior is responsible for at least part of the gridlock and lack of legislative productivity in Washington even in areas where there is significant bipartisan consensus about policy. However, serious empirical investigations into the way politicians interact with each other have been largely absent, the main reason being a lack of suitable data. But the availability of large-scale media event data can help to change that.

The machine-coded media stories that make up the ICEWS (or GDELT) data provide fine-grained information about how politicians publicly interact with each other, and with other societal actors. They record when one politician criticizes or denounces someone, and they also document when two actors praise each other or express a desire to work together. This allows us to analyze conflict and cooperation between political actors in a systematic manner. In a new working paper, I use the ICEWS event data to analyze the way parties interacted in the 11 Eurozone countries between 2001 and 2011.

I divide the events into two categories, cooperative (e.g. one actor praises another) and conflictual (e.g. one actor criticized another), based on the CAMEO codebook. For each country, the data provide between 2000 and 30,000 events, involving between 125 and almost 450 actors (parties, NGOs, military, etc.). The actors have a complex network of interactions with each other. To summarize them in a simple and intuitive manner, I estimate latent network models for each country-year. Without getting into the technical details, these models estimate the position of each actor in a hypothetical latent space. Actors that are positioned close together in the latent space have a higher probability of interacting with each other frequently in a cooperative way, while actors positioned far away from each other are likely to interact in a conflictual manner.

Posterior latent space estimates for Greece in 2002, 2006, and 2010. Parties: PASOK (green), ND (blue), KKE (red). All other actors in gray.

Latent space estimates for Greece in 2002, 2006, and 2010. Parties: PASOK (green), ND (blue), KKE (red). All other actors in gray.

Read More

Follow

Get every new post delivered to your Inbox.

Join 32 other followers