Archive

Tag Archives: Natural Language Processing

GDELT (gdelt.utdallas.edu) is a global database of events which have been coded from vast quantities of publicly available text that is produced by the world’s new media. It has created a great deal of excitement in the social science community, especially within the field of international relations. But it has had wider visibility as well: in August 2013, there were 150,000 views of a map of protest activity around the world, based on the GDELT database.  Event data have been around for several decades, but the GDELT project has generated new interest.

ICEWS is an early warning system designed to help US policy analysts predict a variety of international crises to which the US might have to respond. These include international and domestic crises, ethnic and religious violence, as well as rebellion and insurgency. This project was created at the Defense Advanced Research Projects  Agency, but has since been funded (through 2013) by the Office of Naval Research. ICEWS also produces  a  rich corpus of text which is analyzed with powerful techniques  of automated event-data production.  Since GDELT and ICEWS are based on similar, though not identical methods and sources, it is interesting to compare them.

ICEWS data

ICEWS event data, gray line for stories and black line for events, 2001-2013

One area in which they are most conceptually different is that ICEWS follows a more traditional approach to event data in seeking to encode a chronology of events that reflects in some sense  the putative ground truth of what occurred. The figure on the right shows the corpus of stories in ICEWS (gray) and the resulting events (black): total events are fairly stable over time event though the number of media stories increases. GDELT is more concerned with getting a comprehensive catalogue of all media stories (and other text) on reported events, and the corpus of those media stories is increasing exponentially, as the figure below shows. As a result, the number of events in GDELT is also increasing over time, much more so than ICEWS.

Read More