voir aussi le travail d’Alain Bertho
►https://berthoalain.com/documents
données sources de cette carte paraissent exhorbitantes, et précises (pas que le lieu mais l’intensité), une idée d’où elle proviennent ?
Phoenix Data Project
▻http://phoenixdata.org
Political event data—nominal or ordinal codes recording the interactions between international actors as reported in the open press — are a common type of information used in quantitative political research. Technological developments over the past two decades now allow this data to be automatically coded from machine-readable sources in near-real-time, which has dramatically decreased the cost of producing such data and increased the interest in applying it in a variety of domains, including commercial, policy and academic applications.
alternative à #GDELT
Dynamic Maps and Graphs | ACLED
▻http://www.acleddata.com/visuals/maps/dynamic-maps
The dynamic maps below have been drawn from ACLED Version 6. They illustrate key dynamics in event types, reported fatalities, and actor categories. Clicking on the maps, and selecting or de-selecting options in the legends, allows users to interactively edit and manipulate the visualisations, and export or share the finished visuals. The maps are visualised using Tableau Public.
#ACLED #GDELT et d’autres bases d’événements : un résumé ici par « Georgine » :
▻http://www.poliscirumors.com/topic/gdelt-suspended/page/8#post-95140
ACLED is basically an incident database - each “event” codes a “happening” rather than a simple action, in the case of ACLED a day of violence in a location perpetrated by a group - it’s much more similar to Wikileaks data and/or the Global Terrorist Database. For each of their events, they extract a ton of information that is connected to the event - so basically one ACLED / GTD / UCDP-GED etc. etc. event can be 100 GDELT events. Don’t get me wrong, being curated and manually collected, it’s leaps and bounds better than GDELT in terms of data quality (the number of false positives and negatives is tons lower), but ACLED is one of the worse incident datasets out there, if not the worse (they did come first though). It’s simply not reliable across cases, the geocoding is pretty poor, and I’ve read an working paper (done by some guys in Berlin, comparing their independently collected benchmark data with some incident datasets), and ACLED was the “odd one there”.
A far, far better choice is UCDP-GED, they are fairly close to ACLED in what they capture (also only Africa), but the quality and documentation is leaps and bounds better, plus they code far more information per event, plus they are compatible with other datasets (ACLED isn’t), i.e. UCDP-GED has reliable estimates of battle deaths, while ACLED doesn’t really (they’ve added them as an after-thought, and much of them are simply extracted from nowhere). Plus UCDP has the intention to (at least one day) be global with their UCDP-GED while ACLED seems to want to be regional forever. There are issues with UCDP-GED as well - their inclusion criteria is strict, and, like all Uppsala products, they are 1-2 years “behind” real time.
You also have AID-DATA (Aid flows in Africa, by World Bank, I have no idea how far they’ve gone), SCAD (protests in Africa), MMAD (mass-movements in autocracies), GTD (terrorism), etc. etc. They all follow the “incident” pattern rather than the VRA-LEVANT-GDELT pattern.
P.S. Note that I don’t work for any of these projects, but worked quite a bit with UCDP-GED, ACLED, GTD and GDELT. Good luck!
Showing Refugees Some Love | geovisualist
▻http://geovisualist.com/2015/11/22/showing-refugees-some-love
The data comes from #GDELT (The Global Database of Events, Language, and Tone). GDELT’s Global Knowledge Graph monitors media in 65 languages around the world and uses algorithms to measure the emotions and tone of the texts. The map shows results on the theme of “refugees” with a tone of greater than two. Tone is the most basic GDELT parameter, and measures how positive or negative a media article is. So, for example, this article about how churches in Kansas and Nebraska are ready to help refugees is included in the dataset.
History As Big Data: 500 Years Of Book Images And Mapping Millions Of Books - Forbes
▻http://www.forbes.com/sites/kalevleetaru/2015/09/16/history-as-big-data-500-years-of-book-images-and-mapping-millions-of-books
The terms “big data” and “massive data analytics” likely conjure thoughts of the modern world, of hundreds of millions of tweets or billions of Facebook FB +2.17% posts streaming in real time into gleaming data centers filled with blinking lights. Libraries, on the other hand, filled with endless rows of dusty books, are likely not the first thing that comes to mind. Yet, what if we could use libraries to reimagine our past, creating a gallery of all the images from half a millennium of books or creating a 215-year animated map of human history as seen through millions of books?
je trouve ça absolument fascinant (par l’auteur de #GDELT)
#géolocalisation #histoire #text_mining #google_bigquery
à noter, l’impact du #copyright qui provoque une chute du nombre de livres analysés à partir de 1922
la base extraite est disponible en accès libre
▻http://blog.gdeltproject.org/3-5-million-books-1800-2015-gdelt-processes-internet-archive-and-
avec des exemples qui permettent de voir coment en quelques secondes faire une carte traitant d’une recherche particulière
▻http://blog.gdeltproject.org/google-bigquery-3-5m-books-sample-queries
Why Big Data Missed the Early Warning Signs of Ebola
▻http://www.foreignpolicy.com/articles/2014/09/26/why_big_data_missed_the_early_warning_signs_of_ebola
Merci à @freakonometrics d’avoir signalé cet article sur Twitter
ith the Centers for Disease Control now forecasting up to 1.4 million new infections from the current Ebola outbreak, what could “big data” do to help us identify the earliest warnings of future outbreaks and track the movements of the current outbreak in realtime? It turns out that monitoring the spread of Ebola can teach us a lot about what we missed — and how data mining, translation, and the non-Western world can help to provide better early warning tools.
Earlier this month, Harvard’s HealthMap service made world headlines for monitoring early mentions of the current Ebola outbreak on March 14, 2014, “nine days before the World Health Organization formally announced the epidemic,” and issuing its first alert on March 19. Much of the coverage of HealthMap’s success has emphasized that its early warning came from using massive computing power to sift out early indicators from millions of social media posts and other informal media.
By the time HealthMap monitored its very first report, the Guinean government had actually already announced the outbreak and notified the WHO.
cf ▻http://seenthis.net/messages/286853#message286960 et ▻http://seenthis.net/messages/287766
et sur l’impasse de #GDELT (l’auteur de l’article, Kalev H. Leetaru, étant le créateur de cette base de données) :
Part of the problem is that the majority of media in Guinea is not published in English, while most monitoring systems today emphasize English-language material. The GDELT Project attempts to monitor and translate a cross-section of the world’s news media each day, yet it is not capable of translating 100 percent of global news coverage. It turns out that GDELT actually monitored the initial discussion of Dr. Keita’s press conference on March 13 and detected a surge in domestic coverage beginning on March 14, the day HealthMap flagged the first media mention. The problem is that all of this media coverage was in French — and was not among the French material that GDELT was able to translate those days.
This algorithm can predict a revolution | The Verge
▻http://www.theverge.com/2014/2/12/5404750/can-a-database-predict-a-revolution
the most exciting projects right now are fully open endeavors that publish their predictions for anyone to see. On the data side, researchers at Georgetown University are cataloging every significant political event of the past century into a single database called #GDELT, and leaving the whole thing open for public research. Already, projects have used it to map the Syrian civil war and diplomatic gestures between Japan and South Korea, looking at dynamics that had never been mapped before. And then, of course, there’s Ward Lab, releasing a new sheet of predictions every six months and tweaking its algorithms with every development.
#GDELT Suspension
▻http://blog.gdelt.org/2014/01/20/gdelt-suspension
In light of an ongoing dispute between third parties, we are suspending the GDELT [Global Database of Events, Language, and Tone] website
#censure #copyright_madness #recherche #université
(si vous avez des détails j’aimerais bien comprendre)
hmmm #merci ; on ne sait donc toujours pas s’il s’agit d’une nouvelle affaire #Aaron_Swartz ou d’autre chose...
The Cartography of Bullshit » AFRICA IS A COUNTRY
►http://africasacountry.com/the-cartography-of-bullshit
With the gutting of foreign coverage by most U.S. newspapers and the need to populate infinite Web space with content, a new creature has emerged: the foreign affairs blogger. Max Fisher, who hosts the Washington Post’s WorldViews page, is a leading exemplar of the species. Fisher’s newsy nuggets are often low-priority zeitgeist items that may or may not be vignettes of greater themes: examples in recent days include the tunnel-smuggled delivery of KFC chicken into Gaza, the video of the Czech president possibly drunk, a staff-passenger brawl at Beijing airport, and New Zealand’s “war on cats.”
Ce commentaire par Tommy Miles on November 11, 2013 :
It is not an accident that maps are the vehicle for this particular b*llsh%t. I had a similar experience recently that might provide some context.
The Washington Post website, like the Atlantic, Salon and other US web heavyweights are in the business of attracting clicks: instant attention to an easily understood “interesting factoid”. The most magnetic of these are those that are salacious (Miley Cyrus), or gruesome (the war in Syria, perhaps), or some factoid that gives dramatic confirmation or contradiction of “conventional wisdom”. If scandalous or horrifying photographs are the best tool for the first of these, maps — or the newly invented “Infographics” — are the best tools for instantly communicating some “believe it or not” headline. Mapped data is invariably numerical data graphed to an image of the planet, subdivided by nation states. The subdivisions used will themselves reinforce certain assumptions; unchallenged nonsense that says the important divisions between people are state borders, not classes, communities, etc., and that people on one side should all be pretty much the same and pretty much different from everyone on the other side of that artificial line. Most anything unexpected can be placed here and will elicit an “I assumed this all along” reaction from the reader. People in Canada have more Dogs than we do? That makes sense because it’s cold. Or they want to be more like the Inuit. Or Canadians like strict hierarchies. Or some other b*llsh%t.
You mention this observation over coffee the next day at work and you sound erudite, you silently classify the writer as someone who helps the world make sense, and the Washington Post has a reliable producer of advertising views.
That’s why these sort of articles a written. But what’s the source of the particular subjects which end up in the “interesting factoid” bin? Who’s discovering these “data points” harvested by the popular columnist? This is where — to me — this gets interesting.
I looked closely at another of these “surprising map” articles a couple of months ago “Mapped: What every protest in the last 34 years looks like.” in Foreign Policy magazine, which was then picked up by dozens of websites. It worked really well, because it confirmed Western fears of a world of increasing danger, and it also was big among the US left, because it confirmed their hope of a world of increasing resistance. Except it was fatally flawed nonsense.
The source of the data was the more interesting bit: it came from a grad student working for professors who put together the Texas based “Global Database of Events, Language, and Tone (#GDELT)”, a database of “events” everywhere in the world since 1979, culled from newswires and then classified with “codified emotional and thematic indicators”. So an event might be tagged as involving “a Muslim student dissident” as “XXXOPPMOSEDU” who
“Engaged in material cooperation, not specified below” as code “060″ with “the O’odua Peoples Congress (a Yoruba rebel group)” as “NGAYRBREB” at a particular geocode in Nigeria at a particular time.
So why would people want to keep that information? Well it turns out that GDELT is an open source alternative to the pre-existing classified database from the US Department of Defense called the “ICEWS”. The US DOD Worldwide Integrated Crisis Early Warning System (W-ICEWS), designed originally by DARPA, and more recently expanded by Lockheed Martin Corporation.
The three academics behind the GDELT are not DoD staffers, but are producing much the same thing for a similar audience, writing extensively on “disorder” and “terrorism.” One developed “a groundbreaking virtual reality rapid prototyping and design environment that was used by the University of Illinois Department of Architecture continuously for two and a half years, by the United States Army”, while another has been funded by “the Defense Advanced Research Projects Agency, and the U.S. government’s multi-agency Political Instability Task Force.”
The more you look, the more you see there is a huge industry, funded by or servicing the US government military and “Homeland Security”, the financial resources of which dwarf most university Political Science departments.
So b*llsh%t maps are propelled by something deeper: there is a surge in funding for “data” used to explain the world, specifically to explain the world to the United States military and intelligence agencies. More and more academics are party to this, and so that data is used in public research, not just secretly in the Pentagon, where post-9/11 assessments of intelligence failures and huge pressure to shift to private subcontracting have moved more of the work offsite. That public research is dashed across the internet in pres releases, mined for linkbait on “news sources”, that are now just websites with lots of pictures. To grab your eyeballs, what works better than maps? As a species of infographic, it is much more arresting than a column of numbers can ever be.
Prepare yourself, then, for more b*llsh%t.
Can Machines Learn to Predict a Violent Conflict? - Chris Perry
▻http://theglobalobservatory.org/analysis/633-can-machines-learn-to-predict-a-violent-conflict.html
Automated early warning systems can help NGOs and IOs in a number of ways. They can help organizations develop an evidence base to create the political will to do preventative work to intervene or mitigate negative effects of large-scale conflict as tensions ramp up. In the case of predicting conflict, organizations can use early warning risk assessments for better planning and try to target non-conflict interventions that have conflict-mitigating knock-on effects in high-risk areas.
Yet there are relatively few examples of systematic attempts to create open source tools to forecast violent conflict. Instead, existing efforts to use statistical forecasting are 1) classified, 2) proprietary and very expensive, or 3) rudimentary, often relying heavily on data of violent occurrences as the primary source of information about trends in violence. (...)
As part of IPI’s new Data Lab project, we have been looking at ways to leverage data science methods into our policy research on peace, security, and conflict prevention. One area of research over the last year has been to research the application of machine learning specifically to the conflict prevention and early warning problems.
Data Map Shows Protests Around the World Increase, With Caveat - Jill Stoddard
▻http://theglobalobservatory.org/analysis/576-mapping-some-of-the-worlds-unrest-.html
But, with data comes problems. Commenters on the Foreign Policy post quickly pointed out what they saw were the map’s limitations. “Maybe I’m being nit-picky, but I’m just looking at Latin America and I can tell you this is severely underestimating the number of protests, both now and prior to the third wave of democratization,” wrote one user. “Clearly, wrong!! In México we have an permanente[sic] wave of protetest [sic] since 1950 and in the map only appears since 1994,” wrote another.
#cartographie visualisation #guerre_et_paix #impressionnant Chris Perry ...
Today’s Complex Data Can Help Predict the Future’s War Zones, by Jonathan Keats - Wired
▻http://www.wired.com/dangerroom/2013/10/afghanwarmaps
#GDELT
FOR MUCH OF the past decade, Afghanistan’s remote Faizabad district remained out of the Taliban’s reach. But the northeastern region has become a target for violence—and according to a new predictive model, Faizabad could get dicey by mid-2014.
Led by PennState political scientist Philip Schrodt, a team of researchers developed the Global #Database of Events, Language, and Tone to scrape news from the Internet—the BBC, yes, but also hyperlocal sources around the world (...)
The map below (...) forecasts conflict levels in #Afghanistan for June 2014.
Comparing #GDELT and #ICEWS Event Data | mdwardlab.com
▻http://mdwardlab.com/biblio/comparing-gdelt-and-icews-event-data
GDELT and ICEWS are arguably the largest event data collections in social science at the moment. During their brief existence they have also been among the most influential data sets in terms of their impact on academic research and policy advice. Yet, we know little to date about how these two repositories of event data compare to each other. Given the nascent existence of both GDELT and ICEWS event data, it is interesting to compare these two repositories of event data. We undertake such a comparison for fighting in Syria, and for protest behavior in Egypt and Turkey, from 2011 to the present.
#GDELT: What can we learn from the last 200 million things that happened in the world? | War of Ideas
▻http://ideas.foreignpolicy.com/posts/2013/04/10/what_can_we_learn_from_the_last_200_million_things_that_happene
The excitement over Global Data on Events, Location, and Tone - to give its full name — is understandable. The singularly ambitious project could have a transformative effect on how we use data to understand and anticipate political events.
Essentially, GDELT is a massive list of important political events that have happened — more than 200 million and counting — identified by who did what to whom, when and where, drawn from news accounts and assembled entirely by software. Everything from a riot over food prices in Khartoum, to a suicide bombing in Sri Lanka, to a speech by the president of Paraguay goes into the system.
Similar event databases have been built for particular regions, and DARPA has been working along similar lines for the Pentagon with a project known as ICEWS, but for a publicly accessible program (you can download it here though you’ll need some programming skills to use it) GDELT is unprecedented in it geographic and historic scale. The database updates with new events every night following the day’s news and while it currently goes back to 1979, its developers are working on adding events going back as far as 1800 according to lead author Kalev Leetaru, a fellow at the University of Illinois Graduate School of Library and Information Science.
#histoire #politique #conflits #data via @francoisbriatte
#GDELT
►http://eventdata.psu.edu/data.dir/GDELT.html
Package GDELT pour #R
▻http://cran.r-project.org/web/packages/GDELTtools
Guardian datablog
▻http://www.theguardian.com/news/datablog/2013/apr/12/gdelt-global-database-events-location
▻https://willopines.wordpress.com/2013/04/11/excitement-about-gdelt-and-some-personal-intellectual-history
Quantifying memory
▻http://quantifyingmemory.blogspot.co.uk/2013/04/big-geo-data-visualisations.html
Mapping with GDELT
▻http://nbviewer.ipython.org/urls/raw.github.com/dmasad/GDELT_Intro/master/GDELT_Mapping.ipynb
Mapping Syria’s conflict
►http://syria.newscientistapps.com