Via The Guardian, an article that all webwriters should read: Wikileaks Iraq: how to visualise the text. Excerpt:
In October, WikiLeaks released 391,832 reports from the Iraq War, the most comprehensive set of documents about the conflict to date. Each is a report of a specific incident in the Iraq war.
At the rate of one document per minute, it would take 272 days non-stop to read every report -- and you still might miss the big picture. This is exactly the sort of problem where visualization can help, by turning patterns in the documents into patterns in a picture.
The Guardian and others had already created visualizations by plotting the incident locations on a map of Iraq, and by graphing monthly casualties.
My Associated Press colleague Julian Burgess and I wanted to go a step further, by designing a visualization based on the richest part of the report: the summary text, a human-readable description of what actually happened. But how?
We pulled out our data mining textbooks and started experimenting, eventually settling on a technique that extracts the key words from each document. A document's key words appear frequently in that document, but rarely in all others.
The graphics in the article are worth a lot more than a thousand words each.




Comments