Interesting stuff from Jonathan Stray of the Nieman Journalism Lab. These are the people who gave us that nifty research on "The Meme Tracker", which revealed the "shape" of the news cycle and the back-&-forth between between blogs and MSM:
Jonathan Stray did a very smart analysis for Nieman Journalism Lab, looking at a universe of 800 stories about the alleged involvement of two Chinese universities in hacking attacks on Google. His findings were striking: 800 stories = 121 non-identical stories = 13 stories with original quotes = 7 fully independent stories.
Stray coded the 121 non-identical stories that had been clustered together by Google (the clustering algorithms are good, but not perfect – nine stories were unrelated to the specific case of these two universities) and looked for the appearance of novel quotes, which he considered the “bare minimum” standard for original reporting. (Interesting – it’s the same logic that led Jure Leskovec to track quotes to track media flow in MemeTracker.) Only 13 of the stories contained quotes not taken from another media source’s report. The essence of Stray’s piece is the question, “What were those other 100 reporters doing?” The answer, unfortunately, is that they were rewriting everyone else’s stories.
Stray's actual study can be found here.