Analyzing Twitter Data on Ferguson

The grand jury decided not to indict Officer Darren Wilson on murder charges was the first historic event I followed on twitter. I felt helpless, anxious, and inspired as I read the feeds. After a few hours it occurred to me that someone should be archiving this information, but I couldn’t be sure anyone was. How many people do “digital history/humanities” work anyway? So a few hours after the decision was announced I activated an archive on the online tool, Tweet Arivist, to collect all of the tweets on #Ferguson and #MikeBrown. I have now made that archive public on the site Figshare. What follows are some suggestions for how scholars might interact with this twitter data.

 

I have some experience working with tweets as sources. After the death of Nelson Mandela I developed a qualitative method to analyze how people were speaking and remembering Madiba on twitter. You can find a detailed description of this method and look at the short products they inspired here. My approach was simply to cull through all of the material, identify trends and tag the tweets. Then I searched my tags to see how people were talking about Mandela after his death.

I used Tweet Archivists to collect my tweets. The site requires a subscription, but I think that in cases where collection is time sensitive it is worth the money. The site exports these tweets as pdf or xlsx files. It also has several built in analytic tools. I used the built in word cloud generator to identify jumping off points. I then performed in-text searches to find the tweets that mentioned those words and then crated a new column to tag these tweets. I used the program OpenRefine for this, because it allows users to search texts and batch edit. For information on using OpenRefine, see Thomas Padilla’s great resource: getting started with OpenRefine.

Once I uploaded my Madiba dataset to OpenRefine, I could search the tags and spend more time analyzing the data in those tweets. But that dataset is much smaller than the Ferguson dataset I collected. One approach to analyzing the Ferguson data might be to limit qualitative searches to specific time frames. Tweet Archivist packages the tweets it collects in bundles of about 500,000, and these are arranged chronologically in the data set. Researchers interested in analyzing reactions to specific news casts that might be biased or inflammatory might look only at the tweets posted immediately after the report aired. The researcher could then use OpenRefine to tag the relevant tweets.

#2MikeBrown - Top Words 2015-02-22 17-17-10

Since the data is hosted on Figshare those useful TweetArchivist tools won’t be available. Researchers interested in using this dataset will have to use their preferred program for creating word clouds. There are certainly many options out there, but I think AntConc is a particularly useful tool because it allows you to perform many different kinds of analysis. A big picture concordance analysis on this data set might yield some compelling information. A researcher could create plots that examine what words most often accompanied a word like “cop” or “fire” to closely examine how people talked about the protests that followed the Ferguson decision. Points that stand out can be examined again with a more qualitative lens. Micky Kaufman, uses AntConc’s concordance function to great effect in her Quantifying Kissinger project.

I encourage anyone interested to work with this dataset. It would also be helpful if scholars and institutions that have data on Ferguson linked to one another to raise the visibility of this data on the web. Tweet a link to @bradsh41, or leave a comment if you have a similar dataset you would like me to link to on my Figshare project.


Comments

5 responses to “Analyzing Twitter Data on Ferguson”

  1. Jack Hennes Avatar
    Jack Hennes

    Thanks for this, Joseph. I recently learned how to archive tweets, and it’s always helpful to see the tools and methods others are using. If you’re interested in finding a way to scrape twitter data directly into a Google Sheet, which makes it easy to tabulate statistics, create graphs, and so on. I believe you can scrape up to 5,000 tweets at once. There’s also a feature to visualize the data, and I find it to be useful. You can make Google Sheets public, too.

    I have a blog post on how to set up TAGS: http://www.jackhennes.com/archives/321

    You can access TAGS here: https://tags.hawksey.info

    — J.H.

    1. Joseph Bradshaw Avatar
      Joseph Bradshaw

      Jack this looks like a great tool thanks for sharing the link. After I wrote this blog I found a blog post by Ed Summers at MIRTH with information on twitter’s policy on scraping, archiving and sharing data. He mentions another tool called GNIP that was recently acquired by twitter and lets you get at tweets that are 30 days old. You can read more from Ed Summers here. http://mith.umd.edu/miths-ed-summers-discusses-ferguson-twitter-archive/

      1. Ethan Watrall Avatar
        Ethan Watrall

        MITH (Maryland Institute for Technology & the Humanities)….not MIRTH…though, thats pretty funny

        1. Joseph Bradshaw Avatar
          Joseph Bradshaw

          They should add an R and make it MIRTH the Maryland Institute for Research, Technology and the Humanities. U of M if you are looking for a DH acronym…you’re welcome

  2. Jack Hennes Avatar
    Jack Hennes

    Thanks Joseph! Getting those older tweets is really going to be an asset…also, 13,480,000 tweets is just…astounding.

    Anyway, I look forward to following your work!

Leave a Reply

Your email address will not be published. Required fields are marked *