Into the Hashtags

Social network diagram for period 7 from Beyond the Hashtags

This is a post in a semi-regular series about project developments on the Documenting the Now project. Please feel free to add any feedback or questions here in comments or by joining us over in Slack.

It was extremely fortunate that just as we began our work on DocNow two members of our advisory board, Deen Freelon and Meredith Clark, along with their colleague Charlton D. McIlwain, published their landmark report Beyond the Hashtags: #Ferguson, #BlackLivesMatter and the Online Struggle for Justice. To be fair, credit for this good fortune should really go to Bergis, who put a great deal of thought and planning into who to invite to the Documenting the Now board.

One of the first things we did as a team was dive in and give the report a close read. We also invited Deen and Meredith to speak about their work at one of our meetings. While much is still up in the air about how exactly the DocNow application is going to work, one thing is certain: Beyond the Hashtags is going to deeply influence our approach. Here’s why.


First and foremost, Beyond the Hashtags is a great example for us because it is accessible. Even with its technical focus the report uses clear, non-jargony language and a compelling presentation to reframe the debate around a perceived decade long decline in youth civic engagement. This is not your average, run of the mill scholarly research article — it has a purpose, and a mission. We want the same to be true for DocNow.

At its core Beyond the Hashtags provides insight into a dataset of 40.8 million tweets related to the BlackLivesMatter movement that were purchased from Twitter. Their analysis combines quantitative and qualitative methods to explore the communities of participants during 9 distinct periods between June 1, 2014 and May 31, 2015. The quantitative methods include Freelon’s own Twitter Subgraph Manipulator Python library that implements an algorithm known as the Louvain method to extract community information from the extensive Twitter network graph.

We want DocNow to work with this scale of data, for multiple users, and to provide similar functionality for partitioning the graph into communities over time. We think it’s going to be extremely important to explore communities in gathered Twitter data, particularly as we begin to examine the media and URLs being shared, in order to evaluate them for archiving.

It is also important for DocNow to work with purchased datasets like the one undergirding Beyond the Hashtags. Unfortunately it is not always possible to collect data in realtime from Twitter because of constraints on their public search and streaming APIs. Historical analysis often needs to be performed rather than real time data collection. It’s important for DocNow to work in this modality with both data acquired from Twitter or research datasets of tweet identifiers.


Another signpost Beyond the Hashtags provides us is its use of participant interviews to explore what motivated the different communities of participants, and why they engaged on Twitter. Without this qualitative dimension it would’ve been difficult, if not impossible, to know how to meaningfully label and interpret the discovered network graph.

This attention to the personal stories found amongst the data can also be seen in the report’s approach to ethics and privacy. Since Beyond the Hashtags is a public, open access document the authors took several steps to protect the Twitter users that they draw attention to. These techniques are discussed in Appendix A and include:

  • only linking to tweets, instead of reproducing their full text, which allows users to delete them
  • linking only to tweets that were widely shared (more than 100 retweets)
  • linking only to tweets from prominent users who are followed by more than 3000 users
  • not making their dataset available to the public, which would violate Twitter’s Terms of Service.
  • embargoing their dataset of tweet identifiers for a year to give people an opportunity to delete their data

At this stage it’s not entirely clear whether these specific features will be built into DocNow. But thinking that puts the concerns of the content creators first is going to be essential to us as we design the DocNow. Gnip (a subsidiary of Twitter) calls this Honoring User Intent which seems like a useful way to characterize these design considerations. Here are some of the specific affordances or as Katie Shilton calls them value levers we’ve been beginning to discuss in Slack:

  • visibility: is it possible to obfuscate or anonymize content to protect users?
  • transparency: can users easily find out why their tweets are being collected and who is doing it?
  • consent: can we design in functionality that will allow content creators to opt-out, opt-in or license their content that is being collected?

We’re on the lookout for examples of functionality or creative hacks in this area, such as: Storify notifications when a user’s tweets are included in a story, ArchiveSocial’s use of authentication to grant permission to archive, Lentil’s ability to post donor agreements in comments on Instagram, and Social Feed Manager’s approach of running in a local context where data access can be managed. If you know of more examples we’d love to hear from you.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top