Prototyping in Public

A user story map from our meeting in St Louis on August 23, 2016

There’s an old adage in the software development world that you should plan to throw one away. You often need to be able to explore a design space and learn about the particular constraints and challenges involved before committing resources to a specific architecture and plan of action. Or as Fred Brooks said:

In most projects, the first system built is barely usable. It may be too slow, too big, awkward to use, or all three. There is no alternative but to start again, smarting but smarter, and build a redesigned version in which these problems are solved. The discard and redesign may be done in one lump, or it may be done piece-by-piece. But all large-system experience shows that it will be done. (The Mythical Man Month, p. 116)

It was in this spirit that we’ve spent the last 5 months working on a prototype that we’ve told you a little bit about already.

Building dnflow let us explore how to collect data from Twitter and from the Web, while performing some modest analysis on the collected data and presenting the results to the user. Our focus wasn’t on a visual design so much as it was on a user interaction pattern that would allow someone to quickly and iteratively collect and analyze data. It also gave us an opportunity to learn how to work as a distributed team in three different time zones.

Brooks goes on to say:

Delivering that throwaway to customers buys time, but it does so only at the cost of agony for the user, distraction for the builders while they do the design, and a bad reputation for the product that the best redesign will find hard to live down.

I guess you could say we parted ways with Brooks on this one by making the dnflow prototype available to you for the past two weeks. Hopefully it didn’t cause any agony or give us too much of a bad reputation.

We needed to see what feedback we could get from the social media archiving and research communities that we are trying to tap into. Helping build a community of practice around social media archiving is Goal #1 in our project. Any particular technical artifact that gets generated is in the service of that larger goal. So sharing this prototype was an extremely important communication tool for us.

Since we made it available two weeks ago we’ve seen over 2 million tweets collected, in 234 different datasets created by 44 users. We even saw a few intrepid folks use our Ansible playbook to set up a local instance of dnflow, which was unexpected, but very cool. This post is just a brief recap of what we’ve learned from you in the last two weeks on Twitter, Slack, GitHub and email.

Permissions & Trust

  • allow users to publish datasets or keep them private
  • add the ability to delete datasets
  • filter tweets from protected accounts when publishing
  • allow the view of datasets to be limited to ones you have created
  • clarify the time span that data was collected over
  • allow users to express a data retention policy for their datasets
  • provide ways for the values of the researcher to be communicated to potential users of their datasets

Data Acquisition

  • expose the ability to search in one or many languages
  • collect the replies to tweets that were part of the initial collection
  • collect from the stream in real time
  • show the number of tweets pulled in the list of datasets
  • provide a visual cue when a user is rate limited by the Twitter API

Reporting and Visualization

  • remove the hashtag you enter from the popular hashtags chart to reduce skewing
  • view languages present in the collected data
  • view geographic places mentioned in the dataset
  • see the most replied to users
  • show a grid of profiles for most mentioned users
  • unshorten URLs prior to counting them
  • allow users to explore the long tail of results (e.g. next 20)
  • allow users to see all results in particular areas
  • horizontal bar graphs for long lists

Curation

  • download report/data as a zip package with metadata
  • hide/remove content from a dataset (e.g. pornography)
  • see tweets that are related to extracted images for context
  • allow downloaded webpages and media to be downloaded as a WARC file for provenance

Infrastructure

  • make the Ansible playbook as seamless as possible for local/remote deployment

This is not meant to be an exhaustive list of the things we’ve learned but they are some of the big issues. If you feel like we missed anything please leave your comments here, or come find us in Slack. We anticipate that addressing some of these ideas will likely require significant architectural changes — but that’s why we built this prototype. We may throw away some of the code, but you can be sure we’re not going to be throwing out are the ideas we learned from you during this process over the last two weeks. Thank you very much for the contributions you’ve made and for reading about our progress!

We are going to be retiring the prototype tonight at 8PM EDT. We’ll keep the OS image around for later study. If you would like us to make it available for any of your work please let us know and we can bring up an instance for you.


The core development team are going to be getting together in person at the Maryland Institute for Technology in the Humanities on September 29–30 to digest many of the design ideas that came up during our meeting in St Louis a few weeks ago. If you have any interest in joining us for all or part of this meeting please get in touch. We can accommodate more people at MITH, and will try to make the meeting available to remote participants if there is interest.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top