SH-A Labels

These labels are meant to be used by archivists and academics building social media collections. Some are meant to be used at the item (post) level, others for the dataset as a whole, and some are appropriate at both levels.

Item-Level Labels

This account is likely a bot.

Account created shortly before data collection.

Content Warning

Violent imagery, hate speech, etc.

This account claims to be someone it is not.

Advertising or other irrelevant content.

Community Control

This account has many authors.

Take Care

Take extra care in using this content.


Unverifiable, false, misleading, or misinformed.

Semi-Private Space

Users did not consider this to be a public statement.

Privileged Information

This content cannot be accessed by all users.

Collection-Level Labels

Opt Out Provided

Users can opt out or remove their content from collection.

Outreach Conducted

We have shared publicly about data collection and contact options.

Part of a Whole

Dataset is part of a larger collection of diverse materials.