Ethics of digital data collection: The debate continues

The conversation around digital data collection and ethics behind it often default to rules/laws that exist in “face-to-face” data collection: if it’s in a public arena, then the rules are the same for observing people in physical public spaces. However, as many within the realm of digital data know, the idea of “public” can vary in the virtual sphere, and further, questions have been raised whether or not we as researchers have the right to use posts and other digital artifacts posted by users if they posted them without the intention of the posts being discovered and used without their consent by researchers. Basically, if someone knew that eventually their content would be used in research, would they have posted it at all? It’s the digital Hawthorne Effect.

This brings up a few issues with the approach of studying online communities, particularly on Reddit where they have the option to go private and the larger issue of there being Discord Servers, IRC channels, Slack channels, etc. where the members congregate. This then brings up the issue of gaining consent from every single member that frequents those closed access communities, and even on Facebook there are a number of closed groups that require membership to view their content. Although we can all agree that when something is closed it is no longer in the “public sphere” of the Internet, there’s one thing that I’ve grappled with in terms of looking for things using hashtags – if I happen to come across, say, an Instagram account because they used a hashtag once, can I then look at their account and use other posts by them if there are no hashtags on any of their other posts?

Basically – is visibility and “searchable”-ness (through hashtags) a facet of what constitutes “public” data, or is the mere fact that the account itself is public (i.e., not locked down and private) sufficient? If it doesn’t have a hashtag, does it exist in the public sphere?

Digital research can become incredibly messy, not just because of the types of content that exist online that help to paint a picture of social and cultural life through multiple modalities, but also because of the questions of ethics that arise throughout these processes. Postill and Pink (2012) talk about “hashtag sociality” in their work discussing ethnographies in virtual environments, and that hashtags are not merely a part of online culture but serve as an organizing function for topics like a web forum does (Solis, 2011). This still complicates my earlier question in whether or not content from an account that has hashtags on some posts but not others are still “public”.

The Intenet as a public sphere has been a topic of discussion for decades now, but laws surrounding internet privacy and mobility still challenge the status of “public-ness” and “open-ness” that the Internet is typically known for. It is more complex, and because of its ever-changing nature, constantly being differentiated from traditional notions of public-ness.

Hashtags are now ubiquitous in online interactions, and may be complicating some of the questions of ethics surrounding what data is “open” and “public” and what is not. I wonder where these conversations surrounding digital data ethics will go, especially since now there are so many concerns about how so much of our lives have gone digital and the risk of privacy involved in using this data in research without the person’s knowledge (not necessarily consent, but maybe even consent). There are clear and defined expectations and rules for what is public and private (open versus closed group), but what is someone just posts something without granting it discoverable markers? Is it still public then? I don’t know if this adds an unnecessary complciation to the ethics debate, but it is one that I’ve been curious about throughout my own research.

Comments

Leave a Reply Cancel reply