Ten questions we’re asking about ethics, data, and open source research

This post is documentation of the RightsCon session titled ‘Using data ethically in humanitarian, human rights, and open source investigations,’ hosted by The Engine Room. The session was organized and moderated by Laura Guzmán (The Engine Room) and co-facilitated by Jonathan Drake (American Association for the Advancement of Science), Sophie Dyer (Amnesty International, FemOS) and Gabriela Ivens (Human Rights Watch). We drew upon research conducted by The Engine Room for the Human Rights, Big Data and Technology Project at the University of Essex. To encourage open and honest discussion, participants’ comments were documented but not attributed.

I have a video that shows someone being attacked that I want to publish in order to make a stronger message, can I? A huge leak of data has been made available that was hacked from a government body, can I use this in my research? Is entering into a private Facebook group under a pseudonym okay? And, in any case, is this still considered open source research?

These are the questions with which Gabriela Ivens, head of Open Source at Human Rights Watch, opened our RightsCon session. Gabriela’s questions highlight how using data and open source research for human rights and humanitarian work raises new ethical challenges to which we don’t have straightforward answers.

What is at stake?

The use of publicly available information to expose state violence has a powerful democratizing potential, both in terms of who contributes to research, and whose stories get told. Projects like Amnesty International’s Crisis Evidence Lab Decoders platform are citizen-powered, and proof of this potential. As Christof Heyns, former UN Special Rapporteur on extrajudicial, summary or arbitrary executions, puts it, digital technologies “create opportunities for pluralism that can democratize the process of human rights fact-finding, as well as offer mechanisms of social accountability that citizens can use” to hold governments, corporations, and other powerful actors to account.

That said, the ethical stakes are high. “One person’s open source investigation could be another person’s ‘doxxing’”, write The Engine Room’s Deputy Director Zara Rahman and Gabriela in their chapter in the book Digital Witness. In other words, open source researchers often make use of surveillance-like tools, forgo conventional forms of informed consent, and exploit the increasingly fuzzy distinction between public and private data.

What’s more, in our experience, there is a lack of consensus within the open source investigative community on what is ethical. This is what Cambridge University researchers from The Whistle project, Ella McPherson, Isabel Guenette, and Matt Mahmoudi (Amnesty Tech) call a “knowledge controversy”. Put simply, new tools and practices, a proliferation of data and actors has challenged established ways of working in human rights and humanitarian investigations. The resulting uncertainty among practitioners and the diversity of opinions and approaches was the focus of our strategy session.

The questions below are far from exhaustive—for example the important topic of vicarious trauma is not covered—rather they are a sample of the conversations had between facilitators and participants over the course of the session. 

Amnesty International’s Digital Verification Corps members engaged in research, London, 2019

Ten questions we’re asking each other

1. Which types of data need to be handled with care?

In Laura’s opening remarks of the session, she made clear that data is much more than numbers. Data can take the form of photos and videos, social media profiles and posts. What’s more, these data points are connected to other data points, people, and places, in ways we can never fully anticipate. This complex web of relations means that it is not just impossible but also risky to divide data into sensitive or not sensitive. In short, all data warrants care.

Care in the context of data can mean many things including information and cyber security; accessible and sustainable storage; cleaning and ongoing maintenance. As practitioners, how can we be better at recognizing and resourcing data care throughout the research process?

2. Why broaden our definition of ethics?

In breakout groups we discussed how centring ethics could be an opportunity to go further than the simple notion of “do no harm.” What if our ethics took into account social inequalities that have their origins in the long-standing expropriation of lands and goods, people and labour? In other words, we acknowledge existing harms. For instance, what if our definition of ethics included explicitly anti-racist policies? As Sophie noted in her opening talk, broadening our definition of ethics “might require us to change our metrics of success. However, to not ask these questions is to risk reproducing the very power asymmetries we seek to unsettle.”

3. Who is (and isn’t) in the room when we’re talking about ethics?

This year’s edition of RightsCon, held online in the face of the COVID-19 pandemic, saw people who might not otherwise have been able to attend because of travel costs, disability or other reasons, join our session virtually. At the same time, people with limited internet bandwidth, unstable electricity supplies or limited access to internet-enabled devices are unlikely to have been able to participate easily. Nor would non-English speakers or people reliant on closed captioning. These digital divisions echo throughout work that is done with data and technology—access is unevenly spread, making knowledge and opportunity unevenly distributed as well. These questions of access intersect with racist, ableist, classist, sexist and heteronormative practices to shape who, most commonly, ends up in “rooms” discussing ethics. 

4. Are our expectations about what is ethical online versus offline different?

During our session there was frustration from some at a sense of exceptionalism around open source research — in particular around the idea that “the normal rules do not apply.” It was suggested that this exceptionalism is an extension of our different expectations of privacy on and offline. At the same time, RightsCon 2020 was evidence of an increased awareness of how, too often, online harms lead to offline violence.

When exceptionalism does occur, how can we challenge it? One way is to stress the continuity between cyberspace, physical spaces, and the people that populate both. In the context of community-based activism and education, the Detroit-based Our Data Bodies project makes this argument brilliantly in the Digital Defense Playbook.

5. What informal strategies have we developed to help us make ethical decisions?

In the session we shared our informal strategies for thinking about ethics. Gabriela shared the acronym SIP: “S is for Sticky, having the time to acknowledge that a decision is sticky. I is for Icky, that icky feeling I get in my gut if I have done something that doesn’t sit well with me. And finally, P is for Picky, which is being able to say no and being able to communicate why. Half of my work boils down to working out if we can do something, the other half is working out if we should.”

6. How can we encourage each other to feel comfortable saying, “no”?

During our conversation, Jonathan shared an example of his remote sensing work for a group of civilians in Aleppo in 2016, who contacted him in the hope that newly available satellite imagery could be used to plot a safe passage out of the besieged city. After assessing the “serious ethical challenges”, Jonathan’s team determined that there wasn’t a way to identify, verify, and communicate a route, the risks, and unknowns to the group. The team made a call to not involve themselves in life-or-death decisions, even though technology presented them with the opportunity. Practitioners generally don’t approach problems with the intention of telling key stakeholders “no” as a final answer, but it can be an important position to reach if we want to work ethically and reduce the harms we cause.

7. How does speed impact decision making?

In her remarks, Gabriela noted that, “speed often dominates the work of open source investigations.” She mentioned drivers such as the safety of people who might need a quick response, timing a project launch for maximum impact, and the possibility of being scooped. At the same time, she says, working at speed can constrain our ability to consider whether we should investigate, share or publish potentially risky information. Ultimately, speed plays a complex role in making ethical decisions; if it is not monitored or pushed back against, it can become a barrier to good decision making. 

Tools or protocols for decision making can help. To this end, The Engine Room, in collaboration with Jonathan and the American Association for the Advancement of Science to develop a series of decision trees for the collection and sharing geo-located data in crisis situations.

Collecting and sharing geo-located data in crisis situations: Decision-making tools for practitioners. Source: AAAS

8. How can ethics help us to negotiate competing priorities? 

Different goals within a project can produce competing priorities. For example, the automation of the mass collection and archiving of open source information can save documentation from erasure. On the other hand, this relatively indiscriminate approach runs counter to the principle of data minimalism. Data minimalism means collecting only what you know you will use. 

Tensions exist, too, within frameworks. As the forthcoming report by The University of Essex’s Human Rights, Big Data and Technology Project points out, there are occasions when “obtaining informed consent … is not always possible and has to be balanced with other concerns and risks to human rights.” For instance, when the safety of the content creator is threatened. How do ethical frameworks help us to negotiate competing commitments?

9. What new practices can we develop to support the ethical use of data and open source information?

Several breakout groups explored the idea that there are more options available to us than the binary of “yes or no”, “publish or not publish”. In other words, there are more ways of doing work than we often default to.

We explored small changes organisations could adopt, such as establishing team-level agreements around ethics before projects begin. As well as larger cultural changes, such as an expectation (and appreciation) of critique, uncertainty, and vulnerability. 

We noted creative alternatives to publishing documentation that risks retraumatizing the subjects of violence or doing other harms. Using a visual language that is different to that of the source materials can have the added benefit of cutting through already saturated media environments. Additionally, in the book Data Feminism, Catherine D’Ignazio and Lauren Klein discuss the use of data memoranda of understanding as a strategy to protect the integrity of data, strengthen accountability, and guard against its unethical uses.

Adjustments need not be resource intensive. For example, when the creators of user generated photos or videos cannot be attributed for safety or other reasons, we include a simple thank you message? Ultimately, any cultural change in the investigative community will be the sum of these small shifts in practice.

10. What next?

In addition to the many questions that came up in our session, there were ways forward, too! We have touched upon some of these above, and others merit mentioning below. Explore these tools, reading and framings and let us know what you think. 

Beyond ethics, human rights offers a legal framework for decision making in open source research. In this vein is the upcoming International Open Source Investigations Protocol developed by The Human Rights Center at the University of California, Berkeley, in close collaboration with practitioners as well as the International Criminal Court and the United Nations. The University of Essex is also developing a Human Rights, Big Data and Technology Project report and workbook for open source research methods employing a human rights-based approach.

You can reach Laura at [email protected] and Sophie at [email protected].

Further reading

  • The Human Rights, Big Data & Technology Project
    Housed at the University of Essex Human Rights Centre with partners worldwide, the Human Rights, Big Data and Technology Project considers the challenges and opportunities presented by AI, big data and associated technology from a human rights perspective.
  • CLEAR Lab Handbook
    This document serves to guide our actions, decisions, and work, as well as to help realize our goal to be a ​feminist​ marine science laboratory! It is a living document, meaning it is something we interact with and update frequently as members come and go, and as we evolve as a lab.
  • Data Feminism (open access version and reading group)
    Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed.
  • Design Justice: Community-Led Practices to Build the Worlds We Need
    An exploration of how design might be led by marginalized communities, dismantle structural inequality, and advance collective liberation and ecological survival.
  • Digital Witness
    Modern technology—and the enhanced access it provides to information about abuse—has the potential to revolutionize both human rights reporting and documentation, as well as the pursuit of legal accountability. However, these new methods for information gathering and dissemination have also created significant challenges for investigators.
  • Location-based data in crisis situations
    In all circumstances, there are potential risks and benefits associated with collecting, aggregating, representing, using, and storing such data. In the context of crises, however, the nature and significance of the risks and benefits will differ from a non-crisis context. The following principles and guidelines aim to fill a gap in ethical guidance. 
  • The Data Storytelling Workbook
    From tracking down information to symbolizing human experiences, this book is your guide to telling more effective, empathetic and evidence-based data stories.
  • What would a feminist open source investigation look like
    Here, we set out why intersectional feminist thought should be considered when grappling with the radical possibilities and serious ethical challenges of open source investigations.

Special thank you to Laura Guzmán and The Engine Room for organising the RightsCon 2020 event!

Thanks, too, to Catherine D’Ignazio for answering Sophie’s email question about how she defines ethics.

Feature photo: Evan Dennis