Part Three: Keywords and advanced search
This third and final post in a series on conducting online inquiries explores the importance of researching digital media landscapes, constructing effective keyword searches, and documenting the online inquiry process.
In the two previous posts in this series, we have discussed sources of bias that can negatively impact online investigation during the initial stage of inquiry, as well as tools and tips for identifying relevant content while mitigating bias. In this final part of the blog series, we will explore the importance of a rigorous methodology for online inquiry to produce more comprehensive searches and results.
1. Research media landscape and subject background
Before the identification and collection of open source content for further analysis, every investigation’s first step should be to develop an understanding of the digital media landscape within the geographic area or topic of focus.
An investigation team should ask:
- How is information primarily disseminated?
- Which online sources and social media platforms are most popular?
- What biases might the sources of information themselves carry?
- What is the level of Internet and smartphone penetration? How does this differ across different segments of the population?
Answering these questions can help researchers determine which online sources to dedicate resources to, as well as develop an awareness of the limitations and potential ‘blind spots’ that are likely to affect their digital investigation. For example, governments may block or restrict access to certain social media sites, leading information to be shared through less commonly used platforms or encrypted messaging apps. Likewise, access and use of digital technologies may differ based on various and intersecting criteria such as gender, education level, socioeconomic status, and language. This will impact not only who has the ability to share information about human rights abuses that affect them, but also what types of abuses will be most widely depicted in open source media. Sites like DataReportal and Media Landscapes can be useful in informing this assessment of the digital landscape.
In addition to evaluating the media landscape in which researchers will be working, iit is also often useful to gather some general information on the subject of the investigation, whether it be a particular conflict, protest movement, thematic issue, etc, to develop an overall understanding of the context in which an investigation is taking place. While this material may not be directly used within an investigation, here you have some sources you can use: international or national news articles, reports by non-governmental organizations (NGOs) or official government statements. Contextual knowledge will inform the inquiries researchers make, for example:
- types of human rights abuses that researchers look for.
- the language(s) in which searches are conducted.
- as well as generating initial keywords or search parameters, such as regions of interest, groups or individuals involved, and dates of events.
2. Identifying keywords and hashtags
When it comes to searching for content online, whether on social media platforms or general search engines, choosing the ‘right’ keywords is critical to identify the most relevant and thorough scope of information available.
If the keywords selected by the researcher are too general, they will yield an overwhelming number of results, many of which are irrelevant to the investigation. Too specific, and the search may exclude large amounts of relevant information.
The best keywords are those that are both unique to the subject being researched and are likely to appear on the webpage or social media posts that researchers are looking for. When developing a set of keywords to use, details gathered during the background research phase can be a helpful starting point. This can include the names of places, relevant people/organizations, types of human rights abuses that are likely to have occurred, the dates of key events, and the relevant language(s) that content may have been shared in. Google Trends can be a helpful tool for generating a list of search terms, providing information about which keywords are trending in a certain country (particularly useful when monitoring for real-time information) as well as popular search queries related to the topic or keywords that you enter.
This preliminary research should help to identify many relevant keywords, however, certain types of useful keywords will likely be missed, such as context-specific terms or slang, coded language, or hashtags in use. For example, during the 2019 protests in Chile, protestors used the slang terms ‘zorrillo’ and ‘guanaco’ to refer to different types of armoured vehicles that the security forces used. As discussed in part one of this series, coded language is also frequently used when referring to sexual and gender-based violence. Failing to account for this can result in entire subsets of information being overlooked (and thus, information bias). Working with local researchers and experts is often the best way to account for this.
It is also worthwhile to consider likely variations of those terms that may appear. This can include grammatical variation, such as verb tense, masculine v. feminine forms, and plural v. non-plural, as well as common abbreviations or acronyms. Considering that content shared on social media in particular is often unfiltered/unedited, it may also be helpful to identify common misspellings or alternative spellings of relevant keywords.
This is particularly relevant when searching for terms that have undergone transliteration from another script and especially when searching for the names of people or places. For example, ‘Nay Pyi Taw’, the national capital of Myanmar, may also be written as ‘Nay Pyi Daw’ or ‘Naypyidaw’. Also, as noted in part one of this series, social media users often use colloquial language and first-person terminology when posting content. When added to a combination of keywords, or a ‘search string’, such terms can be a useful way to return results including more ‘user-generated content’ as opposed to news articles or NGO reports.
3. Advanced Search and Boolean Basics
To make the most effective use of search terms, it will be necessary to apply additional techniques for filtering search results, such as through an ‘advanced search’ option, and by using multiple keywords in combination.
Whether using Google Advanced Search or searching directly on social media, one of the best options for filtering results is to restrict the date range of your search results to the time period of a specific event or prolonged conflict/crisis. Other filtering options make it possible to restrict results to certain websites (‘site:’), such as particular news websites or social media platforms, as well as filetypes (‘filetype:’).
Another method for improving online search results is to experiment with different combinations of keywords using Boolean search operators.
The key search operators to understand are:
- AND – producing results containing only a combination or two or more search terms (i.e. police AND protest)
- OR – producing results that include either of the keywords included, useful for accounting for variations in language, spelling or grammar (i.e. shot OR shooting)
- NOT – excluding results containing certain keywords that are not relevant to the investigation (i.e. police NOT municipal)
Searches can also include more than one Boolean search operator or other filtering option, allowing for long, complex search strings such as shown in the example below. Tweetdeck is a useful tool for experimenting with such search strings, making it possible to construct multiple searches and compare results side-by-side.
Most importantly, online discovery should be an iterative process. Determining the best combinations of keywords and filters is only possible by paying attention to the results of each search attempt, identifying what information is present, and just as importantly, using knowledge of the subject of the investigation to determine what types of information might be missing from the results. Additional search terms and hashtags can also be identified from relevant search results and integrated into subsequent queries.
4. Documenting your online inquiries
Create a search database
There is no single approach that is guaranteed to provide the best results and the process of online discovery is often one involving substantial experimentation. However, like any good experiment, it is best to approach this in a systematic and structured manner, which may take the form of a search database or spreadsheet.
This document can include an inventory of search terms identified while conducting background research and throughout the inquiry process, along with translations for all relevant languages. It can also be used to track popular hashtags (these often need to be updated when researching ongoing events), record possible keyword combinations or exclusions to use with Boolean operators, as well as document attempted search strings and their results on specific platforms.
Such a record is vital not just for producing effective and efficient searches but also for developing a methodology that allows for regular evaluation of the information gathered to identify and, wherever possible, mitigate the influence of potential bias. For example, only by keeping a detailed record of the search terms used throughout an investigation can a researcher determine possible gaps in the information produced as well as demonstrate steps taken to address them.
TIP: When working as part of a team with multiple researchers, use Google Sheets to create a collaborative search database for the investigation.
This blog series has attempted to highlight potential sources of bias that may influence an online investigation during the inquiry stage, as well as mitigation strategies that can be employed throughout the research. This guide is not meant to be prescriptive, but rather seeks to highlight where and when such biases are likely to arise and share knowledge about how Amnesty International has sought to limit their influence.
When developing or evaluating any methodology for online inquiry, researchers should keep in mind the following key elements to ensure that they are effectively accounting for and mitigating biases within their investigation:
- Awareness – Researcher’s should be aware of and actively seeking to identify potential sources of bias within an investigation. This includes cognitive biases stemming from the researcher’s approach, knowledge, and decision-making as well as technical biases native to the tools and platforms that the researcher is using.
- Context – Understanding the context within which a research project is taking place is essential to designing and carrying out an investigation. For open source research, this includes not only gathering background information on the subject matter, but also looking into the digital landscape to identify which technologies are used to capture and share content, who has access to these technologies, and what limitations might affect the research.
- Reflexivity – Developing a consistent and systematic approach to online inquiry is an important step to carrying out an effective and efficient investigation. However, investigation is often an iterative process that builds on knowledge gained while researching. Methodologies for inquiry should be similarly adaptive, able to track the results of searches in order to identify fruitful keyword combinations and account for any gaps in the accumulated information.