Part Two: Browser and search engine basics
In the second of this three-part guide examining open source information gathering, we look at the basic tools used for online inquiries, selecting the best ones for the job, and how to use them most effectively.
In the first part of this blog series on conducting effective online inquiries, we discussed some of the potential sources of bias that can negatively impact the information-gathering phase of an investigation. In this second post, we look at how researchers can set themselves up with the right tools for effective and efficient online inquiries, as well as strategies for how to use them in ways that mitigate bias and produce the most relevant results.
Prior to conducting any specific online searches, researchers should consider a number of factors regarding tools. These considerations include which web browser to use for online research and how to conduct more ‘neutral’ searches with search engines. They also include questions around protecting digital security while searching for online content. Ultimately, the decisions made before even entering a single search term can have a substantial impact on the effectiveness of an online inquiry, providing more useful search results while mitigating some algorithmic bias.
1. Selecting a browser
In order to conduct searches online, first select a web browser to use from among the many options available. Each browser differs in significant ways, including the user interface, customization features, and online privacy settings. For open source investigations, I recommend using Google Chrome, as it offers the widest array of useful browser extensions to assist with your research and allows you to sync bookmarks so you can access them across devices. Firefox is a good, open source alternative for this same reason. Some of the most useful Chrome extensions include the Google Translate plugin for inline translations of web content and the InVID plugin as an all-purpose verification toolkit.
However, while Chrome is in many ways the most effective web browser for open source investigations, it also harvests substantially more data on users compared to other browsers. This has the potential to expose researchers to additional risk as well as increase the influence of algorithmic bias on search results. For this reason, we suggest taking steps towards greater anonymity online (see section 2) and creating conditions for a more neutral search when conducting online inquiries (see section 3).
2. Setting up research account
Open source researchers may need to supply login credentials for many activities they engage in during an online inquiry. This could be for a Google account in Chrome or a profile for a social media platform. In some cases, doing so might even involve giving these platforms access to your content. The Berkeley Protocol on Digital Open Source Investigations recommends that, for security purposes, researchers avoid using personal accounts for investigations and create ‘virtual identities’ to act as a protective layer of separation between their online activity and their personal identity. This can help mitigate the risk to the researcher, the investigation, or anyone supporting the investigation of being compromised during the research process. Depending on the platform, account holders may be required to provide photographs, emails, or telephone numbers, which the Protocol recommends should also be divorced from the researcher’s personal identity. However, while creating such virtual identities is beneficial for digital security, it is important to evaluate this against the challenges it can bring, including conflict with these platforms’ terms of service.
Creating such accounts for research purposes also has advantages when it comes to conducting searches and mitigating algorithmic bias. For example, having a research account allows a researcher to easily erase search history and activity when using Chrome, meaning they can start each investigation with a clean slate. Platforms like YouTube, which heavily apply algorithms in promoting content to users, may also be used to the researcher’s advantage, as content relevant to the investigation may be promoted without the researcher seeking it out.
3. Neutral Searches
In the previous blog, I discussed the impact that algorithms can have on the results yielded by search engines. Not only can this result in information bias, it can also make it difficult to distinguish the information that is most useful to the open source researcher from the general noise of the internet. For example, search engines may be more likely to return content that has received heavy traffic, such as articles from well-known international media outlets, than less visible information from local information sources. Likewise, certain searches may return irrelevant results linked to the researcher’s working location, such as local places (see example below) or news items only loosely related to the search terms entered.
However, there are steps that researchers can take to minimize the effect of search algorithms and get closer to a ‘neutral’ search when conducting online inquiries. This can lead to more efficient and effective searching, while also partially mitigating the influence of algorithmic bias on the information gathered. The best approach that researchers can take to achieve a more neutral search involves limiting the amount of information that search engines may draw upon when a query is made. This can include:
- regularly erasing browser search history: this limits the degree to which search results are shaped by previous searches
- using an anonymous browser tab: similar to the point above, using an anonymous browser tab prevents search results from being influenced by previous searches and user preferences
- experimenting with different search engines: while Google is often the most effective search engine for returning useful results, others such as Bing and DuckDuckGo use different search algorithms and may rank useful content higher in their search indexes, making relevant information easier to find
- using quotation marks around search terms or phrases: while this verges into strategies for constructing searches, which will be covered in Part Three, it bears mentioning that using quotation marks around search terms minimizes the influence of search algorithms by guaranteeing that those exact search terms appear somewhere on the page for each result (see example below)
While these tips may not be necessary or desirable in all cases, they can often help researchers access information that is more relevant to their investigation by avoiding results that would otherwise be privileged by the search engine. Importantly, while these strategies can also help to limit the influence of search algorithms on the information-gathering process, a completely ‘neutral’ search is impossible — technical bias will always have some effect on the search results and the researcher should be aware of this.
One additional factor to consider when preparing to conduct online inquiries is the location from which the search is being conducted, particularly when it is different from that of the investigation’s subject. Many search engines use information about the location from which a search is conducted to tailor results to what the algorithm deems the user is likely interested in. While at one time, this could be configured on Google by searching via different country domains, it appears that Google search results are now influenced primarily by the IP address of the user.
For this reason, it may be helpful to use a virtual private network (VPN). This allows a researcher to connect to the web via an IP address in the location that is the focus of the investigation. Search algorithms may then be more likely to prioritize content that is most relevant to your research, making inquiries more efficient and effective.
These tips and suggestions are meant to help open source researchers consider what steps they might take prior to beginning their online inquiry in order to make the most effective use of basic tools, including understanding potential biases and security risks. However, each investigator should weigh these considerations against their own research needs and security assessment to ensure that they develop the approach best suited to them. In the third and final part of this blog series, I will discuss strategies and techniques for carrying out online inquiries using these tools, including how to select keywords and write effective search queries.