The Dilemma of Ethics and Accuracy

The readings that we have done so far have demonstrated that data is very much subject to both present and historical biases and as such cannot be taken at face value, nor even be considered reliable and ethical. Lara Putnam, in her article “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast” details the advantages of digitization for researchers, who were previously constrained by archives and their accessibility. However, Putnam notes that despite the convenience that digital research offers, it is still imperative to interpret one’s findings. Putnam references E. H. Carr’s argument that historians often unintentionally select their facts, comparing historical research to a fisherman’s tackle. Putnam notes that this is exasperated by digital methods, stating that “…if the fact is out there anywhere, it will be on your hook in a nanosecond (Putnam 390).”
This is further complicated by the categorization of data by companies controlling search engines and the political implications of certain identities. The chapter “The Future of Knowledge in the Public” from Safiya Umoja Noble’s Algorithms of Oppression details the ways in which corporations and government institutions often categorize information based on white, Anglo-American male hegemony, leading to racialized categorizations in the Library of Congress, as well as the specific example of google autocorrecting “herself” to “himself as late as 2016 (Noble 6).
The complicated aspects of data and categorization are elaborated upon even further by Catherine D’Ignazio and Lauren Klein in “Chapter Three: “What Gets Counted Counts,” in Data Feminism. The authors note that something as simple as a user account can be anything but, as such systems, which demand that users categorize themselves, often disregard the identities of non-binary and trans people. Furthermore, D’Ignazio and Klein note that in the case of Facebook, which permits users to write their own identity, users are often categorized as male or female in order to appease potential advertisers. Furthermore, the authors provide an example of a case in which data cannot be transmitted at all, and the implications of such a refusal. The O’odham Nation of the Southwestern United States was unable to provide the United States government with details about the locations of burial grounds, as such knowledge constituted sacred knowledge. Therefore, the United States destroyed many burial grounds in order to construct a border fence.
Joanna Radin, in “ ‘Digital Natives’: How Medical and Indigenous Histories Matter for Big Data” demonstrates that the people of the Pima Gila River Indian Community, while they have assisted in and furnished the data for medical studies since the early twentieth century, did not retain any control over the data they provided. Kimberly Christen, however, shows a way in which this could be corrected in her article “Relationships, Not Records: Digital Heritage and the Ethics of Sharing Indigenous Knowledge Online.” She demonstrates that several indigenous nations, while generating their own digital archives, often include specific conditions on the access and use of the data, thereby retaining control over their own information. In this light, while data and its categorization may be inherently problematic, it is possible that data and its categorization may be adapted to better reflect the people who actually provide it.

Colonialism and the Violent Academy

The readings of the past two weeks have defined digital humanities and outlined the ways this field can uphold or challenge colonialism and sexism through careful contextualization of data (Risam 2018), collaborative stewardship (Christen 2018), and critical reflection on the histories of constructed categories of data (Aronova et al. 2017; Noble 2018; Radin 2017). In Decolonizing the Digital Humanities in Theory and Practice, Risam warns of the risks of disingenuous decolonization efforts wherein collecting a diverse body of researchers is seen as the endpoint for decolonization in academia rather than the dismantling of colonial epistemologies and practices. This “add and stir” approach echoes colonialism in that researchers belonging to minority groups are either expected to conform to the structures of the academy and act as a figurehead for decolonization efforts or expected to transform a violent and oppressive system from the inside out with no support. The visibility of these researchers both within and outside academia exposes them to additional violence in an increasingly accessible digital world (Bailey and Gossett 2018). This violence is especially clear in Bailey’s section where contributors to the development and proliferation of the term misogynoir were removed from Wikipedia due to their lack of academic credentials or publications despite the fact that many of the individuals who edit Wikipedia lack these same qualifications. Fortunately, many of the authors have provided meaningful methodological changes in order to include and center knowledge originating outside academic institutions. Christen (2018) provided the most straightforward approach by outlining ETHICS, a series of steps for reflexive archival practices. Here, digital archives are created from communities’ stated needs with the power to modify, view, and change the digital record belonging to the people these data were taken from. In a similar vein, Risam (2018, p. 82) suggests that the emphasis on local “…demands acknowledgement that there is not a single world or way of being within the world but rather a proliferation of worlds, traditions, and forms of knowledge.” While these works provide methods to practice decolonization, rather than just speak to it, it is unclear to me if methods like ETHICS can effectively be used to decolonize “big data.”

The move to decolonize has been seen across multiple disciplines in the humanities with detrimental effects on researchers of color. Particularly in anthropology, which has a long legacy as an investigative tool of colonial powers, researchers of color are regularly expected to engage in integral decolonization work in addition to (and often in lieu of) the academic labor that departments use to measure progress. For example, Savannah Martin, a Siletz researcher (@SavvyOlogy on twitter), was criticized by her department for not meeting the writing benchmarks for her dissertation despite being an invited speaker on multiple panels for challenging colonial narratives in anthropology. Similarly, Shay-Akil McLean, a queer trans man (@Hood_Biologist on Twitter), who founded decolonizeallthethings.com and has been an invited speaker on multiple panels covering decolonization in anthropology, left anthropology for more supportive humanities departments after facing racial discrimination during his time as an anthropology PhD. While the move to decolonize theory and practice is excellent in digital humanities, I am unsure (as I am unfamiliar with the discipline) if these efforts have extended to department level initiatives to adamantly support the people actively challenging colonialism in academia.

New Methods, Same Standards, New Problems

The first two weeks of class and readings have illuminated upon several of the course’s objective learning statements and thematic questions. In particular, many of the articles focused on how information flows in and out of socio-technical systems and the ways that researchers access, arrange, organize, and describe information. Safiya Umoja Noble’s Algorithms of Oppression argued that information found online through web searches perpetuates misinformation and oppression. The author contended “Lack of attention to the current exploitative nature of online keyword searches only further entrenches the problematic identities in the media for women of color” (14). Noble identified that online searches and algorithms influence the data people receive when web-searching. This directly relates with the course’s theme of understanding and critiquing the ways in which researches receive information. In this instance, search engines like Google can promote and perpetuate problematic and disingenuous information on identities and ethnicity.

Catherine D’Ignazio and Lauren Klein’s chapter “What Gets Counted Counts” also explored the socio-technical systems and the ways in which data can become problematic. They wrote that “Web-based full-text search decouples data from place. In doing so, it dissolves the structural constraints that kept history bound to political-territorial units long after the intellectual liabilities of that bond were well known” (377). Put simply, the two authors argued that data can become disingenuous by how information is stratified and labeled.  Data collected on a gender binarism leaves out an entire population of individuals and fails to acknowledge the existence of people who do not label themselves male nor female. In result, the ways in which data is being collected misrepresents (or fails to represent at all) different groups of people and also creates inaccurate data on the male and female research. Noble and D’Ignazio and Klein’s scholarship reflect problematic latent dysfunctions from the ways in which data is currently being produced. Who gets to control what is shared, how and when it is shared, and the framework to collect data from? The article and chapter highlighted these increasingly important questions from the rise of big data.

As a historian, Putnam’s article strongly resonated with my work and provoked questions that directly pertain to what I do. Most of my research includes traveling to archives, learning the culture and decorum in each city and archive, and subsequently conducting research for weeks in the location. Recently, one of my primary archives went digital; I will not be required to conduct research on site there. Whereas Putnam argued my appreciation for the data and archive will diminish and will not be reflected in my work, I have a much different take. The digitization of data may make the historian’s “journey” less adventurous, I believe the online accessibility to data and archives alleviates economic barriers to research. Not every individual has the ability to travel for a month or longer and live abroad to conduct research. The traditional methodology of conducting global research is largely gatekept for the academic elite. With the digitization of data in archives, conducting historical research is widely more accessible. Furthermore, scholars can fact-check and cross-reference sources from monographs or scholarly works. Engaging in a rigorous historical dialectic should become more common with the digitization of data.

Similar to the previous two articles however, problems can still remain when digitizing data. Referring to the course question of how information flows in and out of socio-technical systems, one negative of the digitization of data from an archive is that not every document is uploaded. Who are the individuals choosing what to include and exclude? Is there a deliberate narrative being constructed by an archive with what is uploaded versus discarded? Archives are not always neutral spaces; the digitization of their data/ sources will raise new questions over the process and potential manipulation of uploaded sources.

Overview Analysis and Reflections

In the readings for our first two overview weeks, I was most interested in the ways in which power and privilege both have structured and are evident in systems of classification and in representation and misrepresentation. For my response, I will consider the discussion of these themes in our readings by Safiya Umoja Noble and Roopika Risam.

In Safiya Umoja Noble’s chapter, “The Future of Knowledge in the Public,” Noble discusses how the Library of Congress Subject Headings (LCSH) have reflected and reinforced the history of characterizing certain individuals as “problem people” based on aspects of their identity or their position within a group (2). In addition to reflecting the attitudes of those involved in developing such systems of classification, information systems such as the LCSH and the Internet continue to shape how certain individuals and groups are characterized and perceived in the present by authoritatively identifying them and locating related material under subject headings and search results that participate in “‘legitimizing the ideology of dominant groups’ to the detriment of people of color” (2). Noble’s discussion of the movement led by students at Dartmouth College and supported by campus librarians and the American Libraries Association to have the Library of Congress replace the term “illegal aliens” with terms preferred by undocumented immigrants and their advocates provides an example of the importance of self-representation and the adoption of preferred terms in consultation with the individuals and groups to whom those terms and classifications refer, particularly in systems that have been structured by power and privilege. Near the end of the chapter, Noble notes a commitment to “ensure that traditionally underrepresented ideas and perspectives are included in the shaping of the field—to surface counternarratives,” which Roopika Risam emphasizes as being central to the development of a postcolonial digital humanities (14)

In Roopika Risam’s essay, “Decolonizing the Digital Humanities in Theory and Practice,” Risam characterizes a postcolonial digital humanities as one that centers intersectional engagement with various “axes of identity” that shape the production of knowledge, in contrast to colonial and neo-colonial information institutions and systems that situate the colonizer at the center and privilege certain Western perspectives and forms of knowledge (78). Risam describes postcolonial approaches to digital humanities as those that center and affirm local and indigenous forms of knowledge and knowledge production while questioning and seeking to dismantle the imposition of colonial and neo-colonial perspectives. For me, Risam’s essay recalls Safiya Umoja Noble’s discussion of the LCSH and the student-led and librarian-supported movement to involve, if not center, those to whom the subject headings refer in replacing existing terminology with preferred terms.

The readings for our first two overview weeks, represented here by Noble’s chapter and Risam’s essay, encouraged me to think critically about the ways in which information institutions and systems construct and present information and the accumulation of that information as knowledge. I am thinking here of two of my courses from last semester, Cultural Identities in Medieval Europe and History and Ethics of Collecting and Collections, in which we discussed how individual, social, and cultural biases have informed Western scholarly narratives of the history of art, including representations and misrepresentations of the individuals, societies, and cultures involved in the production of works of art and other cultural objects.

Inclusivity?

Last year, when I was helping a little bit with the metadata for the manuscripts included on the Italian Paleography website at the Newberry, which I mentioned earlier, I was laughing a little bit at how specific some of the Library of Congress subject headings were and how lacking in specificity were others. As I was trying to categorize some items from Catholic Renaissance Italy, I was seeing just how obvious it was that the subject headings were created by White Anglo-Saxon Protestant Americans. Of course, this bias is relatively very minor compared with others brought up in the chapter we read by Safiya Umoja Noble and in the other writings pointing out the place of bias in the categorization of data. But, I mention it because, despite the fact that all of us who were working on the project were seeing the limitations of using the Library of Congress subject headings, we kept using that categorization because it allowed the project to maintain a certain level of standardization that makes it useful and compatible with other databases. And so, it reveals in a very minor way why it can be difficult to get away from faulty systems of data categorization riddled with problematic ideologies in order to create a totally inclusive space in the digital humanities. It seems like there would need to be a broad and, more or less, simultaneous overhaul of these “standard” systems in order to really make an impact.

Several of the articles we have read have highlighted localized efforts to more inclusively handle data, but I am finding it an unsatisfactory solution and am having difficulty putting my finger on why. I think part of it is that these localized efforts seem to have a way of segregating data. For example, the types of projects dealing with Indigenous archival records that Kimberly Christen addresses or those related to dismantling binaries that Moya Bailey and Reina Gossett mention all seem to segregate the data of these communities rather than opening up an inclusive space in already established and existing spaces. Is this really the best way to breakdown the biases and foster inclusivity? Or are we actually just perpetuating divisions? And, of course, to a certain extent the whole point of having systems of categorization is to be able to break down data into groups to be able to more easily analyze and use it. So, is it really possible to create a totally inclusive and non-segregated system for working with data? These are the questions I’m left pondering as we finish our first two weeks. I don’t expect to come away with answers but perhaps just greater clarity on the problems and the solutions that have been attempted.

Reflections on the Overview

Aronova, von Oertzen, and Sepkoski’s “Introduction” provides a comprehensive foundation from which to discuss Big Data, computers, and science. In their writing, they reflect critically on the legal, ethical, and political implications of today’s information technologies and the high value it places on data. They ask, then, “what is the source of the new value?”[1] By illustrating how the collection of large data was not invented by computers but in fact has a long epistemological history, the authors argue for a more encompassing historiographies of data, science, and computers that include the natural, social, and human sciences.

In their volume, Aronova et al. also question the emergence of a “new elite.” Safiya Noble’s article, “The Future of Knowledge in the Public,” takes issue with these new elites and argues for studying the social context of those who organize information online. As example, Noble discusses how systems of organization inherit their creators’ assumptions, like the classification of people as “illegal aliens” in a library system. D’Ignazio & Klein also wrestle with the difficulties of classifications in their work, “What Gets Counted Counts.” In it, they examine the online classification of gender, using Facebook as a particularly strong example.

In her “Conclusion” to Programmed Inequality, Marie Hicks too investigates gender inequalities in the technology field. Using gender as a historical analysis, Hicks shows the absence of women who defied technological change and who shaped key technologies. The piece concludes by stating that “the process of rendering invisible certain categories of workers” aligned with the nation building project.[2] Such unequal relationships of power are also evident in Bailey & Gossett’s chapter “Analog Girls in Digital Worlds.” While similarly concerned with gender, their chapter renders visible the intersectionalities of race, class, and sexuality within the digital humanities. Bailey’s section, especially, explores the relationship between academia and non-academic digital spaces, including the value and usefulness of both spaces.

The power imbalance in the digital sphere is also evident in the pieces by Kimberly Christen, Joanna Radin, and Roopika Risam, who all examine the legacies of colonialism and indigeneity in the digital world. Risam questions how the digital humanities have contributed to the epistemic violence of colonialism and neo-colonialism, and suggests some methods of decolonizing, for example, by focusing on the local context. When researchers take data out of context, as we see in Radin’s piece, it can lead to profound negative consequences. Additionally, Christen shows how the utopian ideal of the digital “openness” disregards the cultural, social, and historical conditions of oppression that native peoples have endured.

Lara Putnam’s “The Transnational and the Text-Searchable” provides further insight into historical and digital research praxis. Rather than focusing on data mining, Putnam highlights how historians use digital methods for “finding and finding out,”[3] and the consequences of some of the physical and geographic spaces of archives. This change has affected the “peripheral vision” and social interactions scholars experience in the physical archive. Although digitization has weakened some traditional barriers, Putnam concludes, the benefits may be canceled out by superficiality and new blind spots.

In all, the readings for the past two weeks illustrate the subjectivity of socio-technical systems, which are often flaunted as egalitarian, neutral, and liberating. Though the authors provide a wide range of considerations, the literature revolves around the North American and European experiences. What new insights might we encounter concerning data, the digital, gender, and race with voices trained in and hailing from South America, Africa, or South or East Asia?

[1] Aronova, et al., 4.

[2] Hicks, 238.

[3] Putnam, 378.