Digitization and Decontextualization

The conversations and readings of the past week, while revealing the ways in which the digitization of textual data and text analysis can be extremely helpful to historical research, nonetheless show the many limitations of text analysis and, as Lara Putnam noted in Leon Sharon’s “The Peril and Promise of Historians as Data Creators: Perspective, Structure, and the Problem of Representation”, the dangers present in the “decontextualization of data.” This is also reflected in Jo Guldi’s assessment of the analysis of key words present in the transcripts of parliamentary debates which only reveal the dynamics of legislation within the British Isles, and not the British Empire as a whole. Furthermore, as John Markoff noted concerning his study of French parish cahiers, the regional languages of France in 1789, as well as the different terms that could be employed for the same aristocratic privileges, rendered personal, specialized analysis indispensable for the creation of this study, even though it took on a digital medium. In light of these analyses, it is questionable as to whether human analysis can ever be replaced by digital text analysis, as certain elements of data could be overlooked if not considered within a larger base of information.
This is an issue that I come across in my own research, which primarily engages with documents written by French officials in New France for either the Governor-General or the Ministry of the Marine. While I was unaware of this when I first began my research, certain elements of the text can reveal important details to interpreting the documents. This is particularly the case with handwriting, as well as documents that have different dates. One date is the year in which the document was received by the Ministry of the Marine, whereas the other would likely be the day on which it was written. The only way the two dates, and the correct date of the document, can be established is through an analysis of handwriting. Furthermore, handwriting can also reflect which officials had a secretary. More importantly, however, personal handwriting can indicate if an official, who would have otherwise had a secretary at his disposal, was in a location where bringing a secretary was impossible. This was the case with Charles le Moyne de Longueuil, who was the Governor of Montreal and simultaneously maintained a residence in the Onondaga Nation. An analysis of the handwriting in letters from him to the Governor-General of New France or the Ministry of the Marine can therefore show where he was when he wrote those letters, as he would have had a secretary in Montreal.
In light of the readings and the following conversations in class, I am wondering if these nuances, which can be pivotal in the correct interpretation of a document, can be reflected purely through digital text analysis. Furthermore, if they were reflected, would other information present in the document and its composition be overlooked? In this capacity, is it possible for text analysis to become a reliable means to evaluate (or at least summarize) archival sources without human intervention?

Moderation in the Age of Global Pandemics

Content moderation, while posing a possible threat to the freedom of information and expression, nonetheless plays an important role in regulating what is posted on internet platforms. While the internet was initially perceived as a potentially free community for the sharing of information and ideas, some measure of moderation must exist to ensure that such digital spaces serve as safe sources of information. This, however, can prove to be intensely difficult to maintain, as voluntary moderation could prove to be sporadic and, as Roberts notes, people who are specifically employed to moderate digital platforms are repeatedly exposed to disturbing images and content, and are furthermore subjected to low-pay and low-prestige positions. Using algorithms for moderation, however, can serve to stifle not only free speech, but valid sources of information, as Facebook’s temporary elimination of articles from The Atlantic shows. In light of this incident, human labor seems like the most reasonable solution to online moderation. However, such labor is blatantly undervalued, and those who perform it are sufficiently out of the public eye for the dynamics of their work, and even its importance, to be relatively unknown. Furthermore, attempts to gamify online moderation, while providing incentives for voluntary moderation, would not be sufficient to compensate voluntary moderators who take on the responsibilities of professional moderators, as they would likely still be exposed to the same material, which would in turn leave them with the same psychological scars.
These issues have been exacerbated by the onset of the COVID-19 pandemic as, with the onset of social distancing and stay-at-home orders, the dependence of the population on digital media for news, entertainment and work is greater than ever. This has led to increased online activity and, in an effort to obtain information about the virus, an immediate need for access to valid reports and reliable news outlets. Furthermore, false information about the virus or government policies could have severe effects. As such, moderators are under intense strain, and the weaknesses of algorithms are being exposed as online traffic increases. Such a dynamic could diminish the quality of moderation or, when comparing the increased online traffic to the existing pool of human moderators, a shortage of moderation altogether. It is also possible that the stresses of moderation, combined with the pre-existing stress that moderators experience, could exacerbate the detrimental effects that their jobs already take on them. In this regard, COVID-19 poses a unique challenge to online moderators, and while it may be too late to adapt, it is possible that these circumstances will inform future decisions regarding the refinement of algorithms, as well as the treatment of human moderators.

Time and Place in Colonial North America

The World Historical Gazetteer and Recogito, while being formidable research tools in development, are nonetheless, at the moment, relatively limited depending on the analysis that is being conducted. In my experience with the World Historical Gazetteer, I was unfortunately unable to upload the data provided by the World Historical Gazetteer. However, upon typing names into the World Historical Gazetteer search, I was struck to find names listed in different languages, even indigenous languages such as Seneca. However, the gazetteer seems to primarily operate by proper place names, as the term “Seneca Nation” furnished no results, while “Tonawanda”, a reservation properly called the Tonawanda Seneca Nation, did reflect a result on the gazetteer. However, it seems as though the World Historical Gazetteer, in the case of North America, is primarily concerned with place names that either existed in the recent past or continue to be in use. The place names indicated by the gazetteer are also sporadic; both Elba, NY and East Elba (a very small unincorporated community within the Town of Elba in Genesee County) are shown by the Gazetteer. However, ARTPARK, a state park in Lewiston, NY that once held a Seneca village and French trading post, is nonetheless not present.
While working with Recogito, I had a comparable experience. While Recogito could recognize place names from the distant past, such as fortifications and British and French colonies, it could not establish boundaries for historical entities such as New France or the colony of New York. This is understandable, as the nature of colonial boundaries is itself obscure. While European states like France and Britain would claim vast swaths of territory, they were unable to govern and occupy most of it. Furthermore, such colonial land claims would not reflect the indigenous peoples who lived within them and exercised sovereignty over their ancestral lands despite European claims. Mapping such dynamics would prove to be very difficult even for experienced colonial historians. Furthermore, Recogito is limited by spelling and language. It would be impossible to upload original French place names or French transliterations of indigenous place names, as spelling during the 18th century was sporadic, and many such place names are only present in historical documents and are difficult to connect to present localities. This, however, is only symptomatic of a developing gazetteer. As Ryan Shaw demonstrates in Placing Names: Enriching and Integrating Gazetteers, gazetteers will ideally indicate the presence of other place names, such as early modern Spanish place names, in a given locality (Shaw 54), thereby serving as an important research tool for historians. This however, leaves a major question for me: as the name of a location in New France or the territory of the Iroquois Confederacy could have several different place names depending on the which indigenous group was speaking, and the indigenous names would be transliterated into French letters in multiple ways, could a comprehensive gazetteer of places in North America be created, and would it be truly all-encompassing, or will there always be limitations on the information both available to and from the gazetteer?

Citation Analysis and Historical Journals

For my citation analysis exercise, I downloaded 500 editions of the American Historical Review, and applied them to both text and citation analysis. The American Historical Review articles that I employed were drawn from Web of Science, and I did not set any time frame for them. The text analysis that I employed showed a web of connections between different key terms, such as “Empire” and “United States,” and shows how such terms are grouped more closely depending on the articles in which they were employed.
I created two bibliometric networks, one dealing with key terms present in the articles and the other dealing with shared citations. The exercise showed me that while key terms and the concepts they embody are shared by historians, historical articles do not often contain shared citations. For the citation analysis portion, I initially employed the criterion of two shared citations. However, in doing so, only two authors were connected. When I set the criterion for one shared citation, there were only six authors with shared connections. This may be a nuance characteristic of historical articles, as many articles are based on original research and have very few citations of secondary literature, which reduces the potential for shared citations. Furthermore, the few shared citations may have been due to book reviews, as an author would be cited by another author doing a review of their work. Also, as such authors would be in a similar field (if not the same field), the likelihood of them citing each other is higher.
The structure that I employed for my data, pertaining to the citation exercise, was limited by the minimum number of citations that I selected. This was further constrained by the limitations of citation analysis on historical journals, as mutual citations are not stressed to the same level in the field of history, as opposed to the sciences. While VOS viewer was helpful in showing the connections between key concepts and keywords across journal entries, the analysis of citations only further reflects the nature of articles and writing in the field of history.

Text Analysis of the American Historical Review
Citation Analysis of the American Historical Review

The Limitations of Citation Analysis

John Tipton

Jonathan M. Weiner, “Radical Historians and the Crisis in American History, 1959-1980,” Journal of American History 76 (1989).

What is the total number of citations?
1.

What can you learn about the number of citations to this article per year since it was published?
This article may have been perceived to be more relevant in 2019, as that was the only year in which it was cited.

What can you learn about who cites this article? What are their disciplinary identifications?
This article was cited by James Barret for the article “Making and Unmaking the Working Class: E.P. Thompson and the ‘New Labor History’ in the United States”, Historical Reflections-Reflexions Historiques, vol 41, issue 1.
James Barret, before his retirement, was a prominent labor historian at the University of Illinois, and his article was dealing primarily with the legacy of E.P. Thompson’s The Making of the English Working Class. He likely cited Weiner as Weiner’s article extensively dealt with the rise of the New Left and History from Below, in which E.P. Thompson played a prominent role.

What is the total number of publications?
1.

What is the H-index?
The H index is 2.

What are the average citations per item?
1.

Which of these numbers would you prefer to have used in evaluations for hiring and tenure? Why?
I would rather have the H-index number taken into account for hiring and tenure, as It would indicate that my work had a wider reach and was a greater significance to my field.

Is this kind of analysis appropriate for all academic fields? Why or why not?
I do not believe that this kind of analysis is appropriate for all academic fields. It his heavily geared toward the sciences, to the point that when I attempted to refine my search, the only search term applicable to history was “interdisciplinary humanities.” Furthermore, the authors full name is “Jonathan M. Weiner.” After performing the basic search, in which five articles were furnished, only one was written by this author. All other articles were written by either “Jesse Weiner” or “JF Weiner.” Furthermore, upon searching a more recent article, Web of Science indicated that it had been cited 80 times. Therefore, the accuracy of Web of Science’s citation index and analysis may be limited by the age of the article. If this is the case, the significant articles that had been instrumental in the development of a field, but not applied within the recent past, may be inaccurately represented.
Given the limitations of Web of Science’s citation analysis when applied to the humanities, as well as the imprecisions and constraints of its citation search and analysis, I would be reluctant to employ it when evaluating the significance of a body of work within my field. Furthermore, as far as I was able to tell, citation analysis does not indicate to what degree such works are being employed to support someone’s argument. It is entirely possible that an author’s work is being rebutted while being appropriately cited, and depending solely on citation analysis would give a skewed perspective concerning the current significance of an individual’s argument or methodology within their respective field.

Legal Impediments

As I was studying the Woodrow Wilson Center’s Women in Public Service Project and the Global Women’s Leadership Initiative Index, I decided to research women’s presence in the judiciaries of various countries. In terms of evaluating women’s presence in that domain, presence in the civil service, the attainment of university degrees, and presence in the decision-making civil service could all be important indicators. However, the data present for these indicators was most complete among developed countries. Countries like the United States, the United Kingdom, and France had complete data, with which the position of women in both parliamentary houses and the civil service could be established. Concerning countries like Albania, for example, the data was shown to be far from complete, and only GWLI indicators such as the literacy rate, marriage rate, and the presence of women in the civil service were shown. For a country like France, however, the GWLI index included indicators such as women in the civil service, women with post-secondary education, and women in the workforce, thereby demonstrating that the evaluation of such indexes can be limited by the economic development of the country. There is also a distinction concerning women’s presence in a nation’s civil service, as the presence of women in the civil service as a whole is consistently disproportionately larger than the presence of women in “decision making in civil service” roles. As such, many of these nations reflect what the “Roadmap to 50×50: Power and Parity in Women’s Leadership” terms “flat parity,” in that women work in a variety of different capacities, but they are nonetheless largely prevented from obtaining positions of leadership. Furthermore, the presence of women in “decision making in civil service” roles does not adequately reflect the presence of women in the judiciary, as in France, women hold only thirty percent of “decision making civil service” roles while, according to the United Nations’ Special Rapporteur on the Independence of Judges and Lawyers, women constitute 70.9 percent of the judiciary.
The website of the National Association of Women Judges also furnishes a state-by-state breakdown of women in state-level courts, showing the prevalence of women in the judiciary throughout the United States, and giving a sense of gender equality on a state by state basis. Overall, women constitute only 34 percent of the United States judiciary as of 2019. The United Kingdom also has a low rate of women judges, as a 2019 article in The Guardian, entitled: “Lady Hale: at least half of UK judiciary should be female” by Diane Taylor shows that only 29 percent of judges in lower courts are women, although the number of women judges increases among higher courts. It can be assessed that the disparity between the prevalence of women in parliaments and women in the judiciary can be caused by the necessity of appointments to enter the judicial system. While certain countries may have incentives and policies to ensure gender equality in the court system, such policies may not translate to an effective implementation of policies that foster gender equality at the national level. Furthermore, the different structures of each level of a given judiciary may make certain branches more resistant to change and more independent from the central government. While measuring the presence of women in the judiciary would likely be less challenging than evaluating gender equality in more private aspects of life, such as home life. However, the structural differences between tiers of courts in a country must be contended with, and the prevalence of women at higher levels does not necessarily indicate an increase in gender equality at the national level. While the availability of education and economic freedom may hinder women from entering the judicial system in developing countries, the greatest obstacle in developed countries would seem to be the necessity of appointments to enter the judiciary, and a potential reluctance on the part of local authorities to alter a male-dominated establishment.

Methods and Measurements

The articles for this week reveal the limitations of quantitative indicators of gender equality. Hanny Cueva Beteta notes that the general indication used to measure gender equality, the presence of female politicians at the national level, may not accurately reflect gender equality in a given society. Cueva Beteta notes that in developing countries, the ability of female politicians to advocate for gender equality is limited by a variety of factors, such as the gaining of a parliamentary position due to family connections, the multiplicity of identities, and the elimination of feminist agendas, which are seen as an “electoral liability (Cueva Beteta 225).” Furthermore, as Melanie Hughes pointed out during seminar last week, states may require a quota of female representatives in order to obtain aid, even though their parliament has little power compared to the executive branch.
Fulvia Mecatti, Franca Crippa, and Patrizia Farina note that there are other indicators of gender equality or inequality that often go unevaluated, such as freedom of movement and dress. (Mecatti, Crippa, Farina 460). However, SDG 16.7.1 offers a solution to this. Instead of just evaluating the presence of women legislators at the national level, SDG 16.7.1 catalogues the prevalence of women in positions of authority at the local level in addition to national parliaments. While obtaining such data would be substantially more difficult than measuring the number of women in national parliaments, such an analysis could reveal a more nuanced picture of gender equality in developing nations. This type of analysis is familiar to me, as in my field of research, what appear to be general state or colonial policies very rarely affect the reality of life on the ground. Furthermore, while the employment of quantitative data to measure social conditions is relatively new to me, given the examples presented by Melanie Hughes and the articles, I believe that, with adequate sources, I could apply such a method to my own research.

The Dilemma of Ethics and Accuracy

The readings that we have done so far have demonstrated that data is very much subject to both present and historical biases and as such cannot be taken at face value, nor even be considered reliable and ethical. Lara Putnam, in her article “The Transnational and the Text-Searchable: Digitized Sources and the Shadows They Cast” details the advantages of digitization for researchers, who were previously constrained by archives and their accessibility. However, Putnam notes that despite the convenience that digital research offers, it is still imperative to interpret one’s findings. Putnam references E. H. Carr’s argument that historians often unintentionally select their facts, comparing historical research to a fisherman’s tackle. Putnam notes that this is exasperated by digital methods, stating that “…if the fact is out there anywhere, it will be on your hook in a nanosecond (Putnam 390).”
This is further complicated by the categorization of data by companies controlling search engines and the political implications of certain identities. The chapter “The Future of Knowledge in the Public” from Safiya Umoja Noble’s Algorithms of Oppression details the ways in which corporations and government institutions often categorize information based on white, Anglo-American male hegemony, leading to racialized categorizations in the Library of Congress, as well as the specific example of google autocorrecting “herself” to “himself as late as 2016 (Noble 6).
The complicated aspects of data and categorization are elaborated upon even further by Catherine D’Ignazio and Lauren Klein in “Chapter Three: “What Gets Counted Counts,” in Data Feminism. The authors note that something as simple as a user account can be anything but, as such systems, which demand that users categorize themselves, often disregard the identities of non-binary and trans people. Furthermore, D’Ignazio and Klein note that in the case of Facebook, which permits users to write their own identity, users are often categorized as male or female in order to appease potential advertisers. Furthermore, the authors provide an example of a case in which data cannot be transmitted at all, and the implications of such a refusal. The O’odham Nation of the Southwestern United States was unable to provide the United States government with details about the locations of burial grounds, as such knowledge constituted sacred knowledge. Therefore, the United States destroyed many burial grounds in order to construct a border fence.
Joanna Radin, in “ ‘Digital Natives’: How Medical and Indigenous Histories Matter for Big Data” demonstrates that the people of the Pima Gila River Indian Community, while they have assisted in and furnished the data for medical studies since the early twentieth century, did not retain any control over the data they provided. Kimberly Christen, however, shows a way in which this could be corrected in her article “Relationships, Not Records: Digital Heritage and the Ethics of Sharing Indigenous Knowledge Online.” She demonstrates that several indigenous nations, while generating their own digital archives, often include specific conditions on the access and use of the data, thereby retaining control over their own information. In this light, while data and its categorization may be inherently problematic, it is possible that data and its categorization may be adapted to better reflect the people who actually provide it.

John’s Intro

I am a second-year PhD student in the Department of History, and my focus is on interactions between factions the Haudenosaunee (Iroquois) and French mediators in what is today Upstate New York. My thesis is particularly concerned with Fort Niagara, a French fortification built on the territory of the Seneca Nation in 1724, and the diplomatic interactions between the French, English, and Haudenosaunee that resulted from its construction. Currently, I am sorting through the documents that I compiled at the Archives Nationales in Paris and trying to situate them and my project as a whole within the body of scholarship concerning indigenous and French concepts of sovereignty and alliances, as well as limited control that both the French and Seneca had over the territory in question.
In this class, I would like to learn the ways in which I can employ digital methods to facilitate my understanding of the data that I have. This is not only regarding documents and correspondence, as I am very interested in including maps, and other possible forms of spatial data, into my project. As such, I am very interested in learning the ways in which such data can be used to benefit my comprehension of the subject, and to what degree it can be trusted at all.