Digitization and Decontextualization

The conversations and readings of the past week, while revealing the ways in which the digitization of textual data and text analysis can be extremely helpful to historical research, nonetheless show the many limitations of text analysis and, as Lara Putnam noted in Leon Sharon’s “The Peril and Promise of Historians as Data Creators: Perspective, Structure, and the Problem of Representation”, the dangers present in the “decontextualization of data.” This is also reflected in Jo Guldi’s assessment of the analysis of key words present in the transcripts of parliamentary debates which only reveal the dynamics of legislation within the British Isles, and not the British Empire as a whole. Furthermore, as John Markoff noted concerning his study of French parish cahiers, the regional languages of France in 1789, as well as the different terms that could be employed for the same aristocratic privileges, rendered personal, specialized analysis indispensable for the creation of this study, even though it took on a digital medium. In light of these analyses, it is questionable as to whether human analysis can ever be replaced by digital text analysis, as certain elements of data could be overlooked if not considered within a larger base of information.
This is an issue that I come across in my own research, which primarily engages with documents written by French officials in New France for either the Governor-General or the Ministry of the Marine. While I was unaware of this when I first began my research, certain elements of the text can reveal important details to interpreting the documents. This is particularly the case with handwriting, as well as documents that have different dates. One date is the year in which the document was received by the Ministry of the Marine, whereas the other would likely be the day on which it was written. The only way the two dates, and the correct date of the document, can be established is through an analysis of handwriting. Furthermore, handwriting can also reflect which officials had a secretary. More importantly, however, personal handwriting can indicate if an official, who would have otherwise had a secretary at his disposal, was in a location where bringing a secretary was impossible. This was the case with Charles le Moyne de Longueuil, who was the Governor of Montreal and simultaneously maintained a residence in the Onondaga Nation. An analysis of the handwriting in letters from him to the Governor-General of New France or the Ministry of the Marine can therefore show where he was when he wrote those letters, as he would have had a secretary in Montreal.
In light of the readings and the following conversations in class, I am wondering if these nuances, which can be pivotal in the correct interpretation of a document, can be reflected purely through digital text analysis. Furthermore, if they were reflected, would other information present in the document and its composition be overlooked? In this capacity, is it possible for text analysis to become a reliable means to evaluate (or at least summarize) archival sources without human intervention?

Moderation in the Age of Global Pandemics

Content moderation, while posing a possible threat to the freedom of information and expression, nonetheless plays an important role in regulating what is posted on internet platforms. While the internet was initially perceived as a potentially free community for the sharing of information and ideas, some measure of moderation must exist to ensure that such digital spaces serve as safe sources of information. This, however, can prove to be intensely difficult to maintain, as voluntary moderation could prove to be sporadic and, as Roberts notes, people who are specifically employed to moderate digital platforms are repeatedly exposed to disturbing images and content, and are furthermore subjected to low-pay and low-prestige positions. Using algorithms for moderation, however, can serve to stifle not only free speech, but valid sources of information, as Facebook’s temporary elimination of articles from The Atlantic shows. In light of this incident, human labor seems like the most reasonable solution to online moderation. However, such labor is blatantly undervalued, and those who perform it are sufficiently out of the public eye for the dynamics of their work, and even its importance, to be relatively unknown. Furthermore, attempts to gamify online moderation, while providing incentives for voluntary moderation, would not be sufficient to compensate voluntary moderators who take on the responsibilities of professional moderators, as they would likely still be exposed to the same material, which would in turn leave them with the same psychological scars.
These issues have been exacerbated by the onset of the COVID-19 pandemic as, with the onset of social distancing and stay-at-home orders, the dependence of the population on digital media for news, entertainment and work is greater than ever. This has led to increased online activity and, in an effort to obtain information about the virus, an immediate need for access to valid reports and reliable news outlets. Furthermore, false information about the virus or government policies could have severe effects. As such, moderators are under intense strain, and the weaknesses of algorithms are being exposed as online traffic increases. Such a dynamic could diminish the quality of moderation or, when comparing the increased online traffic to the existing pool of human moderators, a shortage of moderation altogether. It is also possible that the stresses of moderation, combined with the pre-existing stress that moderators experience, could exacerbate the detrimental effects that their jobs already take on them. In this regard, COVID-19 poses a unique challenge to online moderators, and while it may be too late to adapt, it is possible that these circumstances will inform future decisions regarding the refinement of algorithms, as well as the treatment of human moderators.

What of all these would stay with us in the future?

Hi everyone, it was difficult for me to focus during this past week, so I only managed to write my reflection on the readings and discussion in a series of not very well-connected paragraphs. I apologize and thank you for your understanding.

In the first session of the seminar, we discussed how content on the Internet, although accessible from all over the world, is created in a specific place. The same consideration could be extended to online platforms. People around the globe use online social media or service platforms where they create their own culture-specific content, but these platforms were also designed in a particular cultural and temporal context. We know that platforms constantly evolve and readapt as a response to multiple factors, but I wonder if the original way and place where they were designed can contribute to transforming into global culture certain elements that before were only specific of a given culture. I have in mind the concept of a school yearbook and its derivative form as Facebook. But I also think on how the now normalized use of emojis and gifs has replaced the verbalization of ideas and feelings, a phenomenon that decades ago was associated mostly with technology users in East Asia.

The video lecture “Algorithmic Cruelty and the hidden costs of ghost work” brought feelings of empathy towards those workers trapped on the mechanism of on-demand online platforms 🙁 Not only because I can imagine their precarious situation, but also because many of the traits of these jobs are common to other productive activities, one of them being graduate life, and perhaps academic life, more generally. Not having a 9 to 5 shift nor being able to finish your workday is the most evident connection. Grant writing and competition can be seen as task-based work, in which we are also required to be hunting for grants and calls for papers constantly on our own. Most of the time, our chances to receive a grant depend on who else applied for it, but institutions would never reveal this information to the applicants. We are kept in a state of isolation and ignorance about this process, not knowing who exactly gets to choose/hire us and why.

A recent Pittwire email on Zoom protocols reminded me of the flagging system in social media, and how the role of the moderator has to be better defined, as well as the expected behavior of participants. I was disappointed not to find among the protocols a suggestion to avoid having in their background any direct source of light, such as a lamp, which could be replaced it with an interesting but not too distracting object to which other zoomers could direct their gaze when they get tired of looking at people’s faces; or to encourage attendants to change to a more appropriate attire even if it is only for the Zoom session, or from the waist up.

Temporarily, virtual technology mediates and shapes all our social and work interactions, but this process will impact the more permanent life and work of the future. Just like Alison, I want to be optimistic and believe that we will develop an aversion to this type of communication that will push us to avoid it, but it is hard not to consider another scenario. In a virtual conversation with my father about this possible future where more and more activities are transferred into the virtual world, his deepest hope was that “church could be one of them.” As graduate students, should we start developing our online pedagogical skills and portfolio more seriously?

Places as City, What About Villages?

While engaging with the World Historical Gazetteer project and the Recogito project, I tried to experiment with different places and various versions of the places. Few of the words that I tried was Delhi, the capital city of India, Allahabad, a city in the Northern Part of India and Jim Corbett National Park. Much to my surprise, I tried to enter the various version of the spelling of Delhi like New Delhi, Indraprastha, Dili or Dilli, and the first result in the list provided was always Delhi and then subsequently other places that have the same or similar names.
Similarly, I tried to search for places like Allahabad which is also known as Praygraj and has a spelling variant Illahabad. In my search, Allahabad often referred to as a city and was displayed by its other name as well. However, the spelling variant did not get accounted for. In addition, Allahabad is also the name of the district in which the city Allahabad is located. This was of particular interest to me because the WHG referred to Allahabad as a city and not as a district. Lastly, the tried to search for Corbett National Park, a national park and a small village located northeast of Delhi. To which there were no results.
I tried to assimilate these findings in the data set I decided to upload on WHG however I encountered an error.
My experience with Recogito was slightly different. To understand the interface of the project was easy along with its many features. I reviewed a few of the case studies that were uploaded on the blog of Recogito to understand how the project is utilized by scholars of various fields. However, while I tried to upload the article Critical Regionalism by Kenneth Frampton but there was an error in uploading the file. I tried multiple file types (doc, docx and pdf) to find the same results.
My overall questions/observations that came through after engaging with both the projects are:
(a) Importance of city as a place given over national parks and villages and other kind of spatial categorization
(b) In though I could not upload my text on the Recognito but the feature of able to edit and add places on the map to annotate them is an enriching innovation, however, do we have a way to verify the content?

Not a place, but an event

Just like others, I was also unsuccessful in uploading the same data to the WHG. Instead, I spent more time looking up places. My first exploration was my hometown in Mexico, Chihuahua, which before my search I associated with both the name of a city and of a state. The variants names for Chihuahua were 35 ( Altepetl Chihuahuah ; Byen Chihuahua ; CUU ; Chihuahua ; Chihuahua City ; Chihuahua by ; Chiuaua ; Chiuauae ; Chivava ; Cihuahua ; Ciuaua ; Dakbayan sa Chihuahua;   Tsiouaoua ;   chiuaua ;   chiwawa ; chyywaywa. chyywaywa ; qi wa wa shi ; zhi hua hua shi ;  Čihuahua ;  Čiuaua ;  Τσιουάουα ;  Чивава ;  Чиуауæ ;  Чиуауа ; Чіуауа ; ציוואווה  ; ჩიუაუა ; チワワ ; 奇瓦瓦市 ; 芝華華市 ; 치와와, plus three entries in scripts that my word processor could not reproduce). These are a considerable amount of variants, especially when compared to other more worldwide known places, such as London. The fact that the spelling of Chihuahua is not intuitive might influence in it having these many variants. But they are not too many variants compared to some Chinese cities, as I will show later.

In the WHG, Chihuahua was connected to what I thought it was a well-known geologic formation within the state of Chihuahua, the Copper Canyon. But in the map, this place was located in California, near the border with Mexico. Then I searched for Copper Canyon and results only appeared in the United States. I tried searching using the Spanish names Cañón del Cobre and Barranca(s) del Cobre, and there were no results available.

My second exploration was in China. In my research as an art historian, when trying to follow the itinerary of Latin American artists or intellectuals that traveled to China after WWII, I often find challenging to know exactly where they went. This is because both in published books and manuscripts, the “Spanish” names of Chinese cities are always spelled differently from the contemporary standard names which use pinyin (phonetic transcription). Especially in manuscripts, the authors often write something that sounds like a place, but they might be making up the script.

When searching for the Chinese city that I know better, Hangzhou, the name variants were 69 in total (Chan’nktsoou ; Chang-cou ; Chang-čou ; HGH ; Hancheum ; Hanchow-fu ; Hanczou ; Handzou ; Handžou ; Hang ; Hang Chau ; Hang-chiu-chhi ; Hang-chou ; Hang-chou-shih ; Hang-hsien ; Hangchow ; Hangcsou ; Hangdzou ; Hangdžou ; Hanggouo ; Hangtsjou ; Hangzcouh ; Hangzhou ; Hangzhou Shi ; Hangĝoŭo ; Hančžou ; Hong-chu-su ; Hong-ciu ; Hàng Châu ; Hâng-chiu-chhī ; Hòng-chû-sṳ ; Hòng-ciŭ ; Khanchzhou ; Khandzhou ; Khangdzou ; Khanzhou ; Xanchjou ; hang cow ; hang zhou ; hang zhou shi ; hangacau ; hangajho’u ; hanghtshw ; hangjeou ; hangjeou si ; hangju ; hangzhw ; hanjha ; hannaco ; hʼnggwʼw ; kancu ; Χανγκτσόου ; Хангџоу ; Ханджоу ; Ханжоу ; Ханчжоу ; האנגגואו ; हांगचौ ; हांगझोऊ ; ਹਾਂਙਚੋ ; காங்சூ ; ഹാങ്ഝൗ ; หางโจว ; ཧང་ཀྲོའུ། ; 杭州 ; 杭州市 ; 항저우 ; 항저우 시 ; 항주 plus five entries in scripts that my word processor could not reproduce)

In a way, this makes me feel more relaxed and imagine that I am not the only one dealing with this toponymy issue. I checked other Chinese city and province names (Sichuan, Suzhou, Dalian), and the variants of names included were not as many as for Hangzhou. Why is there so much information about this particular city in the WHG? The collection of variant names of Chinese WHG could be useful in my future research in relation to Chinese names. Just as with any other resource, it makes sense to know what are the strengths of the WHG as a tool for our own research, instead of expecting it to have equal amount and quality of information about all places in the world.

For the annotated text, I tried with a section of the article “Not a Place, but a Project. Bandung, TWAIL, and the Aesthetics of Thirdness” by Vik Kanwar. I am interested in mapping events, which many times take the name from the place where they were first hosted. Like in the case of Bandung, the 1955 Conference and the Indonesian City, the overlap between event and place labels happens often, Versailles and Westphalia were the other two cases in this text. I can imagine that something similar could happen with mapping biennales or other artistic events, such as Art Basel, which happens in Basel, but also in Miami and Hong Kong.

One idea from the readings that stayed with me was how a gazetteer is a collection of triples <N, F, T>, Name, footprint, and type. This is, the basic information required to understand better a place includes what is it called, where it is, and what kind of thing it is (Goodchild, 28-29). Adding the time element makes the definition of a place more complete. I shared this formula with the students of the Intro to Western Architecture course for which I am the TA now, suggesting them to use it when writing about a place, e.g. the Aswan High Dam built in the Nile Valley in the 1960s. I hope this formula helps them and myself to improve our writings.

Time and Place in Colonial North America

The World Historical Gazetteer and Recogito, while being formidable research tools in development, are nonetheless, at the moment, relatively limited depending on the analysis that is being conducted. In my experience with the World Historical Gazetteer, I was unfortunately unable to upload the data provided by the World Historical Gazetteer. However, upon typing names into the World Historical Gazetteer search, I was struck to find names listed in different languages, even indigenous languages such as Seneca. However, the gazetteer seems to primarily operate by proper place names, as the term “Seneca Nation” furnished no results, while “Tonawanda”, a reservation properly called the Tonawanda Seneca Nation, did reflect a result on the gazetteer. However, it seems as though the World Historical Gazetteer, in the case of North America, is primarily concerned with place names that either existed in the recent past or continue to be in use. The place names indicated by the gazetteer are also sporadic; both Elba, NY and East Elba (a very small unincorporated community within the Town of Elba in Genesee County) are shown by the Gazetteer. However, ARTPARK, a state park in Lewiston, NY that once held a Seneca village and French trading post, is nonetheless not present.
While working with Recogito, I had a comparable experience. While Recogito could recognize place names from the distant past, such as fortifications and British and French colonies, it could not establish boundaries for historical entities such as New France or the colony of New York. This is understandable, as the nature of colonial boundaries is itself obscure. While European states like France and Britain would claim vast swaths of territory, they were unable to govern and occupy most of it. Furthermore, such colonial land claims would not reflect the indigenous peoples who lived within them and exercised sovereignty over their ancestral lands despite European claims. Mapping such dynamics would prove to be very difficult even for experienced colonial historians. Furthermore, Recogito is limited by spelling and language. It would be impossible to upload original French place names or French transliterations of indigenous place names, as spelling during the 18th century was sporadic, and many such place names are only present in historical documents and are difficult to connect to present localities. This, however, is only symptomatic of a developing gazetteer. As Ryan Shaw demonstrates in Placing Names: Enriching and Integrating Gazetteers, gazetteers will ideally indicate the presence of other place names, such as early modern Spanish place names, in a given locality (Shaw 54), thereby serving as an important research tool for historians. This however, leaves a major question for me: as the name of a location in New France or the territory of the Iroquois Confederacy could have several different place names depending on the which indigenous group was speaking, and the indigenous names would be transliterated into French letters in multiple ways, could a comprehensive gazetteer of places in North America be created, and would it be truly all-encompassing, or will there always be limitations on the information both available to and from the gazetteer?

Citation Analysis and Historical Journals

For my citation analysis exercise, I downloaded 500 editions of the American Historical Review, and applied them to both text and citation analysis. The American Historical Review articles that I employed were drawn from Web of Science, and I did not set any time frame for them. The text analysis that I employed showed a web of connections between different key terms, such as “Empire” and “United States,” and shows how such terms are grouped more closely depending on the articles in which they were employed.
I created two bibliometric networks, one dealing with key terms present in the articles and the other dealing with shared citations. The exercise showed me that while key terms and the concepts they embody are shared by historians, historical articles do not often contain shared citations. For the citation analysis portion, I initially employed the criterion of two shared citations. However, in doing so, only two authors were connected. When I set the criterion for one shared citation, there were only six authors with shared connections. This may be a nuance characteristic of historical articles, as many articles are based on original research and have very few citations of secondary literature, which reduces the potential for shared citations. Furthermore, the few shared citations may have been due to book reviews, as an author would be cited by another author doing a review of their work. Also, as such authors would be in a similar field (if not the same field), the likelihood of them citing each other is higher.
The structure that I employed for my data, pertaining to the citation exercise, was limited by the minimum number of citations that I selected. This was further constrained by the limitations of citation analysis on historical journals, as mutual citations are not stressed to the same level in the field of history, as opposed to the sciences. While VOS viewer was helpful in showing the connections between key concepts and keywords across journal entries, the analysis of citations only further reflects the nature of articles and writing in the field of history.

Text Analysis of the American Historical Review
Citation Analysis of the American Historical Review

Why have certain articles not been cited so many times?

(1) Linda Nochlin, “Why Have There Been No Great Women Artists?” ARTnews, January 1971, 194–204.

What is the total number of citations?  30

What can you learn about the number of citations to this article per year since it was published?  From what I can understand, the number of citations started to increase in 2015 and 2019 was has the highest point of citations, with a total of 8.

What can you learn about who cites this article?  What are their disciplinary identifications? The majority of the articles were published in journals of art history, history, art and culture, museum and curatorial studies, or feminist and gender studies. It was also cited in publications of literature, theater, design, or area-specific, such as Japanese studies, or Latin American studies. Also, one citation was from an article on a journal of Informational Science (Ciencias de la información, in Spanish, I am not sure is that is the correct translation). All articles were written in English, but a few were published in journals with Spanish or German titles.

(2) Nochlin, L

What is the total number of publications? 46

What is the H-index? 3

What are the average citations per item? 0.5

Which of these numbers would you prefer to have used in evaluations for hiring and tenure?  Why?

My first idea is that both numbers could offer information for the evaluation but should not be taken as single or central parameters. Evaluating a candidate should be a more holistic process that takes into account many other aspects of the person’s profile, such as age or time working in the field. As we have seen in the readings, the reasons why an article or an author gets cited are many and not always linked to the quality of content. But if one of the main institution’s values or objectives is to increase the statistics of productivity of their faculty based on publication and impact on the high-profile journals, then it would make sense to use these numbers and to prioritize the candidates that bring statistics up. But if the institution has other interests, such as an orientation towards teaching or reaching out to the broad community beyond academia, then the citation indexes are not too relevant.

Is this kind of analysis appropriate for all academic fields? Why or why not?

No, the nature and structure of each field are different. From the readings, I understood that the Web of Science only includes certain types of “top” journals and prioritizes English-written articles. Just to mention one example, for academics working on regional-specific fields, it seems logical to write in the languages of such regions, and not necessarily always or most of the time in English.

The Limitations of Citation Analysis

John Tipton

Jonathan M. Weiner, “Radical Historians and the Crisis in American History, 1959-1980,” Journal of American History 76 (1989).

What is the total number of citations?
1.

What can you learn about the number of citations to this article per year since it was published?
This article may have been perceived to be more relevant in 2019, as that was the only year in which it was cited.

What can you learn about who cites this article? What are their disciplinary identifications?
This article was cited by James Barret for the article “Making and Unmaking the Working Class: E.P. Thompson and the ‘New Labor History’ in the United States”, Historical Reflections-Reflexions Historiques, vol 41, issue 1.
James Barret, before his retirement, was a prominent labor historian at the University of Illinois, and his article was dealing primarily with the legacy of E.P. Thompson’s The Making of the English Working Class. He likely cited Weiner as Weiner’s article extensively dealt with the rise of the New Left and History from Below, in which E.P. Thompson played a prominent role.

What is the total number of publications?
1.

What is the H-index?
The H index is 2.

What are the average citations per item?
1.

Which of these numbers would you prefer to have used in evaluations for hiring and tenure? Why?
I would rather have the H-index number taken into account for hiring and tenure, as It would indicate that my work had a wider reach and was a greater significance to my field.

Is this kind of analysis appropriate for all academic fields? Why or why not?
I do not believe that this kind of analysis is appropriate for all academic fields. It his heavily geared toward the sciences, to the point that when I attempted to refine my search, the only search term applicable to history was “interdisciplinary humanities.” Furthermore, the authors full name is “Jonathan M. Weiner.” After performing the basic search, in which five articles were furnished, only one was written by this author. All other articles were written by either “Jesse Weiner” or “JF Weiner.” Furthermore, upon searching a more recent article, Web of Science indicated that it had been cited 80 times. Therefore, the accuracy of Web of Science’s citation index and analysis may be limited by the age of the article. If this is the case, the significant articles that had been instrumental in the development of a field, but not applied within the recent past, may be inaccurately represented.
Given the limitations of Web of Science’s citation analysis when applied to the humanities, as well as the imprecisions and constraints of its citation search and analysis, I would be reluctant to employ it when evaluating the significance of a body of work within my field. Furthermore, as far as I was able to tell, citation analysis does not indicate to what degree such works are being employed to support someone’s argument. It is entirely possible that an author’s work is being rebutted while being appropriately cited, and depending solely on citation analysis would give a skewed perspective concerning the current significance of an individual’s argument or methodology within their respective field.

Paradoxical Presence of Hierarchy and Subversion

In the seminar so far, I have dealt with scholarships which are unfamiliar and written with a different perspective. It is helping me develop an understanding of the nature of data I use in my own research and how certain biases and prejudices are inbuilt in the date which I am oblivious about. I often take for granted the categorization I use in the research and the readings and discussions we have done in the seminar thus far have been reflecting on these biases. I realized how these categorizations can itself assert their hegemonic voices in the research which I constantly try to subvert in my own research I conduct. The paradoxical nature of the data and the interest is something I realized through my conversations. In the coming edition of the seminars, I hope to find tools through which we can target these biases in scholarships of humanities.

Unit 2a