Online versus Physical Space: Are they really that different?

With the rise of text-based search queues and online databases, I wonder how might text-searches their algorithms create unpredictable new modes of research. In a physical archive, a researcher can utilize a finding aid and seek help from an archivist in order to create a clear gameplan and subsequently go through a collection with precision and meticulous form. With an online database, often times, a historian might plug in key words and set limits on the search. The researcher no longer is digging through boxes and folders to uncover data. Instead, they are opening curated data that an algorithm decided was most relevant.  Does this negatively impact the discovery of new sources? There have been countless times in which a researcher discovers new and exciting information within a folder or box they were going through, while looking for another source. Because text-based searches pull up only curated content from your search, will these random yet important discoveries occur less? Are historians and researchers missing out on important data by utilizing online databases?

In some instances, I have found that this question goes both ways. I browse online databases often, usually examining newspaper articles. If I click on an individual newspaper article to examine, once the page with the article appears, on the sidebar is a list produced by an algorithm of suggested primary sources that the site thinks is related to my latest click. From these suggestions, I uncovered important articles and primary sources that would not have otherwise appeared in a search. The system and UX recommended these links to other pieces of data to me, as if replicating an archivist within a physical space who is interested in my work and thus offers suggestions on where else to look. I often wonder if I would have found these articles in a physical archive. They were not directly connected what I was looking for, but after browsing through them, these articles offered valuable data and contextualization for the events I am researching. In rare cases, the suggested links turned out to be consequential and excellent finds that directly influence my argument. The biggest concern however, these suggested links were primarily based upon my text word search and the sources I clicked on once I received my results. If I had used a different key word, would these articles ever have been suggested to me? In short, the online archive is increasingly becoming a mainstay in the work of historical research. I believe with it can come a sense of discovery that historians clamor for in physical spaces. The inconsistency and ability to replicate these discoveries due to the use of algorithms remains an issue for historians. I do question, is this any different than the data us historians never come across in a physical archive due to the subjectivity of where an archivist files/stratifies sources? How is this any different than the random nature of valuable data being placed in boxes that we would never look through due to the subjectivity of archivists and finding aids. As such, online databases and physical archives are vastly different experiences. But are the problems that come with both of them that much different?

Content Moderation and the User Experience

The past week of discussions has been incredibly timely; in our current events, it is incredibly important to understand and analyze the ways in which content moderation can be manipulated to craft a specific narrative. As humans work behind the scenes to code and use algorithms that generate public content, I wonder how might this form of automation influence what we read and what is accessible for us to view? At what point might a content generator value high volume traffic over factual objective data? For example, would a private industry, who’s financial goal is (usually) to produce a profit, utilizing algorithms to produce content that is more likely to be viewed, even if more thorough, factual data/articles are available, but less viewed? Are developers encouraged through gamification to moderate content that will be most financially successful rather than most informative and objective? Unlike much of academic writing and analysis, content moderators often do not expose their algorithms and methodology, largely to maintain the integrity of their content moderation (and to avoid manipulation of the user experience) As such, we the users are not privy to the algorithmic decisions being made that would allow us to better adjudicate and understand the automated content modifications. Scholars can utilize footnotes and sources to better understand an academic work. We, however, a largely left in the dark when it comes to internal gamification and content moderation.

Moreover, in “Platforms are not Intermediaries,” Tartleton Gillespie wrote how content moderation shapes and determines  the ways  in which platforms conceive of their user-base. We are no longer simply consumers of a medium. Instead, we, the user, are becoming a dataset and signifier but also the prison guard to the entire system. Gillespie argued that this was through user moderators and tools such as flagging. Does this give the user moderator a greater sense of control over what remains a largely automated system? Does a medium appear more or less trustworthy to the public when there are user content moderators versus an entirely algorithmic system? This is a form of labor, as Gillespie points out, but one that is often not recognized.

In short, unfortunately, until content moderation is enforced to be transparent, we the users have a startlingly lack of control over both the content we view and our curated experience. Gamification will often determine which algorithms are used to produce content that generates maximum revenue. Until a system of enforcement, transparency, and (as Gillespie suggests) enabling access for researchers to dive deep into specific examples of content moderation, we the consumers are left with a user experience that is largely curated outside of our control.

Recogito and Gazeteer

With Recogito, I uploaded a text document pertaining to the removal of cherry blossom trees to make way for the Thomas Jefferson memorial in Washington D.C. I wanted to mark on the Gazeteer where the shipments of cherry blossoms were originally planted, the location for the site of the Jefferson memorial, and where many of the trees were moved to after the construction of the Jefferson memorial. This was a seamless task and the recogito interface with the gazeteer map worked fluidly and intuitively. One question I had, and I am not sure if it is technologically feasible, but would the system eventually be able to accept PDF files? I had originally uploaded one to be consequently rejected due to the file type. I am not sure if uploading PDF files is possible in the future, but it would greatly expand what I am able to upload and annotate.

I unfortunately struggled to get the gazeteer to cooperate with my files. I received an error notification. I believe this is largely due to my misunderstanding of document types and upload criterion. However, I played around with the Gazeteer without my own personal files and was impressed with the relative ease of access and amount of data currently in the system. A lake from my hometown of ten thousand was marked with a brief description. This was very exciting to see. Like my colleagues, I would love the zoom in feature to be expanded upon. This would enable scholars working on micro-level projects (streets, alleyways) to utilize the program. However, if a board is to be constructed to review submissions, expanding the scope of the program to the street level would be difficult to administer and adjudicate.

Urban agricultural history

The first network I used was “Urban agricultural history.” This network brought back 529 documents. Of the 17,076 terms found, 308 occurred over 10 times. The terms that were deemed most relevant were quite surprising to me. Species richness, metal, species composition, lead, and stream were all stratified as most relevant with a score in the 3.0s. None of these terms exceeded 20 occurrences. The terms most frequently used, however, were site (107 occurrences), effect (103), species (90), and century (80).  By taking out these most frequently used terms, the chart becomes quite different. The most number of cluster terms become human activity, water quality, agricultural activity, and several others which were not previously the highest number of cluster terms.

The second cluster I made got rid of most terms by only viewing terms that appeared over thirty times. Out of the 17,076 terms, only selected 58 terms, and only 35 of those deemed relevant. The most used terms were now history (353 occurrences), area (251), development (173) and process (109). Furthermore, many of the terms like China were included in the results, but no longer had a visual representation on the map. Moreover, in the second cluster set, nearly every term visually represented on the map connected with every other word/term. The universalism/ ubiquitous relationship between all phrases stood out compared to the first cluster.

Second Cluster

I found it fascinating to observe clusters that started from methodological phrases like case study, and to then trace what other terms were associated with case study. I also loved using the software to trace clusters and patterns. Unfortunately, I am not sure my understanding of the software and manipulation of data sets is currently strong enough to articulate meaningful criticisms of the program. (Which should make tomorrow’s class even more interesting!)

Metrics and Citations

Epstein, Richard A. “Caste and the Civil Rights Laws: From Jim Crow to Same-Sex Marriages.” Michigan Law Review 92, no. 8 (1994): 2456-478.

 

http://apps.webofknowledge.com.pitt.idm.oclc.org/CitationReport.do?product=UA&search_mode=CitationReport&SID=7EEgg7EEbMxjPWfsX3T&page=1&cr_pqid=39&viewType=summary

 

Total number of citations:

191

What can you learn about the number of citations to this article per year since it was published?

Published in 1995, the highest number of citations occurred in 1997. The number dipped to 15 in 1998. By 2000, the article was cited only eight times. In 2015, the article only received two citations. As such, the results demonstrate that an article is most likely to be cited shortly after its publication (>5 years), often when the article is most ground-breaking. However, there was a rejuvenated spike from 2016 to 2018. Are there social and political factors that influence research? Can we, as researchers, gauge and synthesize intersections between political and social events, and the research they influence? 2016 to 2018 was the beginning of the current presidential administration’s regime. Did this influence a rejuvenated interest in an article pertaining to Jim Crow and inequality in the United States? This data can therefore potentially indicate and reveal the ways in which (and why) research has ebbs and flows of being cited/

What can you learn about who cites this article? What are their disciplinary identifications?

Analyzing who is citing the article and the application they are citing for reveals which disciplines are engaging with the text and which fields are finding the research most integral for their own studies. In this instance, the article is heavily utilized by legal scholars.

What is the total number of publications?

11

What is the H-index?

6

What are the average citations per item?

17.36

Which of these numbers would you prefer to have used in evaluations for hiring and tenure? Why?

Neither. But unfortunately, that does not answer the question. Depending on what metrics a department is interested in the H index can be a powerful tool for showcasing the influence a researcher has in high-impact journals. This can be useful in determining the relevance and timeliness of one’s scholarship. If a department is seeking influential leaders within the field, the H index can be useful. The H index, however, can not compare professors 1:1. There are external factors based upon specific journals and fields that influence one’s H index. In short, the index can be an intriguing metric that can credit one’s influential scholarship, but I do not believe an H index can be used to discredit the influence of one’s work.

Is this kind of analysis appropriate for all academic fields? Why or why not?

It is not. The H index, for instance, does not necessarily take into consideration the variating numbers of citations in different fields. Will articles in fields that cite less be viewed as less impactful? Put simply, can an H Index really be used to compare articles from one department to articles from an entirely different discipline with different modes of access and standards? Likewise, the reasons for citing (average citations per item category) can be influenced by external factors that have little to do with the quality of one’s research.

Gender Inequality and Doctoral Degrees

The dimension I chose for gender inequality was advanced degrees awarded at the PhD level. I wanted to learn what the disparity between males and females were for awarded advanced degrees, how this compared to undergraduate degrees, and which countries had the most equal rates of degrees awarded by gender. The most important indicator for this domain was undergraduate degrees awarded and PhD degrees awarded. These two indicators are both incredibly important; the United States awarded forty six percent of their PhD degrees to females. This number appears innocuous and largely equal. However, by examining the indicator of bachelor’s degrees awarded, sixty percent of bachelor’s degrees went to females from 2012 to 2016 in the United States. Why does sixty percent of bachelor’s degrees go to females compared to only forty six percent of doctoral degrees? This number/percentage drop-off certainly complexifies the United State’s level of equality for female doctoral degrees. As such, to obtain a more complete analysis of doctoral degrees awarded by gender, it becomes imperative to also examine undergraduate degrees awarded.

The National Center for Education Statistics and the US Department of Education compile many statistics and data to examine and confront gender inequality within the academy. Not all nations, however, have such thorough data. Although discovering PhD’s awarded by nation was a quick find, undergraduate degrees by gender for countries outside of North America became much more difficult. Many nations do not track or poorly track the gender divide within academia at the undergraduate level. Adding even more complexity to the data, when observing international rates of doctoral degrees awarded by gender, the area/department of study becomes even more difficult to navigate. Tracking degrees-awarded does not offer a nuanced synthesis of gender inequality within the academy; are departments/areas of study at the doctoral level contributing to a more equal number by gender? In other words, are degrees largely segregated by gender? I found trouble navigating in-depth data indicators for doctoral degrees awarded by gender for countries outside of the United States.

The National Science Foundation conducted a powerful study breaking down fifty-six nations by doctoral degrees and gender. Unfortunately, with only fifty-six nations included, a vast amount of data and indicators need to be collected and analyzed.  As a result, I have access to overall doctoral degrees awarded by gender for a wide variety of countries, but many of these nations do not have in-depth (from what I could find) statistics regarding a breakdown of doctoral degrees by gender. In order to write a synthesis and contextualization pertaining to the results, more in-depth and nuanced data is needed.

With the information provided, the countries with the smallest gender gap in PhD’s awarded were Australia, Israel, Macedonia, Croatia, Italy, Estonia, New Zealand, Finland, Ukraine, Kyrgyzstan, Argentina and Mongolia. Each of these nations ranged from fifty to fifty five percent of PhD’s going to women. The countries with the highest number of gender inequality for PhD’s were Taiwan, Georgia, South Korea, Iran, Jordan, Uganda, Malaysia, Saudi Arabia, Armenia, and Colombia. Taiwan had a stunning twenty-six percent of doctoral degrees awarded to females while Colombia reached thirty nine percent. The data is only from 2010, however. As a result, more years need to be included. In doing so, outliers will have a lesser influence on the results.

Above all, it is imperative that more countries participate in the surveys. Fifty-six nations does not offer an adequate analysis. Moreover, countries need to supply both undergraduate and graduate statistics to verify that both tiers are in proportion with one-another. If they are not, then social scientists will have to ask what factors are causing men or women to enter the workforce after obtaining a bachelor’s degree? Is it a correlation with particular departments that are more segregated by gender? Why are certain departments lacking gender diversity? Social scientists are well aware of these questions and will continue to seek more data and indicators in the future.

The Familiar and Slightly Less Familiar

Last week facilitated an illuminating conversation due to my previous ignorance of the topic. In many ways, the discussion points were both familiar and unfamiliar. The articles critiqued the shortcomings of data being used to reveal gender equality rates. This was familiar for me; as a historian, I am accustomed to questioning evidence and deconstructing an argument that utilizes controversial data. In the case of our class discussion, it was fascinating to see the ways in which data can be manipulated to make a country appear more equal than it is. I also loved looking at possible solutions to rectify the problems encountered by sociologists, only to realize that many of the solutions suggested in class (including my own) do not adequately address what determines when a country’s gender gap is legitimate. Interpreting data and trying to find a universal model that can determine gender inequality – to the standards of the UN – seems like a gargantuan task.

The articles were unfamiliar not just because the topic looked at contemporary events rather than historical patterns over time, but also because the analysis heavily focused on data points and their inconsistencies. Although the style of writing (Abstract – Conclusion) was much different compared to the narrative story-telling focus of many histories, I really appreciated the articles as an example of the difference between a social science project and a humanities project. This certainly asks the question; how can we intersect the two? Certainly, they do not need to be mutually exclusive. But by intertwining the two, the next question becomes who are you writing this for? And would interpret it still be analyzed as a serious body of scholarship (compared to other social science works)?

 

Bryan Paradis

New Methods, Same Standards, New Problems

The first two weeks of class and readings have illuminated upon several of the course’s objective learning statements and thematic questions. In particular, many of the articles focused on how information flows in and out of socio-technical systems and the ways that researchers access, arrange, organize, and describe information. Safiya Umoja Noble’s Algorithms of Oppression argued that information found online through web searches perpetuates misinformation and oppression. The author contended “Lack of attention to the current exploitative nature of online keyword searches only further entrenches the problematic identities in the media for women of color” (14). Noble identified that online searches and algorithms influence the data people receive when web-searching. This directly relates with the course’s theme of understanding and critiquing the ways in which researches receive information. In this instance, search engines like Google can promote and perpetuate problematic and disingenuous information on identities and ethnicity.

Catherine D’Ignazio and Lauren Klein’s chapter “What Gets Counted Counts” also explored the socio-technical systems and the ways in which data can become problematic. They wrote that “Web-based full-text search decouples data from place. In doing so, it dissolves the structural constraints that kept history bound to political-territorial units long after the intellectual liabilities of that bond were well known” (377). Put simply, the two authors argued that data can become disingenuous by how information is stratified and labeled.  Data collected on a gender binarism leaves out an entire population of individuals and fails to acknowledge the existence of people who do not label themselves male nor female. In result, the ways in which data is being collected misrepresents (or fails to represent at all) different groups of people and also creates inaccurate data on the male and female research. Noble and D’Ignazio and Klein’s scholarship reflect problematic latent dysfunctions from the ways in which data is currently being produced. Who gets to control what is shared, how and when it is shared, and the framework to collect data from? The article and chapter highlighted these increasingly important questions from the rise of big data.

As a historian, Putnam’s article strongly resonated with my work and provoked questions that directly pertain to what I do. Most of my research includes traveling to archives, learning the culture and decorum in each city and archive, and subsequently conducting research for weeks in the location. Recently, one of my primary archives went digital; I will not be required to conduct research on site there. Whereas Putnam argued my appreciation for the data and archive will diminish and will not be reflected in my work, I have a much different take. The digitization of data may make the historian’s “journey” less adventurous, I believe the online accessibility to data and archives alleviates economic barriers to research. Not every individual has the ability to travel for a month or longer and live abroad to conduct research. The traditional methodology of conducting global research is largely gatekept for the academic elite. With the digitization of data in archives, conducting historical research is widely more accessible. Furthermore, scholars can fact-check and cross-reference sources from monographs or scholarly works. Engaging in a rigorous historical dialectic should become more common with the digitization of data.

Similar to the previous two articles however, problems can still remain when digitizing data. Referring to the course question of how information flows in and out of socio-technical systems, one negative of the digitization of data from an archive is that not every document is uploaded. Who are the individuals choosing what to include and exclude? Is there a deliberate narrative being constructed by an archive with what is uploaded versus discarded? Archives are not always neutral spaces; the digitization of their data/ sources will raise new questions over the process and potential manipulation of uploaded sources.

Bryan’s Intro

Hello everyone! I hope everyone had an enjoyable winter break! My name is Bryan and I am in the department of history. My original research explored historical memory and communal identity within communities of color after integration in the United States. More specifically,I examined the phenomena of Jim Crow Nostalgia, and how many residents within historically black neighborhoods across the United States have glorified and commemorated a period of segregation and racial divide to preserve the lore, perseverance, and memory of their community.

I am currently interested in the intersection between voting patterns/rates, approval ratings, and economic trends within the United States. Yeah, I am aware, that’s quite the shift!

Regardless, I am looking forward to this class and am interested in learning about the advantages and obstacles of digital methods. Above all, I am excited to be placed out of my comfort zone and to be exposed to new methods of conducting, analyzing, and presenting research.