Latinx and color

The term I analyzed was Latinx, a gender-neutral neologism from the 21st century to refer to people of Latin American cultural or ethnic identity in the United States. 338 records with this term in the title were found in the Web of Knowledge. With this dataset, I created a couple of Networks. The first one showed the most common terms in the titles and abstracts of those articles:

I could not figure out what was the logic for the color-coding in this network. The only two terms in blue were color and Latinx child, while similar terms to the latter, such as Latinx adolescent and Latinx youth were in green. Words with connected meaning, including recent study, paper, or article, were in both the red and green sections. The most prominent term was student. 

The second network was on co-citations:

Beyond the information on co-citation,  this visualization allowed me to see more clearly what are the areas of study and journals were the term Latinx is being used. In this case, the color-coding was easier for me to interpret, showing three main fields of study: health, behavior, and education.

When searching in the Web of Science the term Latinx and art, only one result was found: Take two: Prescribing Latinx and medicine as aesthetic form. Searches in Woldcat.com or Pitt library were also not abundant. In my consideration of developing an art historical research that deals with the term Latinx, it would be useful to look at how other fields have previously approached or used it.

Co-Authorship and Dental Anthropology

When I first approached this problem, I wanted to look at the field of Dental Anthropology, a sub-specialization of the many iterations of biological anthropology. I decided to look first at the common terms used in the abstracts and titles of the papers in dental anthropology to see if I could find any patterns.

Looking at the network analysis it appears that the red section relates most closely to age estimation using dental eruption, green is related to identifying human remains in forensic cases as well as forensic dentistry, blue is related to dental traits and morphology, and yellow seems to be a miscellaneous category including site descriptions and journal names. This analysis was interesting because there are so many words that refer to the same phenomenon, for example the ASUDAS section was counted separately from the Arizona State University Dental Anthropology System. Excluding some general terms like man, woman, etc. would likely have resulted in a clearer network.

I also wanted to look at the countries publishing works in dental anthropology and if co-authorship in the field transcended national borders which resulted in the following map.

The strongest connection in co-authorship was between the US and England which wasn’t surprising considering anthropology tends to be rooted in colonial empires. What was interesting was that the US and Germany were basically on top of each other, sharing a nearly identical co-author network which suggests a close knit group of academics regularly publishing together when viewed in a field this small.

Carolyn Walker Bynum’s “Why All the Fuss about the Body”

I chose to examine just one article and it’s citation’s network: Bynum, Carolyn Walker. “WHY ALL THE FUSS ABOUT THE BODY, A MEDIEVALIST’S PERSPECTIVE.” Critical Inquiry 22, no. 1 (Fall 1995): 1-33.

It pulled up 130 citing articles with a total of 683 times cited. H-index is 16. It’s number of citations has steadily grown from it publication in 1995 to its peak in 2012 with 64 citations that year. With the exception of 2014, when it had just 45 citations, it has remained around 60 citations per year through 2019.

Bibliographic Coupling by Country, Overlay Visualization

Bibliographic Coupling by Author, Network Visualization

Bibliographic Coupling by Document, Network Visualization

Citation by Author, Density Visualization

Citation by Documents, Density Visualization

Citation by Source, Density Visualization

What can you learn from the bibliometric network you have created?

I created a variety of networks from this data but, for the most part, I’m not finding the information they are providing all that useful. From the bibliographic coupling networks by author and by country, I’m able to see that the majority of the articles are coming from the US, the UK, and Canada, and from a pretty close-knit scholarly group with a handful of outliers. The density visualizations of citation networks by author and documents give us a couple of authors and articles who cite densely Bynum’s article. Perhaps, the most interesting density visualization was the citation network by source which highlighted for me, among the many medieval journals, two interesting hot spots: American Anthropologist and Eighteenth-Century Studies. When I looked back at the data in the Web of Science citation report, however, I realized that the visualization was actually misleading (or at least the screenshot of it) because, although there were 7 articles from American Anthropologist that cited Bynum, there were 14 from the Cambridge Archeological Journal (which I can’t see in the visualization) and 11 in Environmental History (which barely stands out in the network). There were 3 articles from Eighteenth-Century Studies, which is the same number as in Speculum (a source that doesn’t stand out at all in the visualization).

How does your choice of data limit your analysis?

My analysis is obviously limited because of Web of Science’s limited resources for humanities scholarship. By choosing to take data for one article, I’m also working with a rather small data set.

How can you structure your data to change your analysis?

My data set includes self-citations—removing those self-citations would change my analysis slightly. I also did not examine text networks, which I suspect would give me a broader picture of the citations than I would expect.

Urban agricultural history

The first network I used was “Urban agricultural history.” This network brought back 529 documents. Of the 17,076 terms found, 308 occurred over 10 times. The terms that were deemed most relevant were quite surprising to me. Species richness, metal, species composition, lead, and stream were all stratified as most relevant with a score in the 3.0s. None of these terms exceeded 20 occurrences. The terms most frequently used, however, were site (107 occurrences), effect (103), species (90), and century (80).  By taking out these most frequently used terms, the chart becomes quite different. The most number of cluster terms become human activity, water quality, agricultural activity, and several others which were not previously the highest number of cluster terms.

The second cluster I made got rid of most terms by only viewing terms that appeared over thirty times. Out of the 17,076 terms, only selected 58 terms, and only 35 of those deemed relevant. The most used terms were now history (353 occurrences), area (251), development (173) and process (109). Furthermore, many of the terms like China were included in the results, but no longer had a visual representation on the map. Moreover, in the second cluster set, nearly every term visually represented on the map connected with every other word/term. The universalism/ ubiquitous relationship between all phrases stood out compared to the first cluster.

Second Cluster

I found it fascinating to observe clusters that started from methodological phrases like case study, and to then trace what other terms were associated with case study. I also loved using the software to trace clusters and patterns. Unfortunately, I am not sure my understanding of the software and manipulation of data sets is currently strong enough to articulate meaningful criticisms of the program. (Which should make tomorrow’s class even more interesting!)

Citation Analysis and Historical Journals

For my citation analysis exercise, I downloaded 500 editions of the American Historical Review, and applied them to both text and citation analysis. The American Historical Review articles that I employed were drawn from Web of Science, and I did not set any time frame for them. The text analysis that I employed showed a web of connections between different key terms, such as “Empire” and “United States,” and shows how such terms are grouped more closely depending on the articles in which they were employed.
I created two bibliometric networks, one dealing with key terms present in the articles and the other dealing with shared citations. The exercise showed me that while key terms and the concepts they embody are shared by historians, historical articles do not often contain shared citations. For the citation analysis portion, I initially employed the criterion of two shared citations. However, in doing so, only two authors were connected. When I set the criterion for one shared citation, there were only six authors with shared connections. This may be a nuance characteristic of historical articles, as many articles are based on original research and have very few citations of secondary literature, which reduces the potential for shared citations. Furthermore, the few shared citations may have been due to book reviews, as an author would be cited by another author doing a review of their work. Also, as such authors would be in a similar field (if not the same field), the likelihood of them citing each other is higher.
The structure that I employed for my data, pertaining to the citation exercise, was limited by the minimum number of citations that I selected. This was further constrained by the limitations of citation analysis on historical journals, as mutual citations are not stressed to the same level in the field of history, as opposed to the sciences. While VOS viewer was helpful in showing the connections between key concepts and keywords across journal entries, the analysis of citations only further reflects the nature of articles and writing in the field of history.

Text Analysis of the American Historical Review
Citation Analysis of the American Historical Review

Word Embeddings

I chose to look at the topic of word embeddings, as it was and still is a popular method for semantic analysis of texts. As output, word embeddings return words in a corpus that are used in similar/interchangeable contexts. Last time, I was looking at documents within the field of Slavic languages & literatures; however, it was difficult to find texts prominently used in the field, as Web of Science caters to article publications, typically within STEM. So, I decided to look at a topic within computer science that still has a relation to text (and sometimes literature). Also, since word embeddings are still relatively “new” within computer science–the most prominent algorithm “word2vec” was developed in 2013–there weren’t too many articles I needed to download. In total, the search returned 847 articles.

The first network I generated was through following the directions in the tutorial (binary count, title+abstract as text data). Prominent topics were “language,” “recurrent neural network,” “convolutional neural network,” and “sentiment analysis.” I then changed the count to full count to see the difference in clustered groups. With full count, specific languages such as Spanish and Hindi were included as nodes in the network, and topics such as social media were included. It seems that binary count gave a broader overview of general technical terms related to the algorithm, while full count included topics more specific to each article’s research area. (I tried to upload images, but the file size was too large and attempts to compress the images didn’t really work out).

My choice of data limits me to a specific research tool within a domain; moreover, it is limited to the term “word embeddings,” which only refers to the output of an algorithm, not the algorithm itself. If I wanted to look at the algorithm “word2vec” specifically, results could differ, as 325 articles are returned when searching “word2vec” as a topic as opposed to “word embeddings” on Web of Science. A network might include more specific terms as nodes with an anchor topic of a specific algorithm. I could also choose to eliminate the most frequent terms that are less content-specific from the data before generating the network (this is a common technique in CS to reduce noise). Overall, this model produces a more general model of an academic domain, which can be useful; however, I wouldn’t say that the networks that were produced speak to the full usage of word embeddings in research. This is probably due to the somewhat limited scope of articles in the database.

 

 

“Medievalism”

After the difficulty I had with the Web of Science for the Citation Analysis Exercise, I was pleased that the first topic of interest I chose for the Network Analysis Exercise, “medievalism,” generated 482 results from the Web of Science. For this blog post, I will discuss two of the networks I created in VOSviewer using data generated by the Web of Science.

Network 1

This network is based on text data from the journal articles’ title and abstract fields using binary counting, which only counts whether a term is present or absent in a given document.

The map that visualizes this network shows that the term “middle age” has the highest occurrence among the seventy-seven terms with a minimum occurrence of ten; “middle age” has an occurrence of ninety-six. For comparison, the terms with the lowest occurrence are “remembrance,” “medieval memory,” “person,” and “medieval period,” each of which occur the minimum of ten. Interestingly, “remembrance” and “medieval memory” were the two terms with the highest relevance, at 7.83. The term with the lowest relevance was “present,” at 0.06. It also seems worth noting that while “new medievalism,” “neo medievalism,” and “nineteenth century medievalism” were included among the forty-six out of seventy-seven terms selected by VOSviewer based on relevance, the term “medievalism” was not.

I am interested in how VOSviewer uses the data generated by the Web of Science to determine which terms are grouped, or “clustered,” together using color in the network visualization. For example, “Britain” and “Germany” are grouped together using yellow, along with “great war,” “war,” and “fantasy,” but “England” and “modern England” are in the group designated by the color blue. To return to my observation on the term “medievalism” not being selected as a relevant term, I am also interested in what relationship may exist between a term’s occurrence and its relevance.

Network 2

This network is based on text data from the journal articles’ title and abstract fields using full counting, which counts all of the occurrences of a term in a given document.

For this network, I used the same parameters as the first network—text data from the journal articles’ title and abstract fields, selection of terms with a minimum occurrence of ten, and selection of the forty-six most relevant terms—except that, instead of using binary counting, I used full counting. The map of this second network resembles the map of the first network somewhat, though this network’s visualization uses six colors rather than four to group terms together, with the addition of purple and a second shade of blue. The terms included in the first and second networks, created using binary and full counting, respectively, also varied.

Terms included in both networks (or, terms counted by both binary and full counting) (26/46 terms):

analysis
author
Britain
debate
development
England
fantasy
Germany
great war
knowledge
medieval memory
modern England
narrative
neo medievalism
new medievalism
nineteenth century medievalism
novel
order
period
person
power
relation
remembrance
rhetoric
state
war

Terms included in only the first network (binary counting) (20/46 terms):

art
concept
context
essay
example
form
influence
medieval period
memory
middle age
modernity
past
present
relationship
return
role
text
tradition
use
way

Terms included in only the second network (full counting) (20/46 terms):

approach
country
field
hand
indigenous knowledge
interest
Joan
language
Morris
nature
nostalgia
place
play
research
romanticism
self
Shakespeare
sovereignty
Spain
violence

What can you learn from the bibliometric network you have created?

The bibliometric networks I created and the visualization of these networks using VOSviewer reveals prevalent terms and the connections, or “links,” among these terms in journal articles that address various aspects and instances of medievalism. While I have concerns about the data provided by the Web of Science, which I addressed in my last blog post and will refer to again later in this post, I found the visualization of this data to be useful in thinking about the ways in which these terms are related and how these terms have been used in scholarship.

How does your choice of data limit your analysis?

My VOSviewer bibliometric networks were created using the most prevalent terms in the titles and abstracts of journal articles included in the Web of Science’s Basic Search results for “medievalism,” with the range of the articles’ publication dates determined by Pitt’s subscription. As I mentioned in my last blog post, the restriction of the Web of Science’s searchable publications to academic journals of interest does not account for other publication formats, such as books, essays in edited volumes, and exhibition catalogues, or for material published prior to 1945, which is the earliest year included in Pitt’s Web of Science subscription. Using the Web of Science’s data reproduces its limits in VOSviewer network visualizations.

How can you structure your data to change your analysis?

The first and second bibliometric networks I created show how the structure of data and the way data are counted can change an analysis. By first creating a map based on text data using binary counting and then creating a map based on the same text data using full counting, I saw differences in the terms included in the networks, the relationships among terms, and the ways in which terms were grouped. I could also have changed my analysis by increasing or decreasing the minimum number of occurrences per term, or by changing my selections in other aspects of the map creation process.

What models of the academic world do these metrics produce?

The choice and structure of data based on the interests of those developing searchable resources, datasets, and visualizations or other forms of data presentation call into question, for me, at least, the extent to which such metrics should be valued, in general and in hiring, evaluation, and tenure processes in the academic world, especially when such metrics are used uncritically across fields of study, as we have discussed in class. I am having difficulty articulating my thoughts in response to this question, so I will be interested in hearing from others as we continue our discussion in class.

Academic Journals’ Word Networks

Hello All,

I originally struggled to get Web of Science to give me articles related to my interests. Entering terms like “Brazilian history” produced over 5,500 hits with articles concerning topics like the prevalence of the syphilis virus in female prisons, or the spatial niche modelling of five endemic cacti from the Brazilian Caatinga. Instead of “topic” searches, then, I chose “publication name” and limited my results to those of the top academic journals in my field. I was interested in seeing if there were notable differences between the journals’ word networks. For this post, I will present five related journals: Hispanic American Historical Review (HAHR), Journal of Latin American Studies (JLAS), Latin American Research Review (LARR), Revista de Indias, and Luso-Brazilian Review (LBR).

First is the Latin American Research Review (3,037 search results; 16,414 terms with 318 meeting the threshold = 1.94%):

“The Latin American Research Review (LARR) publishes original research and review essays on Latin America, the Caribbean, and Latina/Latino studies. LARR covers the social sciences and the humanities, including the fields of anthropology, economics, history, literature and cultural studies, political science, and sociology. The journal reviews and publishes papers in English, Spanish, and Portuguese. All papers, except for book and documentary film review essays, are subject to double-blind peer review. LARR, the academic journal of the Latin American Studies Association, has been in continuous publication since 1965.”

The first time I ran the program with LARR’s articles, the network was dominated by the word “America.”

I found this to be unhelpful because the journal was already limiting its publications to the Americas, and so I could assume it was the default common denominator in all the articles. I wanted to know what other words would dominate if I removed “America.” I ran the program again and removed some of the more frequent words including “america,” “forward,” “editor,” and “vol.” These last three words, I assumed, were related to the journal’s text format and did not contribute to the articles’ themes or contents.

By removing “America,” I was able to more easily see the relationships between other key words.

In general, the networks remained similar with only slight changes. Main nodes such as “culture” and “world” changed clusters. Likewise, “violence” was originally grouped with “revolution,” but after I removed “America” “violence” changed to the cluster that included “women.” What in the program’s algorithm would shift these words and clusters? These two images may lead to different assumptions/conclusions of the journal’s subject matter.

Next is the Hispanic American Historical Review (2,692 search results, 6680 terms with 103 meeting the threshold = 1.5%):

“Published in cooperation with the Conference on Latin American History of the American Historical Association. Hispanic American Historical Review pioneered the study of Latin American history and culture in the United States and remains the most widely respected journal in the field. HAHR’s comprehensive book review section provides commentary, ranging from brief notices to review essays, on every facet of scholarship on Latin American history and culture.”

Why were there far fewer items for this journal according to VOSviewer? Above, LARR had over 16,000 terms to calculate while HAHR only had 6,000. How does the program determine the terms it will use? As with LARR, I removed the term “america” from HAHR’s calculation. How does the program differentiate between individual words like “central,” “spanish,” and “america,” and phrases like “central america,” or “spanish america?”

Another journal I researched was the Revista de Indias (1452 search results; 9580 terms with 202 meeting the threshold = 2.1%):

“Since 1940, Revista de Indias is a a well-known forum for debates in the History of America targeted to specialized readers. It publishes original articles aimed at improving knowledge, encouraging scientific debates among researchers, and promoting the development and diffusion of state-of-the-art investigation in the field of the History of America. The contents are open to different topics and study areas such as social, cultural, political and economical, encompassing from the Pre-Hispanic world to the present Ibero-American issues. The Journal publishes articles in Spanish, English and Portuguese. Besides the regular issues, one monographical issue is published every year.”

You can see that this journal has several publications focused on Cuba and Peru with other locations like Argentina, Puerto Rico, New Spain, and Quito on the peripheries.

The journal Luso-Brazilian Review is a smaller, more specific journal (534 search results; 2733 terms with 17 meeting the threshold = 0.6%):

Luso-Brazilian Review publishes interdisciplinary scholarship on Portuguese, Brazilian, and Lusophone African cultures, with special emphasis on scholarly works in literature, history, and the social sciences. Each issue of the Luso-Brazilian Review includes articles and book reviews, which may be written in either English or Portuguese.”

I’m unsure why the spacing on the right side is so wide. If you were to zoom in on the blue cluster it reads from left to right, “study, time, Portugal,” and the red, “identity, history, politic, and Brazil.” Oddly enough, the overlapping green cluster on the far right consists of two nodes, “Assis” and “Machado.” Joaquim Maria Machado de Assis is one individual, an author from the 19th century who is often referred to as “Machado de Assis” – why would the program split his name? As a smaller journal with a more specified topic, it makes sense that there are fewer search results, fewer terms, and yet fewer that reached the threshold. LBR has the lowest percentage of terms that met the threshold at only 0.6%.

Lastly, I investigated the Journal of Latin American Studies (3733 search results; 16,217 terms of which 330 met the threshold = 2%)

Journal of Latin American Studies presents recent research in the field of Latin American studies in development studies, economics, geography, history, politics and international relations, public policy, sociology and social anthropology. Regular features include articles on contemporary themes, short thematic commentaries on key issues, and an extensive section of book reviews.”

You can see that, although the journal advertises a large plethora of fields of study, History sits in the center. JLAS shares similar size and terms as the LARR (the first journal above). “History,” “revolution,” “Peru,” “violence,” and “Cuba” are some terms that stand out to me. It would be interesting to speculate as to why that is… it could relate to the common scholarly interests of researchers, or their trainings, or the availability of funds to study these terms, or accessibility of archives, or the sexiness of the topic and location… Comparing the two journals may yield interesting findings about the field and its publications.

I was also interested in the JLAS’s change over time.

So here we have the same map as above, but viewed through the “Overlay Visualization,” which, according to the manual, indicates impact factor. You can see the shift from “economy” on the right, to “revolution” in the center, to “memory” on the left. What I don’t understand about this map’s key of 2004 – 2010 is if, for example, “economy” was at its peak impact in 2004 and slowly decreased in relation to the other terms (and that’s why it is purple), or if “economy” remained impactful through 2010 and was joined by other terms.

Overall, I found this exercise entertaining. I was able to see the frequency of certain terms and their relationship with other terms, while comparing the different journals. I find the visual representations of each journal to correspond with it’s description. The difference in subject matter topics of the different journals may be implicit to those in the field. The networks provide a visualization for those differences.

Information Literacy in VosViewer

For this exercise I chose the topic “information literacy” which had over 3,000 hits in Web of Science.

I was expecting there to be quite a few hits for this, so was pleased by the number. Looking through the list, the results were varied in terms of related topics and disciplines.

Importing things into VosViewer wasn’t too bad with the tutorial’s help. Here is the first network result:

And then when I started playing around, I liked the density mapping with different colors. The heat mapping effect gave a good sense of where to spend attention. The colored topic differences also was a feature I found helpful in my viewing:

What can you learn from the bibliometric network you have created?

It’s always interesting to look at how things are related. Since this is a topic I’m fairly conversant in I would have expected to see things like “library sessions” and “library” come up more than they did. I see “library user” is one of the nodes. “Higher education” and “course” made sense to me. Something that I didn’t expect, but makes sense in hindsight, are the words associated with the kinds of studies done about information literacy. Words like “survey,” “mean score,” “post-test,” “scale,” and “predictor.” Since this is coming from Web of Science I can assume that the research methods skew empirical in the sample dataset, and those kinds of words and topics would make sense in association with studies on information literacy.

How does your choice of data limit your analysis?

Again, since this is Web of Science, we are getting a lot of studies about information literacy. I’d guess that if you did this kind of analysis with more widely circulated texts that are used by run-of-the-mill librarians you’d get more topics that are case studies, anecdotes, or are about teaching one-shot library sessions. You’d find more on pedagogy and teaching practice, in other words. Instruction librarians tend to be practitioners so lesson plans tend to trump empirical research. That isn’t to say the research isn’t out there, or being done, or being read. It might not be quite so immediate as this data set would make it seem.

Obviously, there is a very large node that says “Information literacy instruction” but one thing that is missing are some of these other ideas being connected to it–one shots, pedagogy, and the like.

How can you structure your data to change your analysis?

I’ve played around with a few things here. One thing that I really like are the exploratory features which allow you to zoom in on certain data points. So, since the question of pedagogy came up for me, I tried to use the filters on the side to see what terms were associated with “library session” and I found the following: information literacy session, library instruction session, and session. Then, I can click “session” and see what this word is co-occurring with to see what those relationships are. This is a cool way of answering questions that arise from the initial overview with all of the terms. I felt like this part of the network was lost to me at first, but here I see that there are some articles about education, pedagogy, and library teaching in this dataset. I like to see how they are related and the relative frequency of each (and how related they are as evidenced by proximity).

I’ll be honest, I’m trying to manipulate the data in other ways but am not seeing huge differences in my network output, so I’ll be excited to learn more about this in class.