While exploring the dataset this week I couldn’t (as someone trained in rhetoric and composition) get past the fact that countries with unequal access to power were termed “pursuing parity.” To me that was a way of packaging the blow that these countries were at the bottom of the spectrum in a “works-in-progress” sort of way. This is where the politics of the situation gets involved. I imagine since these countries opted into the study and made their data available then they are considered works in progress? I wonder what is gained by terming it this way as opposed to “unequal” or “no parity” or something like that.
With that in mind I decided to compare a few of the countries at the bottom of the list: China, Saudi Arabia, Russia and Brazil.
When I search by education rates there is obviously some data missing. Brazil reports no data, and Saudi Arabia’s data is partially there. Russia and China have datasets. So how do you compile a list with incomplete datasets I wonder?
Likewise, the gender wage gap does not include data on any of the poorer performing countries I listed above. With this in mind I went back to read more about the methodology to see what I was missing. How did they account for working with incomplete data and then drawing up a list based on those data?
Behold! An answer to my question! It looks like there are answers for when data is missing and they are discussed in the limitations part of the methodology. While this information is under “limitations” it also serves as a justification for how data was dealt with then it could not be gathered. I note here they especially talk about the missing educational data. I am not knowledgeable enough in statistical methods to know what this means for their findings, but given the status of the project, and the partners working on it, I do find it comforting that the limitations are acknowledged. Maybe this goes back to why these underperforming countries are termed “pursuing parity?” How can you say something dire about them when you don’t have all the data, after all? (Though educated guesses might help us fill in the blanks here. or is that just my Western point of view talking? It may very well be).
I compare the two examples above to the indicator “# of female heads of state.” What this teaches me about data gathering and data availability is that some indicators researchers choose to use are those that they can get answers for with or without a country’s help. It’s easy enough for a researcher to compile a list of female heads of state and compare across nations that way. The countries themselves don’t need to provide the data. This shows that these kinds of data might be privileged over, say, educational data that may or may not be collected by countries. Furthermore, who is to say if the data collection methods in certain countries would be reliable or not? There are social and political factors involved here too.
I’m going to add onto this blog post a bit to report on the data availability project that we were to execute.
I was interesting in learning more about women in tech and STEM. I looked at the Statista database first (which, sadly, Pitt doesn’t subscribe to, but I have super shady ways of getting into the database. Librarian skills and all…). Statista is an interesting database because it’s marketed as a “Global business database” so the information therein is ostensibly to help businesses make choices about products, hiring. Or, for individuals who are interested in global trading markets and companies.
One of the reasons I do like Statista, though, is that oftentimes you can go back to the original dataset, look at methodologies, reports, and decide if you want to use the data, or what it’s “good” for.
I also found datasets through the National Science foundation (nice, because you can download the set) and PEW research (but that is US-based so not cross-cultural).
In terms of my findings, I tend to be more interested in how people access data (as a librarian) and less what they do with it. I understand that this project had to do with both of those things, though. Yet most of the time spent on this project was me imagining how a student or a lay-member of the public might actually get at this information. Mostly, when people Google questions like this they’ll read a report that breaks down the data for them. Fewer people will be able to actually go look at the data itself. Luckily for this issue the datasets are publicly available (like through the NSF). However, some of the more nuanced data I found through Statista: a database you have to subscribe to. So even where there is a lot of data (women and STEM is one of those areas), who has access to it?
Furthermore, how are people actually searching for these things? What keywords are they using? How does that allow them to find information? Or, does it hide information? Statista’s keyword search works differently than a traditional database. The Wikipedia article on Women in STEM is well-researched but are folks going to the works cited list and clicking on those links? Those are the kinds of information behaviors I’m wondering about with the issue of data-access.
Jim
February 3, 2020 — 11:53 am
So many of these posts this week deal with, “So how do you compile a list with incomplete datasets I wonder?”! — I’m relieved that I’m in good company…
Reading your piece I was struck with wondering how our research perspectives would change (for better or worse?) if we shift to concern from “incomplete data” to “incomplete questions.” In a lot of these UN concerns, there is an intrinsic universalism of “importance” in asking these sort of questions. I think this relates to your “Pursuing Parity” dilemma in some way?