Case Study 1: Itinera: (Alison and Drew)
Drew Armstrong: Itinera arose because of limitations in print sources for research, teaching, extensibility. Computers are bad about dealing with images, so we use metadata. Intended to represent networks of people, objects, places, etc. to represent networks of people and artifacts and how they move, tied in to images on site. Priority on images, objects, flora, and who created them: Want to maintain connection to the object, e.g., remain sensitive to the difference between “Athens in year X” and “Athens in year X in this picture.” Itinera deals with representations of real objects, what they meant in their time, and how we represent them in our time. Very complicated “This is what the post-tenure period should be for.”
Alison: Itinera doubles as a classroom, so that students can learn to work with sources (e.g., “these sources disagree; now what do I do?”) and create content. “Tour stops” on the site are not real—but generated by the software: The Grand Tour stands behind some of this, but not all of it. “Tour stops” are sometimes “life events” (e.g., “baptism”). Objects can have life events, too.
Drew: Three things to plot:
- nature and intensity,
Change of state or status (of people and physical objects) is a challenge, e.g., someone who becomes a noble but wasn’t born that way (“time-based change”). Should strength of relationships, or polarity, be represented (and those can change over time)? “Friend of Voltaire” wasn’t always a friend, friendship has gradations, etc.
At the moment there are no time sliders, but those will be added. For the moment, there are no lines, and only dots, because we don’t know the routes. Each event is just there, but humans are the ones connecting the dots. EX: Cornelis Kruseman: stops on his tour, why was he avoiding most of the Holy Roman Empire—went out of his way to visit one place in it?
Collective access: open-source package underlying Itinera. Mostly collections management system (http://www.collectiveaccess.org/), popular with museum community, deals with images better than other open-source alternatives (superior to Omeka), supports BRA44 standard relational metadata (e.g., the real Mona Lisa oil on board, the Xray pictures of it, etc.).
Life roles list is currently flat, but controlled vocabulary suffers if the drop-down list gets too long. (We can’t use natural language processing on this, can we? We can do string-matches.)
Example: Ego network of William Kent (D3); double click on a node to make that the new center: Relationships have date ranges; for a node you can retrieve “related entities” (joins), click on paper-clip icon to retrieve information about the relationship. Paper-clips are “interstitial cataloging” to retrieve source documentation; documentation strength may be important.
Relationships have date ranges: between Kent and his mom the range is between his birth and her death?
Alison’s interest in learning from this workshop; What should go on these interstitial records?
Relationships are bi-directional and invertible: As they add relationships they indicate who’s connected to whom and how: student of, teacher of acquaintance of, appointed, appointed by, child of, spouse of, etc
Categorizing women is tricky: if we label someone as “wife”, do we have to label all men who were married as “husband”? What if a little-known/untraced woman is merely labelled “human being” if she has no occupation on our list?
Changeable data might include religion; legal regime also matters (e.g., women have a different legal status in different places); how do we avoid prejudging how much should be part of the profile of the human individual? There’s a mummy; the person had a tour, but so did the mummy: a change in status over time (!). Are the taxonomies specific to the early modern European world, or are they true in general of humanity?
Q: Will Humboldt’s travels to Latin America be plotted? (Travel seems Euro-centric at this point).
A: the issue is our data set isn’t complete… they could plot globally and not just Europe, but at the end of the day the ontology system should be able to accommodate.
Dan Edelstein: Pitch for minimalism: Why have the computer do things that are better done by humans (dealing with scholars and user expertise? Keep the relationship data simple or at a level of generality –just say that people knew each other without taking it further—so the scholar is invited to dig deeper. Do we need to capture everything from Wikipedia here? Aren’t some things better in prose? Do we need to include diverging scholarly opinions (Itinera does)?
Comment from Aaron Brenner: Other projects, too, started out with very precise relationships definitions but backed off to more workable levels of generality. In FOAF, the knows spec is very complex; the developers opted for general because finer detail (e.g., is friend of) proved too difficult. “The problems are social, rather than technical.”
Drew: Citing sources is important for scholars (what do we need to know to facilitate an understanding of relationships?). We also envision using this in a classroom as a teaching device: What do we need to know to help us teach?
Q: Think about cost and benefits of minimalism and maximalization in identifier vs. relationship
Alison: Our visualizations SHOW our biases… we “bake them in.”
Case Study 2: Mapping the Republic of Letters (Dan Edelstein)
(He has discussed much of this already yesterday at another talk at Pitt.) Data: Dealing with lots of ranging occupations. The Stanford team needed a network ontology but had no experience with this: Followed the Getty Thesauri.
Like the Itinera team, they weren’t really sure what to do about the clergy either. They’re reating religion as an occupation, unless it’s a network we’re capturing, like French Protestants (and he knows this is highly problematic).
Basically four columns of occupation data: Knowledge network, social network, professional network, religious network
They can filter out anyone who’s not in the Knowledge network to indicate who isn’t included. The list is extensible: letters, letters_literary, letters_literary_drama with the underscore character _ as hierarchy delimiter
Elisa: likes the extensibility of the Mapping project’s lists: seems like you can easily adapt the project to new research questions.
Alison: The problem with this is cataloging: Library of Congress starter subject headings are constricting: How efficiently can we *search* is the real issue.
Q: How did you define your networks?
Edelstein A: seeking categories that people would have recognized at the time as network. Categories are based on how they would have been perceived at the time.
What we call it doesn’t matter so much as what people would have recognized in their time.
Ad-hoc additions based on interests / needs of people working on a cluster of letters (e.g. Ben Franklin researcher wants “Planters” to be added, so they added it)
Is there a need to be more consistent with this—how broad or how specific should these categories be?)
Q: Shouldn’t you be using the French words for categories, if you’re prioritizing what people in your network knew? Using the English ‘equivalents” is problematic. A: Yeah, maybe…
Q: Is this minimalist enough? (A: probably can’t be pared down further)
Q: Hierarchies—when they overlap—Is this a downward branching tree and do the hierarchies split and converge again?
Q: Is this available for undergrad course use? A: Not ready for sharing yet—until article is written…
Q: Re the Mapping Republic of Letters Procope: A metadata schema: Undergrads don’t have a good framework for how to search for things.. do YOUR minimalist categories make it hard for other than a professional to understand how to navigate
(Example: Collector [includes connoisseurs, bibliophiles, art dealers]
Domestic: [includes servants, slaves, stewards, factotum, etc.]
Gardening-farming [includes horticulturalists, agronomists, stock breeders, garden designers, nurserymen, botanists (who might also get placed in the “Sciences_Natural” knowledge network]
A: This isn’t so much a search function as a filter.
A (Alison): This is search vs. browse
Q: When you’re matching occupations to individuals, is there a way of dealing with contested occupations: (pretensions to being an artist rather than actually an artist]
A: Their categories are based on the social perspective of the time—if their community thought of them as an artist, then they turn up as an artist. .
Followup Q: What about people who weren’t KNOWN as artists in their time b/c they’re reclusive? Do we exclude this?
Elisa: Suggestion: maybe we make a decision at this point: Decide whether we only ever represent how people were known at a specific moment. Don’t allow mixing of what’s understood in our time vs. what was understood then.
Q/A re Gens du monde [ebb: didn’t catch]
Q: Is there much discussion of versioning ontologies?
A: Maybe after this workshop, there will be!
Daniel Balderston (www.borges.pitt.edu) : I’m worried about the terminology of categories: It’s slippery/unstable over time.
Dan Shore (6degs-Bacon): Should we have an unbounded hierarchy? Should categories be generalizable? Is it better to have just down-ward branching hierarchies–extensibility at lower ends? [Discussion of this to come in workshop groups]
Alison: invitation to post /publish our notes on her website.
Case Study 3: Six Degrees of Francis Bacon
Chris Warren: What can you do with Six Degrees?
- Learn: Explore relationships and see early modern networks
- Query: Ask how individuals or groups were connected
- Contribute: Add names, annotate relationshps, flag errors, share knowledge
Practical Example: A scholar has recently learned that Sir Charles Danvers was involved in the Essex Rebellion: we can pull out who’s within two degrees of Sir Charles Danvers. Search for shard networks, find mutual connections between Danvers and the Earl of Essex
They have a slider that sets different thresholds for probability of relationships
Pull up the Virginia Company group: Find Sir John Danvers, who’s also classed as regicide. (He’s bringing out how the network analysis interface highlights overlaps of categories.)
This is prosopographical research—aimed to make really accessible to people from students to advanced scholars (learning AND querying)
What people can contribute: If we know that Sir John Danver and Sir Charles Danvers are brothers, a user can tag this, and then it goes thorugh a process of validation and be posted eventually.
Re Aaron Brenner’s mention of FOAF schema for relationship vocabulary: 6degs used something similar , but are in process of modifying
- Fidelity to period’s own categories
- Ability to search and sort
- Categories of scholarly community
- Hold linked data
6 Degrees ontology is not adequate to any of these competing/overlapping pressures but IS responsive to all of them.
Jessica Otis (6degs) : Using current ODNB theme data (Current ODNB Early Modern Themes)—which should make it possible for their network analysis to connect with other information resources.
Practical issue: How do we want to define groups—what rules and restriction? How to organize them as the list grows?
- make a loose goal for defining group: Just say there’s *something* in common. Doesn’t matter to them. Doesn’t matter if people knew each other personally as long as there’s *some* connection by affiliation, etc.
–groups like Dissenters—these people would not necessarily have known each other. They’re affiliated by group identity
–Gender is treated as a group, so people can study this as social group
–Any conceivable sub-network of English society as a whole
- Expectation that their group data is going to explode, so need to find a way to order and limit:
–alphabetize the hierarchies, and
–deal with de-duplication
Categories—neither mutually exclusive nor required
–simply provide scholars shortcuts to make collections of groups that may be of interest
Six overarching groups to organize categories:
These categories make more of a statement about our scholarly interests now… If there’s an influx of data from musicologists, they may need to add a new group + categories…
High-level categories (controlled):
From what position are types maintained and enforced? Standardizing is really problematic b/c humanities scholarship raises issues with these
Why not use an uncontrolled vocabulary?
An uncontrolled vocabulary would make it hard to do network analysis
Invokes Aristotle: Dispensing with categories dispenses with science. (So, we’re doing Aristotelian work as soon as we aim to produce knowledge by categorization.)
Kinship relations are generalizable across periods, even though what counts as family membership varies over time and cultures. (Cf. Queer theory debate: what counts as family relationship—not necessarily blood, etc)
The Bacon project deals with this by making top-level categories be Controlled, and low-level categories be uncontrolled:
Low-level categories: Uncontrolled (examples):
Aunt/Uncle of <> Niece Nephew of
Apprentice of<>Master of
Mentor<> Mentee of
An entry given a low-level category such as Son of / Father of could potentially belong to multiple High-level categories: Family, legal/commercial, affective, local, intellectual/educational.
The Bacon team aims to achieve the science of the category and the compromise of the complicating factors of relationships.
Q: What would you do with more standard prosopographical data: location born? Is that a group?
A: Groups are properties of a node.
Q: What data is separate from your groups? What about dates?
A (Jessica): They have titles, names, date of birth/death, historical significance… This stuff isn’t group data, but info on each node. All hard-coded into the node separate from the groups.
Q: Crowd-sourcing aspect: raises quality control questions: What systems do you have in place? Who’s responsible? Potential scaling problem if you’re swamped with lots of data?
A: Admin approval is required for just about anything brought in by crowd-sourcing.
Decision to make contributions to the project be open to the public (not just universities)
Also, a set of user-tiers:
1) ordinary user (just prove you are a person)
2) curators (people accepted to know what they’re doing
3) Admins (Jessica, Chirs, , etc
Q: What about temporal relationships? For how long is a group identification marked for time, even if it can’t be displayed?
A: That’s something we’ve programmed in capacity for, though not yet functional on the site. Hope it’s functional by spring when 6degs goes live!
Closing comments: Crowd-sourcing capacity allows for debate over relationships: Users can challenge a group identification—and preserve a record of the scholarly debate on the site
Very conscious of the idea that we’re mapping scholarly knowledge, not actuality!
Case Study 4: Manner of Belonging: Interstitial Description of Dr. Johnson’s Circle (MOB:ID)
Mark: Our metadata is the interface. Mark is metadata coordinator at Yale, and is collaborating with Susan in similar position at Harvard
MOB:ID: a controlled vocatulary describing relationships among persons an dresources connected with 18th-c. lexicographer Samuel Johnson
Project tile draws from definitions in Johnson’s Dictionary (1763: using Dr. Johnon’s defintions of relationship and network rather than FOAF’s)
MOB:ID builds on a past project: EAC-CPF to Reconnect Samuel Johnson: Started with 78 carefully researched and encoded records:
<eac-cpf> a standard for corporate and family names…idea has been around since 1980s
Beinecke and Houghton librarians describe the resources:
<identity> keeps track of name variations and entity type. 78 records, 79 identities—re King George III
<description>narrative description, timeline. Nice if we can map our places and times. Occupations (231 occcupations, 95 are distinct. Occuapations taken from ___? source
<relations> relations between people, corporate body, family. And between resource and the entity
Susan speaks on the development of their vocabulary:
In the past year, January – March 2014: hired 3-month assistant, survey ontologies, analyzed all the relations in the 78 EAC-CPF records…created draft vocab … documented EVERY decision. (so this is all new.)
Syntax: Looked at models; FOAF, AgRelOn, RELATIONSHIP
Debated verb vs. noun constructions—chose clarity over consistency
Debates over clarity vs. consistency: clarity wins out
Struggled with verb tesnes: used present tense except for description of single events;
Indirect creative relationships: example printedBy/printerOfSemantics debates: Historical context vs. possible broader application:
Patron, mentor, protégé, champion. Kept only those that applied to our 78 entities with the 18th-c. meaning:
Multi-faceted relationship: technical roadblock: EAC-CPF only allows one relationType for each cpfRelation, so they’d need to choose which matters most. Decision for MOB:ID (to build flexibility that was missing in EAC-CPF): to repeat the cpfRelation, but describe it in slightly varying relationship terms. Examples:
James Boswell to Johnson: friendOf, writesAbout
David Garrick to Johnson: studentOf, friendOF
This expands their relationship data considerably. Also allows for expression of antagonistic facets: foeOf and friendOf (during times of rift)
Not all relationships are mutual, for example: non-contemporaneous relationships: Is there a relationship between 20th-c scholars and Johnson or Boswell? Or just a mutual asynchronous connection to the resources? They settled on uni-directional relationships: e.g. collectors Donald and Mary Hyde have a relationship to Johnson but there is no reciprocal link.
Resource Relationships hierarchy:
Challenges: How to categorize Johnson’s cat, Hodge? Are pets owned or cohabitants?
Decision: (entityType=person) is linked to other entities with term “cohabiteeOf” (as opposed to ownership!) [ebb sidenote: The Digital Mitford project made a similar decision to include named pets on its historical persons list.]
How about emancipated slaves (and, for that matter, is slavery and personal or a professional relationship)?
Example: Francis Barber was a slave when he first entered Johnson’s service—though owned and lateer freed: How to describe his relationship? Is it personal or professional in relationship kind?
Next steps: It’s not published yet: Publish the vocabulary in RDF form. Interested in cross-walking MOB:ID to gexf, an interchangeable graph representation.
Relevant Resources: Their starting point (EAC-CPF): http://www2.archivists.org/groups/technical-subcommittee-on-eac-cpf/encoded-archival-context-corporate-bodies-persons-and-families-eac-cpf
Project links: Connecting the Dots: Using EAC-CPF to Reunite Samuel Johnson and His Circle: http://wiki.harvard.edu/confluence/display/connectingdots/
Manner of Belonging: Interstitial Description of Dr. Johnson’s Circle: an ontology describing relationships for Samuel Johnson and His Circle: http://projects.iq.harvard.edu/johnson/home
Q (Chris Warren): How comfortable would you be with others adapting this?
A: Yes, absolutely comfortable—like the way we adapt Dublin Core. Would love to adapt this to the Bloomsbury Group!
Chris followup: I’m not hearing the same worries about historical specificity(?)
A: No, there were some concerns… Susan would love to see some discussion of standardization, the structure/underlying procedure of making terminology
Q: Could we use Bloomsbury Group to define the terms?
Elisa: Aren’t we trying to stay centered in the Early Modern word—lots of projects here have done that…) How does period-specific terminology fit in?
Alison: Suppose we agree on a content standard, and we can then have different vocabularies united through a crosswalk. “We make explicit our crosswalk” between the periods.
Q(someone in audience): Have you looked at Z3950 and ISO and others that are explicitly commenting on standards and linked data, as the Getty has been doing? Z39 and some ISO standard in two large parts address some of this specifically: syntax (structure); logic behind choosing, selecting, and combining (e.g., polyvocabularies); specification and conjoining of different terminologies.
Alison: What do these bring?
Audience-member: They bring logic for structures, specificity, conjoining different terminologies. ISO parts 1 and 2 are 300 pages long!
Alison: The ISO standards aren’t informed by or accessible to humanists. How important is it for humanists to use these?
Elisa: ISO standard on gender is notoriously reductive, problematic for humanities applications. I’m still using it in my TEI projects because it’s built in to the personography markup there, and after all it’s a helpful generalizable way to distinguish “male” and “female”.
There’s a gap between humanists and ISO standards.
David Mundie: Not happy with the way the word “ontology” is being bandied around here: It means a formal syntax! We should be using OWL b/c it deals with levels of abstraction on when to create new categories.
6Degs team (Chris?): OWL handles it and describes, but doesn’t DICTATE what to do!
Alison: implementation of OWL and RDF is coming at a later stage.
David Mundie: Using OWL’s formal syntax will solve a lot of problems.
Dan Edelstein: What about biographical standards (as opposed to just relationship standards): Some data is just biographical data—and when does it make sense to keep that separate? Do we need to think about translating biographical data as Group data? (properties of being in a place in time)
Dan Balderston: What happens when, for example, different cultures have different understandings of the concept of “son”.
Alison: We can build crosswalks between the concepts in different cultures/language groups.
David Birnbaum: It’s not cross-walkable: The richer taxonomy may not be inferable from the more impoverished one.
Possible Discussion Topics for the Afternoon Workgroups
Objects v.s People
Search-maximized or Browse maximized?
Just life roles? What about relationships and networks?
How we need it now? Period fidelity/specificity: 1 – 5th degrees that “represent the past”? Or our own categories?
Multiplicity / palimpsest of meanings?
Meta-structure: language-independent, discipline-independent, bounded/unbounded (OWL-based, ISO standards, etc/)
Groups as a property refinement Entities vs. roles
Precision & recall: Do we need to decide which to prioritize? Proposed audience? Does that change the ontologies?
Syntax –nouns/ verbs
Implementation details: RDF, OWL interchange formats—Gephi, etc
Keep visualizations/audiences in mind: Do we have to keep ALL end-cases in mind? Can we please EVERYONE and anticipate the needs of undergrads, etc?
Localization of IS Standards to the Humanistic Discpline
Linked Data: an intellectual commitment connected with humanities questions