DS Catalog:SPARQL Query Service/example queries: Difference between revisions
Line 108: | Line 108: | ||
===Personal Names and Corporate Names=== | ===Personal Names and Corporate Names=== | ||
===Subjects | ===Genres and Subjects=== | ||
===Standard Titles=== | ===Standard Titles=== |
Revision as of 22:51, 12 February 2024
Using the SPARQL Query Service for the DS Catalog
This page provides basic example queries for exploring the DS Wikibase using SPARQL, a query language designed for RDF-encoded linked datasets. Familiarity with the properties used in the DS Data Model is helpful for understanding how the queries operate, but the queries also contain comments (noted by the use of hash character "#") to direct users to the individual steps taken to better understand how the query is constructed to derive a solution.
Manuscripts and DS Records
In redeveloping the DS data model, the project team made an explicit choice to differentiate between the metadata description (the DS Record) and the manuscript object (Manuscript). Although separate, the data model links the DS Record to the Manuscript, such that a DS Record contains data about the manuscript object from institutional records that provide metadata about the object itself.
The decision to separate but link metadata descriptions from their manuscript objects was purposeful so as not to make any direct claims or assertions about the manuscript object other than its existence (which happens through assignment of a unique persistent identifier: the DS ID). In this way, the DS Record is conceptualized as a document which makes statements about a manuscript object which are not inherent to the manuscript object itself and can be revised at any time. Although the DS data model is designed to have only one DS Record linked to a Manuscript, this conceptualization of descriptive documents as separate from described objects potentially allows many different (and potentially competing) descriptions to be linked to the same object simultaneously.
Because of this data structure, unlike traditional library catalogs or search interfaces (like the one for the DS Catalog), users may find that SPARQL queries seem at first circuitous in comparison to other search and retrieval systems. This is because graphs databases like the DS Wikibase are queried on the basis of pattern matching for particular entities (items) and relationships between entities. A machine rapidly traverses the graph finding patterns that match the path indicated by the query. For purposes of querying DS data, that means that seemingly disparate elements of DS Records, Manuscripts, and even Holding Information (i.e., information about and assigned by the institution that owns and/or contributes data about a manuscript object) may all need to be invoked as part of a constructed queried in order to get solutions to seemingly simple questions (such as which institutions own items with texts authored by Avicenna). Taking some time to understand the items, properties, and linking structures in the DS data model and its substantiation in the DS Wikibase will help to elucidate how queries of this nature can be constructed.
To help users better understand how queries are constructed, the example queries found below provide comments (which are proceeded by # tags) to explain how clauses and asserted triple patterns function in the context of a query. We hope that working through some of these examples will allow users to construct their own more complex queries as they learn more about how the DS data model operates in concert with their research questions.
Prefix Declarations
Why are they used?
Prefix declarations made at the beginning of a SPARQL query tell you which namespaces (ontologies, data models, or other specifications) will be used by the query to construct its triples. Rather than having to write out a long URI every time an entity is referenced, by declaring prefixes, you can shorthand the URIs used later in the query.
For instance, by declaring the following prefixes at the beginning of the query,
PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>
instead of having to type out
<https://catalog.digital-scriptorium.org/entity/Q88> <https://catalog.digital-scriptorium.org/prop/direct/P16> <https://catalog.digital-scriptorium.org/entity/Q13> .
after declaring prefixes, you can type out
wd:Q88 wdt:P16 wd:Q13 .
As you can see, the Q and P values are appended to the end of the base URIs, so that you only need to know the prefix (e.g., wd, wdt) and the appropriate Q or P number to construct the triple pattern you want to use. This makes SPARQL queries much more readable and editable by human beings.
Which prefix declarations will I need to use to query the DS Wikibase?
The following prefix declarations should be at the beginning of any SPARQL query made at the DS Wikibase Query Service endpoint.
PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wds: <https://catalog.digital-scriptorium.org/entity/statement/>
PREFIX wdv: <https://catalog.digital-scriptorium.org/value/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>
PREFIX p: <https://catalog.digital-scriptorium.org/prop/>
PREFIX ps: <https://catalog.digital-scriptorium.org/prop/statement/>
PREFIX pq: <https://catalog.digital-scriptorium.org/prop/qualifier/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
Basic Example Queries
Below is a taxonomy of two types of basic queries based on whether the records in the DS Catalog are described by a particular data element in general (e.g., have any author, any assigned genre, or any place of production) or whether the records meet certain criteria (e.g., were produced by a specific author, were assigned a specific genre, or identified as produced in a particular place). The following queries were originally developed by L.P. Coladangelo (DS Catalog and Data Manager) for prototype testing, and adapted by LEADING Fellows Mace Jones and Jade Snelling as part of their fellowship research.
All manuscripts and their DS records
These queries will return lists of manuscript records and the associated data values, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which have values for the below data types.
Find all DS records describing manuscripts by their...
Artists
Authors
Centuries of Production
Dates of Production
Dated status
Former Owners
Genres
Holding Institutions
Languages
Materials
Other associated names/agents
Places of Production
Scribes
Subjects
Titles
Specific manuscripts and their DS records
These queries will return lists of manuscript records based on or limited by a specific value from an associated DS authority record, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which meet the conditions of having a specific value for the below data types.
Find all DS records describing manuscripts by a specific...
Artist
Author
Century of Production
Date of Production
- Start date
- End date
- Date range
- Inside date range
- Outside date range
Dated status
- Dated
- Non-dated
Former Owner
Genre
Holding Institution
Language
Material
Other associated name/agent
Place of Production
Scribe
Subject
Title
User generated examples
TBD
Technical Queries
Authority Record Generator
This query generates a list of authority records in the Wikibase by authority value type (i.e., all items which are an instance of a particular Authority Type).
Languages, Materials, Places
Personal Names and Corporate Names
Genres and Subjects
Standard Titles
Dated Classification Generator
This query generates a list of manuscript items which have and have not be classified as dated.
Receipt Generator
This query generates a list of DS Wikibase items which shows successful ingest of DS Records, creation of Manuscript items / DS IDs, and assignment of Holding information for an institution, to serve as a receipt for data contributions made to the DS Catalog.
SELECT
# Variables for holding values used in the construction of the receipt
(?holderName as ?holding_institution)
(?holder as ?ds_holding_inst_url)
(?dsid as ?ds_id)
(?manuscript as ?ds_manuscript_url)
(?shelfmark as ?holding_inst_shelfmark)
(?institutionalID as ?holding_inst_id)
(?linkToRecord as ?holding_inst_link)
(?iiifManifest as ?iiif_manifest)
(?holdingLabel as ?ds_holding)
(?holding as ?ds_holding_info_url)
(?dateAdded as ?date_added_to_ds)
(?ds20RecordLabel as ?ds_record)
(?ds20Record as ?ds_record_url)
(?lastUpdated as ?ds_record_last_updated)
{
# Specify a holding institution (remove comment tag)
# BIND (wd:Q28019 as ?holder) # burke - Burke Library at Union Theological Seminary
# BIND (wd:Q16442 as ?holder) # bpl - Boston Public Library
# BIND (wd:Q27887 as ?holder) # columbia - Columbia University Rare Book and Manuscript Library
# BIND (wd:Q667 as ?holder) # conception - Conception Abbey
# BIND (wd:Q825 as ?holder) # csl - State of Calfornia Library
# BIND (wd:Q858 as ?holder) # cuny - City College of New York
# BIND (wd:Q18629 as ?holder) # flp - Free Library of Philadelphia
# BIND (wd:Q868 as ?holder) # grolier - Grolier Club
# BIND (wd:Q1487 as ?holder) # gts - General Theological Seminary
# BIND (wd:Q17632 as ?holder) # hrc - Harry Ransom Center
# BIND (wd:Qxxxxx as ?holder) # huntington - The Huntington
# BIND (wd:Q1521 as ?holder) # indiana - Indiana University
# BIND (wd:Q6060 as ?holder) # kansas - University of Kansas
# BIND (wd:Q1123 as ?holder) # nelsonatkins - Nelson-Atkins Museum of Art
# BIND (wd:Q1914 as ?holder) # nyu - New York University
# BIND (wd:Q10856 as ?holder) # oregon - University of Oregon
# BIND (wd:Q374 as ?holder) # penn - University of Pennsylvania
# BIND (wd:Q12264 as ?holder) # princeton - Princeton University
# BIND (wd:Q801 as ?holder) # providence - Providence Public Library
# BIND (wd:Q1101 as ?holder) # rome - American Academy in Rome
# BIND (wd:Q1936 as ?holder) # rutgers - Rutgers University
# BIND (wd:Q27854 as ?holder) # shi - Science History Institute
# BIND (wd:Q1247 as ?holder) # smith - Smith College
# BIND (wd:Q27869 as ?holder) # wmu - Western Michigan University
# holding and holding properties
BIND ( wd:Q2 as ?holdingType )
BIND ( wdt:P16 as ?instanceOf )
BIND ( wdt:P2 as ?hasHolding )
BIND ( pq:P4 as ?qualifierHoldingInstInAuthFile )
BIND ( p:P5 as ?holdingInstitutionAsRecStmt )
BIND ( ps:P5 as ?holdingInstAsRecValue )
BIND ( wdt:P7 as ?hasInstID )
BIND ( wdt:P8 as ?hasShelfmark )
BIND ( wdt:P9 as ?hasLinkToInstRecord )
BIND ( wdt:P38 as ?hasHoldingAddedDate )
BIND ( wdt:P39 as ?hasHoldingEndDate )
# manuscript properties
BIND ( wdt:P1 as ?hasDSID )
# DS 2.0 Record properties
BIND ( wdt:P3 as ?describesManuscript )
BIND ( wdt:P35 as ?hasDateLastUpdated )
BIND ( wdt:P41 as ?hasIIIFManifest )
# holding information
?holding ?instanceOf ?holdingType ;
?holdingInstitutionAsRecStmt ?holdingInstStatement ;
?hasHoldingAddedDate ?dateAdded .
OPTIONAL { ?holding ?hasInstID ?institutionalID }
OPTIONAL { ?holding ?hasShelfmark ?shelfmark }
OPTIONAL { ?holding ?hasLinkToInstRecord ?linkToRecord }
?holdingInstStatement ?qualifierHoldingInstInAuthFie ?holder .
?holder rdfs:label ?holderName .
# finding linked manuscript objects to holding information patterns above
?manuscript ?hasHolding ?holding ;
?hasDSID ?dsid .
?holding rdfs:label ?holdingLabel .
# finding linked DS records to manuscript object patterns above
?ds20Record ?describesManuscript ?manuscript ;
?hasDateLastUpdated ?lastUpdated ;
rdfs:label ?ds20RecordLabel .
# filter results by a period of time when records were last updated
FILTER ((?lastUpdated > "2023-01-01"^^xsd:dateTime) && (?lastUpdated < "2024-01-01"^^xsd:dateTime)).
# display IIIF manifest URLs if present
OPTIONAL { ?ds20Record ?hasIIIFManifest ?iiifManifest }
# get alphanumerical IDs from Wikibase URIs
BIND (STRDT(REPLACE(STR(?holder), "http.+/entity/", ""), xsd:integer) as ?holderQID)
BIND (STRDT(REPLACE(STR(?holding), "http.+/entity/", ""), xsd:integer) as ?holdingQID)
BIND (STRDT(REPLACE(STR(?manuscript), "http.+/entity/", ""), xsd:integer) as ?manuscriptQID)
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
# allows English language labels to be returned for Wikibase items
}
} ORDER BY DESC(?lastUpdated) ASC(?shelfmark)
# sort results by date updated in Wikibase and then by shelfmark
Statement Count Generator
This query generates a count of the number of statements (triples) in the DS Wikibase matching a particular pattern.
SELECT (COUNT(?string) AS ?stringCount)
#declared variable that will be counted and passed to another variable to display as a number
WHERE {
?stringStatement ps:P10 ?string .
# use "as recorded" property P-value for the data you want to count
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
# allows English language labels to be returned for Wikibase items
}
}
GROUP BY (?stringCount)
# this sorts the results by number, and is necessary for COUNT clauses
Unenriched Strings Generator
This query generates a list of string values of a particular type of as recorded data occurring in a DS Record which have not be qualified by an authority value (i.e., reconciled to its linked data equivalent in a Linked Open Vocabulary or Authority). This example uses properties for name data, but any authority enriched data can be queried using a similar query structure.
SELECT
# find values for the following declared variables matching the below WHERE pattern
?record
# a link to a DS record
?recordLabel
# the name/label of the DS record
?string
# the string value as recorded in the original catalog record
?authority
# a link to the authority record to which the string value has been reconciled
?authorityLabel
# a label for the the authority record in the DS database
#?roleLabel
# where applicable, a label for role information (un-comment roleLabel variable when querying name data to get role information)
WHERE
# the patterns or conditions that need to be met to return values for the above variables
{
?record p:P14 ?stringStatement .
# identifies records with statements with corresponding property (change P-value for as recorded value to be queried)
?stringStatement ps:P14 ?string .
# identifies statements that have strings with corresponding property (change P-value for as recorded value to be queried)
FILTER NOT EXISTS { ?stringStatement pq:P17 ?authority . }
# identifies those statements which have not been enriched with authority values (change P-value for authority file value to be queried)
#OPTIONAL { ?stringStatement pq:P15 ?role . }
# only used for name data, un-comment optional clause when querying name data
SERVICE wikibase:label {
bd:serviceParam wikibase:language "en" .
# allows English language labels to be returned for Wikibase items
}
}
ORDER BY ASC (?string)
# this sorts the results alphabetically by as recorded string values