DS Catalog:SPARQL Query Service/example queries: Difference between revisions

From DS 2.0 Catalog
Jump to navigation Jump to search
Line 109: Line 109:
==Receipt Generator==
==Receipt Generator==
This query generates a list of DS Wikibase items which shows successful ingest of DS Records, creation of Manuscript items / DS IDs, and assignment of Holding information for an institution, to serve as a receipt for data contributions made to the DS Catalog.
This query generates a list of DS Wikibase items which shows successful ingest of DS Records, creation of Manuscript items / DS IDs, and assignment of Holding information for an institution, to serve as a receipt for data contributions made to the DS Catalog.
<syntaxhighlight lang="SPARQL>
SELECT
  # Variables for holding values used in the construction of the receipt
  (?holderName as ?holding_institution)
  (?holder as ?ds_holding_inst_url)
  (?dsid as ?ds_id)
  (?manuscript as ?ds_manuscript_url)
  (?shelfmark as ?holding_inst_shelfmark)
  (?institutionalID as ?holding_inst_id)
  (?linkToRecord as ?holding_inst_link)
  (?iiifManifest as ?iiif_manifest)
  (?holdingLabel as ?ds_holding)
  (?holding as ?ds_holding_info_url)
  (?dateAdded as ?date_added_to_ds)
  (?ds20RecordLabel as ?ds_record)
  (?ds20Record as ?ds_record_url)
  (?lastUpdated as ?ds_record_last_updated)
{
  # Specify a holding institution (remove comment tag)
#  BIND (wd:Q16442 as ?holder) # bpl - Boston Public Library
#  BIND (wd:Qxxxx as ?holder) # columbia - Columbia University
#  BIND (wd:Q667 as ?holder) # conception - Conception Abbey
#  BIND (wd:Q825 as ?holder) # csl - State of Calfornia Library
#  BIND (wd:Q858 as ?holder) # cuny - City College of New York
#  BIND (wd:Q18629 as ?holder) # flp - Free Library of Philadelphia
#  BIND (wd:Q868 as ?holder) # grolier - Grolier Club
#  BIND (wd:Q1487 as ?holder) # gts - General Theological Seminary
#  BIND (wd:Q17632 as ?holder) # hrc - Harry Ransom Center
#  BIND (wd:Qxxxx as ?holder) # huntington - The Huntington
#  BIND (wd:Q1521 as ?holder) # indiana - Indiana University
#  BIND (wd:Q6060 as ?holder) # kansas - University of Kansas
#  BIND (wd:Q1123 as ?holder) # nelsonatkins - Nelson-Atkins Museum of Art
#  BIND (wd:Q1914 as ?holder) # nyu - New York University
#  BIND (wd:Q10856 as ?holder) # oregon - University of Oregon
#  BIND (wd:Q374 as ?holder) # penn - University of Pennsylvania
#  BIND (wd:Q12264 as ?holder) # princeton - Princeton University
#  BIND (wd:Q801 as ?holder) # providence - Providence Public Library
#  BIND (wd:Q1101 as ?holder) # rome - American Academy in Rome
#  BIND (wd:Q1936 as ?holder) # rutgers - Rutgers University
#  BIND (wd:Qxxxx as ?holder) # shi - Science History Institute
#  BIND (wd:Q1247 as ?holder) # smith - Smith College
#  BIND (wd:Qxxxx as ?holder) # wmu - Western Michigan University
 
  # holding and holding properties
  BIND ( wd:Q2 as ?holdingType )
  BIND ( wdt:P16 as ?instanceOf )
  BIND ( wdt:P2 as ?hasHolding )
  BIND ( pq:P4 as ?qualifierHoldingInstInAuthFile )
  BIND ( p:P5 as ?holdingInstitutionAsRecStmt )
  BIND ( ps:P5 as ?holdingInstAsRecValue )
  BIND ( wdt:P7 as ?hasInstID )
  BIND ( wdt:P8 as ?hasShelfmark )
  BIND ( wdt:P9 as ?hasLinkToInstRecord )
  BIND ( wdt:P38 as ?hasHoldingAddedDate )
  BIND ( wdt:P39 as ?hasHoldingEndDate )
 
  # manuscript properties
  BIND ( wdt:P1 as ?hasDSID )
 
  # DS 2.0 Record properties
  BIND ( wdt:P3 as ?describesManuscript )
  BIND ( wdt:P35 as ?hasDateLastUpdated )
  BIND ( wdt:P41 as ?hasIIIFManifest )
  # holding information
  ?holding ?instanceOf ?holdingType ;
          ?holdingInstitutionAsRecStmt ?holdingInstStatement ;
          ?hasHoldingAddedDate ?dateAdded .
                     
  OPTIONAL { ?holding ?hasInstID ?institutionalID }
  OPTIONAL { ?holding ?hasShelfmark ?shelfmark }
  OPTIONAL { ?holding ?hasLinkToInstRecord ?linkToRecord }
  ?holdingInstStatement ?qualifierHoldingInstInAuthFie ?holder .
  ?holder rdfs:label ?holderName .
 
  # finding linked manuscript objects to holding information patterns above
  ?manuscript ?hasHolding ?holding ;
              ?hasDSID ?dsid .
  ?holding rdfs:label ?holdingLabel .
 
  # finding linked DS records to manuscript object patterns above
  ?ds20Record ?describesManuscript ?manuscript ;
              ?hasDateLastUpdated ?lastUpdated ;
              rdfs:label ?ds20RecordLabel .
 
  OPTIONAL { ?ds20Record ?hasIIIFManifest ?iiifManifest }
  # get alphanumerical IDs from Wikibase URIs
  BIND (STRDT(REPLACE(STR(?holder), "http.+/entity/", ""), xsd:integer) as ?holderQID)
  BIND (STRDT(REPLACE(STR(?holding), "http.+/entity/", ""), xsd:integer) as ?holdingQID)
  BIND (STRDT(REPLACE(STR(?manuscript), "http.+/entity/", ""), xsd:integer) as ?manuscriptQID)
 
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
      # allows English language labels to be returned for Wikibase items
  }
} ORDER BY DESC(?lastUpdated) ASC(?shelfmark)
  # sort results by date updated in Wikibase and then by shelfmark
</syntaxhighlight>


==Statement Count Generator==
==Statement Count Generator==

Revision as of 14:56, 30 January 2024

Using the SPARQL Query Service for the DS Catalog

This page provides basic example queries for exploring the DS Wikibase using SPARQL, a query language designed for RDF-encoded linked datasets. Familiarity with the properties used in the DS Data Model is helpful for understanding how the queries operate, but the queries also contain comments (noted by the use of hash character "#") to direct users to the individual steps taken to better understand how the query is constructed to derive a solution.

Manuscripts and DS Records

In redeveloping the DS data model, the project team made an explicit choice to differentiate between the metadata description (the DS Record) and the manuscript object (Manuscript). Although separate, the data model links the DS Record to the Manuscript, such that a DS Record contains data about the manuscript object from institutional records that provide metadata about the object itself.

The decision to separate but link metadata descriptions from their manuscript objects was purposeful so as not to make any direct claims or assertions about the manuscript object other than its existence (which happens through assignment of a unique persistent identifier: the DS ID). In this way, the DS Record is conceptualized as a document which makes statements about a manuscript object which are not inherent to the manuscript object itself and can be revised at any time. Although the DS data model is designed to have only one DS Record linked to a Manuscript, this conceptualization of descriptive documents as separate from described objects potentially allows many different (and potentially competing) descriptions to be linked to the same object simultaneously.

Because of this data structure, unlike traditional library catalogs or search interfaces (like the one for the DS Catalog), users may find that SPARQL queries seem at first circuitous in comparison to other search and retrieval systems. This is because graphs databases like the DS Wikibase are queried on the basis of pattern matching for particular entities (items) and relationships between entities. A machine rapidly traverses the graph finding patterns that match the path indicated by the query. For purposes of querying DS data, that means that seemingly disparate elements of DS Records, Manuscripts, and even Holding Information (i.e., information about and assigned by the institution that owns and/or contributes data about a manuscript object) may all need to be invoked as part of a constructed queried in order to get solutions to seemingly simple questions (such as which institutions own items with texts authored by Avicenna). Taking some time to understand the items, properties, and linking structures in the DS data model and its substantiation in the DS Wikibase will help to elucidate how queries of this nature can be constructed.

To help users better understand how queries are constructed, the example queries found below provide comments (which are proceeded by # tags) to explain how clauses and asserted triple patterns function in the context of a query. We hope that working through some of these examples will allow users to construct their own more complex queries as they learn more about how the DS data model operates in concert with their research questions.

Prefix Declarations

Why are they used?

Prefix declarations made at the beginning of a SPARQL query tell you which namespaces (ontologies, data models, or other specifications) will be used by the query to construct its triples. Rather than having to write out a long URI every time an entity is referenced, by declaring prefixes, you can shorthand the URIs used later in the query.

For instance, by declaring the following prefixes at the beginning of the query,

PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>

instead of having to type out

<https://catalog.digital-scriptorium.org/entity/Q88> <https://catalog.digital-scriptorium.org/prop/direct/P16> <https://catalog.digital-scriptorium.org/entity/Q13> .

after declaring prefixes, you can type out

wd:Q88 wdt:P16 wd:Q13 .

As you can see, the Q and P values are appended to the end of the base URIs, so that you only need to know the prefix (e.g., wd, wdt) and the appropriate Q or P number to construct the triple pattern you want to use. This makes SPARQL queries much more readable and editable by human beings.

Which prefix declarations will I need to use to query the DS Wikibase?

The following prefix declarations should be at the beginning of any SPARQL query made at the DS Wikibase Query Service endpoint.

PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wds: <https://catalog.digital-scriptorium.org/entity/statement/>
PREFIX wdv: <https://catalog.digital-scriptorium.org/value/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>
PREFIX p: <https://catalog.digital-scriptorium.org/prop/>
PREFIX ps: <https://catalog.digital-scriptorium.org/prop/statement/>
PREFIX pq: <https://catalog.digital-scriptorium.org/prop/qualifier/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

Basic Example Queries

Below is a taxonomy of two types of basic queries based on whether the records in the DS Catalog are described by a particular data element in general (e.g., have any author, any assigned genre, or any place of production) or whether the records meet certain criteria (e.g., were produced by a specific author, were assigned a specific genre, or identified as produced in a particular place). The following queries were originally developed by L.P. Coladangelo (DS Catalog and Data Manager) for prototype testing, and adapted by LEADING Fellows Mace Jones and Jade Snelling as part of their fellowship research.

All manuscripts and their DS records

These queries will return lists of manuscript records and the associated data values, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which have values for the below data types.

Find all DS records describing manuscripts by their...

Artists

Authors

Centuries of Production

Dates of Production

Dated status

Former Owners

Genres

Holding Institutions

Languages

Materials

Other associated names/agents

Places of Production

Scribes

Subjects

Titles

Specific manuscripts and their DS records

These queries will return lists of manuscript records based on or limited by a specific value from an associated DS authority record, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which meet the conditions of having a specific value for the below data types.

Find all DS records describing manuscripts by a specific...

Artist

Author

Century of Production

Date of Production

  • Start date
  • End date
  • Date range
    • Inside date range
    • Outside date range

Dated status

  • Dated
  • Non-dated

Former Owner

Genre

Holding Institution

Language

Material

Other associated name/agent

Place of Production

Scribe

Subject

Title

User generated examples

TBD

Technical Queries

Authority Record Generator

This query generates a list of authority records in the Wikibase by authority value type (i.e., all items which are an instance of a particular Authority Type).

Dated Classification Generator

This query generates a list of manuscript items which have and have not be classified as dated.

Receipt Generator

This query generates a list of DS Wikibase items which shows successful ingest of DS Records, creation of Manuscript items / DS IDs, and assignment of Holding information for an institution, to serve as a receipt for data contributions made to the DS Catalog.

SELECT
  # Variables for holding values used in the construction of the receipt
  (?holderName as ?holding_institution) 
  (?holder as ?ds_holding_inst_url)
  (?dsid as ?ds_id)
  (?manuscript as ?ds_manuscript_url)
  (?shelfmark as ?holding_inst_shelfmark)
  (?institutionalID as ?holding_inst_id)
  (?linkToRecord as ?holding_inst_link)
  (?iiifManifest as ?iiif_manifest)
  (?holdingLabel as ?ds_holding)
  (?holding as ?ds_holding_info_url)
  (?dateAdded as ?date_added_to_ds) 
  (?ds20RecordLabel as ?ds_record)
  (?ds20Record as ?ds_record_url)
  (?lastUpdated as ?ds_record_last_updated)
{

  # Specify a holding institution (remove comment tag)
#  BIND (wd:Q16442 as ?holder) # bpl - Boston Public Library
#  BIND (wd:Qxxxx as ?holder) # columbia - Columbia University
#  BIND (wd:Q667 as ?holder) # conception - Conception Abbey
#  BIND (wd:Q825 as ?holder) # csl - State of Calfornia Library
#  BIND (wd:Q858 as ?holder) # cuny - City College of New York
#  BIND (wd:Q18629 as ?holder) # flp - Free Library of Philadelphia
#  BIND (wd:Q868 as ?holder) # grolier - Grolier Club
#  BIND (wd:Q1487 as ?holder) # gts - General Theological Seminary
#  BIND (wd:Q17632 as ?holder) # hrc - Harry Ransom Center
#  BIND (wd:Qxxxx as ?holder) # huntington - The Huntington
#  BIND (wd:Q1521 as ?holder) # indiana - Indiana University
#  BIND (wd:Q6060 as ?holder) # kansas - University of Kansas
#  BIND (wd:Q1123 as ?holder) # nelsonatkins - Nelson-Atkins Museum of Art
#  BIND (wd:Q1914 as ?holder) # nyu - New York University
#  BIND (wd:Q10856 as ?holder) # oregon - University of Oregon
#  BIND (wd:Q374 as ?holder) # penn - University of Pennsylvania
#  BIND (wd:Q12264 as ?holder) # princeton - Princeton University
#  BIND (wd:Q801 as ?holder) # providence - Providence Public Library
#  BIND (wd:Q1101 as ?holder) # rome - American Academy in Rome
#  BIND (wd:Q1936 as ?holder) # rutgers - Rutgers University
#  BIND (wd:Qxxxx as ?holder) # shi - Science History Institute
#  BIND (wd:Q1247 as ?holder) # smith - Smith College
#  BIND (wd:Qxxxx as ?holder) # wmu - Western Michigan University
  
  # holding and holding properties
  BIND ( wd:Q2 as ?holdingType )
  BIND ( wdt:P16 as ?instanceOf )
  BIND ( wdt:P2 as ?hasHolding )
  BIND ( pq:P4 as ?qualifierHoldingInstInAuthFile )
  BIND ( p:P5 as ?holdingInstitutionAsRecStmt )
  BIND ( ps:P5 as ?holdingInstAsRecValue )
  BIND ( wdt:P7 as ?hasInstID )
  BIND ( wdt:P8 as ?hasShelfmark )
  BIND ( wdt:P9 as ?hasLinkToInstRecord )
  BIND ( wdt:P38 as ?hasHoldingAddedDate )
  BIND ( wdt:P39 as ?hasHoldingEndDate )
  
  # manuscript properties
  BIND ( wdt:P1 as ?hasDSID )
  
  # DS 2.0 Record properties
  BIND ( wdt:P3 as ?describesManuscript )
  BIND ( wdt:P35 as ?hasDateLastUpdated )
  BIND ( wdt:P41 as ?hasIIIFManifest )


  # holding information
  ?holding ?instanceOf ?holdingType ;
           ?holdingInstitutionAsRecStmt ?holdingInstStatement ;
           ?hasHoldingAddedDate ?dateAdded .
                      
  OPTIONAL { ?holding ?hasInstID ?institutionalID }
  OPTIONAL { ?holding ?hasShelfmark ?shelfmark }
  OPTIONAL { ?holding ?hasLinkToInstRecord ?linkToRecord }

  ?holdingInstStatement ?qualifierHoldingInstInAuthFie ?holder .
  ?holder rdfs:label ?holderName .
  
  # finding linked manuscript objects to holding information patterns above
  ?manuscript ?hasHolding ?holding ;
              ?hasDSID ?dsid .
  ?holding rdfs:label ?holdingLabel .
  
  # finding linked DS records to manuscript object patterns above
  ?ds20Record ?describesManuscript ?manuscript ;
              ?hasDateLastUpdated ?lastUpdated ;
              rdfs:label ?ds20RecordLabel .
  
  OPTIONAL { ?ds20Record ?hasIIIFManifest ?iiifManifest }

  # get alphanumerical IDs from Wikibase URIs
  BIND (STRDT(REPLACE(STR(?holder), "http.+/entity/", ""), xsd:integer) as ?holderQID)
  BIND (STRDT(REPLACE(STR(?holding), "http.+/entity/", ""), xsd:integer) as ?holdingQID)
  BIND (STRDT(REPLACE(STR(?manuscript), "http.+/entity/", ""), xsd:integer) as ?manuscriptQID) 
  
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
      # allows English language labels to be returned for Wikibase items
  }
} ORDER BY DESC(?lastUpdated) ASC(?shelfmark)
  # sort results by date updated in Wikibase and then by shelfmark

Statement Count Generator

This query generates a count of the number of statements (triples) in the DS Wikibase matching a particular pattern.

SELECT (COUNT(?string) AS ?stringCount)
  #declared variable that will be counted and passed to another variable to display as a number

WHERE {
  ?stringStatement ps:P10 ?string .
    # use "as recorded" property P-value for the data you want to count
  
    SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
      # allows English language labels to be returned for Wikibase items
  }
}
GROUP BY (?stringCount)
  # this sorts the results by number, and is necessary for COUNT clauses

Unenriched Strings Generator

This query generates a list of string values of a particular type of as recorded data occurring in a DS Record which have not be qualified by an authority value (i.e., reconciled to its linked data equivalent in a Linked Open Vocabulary or Authority). This example uses properties for name data, but any authority enriched data can be queried using a similar query structure.

SELECT
  # find values for the following declared variables matching the below WHERE pattern
?record
  # a link to a DS record
?recordLabel
  # the name/label of the DS record
?string
  # the string value as recorded in the original catalog record
?authority
  # a link to the authority record to which the string value has been reconciled
?authorityLabel
  # a label for the the authority record in the DS database
#?roleLabel
  # where applicable, a label for role information (un-comment roleLabel variable when querying name data to get role information)

WHERE
  # the patterns or conditions that need to be met to return values for the above variables
  {
    ?record p:P14 ?stringStatement .
	  # identifies records with statements with corresponding property (change P-value for as recorded value to be queried)
    ?stringStatement ps:P14 ?string .
	  # identifies statements that have strings with corresponding property (change P-value for as recorded value to be queried)
    FILTER NOT EXISTS { ?stringStatement pq:P17 ?authority . }
	  # identifies those statements which have not been enriched with authority values (change P-value for authority file value to be queried)
    #OPTIONAL { ?stringStatement pq:P15 ?role . }
	  # only used for name data, un-comment optional clause when querying name data
  
    SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en" .
      # allows English language labels to be returned for Wikibase items
  }
}
ORDER BY ASC (?string)
	# this sorts the results alphabetically by as recorded string values