DS Catalog:SPARQL Query Service/example queries

From DS 2.0 Catalog
Jump to navigation Jump to search

Using the SPARQL Query Service for the DS Catalog

This page provides basic example queries for exploring the DS Wikibase using SPARQL, a query language designed for RDF-encoded linked datasets. Familiarity with the properties used in the DS Data Model is helpful for understanding how the queries operate, but the queries also contain comments (noted by the use of hash character "#") to direct users to the individual steps taken to better understand how the query is constructed to derive a solution.

Manuscripts and DS Records

In redeveloping the DS data model, the project team made an explicit choice to differentiate between the metadata description (the DS Record) and the manuscript object (Manuscript). Although separate, the data model links the DS Record to the Manuscript, such that a DS Record contains data about the manuscript object from institutional records that provide metadata about the object itself.

The decision to separate but link metadata descriptions from their manuscript objects was purposeful so as not to make any direct claims or assertions about the manuscript object other than its existence (which happens through assignment of a unique persistent identifier: the DS ID). In this way, the DS Record is conceptualized as a document which makes statements about a manuscript object which are not inherent to the manuscript object itself and can be revised at any time. Although the DS data model is designed to have only one DS Record linked to a Manuscript, this conceptualization of descriptive documents as separate from described objects potentially allows many different (and potentially competing) descriptions to be linked to the same object simultaneously.

Because of this data structure, unlike traditional library catalogs or search interfaces (like the one for the DS Catalog), users may find that SPARQL queries seem at first circuitous in comparison to other search and retrieval systems. This is because graphs databases like the DS Wikibase are queried on the basis of pattern matching for particular entities (items) and relationships between entities. A machine rapidly traverses the graph finding patterns that match the path indicated by the query. For purposes of querying DS data, that means that seemingly disparate elements of DS Records, Manuscripts, and even Holding Information (i.e., information about and assigned by the institution that owns and/or contributes data about a manuscript object) may all need to be invoked as part of a constructed queried in order to get solutions to seemingly simple questions (such as which institutions own items with texts authored by Avicenna). Taking some time to understand the items, properties, and linking structures in the DS data model and its substantiation in the DS Wikibase will help to elucidate how queries of this nature can be constructed.

To help users better understand how queries are constructed, the example queries found below provide comments (which are proceeded by # tags) to explain how clauses and asserted triple patterns function in the context of a query. We hope that working through some of these examples will allow users to construct their own more complex queries as they learn more about how the DS data model operates in concert with their research questions.

Prefix Declarations

Why are they used?

Prefix declarations made at the beginning of a SPARQL query tell you which namespaces (ontologies, data models, or other specifications) will be used by the query to construct its triples. Rather than having to write out a long URI every time an entity is referenced, by declaring prefixes, you can shorthand the URIs used later in the query.

For instance, by declaring the following prefixes at the beginning of the query,

PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>

instead of having to type out

<https://catalog.digital-scriptorium.org/entity/Q88> <https://catalog.digital-scriptorium.org/prop/direct/P16> <https://catalog.digital-scriptorium.org/entity/Q13> .

after declaring prefixes, you can type out

wd:Q88 wdt:P16 wd:Q13 .

As you can see, the Q and P values are appended to the end of the base URIs, so that you only need to know the prefix (e.g., wd, wdt) and the appropriate Q or P number to construct the triple pattern you want to use. This makes SPARQL queries much more readable and editable by human beings.

Which prefix declarations will I need to use to query the DS Wikibase?

The following prefix declarations should be at the beginning of any SPARQL query made at the DS Wikibase Query Service endpoint.

PREFIX wd: <https://catalog.digital-scriptorium.org/entity/>
PREFIX wds: <https://catalog.digital-scriptorium.org/entity/statement/>
PREFIX wdv: <https://catalog.digital-scriptorium.org/value/>
PREFIX wdt: <https://catalog.digital-scriptorium.org/prop/direct/>
PREFIX p: <https://catalog.digital-scriptorium.org/prop/>
PREFIX ps: <https://catalog.digital-scriptorium.org/prop/statement/>
PREFIX pq: <https://catalog.digital-scriptorium.org/prop/qualifier/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

Basic Example Queries

Below is a taxonomy of two types of basic queries based on whether the records in the DS Catalog are described by a particular data element in general (e.g., have any author, any assigned genre, or any place of production) or whether the records meet certain criteria (e.g., were produced by a specific author, were assigned a specific genre, or identified as produced in a particular place). The following queries were originally developed by L.P. Coladangelo (DS Catalog and Data Manager) for prototype testing, and adapted by LEADING Fellows Mace Jones and Jade Snelling as part of their fellowship research.

All manuscripts and their DS records

These queries will return lists of manuscript records and the associated data values, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which have values for the below data types.

Find all DS records describing manuscripts by their...

Artists

Authors

Centuries of Production

Dates of Production

Dated status

Former Owners

Genres

Holding Institutions

Languages

Materials

Other associated names/agents

Places of Production

Scribes

Subjects

Titles

Specific manuscripts and their DS records

These queries will return lists of manuscript records based on or limited by a specific value from an associated DS authority record, including both the string value as recorded in the original catalog record (the as_recorded value) and the authority value from a Linked Open Vocabulary to which the as_recorded value has been linked (the authority value). You should expect to see a list of all records and manuscripts in the DS Catalog which meet the conditions of having a specific value for the below data types.

Find all DS records describing manuscripts by a specific...

Artist

Author

Century of Production

Date of Production

  • Start date
  • End date
  • Date range
    • Inside date range
    • Outside date range

Dated status

  • Dated
  • Non-dated

Former Owner

Genre

Holding Institution

Language

Material

Other associated name/agent

Place of Production

Scribe

Subject

Title

User generated examples

TBD

Technical Queries

Authority Record Generator

This query generates a list of authority records in the Wikibase by authority value type (i.e., all items which are an instance of a particular Authority Type).

Dated Classification Generator

This query generates a list of manuscript items which have and have not be classified as dated.

Receipt Generator

This query generates a list of DS Wikibase items which shows successful ingest of DS Records, creation of Manuscript items / DS IDs, and assignment of Holding information for an institution, to serve as a receipt for data contributions made to the DS Catalog.

Statement Count Generator

This query generates a count of the number of statements (triples) in the DS Wikibase matching a particular pattern.

Unenriched Strings Generator

This query generates a list of string values of a particular type of as recorded data occurring in a DS Record which have not be qualified by an authority value (i.e., reconciled to its linked data equivalent in a Linked Open Vocabulary or Authority). <syntaxhighlight lang="SPARQL"> SELECT ?record ?recordLabel ?string ?authority ?authorityLabel ?roleLabel WHERE {

 ?record p:P14 ?stringStatement .
 ?stringStatement ps:P14 ?string .
 #FILTER NOT EXISTS { ?stringStatement pq:P17 ?authority . }
 OPTIONAL { ?stringStatement pq:P15 ?role . }
 
   SERVICE wikibase:label {
   bd:serviceParam wikibase:language "en" .
 }

} ORDER BY ASC (?string) <syntaxhighlight>