Skip to content

A Triple which is not a Quad #144

@awwright

Description

@awwright

It came to my attention in #124 that we don't really have a way of talking about triples without implying that they're part of a graph. Since #124 is about a slightly different issue (if triple should be aliased to quad), I'd like to separately raise adding a Triple interface.

I think it's important to have separate Triple and Quad instances, because they're not the same thing. A Triple is an axiomatic statement; a Quad additionally signifies a Triple exists in a single graph. But sometimes I want to be able to talk about an RDF statement without implying membership in a graph.

So far we've supposed the DefaultGraph should be sufficient if graph membership is unimportant—just treat it as extraneous information. Perhaps we add the requirement that RDF sources add configuration options on how to generate graph names. But this is a workaround; it adds additional complexity to many components of an ecosystem that could be dispensed with entirely.

For example, suppose I parse two Turtle documents and want to test if they're isomorphic. What does this mean if I'm returned a Dataset, without any interface-level guarantee all the triples will be in a single graph? Confusing Quad for Triple muddies the semantics of RDF, which does not define interpretations/entailment over anything other than a single graph. RDF uniquely identifies statements by (subject,predicate,object), and this triple is the same triple even if present in multiple graphs. But the current implementation considers them to be different quads; so there is no way to test for triple-equality.

Adding a graph property immediately doubles the memory requirements to have a fully indexed RDF store. For applications that don't need a graph property—such as testing isomorphism or entailment—this can be quite significant.

URIs/IRIs are supposed to be universal, and so this adds a requirement that each component agree on how to name graphs & treat graph names. While this shouldn't be a foreign concept to RDF developers, a fourth dimension of IRI to maintain is not insignificant, and in my experience working with RDF, not typically necessary; as a result, we now have to decide how to configure a parser that should be zero-configuration.

Sometimes I want to be able to hold multiple graphs in memory without naming them. What are the semantics of having two Quad stores with different information for the same graphs? It's probably possible to figure out, but it's not immediately apparent to me.

It appears to me that Quad stores and named graphs were invented for applications that can't store graphs without names; for example SPARQL, where the graph name is an alternative to a file on the filesystem. But we don't have this limitation in ECMAScript, and I don't think we should limit the data interface to things describable over SPARQL.


For some perspective: Presently I'm working on an application that uses and produces RDFa data. (In the future, it'll do the same with JSON-LD and JSON Hyper-schema.) It uses datasets and quads to identify which RDFa document makes which statements. This is done with a library I've maintained, itself derived from webr3's work.

First, I want the application to manage the namespace for the graphs, as opposed to libraries I call out to. I've tried managing the data a few different ways, and I've simply found it's simpler if I work with Triples when I'm dealing with graphs, and Quads in a single case where I'm aggregating all the information together or querying it.

Second, several of the document operations demands use for Triple, because I have a Graph implementation that provides useful methods that only make sense defined over graphs, things like unions, merges, equality/isomorphism testing, and so on. We're defining an OO interface, and so I would like to define methods that are defined over a graph and not an entire dataset.

Additionally I've been considering adding these methods to Triple, because Triples can be considered a singleton Graph; but a Triple is not a Quad: Since Quad implies two pieces of axiomatic information (both a statement, and its membership to a single graph), and sometimes these methods are only defined over one or the other, not both.


I hope this makes a convincing point; I'm happy to answer any questions or consider any feedback. Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions