Implementation view of spec

TLDR I found it harder to implement this in the way the spec defines, and found a split between read and write to make things a lot easier

-------------------------------------

As a pretext to this, I did previously try to implement this based on using async iterables in place of "streams" where the dataset also implemented `[Symbol.asyncIterator]`, however the implementation was messy and didn't cleanly fall into place, it required weird inheritance where a factory was required to create new datasets within `DatasetCore`.

I hoped that using [async iterables](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/asyncIterator) would help with the lazy style of reading I wanted to achieve, but it turns out you can achieve this same thing with sync [Iterables](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/iterator), which can be seen in a code snippet below

-------------------------------------

I have had great success implementing dataset using two different concepts

First, we have all of our "read" dataset

```ts
export interface FilterIterateeFn<T> {
  (value: T): boolean
}

export interface RunIteratee<T> {
  (value: T): void
}

export interface MapIteratee<T, R> {
  (value: T): R
}

export interface ReadonlyDataset extends Iterable<Quad> {
  size: number
  empty: boolean
  filter(iteratee: FilterIterateeFn<Quad>): ReadonlyDataset
  except(iteratee: FilterIterateeFn<Quad>): ReadonlyDataset
  match(find: Quad | QuadFind): ReadonlyDataset
  without(find: Quad | QuadFind): ReadonlyDataset
  has(find: Quad | QuadFind): boolean
  contains(dataset: Iterable<Quad | QuadLike>): boolean
  difference(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
  equals(dataset: Iterable<Quad | QuadLike>): boolean
  every(iteratee: FilterIterateeFn<Quad>): boolean
  forEach(iteratee: RunIteratee<Quad>): void
  intersection(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
  map(iteratee: MapIteratee<Quad, QuadLike>): ReadonlyDataset
  some(iteratee: FilterIterateeFn<Quad>): boolean
  toArray(): Quad[]
  union(dataset: Iterable<Quad | QuadLike>): ReadonlyDataset
}
```

And our write dataset...

```ts
export interface Dataset extends ReadonlyDataset {
  add(value: Quad | QuadLike): Dataset
  addAll(dataset: Iterable<Quad | QuadLike>): Dataset
  import(dataset: AsyncIterable<Quad | QuadLike>): Promise<unknown>
  delete(quad: Quad | QuadLike | QuadFind): Dataset
}
```

If we want an immutable write dataset...

```ts
export interface ImmutableDataset extends Dataset {
  add(value: Quad | QuadLike): ImmutableDataset
  addAll(dataset: Iterable<Quad | QuadLike>): ImmutableDataset
  import(dataset: AsyncIterable<Quad | QuadLike>): Promise<ImmutableDataset>
  delete(quad: Quad | QuadLike | QuadFind): ImmutableDataset
}
```

A couple of specific changes...

`DatasetCore` is "moved up" to become the write dataset on top, 
`Dataset` is "moved down" to become the read dataset. 

_write_ functions return _writable_ datasets
_read_ functions return _readable_ datasets

In terms of implementation, this felt a lot more natural and took a lot less time than trying to follow the spec one to one

I've used types specifically for TypeScript here, but I think it shows nicely the how implementation works

Behind the scenes I was able to utilise a [Set](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Set) as the backing collection for quads

The read dataset in this implementation accepts a source [Iterable](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Symbol/iterator), which the read dataset implements itself, meaning we don't need any intermediate steps in between creating new datasets, a Set also implements this, making everything very seemless. 

Chaining using the read dataset is very clean which can be seen in the implementation of the read dataset itself:

https://github.com/opennetwork/rdf-dataset/blob/02f8d19e78b8065cfc0f78691f1af174e8c47425/src/readonly-dataset.ts#L76-L78

Using iterables also enables this kind of usage where the returned read dataset is a "live" view of the write dataset:

```js
import { Dataset } from "../esnext/index.js"
import { DefaultDataFactory } from "@opennetwork/rdf-data-model"

const dataset = new Dataset()

const aNameMatch = {
  subject: DefaultDataFactory.blankNode("a"),
  predicate: DefaultDataFactory.namedNode("http://xmlns.com/foaf/0.1/name"),
  graph: DefaultDataFactory.defaultGraph()
}

const aMatcher = dataset.match(aNameMatch)

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"A"@en`)
})

dataset.add({
  subject: DefaultDataFactory.blankNode("s"),
  predicate: DefaultDataFactory.namedNode("http://xmlns.com/foaf/0.1/name"),
  object: DefaultDataFactory.literal(`"s"@en`),
  graph: DefaultDataFactory.defaultGraph()
})

console.log({ a: aMatcher.size, total: dataset.size })

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"B"@en`)
})

console.log({ a: aMatcher.size, total: dataset.size })

dataset.add({
  ...aNameMatch,
  object: DefaultDataFactory.literal(`"C"@en`)
})

console.log({ a: aMatcher.size, total: dataset.size })
console.log({ aObjects: aMatcher.toArray().map(({ object }) => object) })
```

This snippet outputs:

```console
{ a: 1, total: 2 }
{ a: 2, total: 3 }
{ a: 3, total: 4 }
{
  aObjects: [
    LiteralImplementation {
      termType: 'Literal',
      value: 'A',
      language: 'en',
      datatype: [NamedNodeImplementation]
    },
    LiteralImplementation {
      termType: 'Literal',
      value: 'B',
      language: 'en',
      datatype: [NamedNodeImplementation]
    },
    LiteralImplementation {
      termType: 'Literal',
      value: 'C',
      language: 'en',
      datatype: [NamedNodeImplementation]
    }
  ]
}
```

I have implemented these datasets as sync as I am only wanting to know whats _in memory right now_, not _whats available in this remote dataset_

I did this because if you wanted to import information from a remote dataset, you should utilise `import` if you have an async iterable (Node.js ReadableStream, MongoDB cursors, etc), or `addAll` if you have another _in memory dataset_ or an iterable (Arrays, Sets, etc)





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implementation view of spec #58

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implementation view of spec #58

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions