-
Notifications
You must be signed in to change notification settings - Fork 8
RFC: pubOp for creating and updating pubs more easily #965
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
+ cleanup of api
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are helper methods to make matching pubvalues easier.
eg instead of doing
expect(pub.values).toHaveLength(3);
pub.values.sort(/* sorting bc you want the order of the values to be stable over different tests */)
expect(pub.values).toMatchObject([
...
]);you just do
expect(pub).toHaveValues([...])Similarly for
expect(pubId).toExist()just easier than manually looking up the pub yourself
| it("should handle complex nested relation scenarios", async () => { | ||
| const trx = getTrx(); | ||
| // manual rollback try/catch bc we are manually setting pubIds, so a failure in the middle of this will leave the db in a weird state | ||
| try { | ||
| // Create all pubs with meaningful IDs | ||
| const pubA = "aaaaaaaa-0000-0000-0000-000000000000" as PubsId; | ||
| const pubB = "bbbbbbbb-0000-0000-0000-000000000000" as PubsId; | ||
| const pubC = "cccccccc-0000-0000-0000-000000000000" as PubsId; | ||
| const pubD = "dddddddd-0000-0000-0000-000000000000" as PubsId; | ||
| const pubE = "eeeeeeee-0000-0000-0000-000000000000" as PubsId; | ||
| const pubF = "ffffffff-0000-0000-0000-000000000000" as PubsId; | ||
| const pubG = "11111111-0000-0000-0000-000000000000" as PubsId; | ||
| const pubH = "22222222-0000-0000-0000-000000000000" as PubsId; | ||
| const pubI = "33333333-0000-0000-0000-000000000000" as PubsId; | ||
| const pubJ = "44444444-0000-0000-0000-000000000000" as PubsId; | ||
| const pubK = "55555555-0000-0000-0000-000000000000" as PubsId; | ||
| const pubL = "66666666-0000-0000-0000-000000000000" as PubsId; | ||
|
|
||
| // create the graph structure: | ||
| // A J | ||
| // / \ | | ||
| // / \ | | ||
| // B C --> I | ||
| // | / \ | ||
| // G --> E D | ||
| // / \ | ||
| // F H | ||
| // / \ | ||
| // K --> L | ||
|
|
||
| // create leaf nodes first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is by far the most complex test, and very curious whether you agree with whether this should be the behavior.
i spent a bit too long trying to figure this out haha
see my comment in pub-op.ts for a more in depth explanation about the "algorithm" for determining orphans
| /** | ||
| * remove pubs that have been disconnected/their value removed, | ||
| * has `deleteOrphaned` set to true for their relevant relation operation, | ||
| * AND have no other relations | ||
| * | ||
| * curently it's not possible to forcibly remove pubs if they are related to other pubs | ||
| * perhaps this could be yet another setting | ||
| * | ||
| * ### Brief explanation | ||
| * | ||
| * Say we have the following graph of pubs, | ||
| * where `A --> C` indicates the existence of a `pub_value` | ||
| * ```ts | ||
| * { | ||
| * pubId: "A", | ||
| * relatedPubId: "C", | ||
| * } | ||
| * ``` | ||
| * | ||
| * ``` | ||
| * A J | ||
| * ┌──┴───┐ │ | ||
| * ▼ ▼ ▼ | ||
| * B C ────────► I | ||
| * │ ┌─┴────┐ | ||
| * ▼ ▼ ▼ | ||
| * G ─► E D | ||
| * │ │ | ||
| * ▼ ▼ | ||
| * F H | ||
| * ┌─┴──┐ | ||
| * ▼ ▼ | ||
| * K ──► L | ||
| * ``` | ||
| * | ||
| * Say we now disconnect `C` from `A`, i.e. we remove the `pub_value` where `pubId = "A"` and `relatedPubId = "C"` | ||
| * | ||
| * | ||
| * Now we disrelate C from A, which should | ||
| * orphan everything from D down, | ||
| * but should not orphan I, bc J still points to it | ||
| * and should not orphan G, bc B still points to it | ||
| * it orphans L, even though K points to it, because K is itself an orphan | ||
| * ``` | ||
| * A J | ||
| * ┌──┴ │ | ||
| * ▼ ▼ | ||
| * B C ────────► I | ||
| * │ ┌─┴────┐ | ||
| * ▼ ▼ ▼ | ||
| * G ─► E D | ||
| * │ │ | ||
| * ▼ ▼ | ||
| * F H | ||
| * ┌─┴──┐ | ||
| * ▼ ▼ | ||
| * K ──► L | ||
| * ``` | ||
| * | ||
| * Then by using the following rules, we can determine which pubs should be deleted: | ||
| * | ||
| * 1. All pubs down from the disconnected pub | ||
| * 2. Which are not reachable from any other pub not in the tree | ||
| * | ||
| * Using these two rules, we can determine which pubs should be deleted: | ||
| * 1. C, as C is disconnected is not the target of any other relation | ||
| * 2. D, F, H, K, and L, as they are only reachable from C, which is being deleted | ||
| * | ||
| * Notably, E and I are not deleted, because | ||
| * 1. E is the target of a relation from G, which, while still a relation itself, is not reachable from the C-tree | ||
| * 2. I is the target of a relation from J, which, while still a relation itself, is not reachable from the C-tree | ||
| * | ||
| * So this should be the resulting graph: | ||
| * | ||
| * ``` | ||
| * A J | ||
| * ┌──┴ │ | ||
| * ▼ ▼ | ||
| * B I | ||
| * │ | ||
| * ▼ | ||
| * G ─► E | ||
| * │ | ||
| * ▼ | ||
| * F | ||
| * ``` | ||
| * | ||
| * | ||
| */ | ||
| private async cleanupOrphanedPubs( | ||
| trx: Transaction<Database>, | ||
| orphanedPubIds: PubsId[] | ||
| ): Promise<void> { | ||
| if (orphanedPubIds.length === 0) { | ||
| return; | ||
| } | ||
|
|
||
| const pubsToDelete = await trx | ||
| .withRecursive("affected_pubs", (db) => { | ||
| // Base case: direct connections from the to-be-removed-pubs down | ||
| const initial = db | ||
| .selectFrom("pub_values") | ||
| .select(["pubId as id", sql<string[]>`array["pubId"]`.as("path")]) | ||
| .where("pubId", "in", orphanedPubIds); | ||
|
|
||
| // Recursive case: keep traversing outward | ||
| const recursive = db | ||
| .selectFrom("pub_values") | ||
| .select([ | ||
| "relatedPubId as id", | ||
| sql<string[]>`affected_pubs.path || array["relatedPubId"]`.as("path"), | ||
| ]) | ||
| .innerJoin("affected_pubs", "pub_values.pubId", "affected_pubs.id") | ||
| .where((eb) => eb.not(eb("relatedPubId", "=", eb.fn.any("affected_pubs.path")))) // Prevent cycles | ||
| .$narrowType<{ id: PubsId }>(); | ||
|
|
||
| return initial.union(recursive); | ||
| }) | ||
| // pubs in the affected_pubs table but which should not be deleted because they are still related to other pubs | ||
| .with("safe_pubs", (db) => { | ||
| return ( | ||
| db | ||
| .selectFrom("pub_values") | ||
| .select(["relatedPubId as id"]) | ||
| .distinct() | ||
| // crucial part: | ||
| // find all the pub_values which | ||
| // - point to a node in the affected_pubs | ||
| // - but are not themselves affected | ||
| // these are the "safe" nodes | ||
| .innerJoin("affected_pubs", "pub_values.relatedPubId", "affected_pubs.id") | ||
| .where((eb) => | ||
| eb.not( | ||
| eb.exists((eb) => | ||
| eb | ||
| .selectFrom("affected_pubs") | ||
| .select("id") | ||
| .whereRef("id", "=", "pub_values.pubId") | ||
| ) | ||
| ) | ||
| ) | ||
| ); | ||
| }) | ||
| .selectFrom("affected_pubs") | ||
| .select(["id", "path"]) | ||
| .distinctOn("id") | ||
| .where((eb) => | ||
| eb.not( | ||
| eb.exists((eb) => | ||
| eb | ||
| .selectFrom("safe_pubs") | ||
| .select("id") | ||
| .where(sql<boolean>`safe_pubs.id = any(affected_pubs.path)`) | ||
| ) | ||
| ) | ||
| ) | ||
| .execute(); | ||
|
|
||
| if (pubsToDelete.length > 0) { | ||
| await deletePub({ | ||
| pubId: pubsToDelete.map((p) => p.id), | ||
| communityId: this.options.communityId, | ||
| lastModifiedBy: this.options.lastModifiedBy, | ||
| trx, | ||
| }); | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is by far the most complex part of the PR, am curious if you think the general approach is correct! obviously very annoying to check the actual sql, but im more curious whether you think my definition of orphans here tracks
| protected async executeWithTrx(trx: Transaction<Database>): Promise<PubsId> { | ||
| const operations = this.collectOperations(); | ||
|
|
||
| await this.createAllPubs(trx, operations); | ||
| await this.processStages(trx, operations); | ||
| await this.processRelations(trx, operations); | ||
| await this.processValues(trx, operations); | ||
|
|
||
| return this.id; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is basically entirely what a pubOp can do. First create all pubs, add them to stages, do relation stuff, then do value stuff (those last two might be able to be done at the same time, or changed)
notably, all of these things happen at the same time for all pubs. so no more recursive calls, we first collect all operations and then do these things one by one. much more easy to reason about that way i think
| if (nullStages.length > 0) { | ||
| await autoRevalidate( | ||
| trx.deleteFrom("PubsInStages").where( | ||
| "pubId", | ||
| "in", | ||
| nullStages.map(({ pubId }) => pubId) | ||
| ) | ||
| ).execute(); | ||
| } | ||
|
|
||
| const nonNullStages = stagesToUpdate.filter(({ stageId }) => stageId !== null); | ||
|
|
||
| if (nonNullStages.length > 0) { | ||
| await autoRevalidate( | ||
| trx | ||
| .with("deletedStages", (db) => | ||
| db | ||
| .deleteFrom("PubsInStages") | ||
| .where((eb) => | ||
| eb.or( | ||
| nonNullStages.map((stageOp) => eb("pubId", "=", stageOp.pubId)) | ||
| ) | ||
| ) | ||
| ) | ||
| .insertInto("PubsInStages") | ||
| .values( | ||
| nonNullStages.map((stageOp) => ({ | ||
| pubId: stageOp.pubId, | ||
| stageId: stageOp.stageId, | ||
| })) | ||
| ) | ||
| ).execute(); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully thisll be a bit simpler once #967 lands!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seperated this out into a separate file so you can createSeed while still having the automatic rollbacks work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah missed this earlier, awesome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please look here for usage!
im really curious what you all think. please comment any ideas/feedback you have for better naming or anything else regarding the api! really interested in anything, especially "i hate this" or something!
| import { CoreSchemaType, MemberRole } from "db/public"; | ||
|
|
||
| import type { Seed } from "~/prisma/seed/seedCommunity"; | ||
| import { createSeed } from "~/prisma/seed/createSeed"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
iirc importing createSeed up here did make data persist whereas importing just the type { Seed } didn't?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, but i think that was bc they were both imported from /seedCommunity. I separated out the createSeed to a different file, which (I did not explicitly check) i think should fix that issue.
i do think using createSeed is slightly better as that preserves the type info once you pass it to seedCommunity, eg if you have pubTypes: { 'SomepubType': ... } in your seed, the output of seedcommunity will have pubTypes.SomePubType as well
core/lib/__tests__/matchers.ts
Outdated
| expected: Partial<ProcessedPub["values"][number]>[] | ||
| ) { | ||
| if (typeof received === "string") { | ||
| throw new Error("toHaveValues() can only be called with a ProcessedPub"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how come received: PubsId | ProcessedPub up above instead of received: ProcessedPub then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch! it was left over bc i think i wanted to be able to pass in both in either case, and was having some trouble with the types. i later decided to just use PubsId for toExist and ProcessedPub for toHaveValues, but never updated it! I'll fix it
| const pub2 = await PubOp.upsert(crypto.randomUUID() as PubsId, { | ||
| communityId: seededCommunity.community.id, | ||
| pubTypeId: seededCommunity.pubTypes["Basic Pub"].id, | ||
| lastModifiedBy: createLastModifiedBy("system"), | ||
| }) | ||
| .relate(seededCommunity.pubFields["Some relation"].slug, "test relations value", pub.id) | ||
| .execute(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is quite nice! just so I understand, if pub2 were already created, would we also have to call .upsert(...).relate(...)? i.e. there's nothing like PubOp.relate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah correct!
You'd more likely do PubOp.update(pub2Id).relate() if you did not want to override the existing relations, but yes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for context, initially i did only have upsert, not create and update, but i thought it would be best to have separate ones bc
pubOp.upsert().unrelate()is kind of weird I think- I wanted to have
upsertmore act like aPUTthan aPATCHif the pub already exists, ie get rid of the existing values/relations by default if they are not mentioned in theupsert. But i ofc did want to keep the functionality to only update a subset of values, so we would need anupdate(or force consumers of this api to do a bunch of unnatural configuration, which is what i was doing before in feat: add proper pub upsert functions #914 ) - it's good to get an error if you are explicitly trying to create a pub that already exists, rather than silently update it when that's maybe not what you wanted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for the context, makes sense!
| /** | ||
| * remove pubs that have been disconnected/their value removed, | ||
| * has `deleteOrphaned` set to true for their relevant relation operation, | ||
| * AND have no other relations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it's a better idea to have this be a configuration setting on the PubType (or the pubField?)
like, for some relations this could be desireable, like versions and discussions in the current arcadia community. you probably want to delete all the versions of the pub if that pub is deleted.
but for others, like tag or something, you would not want this. you wouldn't want to delete a Tag pub if the you delete the last Pub that is relating to it, you probably want to keep it around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the UI i think we can just force the user to manually delete all the relations first, but i think it would be good in certain places to automatically decide this (import actions)
| lastModifiedBy: createLastModifiedBy("system"), | ||
| }) | ||
| .set(seededCommunity.pubFields["Title"].slug, "Test") | ||
| .relate(seededCommunity.pubFields["Some relation"].slug, "relation1", (pubOp) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that's really nice! 🙌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests look like they cover most of the PubOp behavior and the logic for pub orphans seems sound. Looking forward to tomorrow's chat about the recursive CTE.
Issue(s) Resolved
Adds a new PubOp system for creating, updating and managing relationships between Pubs using a fluent-like API/
High-level Explanation of PR
Note
This PR is an RFC! Meaning I'm totally happy not merging it!
I'm more interested in your general thoughts on this approach, please nitpick it to hell!
This PR introduces a new
PubOpsystem that provides a fluent, builder-style API for managing Pub operations. The system handles complex scenarios like creating multiple related pubs, managing nested relationships, and handling pub value updates in a single transaction.API
The API follows a builder pattern, allowing for chaining operations:
All builders
Currently there's 3 kinds of operations you can do:
createupdateupsertFor
create, there's also an explicitcreateWithId.For both
updateandupsert, there's anupdateByValueand anupsertByValue. These allow you to, instead of selecting a pub by id, select a pub by a specific value, eg a google drive id.Basic capabilities
Currently, the
PubOpallows you to do a bunch of things, here's a small table that shows what you can do with each of the operations:Set values
Set many values in one go
Relate pubs
Relate multiple pubs in one go
Nested relate
You can nest pub operations, allowing you to create, relate, and/or update multiple pubs in a single operation:
If you don't want to keep repeating
{ communityId: ..., lastModifiedBy: ... }, you also define a function insteadThis just returns a new
PubOpwithcommunityIdetc already set.You'll still need to provide the
pubTypeIdforupsertandcreateoperations.Test Plan
Look at the tests in
platform/core/lib/server/pub-op.db.test.ts
Lines 1 to 1329 in 64ebece
Write one extra test to get a feel for it (don't need to push it, but would be appreciated!)
Screenshots (if applicable)
Notes
Note
This new API isn't used anywhere yet, that comes later!