- 
                Notifications
    You must be signed in to change notification settings 
- Fork 1.1k
Enable custom aggregate functions (take 2) #529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| cc @lovasoa | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, thank you !
A few things:
- Can we avoid calling initin create_aggregate itself ? I think this is a very surprising behavior for the programmer.initshould be called every time the aggregate function is called.
- Can we make the step function return the new state, like the standard reduce ? This would avoid the awkward function init() { return {sum: 0} }
- Taking three functions as arguments makes the code using create_aggregate a little bit hard to read. Can we take an object instead ?
db.create_aggregate("json_array", {
    init: () => [],
    step: (state, value) => [...state, value],
    finalize: state => JSON.stringify(state)
})- Can we make init and finalize optional ? init should default to () => nulland finalize tostate => state. In the end we should be able to call
db.create_aggregate("js_sum", { step: (a,b) => a+b })- Can we add memory leak tests like the ones we have in test/test_functions_recreate.js
| 
 Yes, thank you! Bad mental model striking there. I like all your proposed changes, will do. | 
| A question: when an instance of  That is, if I say:  I'm assuming I can? If not, storing the state is tricky, because I need to account for this scenario: 
 I would try to answer this for myself but the web platform is a really tricky runtime so I'm asking in case you know off hand? | 
| The way that proper sqlite3 extensions do it is by hanging their state off the execution context like so. Currently,  | 
| 
 I don't think so? We're not the ones calling the step function, that's sqlite doing it 
 We can make  We can't make  (Maybe I'm not being clever enough and there is a way for both!) | 
Basically it seems that the sqlite extension pattern of 'allocate a struct and stick it in the context pointer' is not going to work for us here. I wonder if using the id of the pointer returned by sqlite3_aggregate_context would be enough? Since no two functions could use the same pointer, per https://www.sqlite.org/c3ref/aggregate_context.html ?
| I managed to make  This branch is WIP status, and I apologize for the noise on it. I will look into getting it down to a single function argument tomorrow. | 
| About defaults for init and finalize: Can't we just have create_aggregate_func start like that : function create_aggregate_func<T>(
   name: string,
   methods: {
       init?: () => T,
       step: (state: T, value: any) => T,
       finalize?: (state: T) => any 
   }
) {
    let init = methods.init || (() => null);
    let step = (state: T, value: any) => { 
        state = methods.step(state, value);
        __update_sqlite_internal_state(state);
    };
    let finalize = methods.finalize || (state => state);
    ...
}About thread-safety: there is no such thing as multi-threading with concurrent access in javascript. The javascript runtime guarantees that there is a single execution thread that can access mutable javascript objects. That said, even within a single thread, there can be multiple aggregate functions that are at various stages of there execution simultaneously. The following has to work: SELECT js_agg_fn(x),  js_agg_fn(y) FROM t;A test case like this one should probably be added to the tests too. | 
| Having to return values from  db.create_aggregate(
    "json_agg", {
        step: function(state, val) { state = (state || []); state.push(val); return state; },
        finalize: function(state) { return JSON.stringify(state); }
    }
);And with modifying state (and with state defaulting to  db.create_aggregate(
    "json_agg", {
        step: function(state, val) { state.push(val); },
        finalize: function(state) { return JSON.stringify(state); }
    }
);I think we might even want to make  
 (I also wonder if there's efficency trickery we could do to accumulate values in a typed array?) edit: standard deviation is another example of an aggregation that can be completed without accumulating all values | 
| 
 No, that's ugly, you wouldn't write it like that ! 🙂  This is a very common pattern in javascript, in many state-management libraries. You would write it as  
 Let's just stick to what developers are used to. Reduce (sometimes called fold) is a very common pattern, and users will appreciate it if you don't try to reinvent the wheel here.  | 
| Can we also add a test with 
 | 
| 
 | 
| You can see from the changes in ac548d4 how much that simplifies the step functions | 
| OK, I think this is ready for re-review now. Thanks for working through this with me! | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me ! Can you just fix the documentation, I'll read it one last time, and we'll merge :)
Co-authored-by: Ophir LOJKINE <[email protected]>
* initial commit. * documentation * remove no-longer-valid type * close over state initialization for performance * link documentation in comment * more testing * run tests if they're main * accept a single arg * this kind of works but I'm abandoning this branch Basically it seems that the sqlite extension pattern of 'allocate a struct and stick it in the context pointer' is not going to work for us here. I wonder if using the id of the pointer returned by sqlite3_aggregate_context would be enough? Since no two functions could use the same pointer, per https://www.sqlite.org/c3ref/aggregate_context.html ? * a middle road sqlite3_agg_context solution * try out auto-updating state * improve quantile test, add multiple agg test * add a null to the test * acorn fails to parse ||=, whatever * make eslint happy * make initial_value an argument * test step and finalize exceptions * add memory leak test * update docs to current interface * delete state in exception handlers * remove null state * return init function and document object * more tests and update back to init function * update redefinition test for new interface * update README to match fixed signature * more consistent test formatting * Update README.md Co-authored-by: Ophir LOJKINE <[email protected]> * clarify what exactly the result will contain * Update README.md * Update README.md * Update README.md * Update README.md * Update README.md * Improve documentation and type annotations * ignore documentation in eslintrc * reduce code size Thanks a lot, @llimllib ! Co-authored-by: dogquery <> Co-authored-by: Ophir LOJKINE <[email protected]>
This PR builds on #407, updating it to current HEAD, adding more tests, and documenting it.
This PR closes #204, and is composed mostly of the work that @AnyhowStep did so long ago in that thread.
#407 notes that aggregate window functions do not work, but I think that's a different issue that should be tackled in another PR.
The full list of changes follows:
create_aggregateparseFunctionArgumentsinto a function that can be used by bothcreate_aggregateandcreate_functionsetFunctionResultsinto a functionsqlite3_aggregate_contextfunctioncreate_aggregatefunctioncreate_aggregate(function_name, initial_value, { step: (state, ...value) =>, finalize: (state) =>})create_aggregatecallssqlite3_aggregate_contextto allocate 1 byte of memory per instance of the aggregate function, which it uses as a key for its state array.sqlite3_aggregate_contextdocumentation