Skip to content

Conversation

@simlrh
Copy link
Contributor

@simlrh simlrh commented Oct 13, 2025

Adds a public function to the JS class for each external routine, which does the same getting & setting of the current word as stemWord.

I've left stemWord untouched for backwards compatibility - not sure if that's the right thing to do, it creates a reference to a non-existent #stem function if the program you're writing isn't a stemmer, although that's not a problem in practice.

The stemWord function is now added only if there is an external with the exact name stem.

Addresses one of the minor points in #244

@simlrh simlrh marked this pull request as draft October 13, 2025 11:04
@simlrh
Copy link
Contributor Author

simlrh commented Oct 13, 2025

Converting to draft as an invalid reference to the #stem function actually causes an error in some environments

@simlrh simlrh marked this pull request as ready for review October 13, 2025 11:29
@simlrh
Copy link
Contributor Author

simlrh commented Oct 13, 2025

Ok, above issue addressed. A public function is now generated for each external, and the stemWord function is added if and only if there is an external named stem.

@ojwb
Copy link
Member

ojwb commented Oct 13, 2025

We could also merge the code from #stem into stemWord (and similarly for the other external methods we generate) when the external isn't called from within the Snowball program. The language does allow calling an external like a routine, but it's unlikely to be done in practice (nothing we ship does it). We do something similar for Go and Rust already (search for This external needs to be callable as a routine).

Code minimisation or an optimising JS implementation may well do something equivalent, but it was easy to do in the Snowball compiler for Go and Rust at least so may be worthwhile doing here too.

Do you have a use case for multiple externals BTW? It certainly good to have them actually usable, but also useful to know if people actually have a current use for them.

w(g, "~Mreturn this.getCurrent();~N");
w(g, "~-~M}~N");

if (!strcmp((const char *) q->s, "stem")) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q->s isn't nul-terminated so use the same test as the other generators which special-case stem have:

if (SIZE(q->s) == 4 && memcmp(q->s, "stem", 4) == 0) {

(Your version probably worked in testing because newly allocated memory is typically zeroed by the OS so as not to reveal data from other processes, but relying on that is undefined behaviour.)

@simlrh
Copy link
Contributor Author

simlrh commented Oct 14, 2025

Do you have a use case for multiple externals BTW?

Yeah, I am writing a stemmer for a minority language, but snowball is great so I'm also using it for other string processing tools. Phonemiser, tokeniser, conjugating verbs. Also since (as I understand it) externals can only take one argument I'll likely have multiple externals for setting options, like regional variants for the phonemiser.

Will get to changes today, thanks for the review

@ojwb
Copy link
Member

ojwb commented Oct 14, 2025

If you're mostly needing getters/setters for variables, we could provide a way to generate those (you can write them yourself in Snowball for strings, but it's harder to for integers and booleans; even for strings, we can probably generate slightly more efficient ones for some target languages).

There already is a way to do this with the C generator (-vprefix) but it apparently isn't actively used as it was accidentally broken in Snowball 2.1.0 and I only happened to notice a few months ago while working on code in this area - there's a fix committed which will be in the next release.

-vprefix defines macros which provide access to the C struct members where the Snowball variables are stored, and just exposes all variables.

It'd be cleaner for the program to specify what to expose - e.g. externals could be extended to allow:

externals (
    stem
    booleans ( flag )
    integers ( variant )
    strings ( type )
)

Where flag, variant and type work in the usual way inside the Snowball code, but we'd also generate public getters and setters for them (or something equivalent which makes sense in the target language - e.g. it could use the existing -vprefix mechanism for C).

This would also mean the compiler would know not to try to localise these variables (there's work in progress to automatically detect Snowball variables which can be made local to one function in the generated target language code).

@ojwb
Copy link
Member

ojwb commented Oct 25, 2025

This would also mean the compiler would know not to try to localise these variables (there's work in progress to automatically detect Snowball variables which can be made local to one function in the generated target language code).

FYI, this variable localisation is now merged.

Will get to changes today, thanks for the review

Did you do this? I don't see a push since.

If it's just the single change I suggested and that looks good to you, I can just fold that in.

@simlrh
Copy link
Contributor Author

simlrh commented Oct 26, 2025

I haven't yet sorry, and there's a bug in my stemWord, I will spend some time first thing Monday finishing up.

@simlrh
Copy link
Contributor Author

simlrh commented Oct 27, 2025

Ok I've corrected the string comparison for "stem".

I looked at changing the way externals are generated based on whether they are called internally, but it was more complicated to implement than in Rust or Go since in JS the externals return a different type, and they require additional code at the beginning and end of the function. Complicating the code didn't seem worth it to save the overhead of a single function call.

@ojwb ojwb merged commit 5b5f559 into snowballstem:master Oct 27, 2025
44 checks passed
@ojwb
Copy link
Member

ojwb commented Oct 27, 2025

Complicating the code didn't seem worth it to save the overhead of a single function call.

Seems reasonable. If the code is minimised it'll probably eliminate that anyway, and an optimising JS interpreter may be able to. It happens for every word stemmed, so may be worth profiling at some point just to check if it's worth trying harder to change, but really the key thing is it should be no worse than before your patch (since stemWord = this.stem should just mean that stemWord is exactly the same function as before).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants