Skip to content

Conversation

@oschulz
Copy link
Contributor

@oschulz oschulz commented May 12, 2024

Adds several things to ElasticManager:

  • An callback option - this can be used to automatically run init code on new workers, add them to and remove them from worker pools, add custom logging when workers connect, etc.

  • More debug logging - often necessary to find out what's wrong if workers won't connect.

  • Add a mechanism to forward environment variables to workers. Havent' found a way to set them before the Julia worker process starts up, but at least sets them before it does anything.

I'm field-testing this via a local copy of ElasticManager in ParallelProcessingTools.jl (will release a new version soon) so I can make breaking changes still if necessary, but I'll keep this PR in sync to upstream once it seems fully stable (looking pretty good so far, so hopefully soon).

oschulz added 3 commits May 7, 2024 16:18
* Add callback mechanism. Allows users to automatically initialize
  new workers, add workers to a given worker pool, etc.

* Make it easy to set worker timeout.

* Add debug logging, often necessary to figure out worker connection
  problems.
Revise has Distributed support, workers shouldn't run Revise separately.
@oschulz
Copy link
Contributor Author

oschulz commented May 12, 2024

CC @JBlaschke , thanks for pointing out the potential of ElasticManager to me.

@oschulz
Copy link
Contributor Author

oschulz commented Jul 13, 2024

Will take a bit longer before I upstream the ElasticManager changes from ParallelProcessingTools, I want to see if there's a clean way to handle network device selection and if that requires interface changes.

@oschulz
Copy link
Contributor Author

oschulz commented Jan 2, 2025

@DilumAluthge , sorry, I neglected this a bit, I should really get on with getting this release-ready.

@DilumAluthge
Copy link
Member

@oschulz We currently do not have a maintainer for the ElasticManager functionality in this package.

Do you actively use the ElasticManager functionality? If so, would you be interested in becoming the maintainer for the ElasticManager functionality?

@DilumAluthge
Copy link
Member

Also @oschulz it looks like there are some merge conflicts here.

Could you rebase this PR and fix the merge conflicts?

@oschulz
Copy link
Contributor Author

oschulz commented Feb 10, 2025

Do you actively use the ElasticManager functionality?

Yes, we do, quite actively, but currently the experimental version in ParallelProcessingTools. The plan is still to re-upstream it though.

I'll rebase and test an get on with this - gimme a bit.

If so, would you be interested in becoming the maintainer for the ElasticManager functionality?

Sure, I can take that over.

@DilumAluthge
Copy link
Member

For the other cluster managers (e.g. Slurm and LSF), I've moved the managers out to separate packages (SlurmClusterManager.jl and LSFClusterManager.jl), with the idea being that each manager has different maintainers, tests, CI, etc.

What do you think about moving the elastic manager out to a new standalone package, e.g. ElasticClusterManager.jl?

@oschulz
Copy link
Contributor Author

oschulz commented Feb 12, 2025

What do you think about moving the elastic manager out to a new standalone package, e.g. ElasticClusterManager.jl?

I'd be all for it! We have to release a ClusterManagers v2.0 then though, right?

@DilumAluthge
Copy link
Member

I'd be all for it! We have to release a ClusterManagers v2.0 then though, right?

Yep, which I'll need to do anyway once I remove Slurm from this package.

@oschulz
Copy link
Contributor Author

oschulz commented Feb 12, 2025

Yep, which I'll need to do anyway once I remove Slurm from this package.

Ok, that's perfect then. Because I can then upstream my changes to ElasticClusterManager directly - I was hestiant to do that because I suspected I might need to do more breaking changes. But if ElasticClusterManager has it's own version number, it's easy.

@DilumAluthge
Copy link
Member

DilumAluthge commented Feb 16, 2025

I created the new repo:

@oschulz I've invited you to the repo: https://github.com/JuliaParallel/ElasticClusterManager.jl

You can accept the invitation here: https://github.com/JuliaParallel/ElasticClusterManager.jl/invitations

@oschulz
Copy link
Contributor Author

oschulz commented Feb 16, 2025

@oschulz I've invited you to the repo. You can accept the invitation here:

Thanks!

@DilumAluthge
Copy link
Member

DilumAluthge commented Mar 23, 2025

For those following this thread: We will be moving this PR to the new package (ElasticClusterManager.jl).

@oschulz
Copy link
Contributor Author

oschulz commented Mar 23, 2025

See JuliaParallel/ElasticClusterManager.jl#5

@oschulz oschulz closed this Mar 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants