Skip to content

Conversation

@lkdvos
Copy link
Member

@lkdvos lkdvos commented Jul 6, 2025

This PR updates the Fsymbols and Rsymbols to be stored on disk as well.
The main motivation is that typically, the CGCs take up way more space than the actual F and Rsymbols, so it may be reasonable to have a single machine that precomputes a lot of them and then not bother with copying the (possible terrabytes of) CGCs.

For reference, I currently have:

CGC disk cache info:
====================
* SU(3) - Float64 - 3439 entries - 203.759 MiB
* SU(4) - Float64 - 10833 entries - 51.241 GiB
* SU(5) - Float64 - 1680 entries - 1.004 GiB

F disk cache info:
==================
* SU(3) - 22079 files - 424083 entries - 101.769 MiB
* SU(4) - 16468 files - 130003 entries - 37.697 MiB
* SU(5) - 5848 files - 18989 entries - 7.093 MiB

R disk cache info:
==================
* SU(3) - 116 files - 512 entries - 146.276 KiB
* SU(4) - 141 files - 488 entries - 159.426 KiB
* SU(5) - 149 files - 542 entries - 173.374 KiB

@lkdvos
Copy link
Member Author

lkdvos commented Jul 6, 2025

@LHerviou How do you feel about these changes? I am mostly wondering about how to organize the files, in the sense that it might be reasonable to simply store all Fs and all Rs each in a single file.
For the CGCs, I didn't feel comfortable doing that since these files get so large, but if our assumptions are correct than this shouldn't be the case here.
The only drawbacks I can think of is that this doesn't allow parallel read/write access, but I'm not sure if that ever was really an issue.
Additionally, if something happens and the file gets corrupted mid-write, that would also be quite unpleasant...

@codecov
Copy link

codecov bot commented Jul 6, 2025

Codecov Report

Attention: Patch coverage is 73.85621% with 40 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/caching.jl 71.22% 40 Missing ⚠️
Files with missing lines Coverage Δ
src/sector.jl 98.50% <100.00%> (ø)
src/caching.jl 76.95% <71.22%> (-9.22%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LHerviou
Copy link
Contributor

LHerviou commented Jul 7, 2025

@lkdvos Thanks for such a quick implementation.

I think a single file should be fine: read/write access should be much faster. It might be worth thinking if having F_N_abc and R_N_ab is not a better alternative -- though it does increase significantly the number of files. I am indeed worried about the corruption on clusters where you can have instabilities.

One or two quick checks/questions for parallel runs:

  • the code does check the storing file every time an unknown F is called?
  • the CGC do stay in memory during a single Julia run, they are just not saved on the disk.

@lkdvos
Copy link
Member Author

lkdvos commented Jul 7, 2025

Just to be clear: the current implementation uses separate files R_N_ab and F_N_abcd.
Do I understand correctly that you think it is better to just have a single file R_N and F_N?

Whenever an F or R or CGC is called, the callstack goes:

  1. check if we have this in memory
  2. check if we have this in file
  3. compute and store

where computing F and R asks for the CGC, which will again go through the same process.
So this implementation is definitely storing everything on disk, it's mainly meant as a first step towards having some precomputed set of F and R, such that the CGCs are no longer really needed.

I'll add a switch for turning disk storage on and off as well, so you can just select which ones you want to store persistently?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants