-
Notifications
You must be signed in to change notification settings - Fork 3
Description
Rasterization step logs errors for relatively few staged files compared to the total amount of staged files. Occurs for both the maximum z-level as well as parent z-levels according to log.log. For the lake size change data sample, the GeoPackage files that errored during a certain parsl run are not obviously corrupt, and when rasterization is applied again (outside of the workflow but still with parallelization) the files are successfully rasterized. See this documented here. This was also seen in the ray workflow with IWP dataset, for which Robyn noted:
IWP run update: Of 10,805,019 staged tiles, I managed to rasterize and transfer (to scratch) 10,741,400, which means 63,619 tiles (~0.5%) got lost along the way. This could be that some files didn’t transfer to scratch before the 24 hour job limit ran out, or it could be some other problem. I did see some warnings that implied that some geopackage files were corrupt. Since it’s overall a small percent that are missing, I am going to continue with the next steps so that we can visualize what we already have. We can always go back and compare the list of geotiff tiles to staged tiles to see which ones we are missing and try to rasterize just those ones.
It would be helpful to do more runs with various datasets to determine if the failing files are ever consistent, and if the errors are random (and can therefore be solved by just trying to rasterize these few staged files again with the same approach), or if the files are actually corrupt, or if it has to do with the files not transferring to scratch dir like Robyn suggested.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status