Conversation
| ) | ||
| # NaN members are zeroed so their contributions to β_0 and β_1 are 0. | ||
| β_0 = B.sum(fct, axis=-1) / M_eff | ||
| β_1 = B.sum(fct * B.arange(0, M), axis=-1) / (M_eff * (M_eff - 1.0)) |
There was a problem hiding this comment.
I assume we would need to replace B.arange(0, M) with B.arange(0, M_eff), since we're only using the non-nans to calculate the score?
There was a problem hiding this comment.
We could check whether this is necessary by testing whether the crps is the same with the "fair" and "pwm" estimators when nan_policy = "omit"
| if nan_mask is not None: | ||
| # Assumes the ensemble is sorted with NaN members zeroed at the end. | ||
| M_eff = B.sum(B.where(nan_mask, B.asarray(0.0), B.asarray(1.0)), axis=-1) | ||
| alpha = B.arange(1, M + 1) - 0.5 # shape (M,) |
There was a problem hiding this comment.
Similarly here, possibly we should use M_eff instead of M. This becomes a bit messier though, since alpha would need to be different lengths in different dimensions. Not sure what the best solution is yet.
This could be checked by testing whether these functions give the same output as the numba functions, since the corresponding gufunc uses M_eff.
There was a problem hiding this comment.
If I understand correctly, this is also the same issue when implementing this for the int estimator
| Default is False. | ||
| estimator : str | ||
| Indicates the CRPS estimator to be used. | ||
| nan_policy : {'propagate', 'omit', 'raise'}, default 'propagate' |
There was a problem hiding this comment.
I like these three options
| return e_1 - 0.5 * e_2 | ||
| if nan_mask is not None: | ||
| M_eff = B.sum(B.where(nan_mask, B.asarray(0.0), B.asarray(1.0)), axis=-1) | ||
| e_1 = ( |
There was a problem hiding this comment.
I assume I am overlooking something, but is there a reason for not defining a B.nansum function, and then just using this instead of B.sum? This would avoid needing to specify nan_mask, and would keep the code essentially the same as before. If the nan policy is "propagate", then we could just fill the resulting vector with nans in the correct places: e.g. something along the lines of
s = B.where(B.any(B.isnan(fct), axis=-1), B.nan, s)
or we could specify the nan_policy as an argument, and then use B.nansum or B.sum depending on whether nan_policy is propagate or omit.
One issue arises when all ensemble members are nan, in which case np.nansum returns zero, I think. But this is handled already using
B.where(M_eff <=1, B.asarray(float("nan")), result)
There was a problem hiding this comment.
An alternative straightforward approach would be to use the weighted ensemble functions, and set the weight of the nan ensemble members to zero. This would arguably not be the most computationally efficient approach, but this would require only very small changes to the code, and could also readily be used for the energy score, variogram score, kernel scores, etc.
e.g. by setting ens_w = B.where(B.isnan(fct), 0.0, ens_w)
| / M_eff | ||
| ) | ||
| result = e_1 - 0.5 * e_2 | ||
| return B.where(M_eff == 0, B.asarray(float("nan")), result) |
There was a problem hiding this comment.
Minor comment, but M_eff == 0 is used here, whereas M_eff <= 1 is typically used above. Is there a reason why these are different?
| return e_1 - 0.5 * e_2 | ||
| if nan_mask is not None: | ||
| M_eff = B.sum(B.where(nan_mask, B.asarray(0.0), B.asarray(1.0)), axis=-1) | ||
| fw = B.where(nan_mask, B.asarray(0.0), fw) |
There was a problem hiding this comment.
I think this could actually be simplified by just setting fw = B.where(nan_mask, B.asarray(0.0), fw), renormalising, and then calculating the owcrps with these updated weights. This would simply put zero weight on the ensemble members that are nans. The same should also hold for the vrcrps below.
| for i, forecast in enumerate(fct): | ||
| if i == 0: | ||
| i = M - 1 | ||
| i = M |
| def apply_nan_policy_ens_uv(obs, fct, nan_policy="propagate", backend=None): | ||
| """Apply NaN policy to univariate ensemble forecasts (fct shape: ..., M). | ||
|
|
||
| For 'propagate': no-op, returns (obs, fct, None). |
There was a problem hiding this comment.
Perhaps not immediately clear from the docstring that the function will return (obs, fct, nan_mask). I was initially a bit confused about what the extra None was for for propogate
There was a problem hiding this comment.
Yep, type hints are missing.
| "NaN values encountered in input. " | ||
| "Use nan_policy='propagate' or nan_policy='omit' to handle NaN values." | ||
| ) | ||
| return obs, fct, None |
There was a problem hiding this comment.
Why are the obs and forecasts still returned in this case?
There was a problem hiding this comment.
Because we have to when the policy is "omit", and we cannot always adjust the number of expected return arguments whenever we call this function based on the policy. The signature must stay the same.
| The array backend uses a roll-based masking approach for the circular kernel, | ||
| which masks out pairs involving NaN members but also loses the wrap-around pair | ||
| when NaN sits at the tail. This means the array-backend result does NOT match | ||
| the clean-ensemble result (a known limitation). The numba gufuncs correctly |
There was a problem hiding this comment.
Ah yes, interesting point. I don't see a nice way round this
| _FCT_CLEAN = np.array([0.1, 0.3, 0.7, 0.9, 1.1]) | ||
| _FCT_WITH_NAN = np.array([0.1, np.nan, 0.3, np.nan, 0.7, 0.9, 1.1]) | ||
| # NaN members of _FCT_WITH_NAN replaced by _OBS — same shape, no NaN, used for | ||
| # building 2-row batches where one row is clean and the other contains NaN. |
There was a problem hiding this comment.
I don't understand this yet. Setting the nans to the obs will not change the first term of the CRPS, since the differences will all be zero, but it will change the second term.
No description provided.