Skip to content

Conversation

dopplershift
Copy link
Member

Description Of Changes

When parsing a part of the file fails, allow parsing to continue with the remaining parts.

Checklist

@dopplershift dopplershift requested a review from a team as a code owner September 25, 2025 20:57
@dopplershift dopplershift added the Type: Enhancement Enhancement to existing functionality label Sep 25, 2025
@dopplershift dopplershift requested review from dcamron and removed request for a team September 25, 2025 20:57
@dopplershift dopplershift added the Area: IO Pertains to reading data label Sep 25, 2025
When parsing a part of the file fails, allow parsing to continue with
the remaining parts.
@akrherz
Copy link
Contributor

akrherz commented Sep 26, 2025

FWIW, a 2000 example resulting in an IndexError
CODSUS.txt

  File "/home/akrherz/projects/MetPy/src/metpy/io/text.py", line 140, in parse_wpc_surface_bulletin
    boundary = LineString(boundary) if len(boundary) > 1 else boundary[0]
                                                        ~~~~~~~~^^^
IndexError: list index out of range

@akrherz
Copy link
Contributor

akrherz commented Sep 26, 2025

More FWIW, I have ~7_000 CODSUS products for 2024 and just one fails with current Metpy main branch :)
CODSUS_2024.txt

  File "/home/akrherz/projects/MetPy/src/metpy/io/text.py", line 139, in parse_wpc_surface_bulletin
    boundary = [Point(_decode_coords(point)) for point in boundary]
                      ~~~~~~~~~~~~~~^^^^^^^
  File "/home/akrherz/projects/MetPy/src/metpy/io/text.py", line 59, in _decode_coords
    lat = float(f'{lat[:2]}.{lat[2:]}') * flip
          ~~~~~^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: could not convert string to float: '.'

And here's a script to run them in bulk

import sys
import traceback
from io import BytesIO

import httpx
from metpy.io import parse_wpc_surface_bulletin
from tqdm import tqdm


def main(argv):
    """Go Main Go."""
    year = int(argv[1])
    resp = httpx.get(
        "https://mesonet.agron.iastate.edu/cgi-bin/afos/retrieve.py?"
        f"sdate={year}-01-01&edate={year + 1}-01-01&limit=9999"
        "&pil=CODSUS&fmt=text",
        timeout=60,
    )
    failure = 0
    progress = tqdm(resp.content.split(b"\003"))
    for prod in progress:
        progress.set_description(f"Failures: {failure}")
        bio = BytesIO(prod)
        try:
            parse_wpc_surface_bulletin(bio, year=year)
        except Exception:
            traceback.print_exc()
            with open(f"CODSUS_fail_{failure:04.0f}.txt", "wb") as fh:
                fh.write(prod)
            failure += 1

if __name__ == "__main__":
    main(sys.argv)

@dopplershift
Copy link
Member Author

Thanks @akrherz ! That's really helpful and we can definitely include a few more fixes here.

@dopplershift dopplershift marked this pull request as draft September 26, 2025 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area: IO Pertains to reading data Type: Enhancement Enhancement to existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

invalid characters in CODSUS text causes parse_wpc_surface_bulletin to fail

2 participants