state_of_nv_child_fatalities

TLDR; All NV Child Fatality PDFs -> 1 CSV per county.

This serves as a first-step of Nevada Child Fatality pdfs and moves into a consistent csv schema across counties in Nevada.

Env setup (recommended but not required)

$ conda create --name child_nv python=3.10      # create python3.10 virtual env
$ conda activate child_nv                       # turn on virtual env
$ pip install tox                               # install tox
$ tox                                           # run tox: installs requirements.txt & runs pytests

Local Run

CONFIG.py - can change values like SCRAPE_YEARS to additional years
$ python main.py # run primary pdf scraper per county (don't need virtual env to run this)

Directory Structure

├── state_of_nv_child_fatalities
      ├───config                              # hardcoded urls, paths and values
          ├─── CONFIG.py                      # Change values in here
      ├───output_files                        # output csv and pdfs
      ├───research                            # original exploration and debugging
      ├───scripts                             # actual python code
          ├─── child_fatality_scraper.py      # core file
          ├─── prior_history.py               # gets number of past calls on child
      ├───tests                               # unit tests of python functions
      ├── requirements.txt                    # package requirements
      ├── tox.ini                             # Multi-environment Python testing
      └── Dockerfile                          # Docker build image, run tox, execute main.py

Example:
Clark County (Las Vegas) original URL for all pdfs

Example Single PDF: Near fatality marijuana access

Output Table for Clark.
output_files/child_fatality_Clark.csv

Still in DEV below this line for GCP/AWS workflow.

Docker Instructions

Docker Run:
$ docker build -t child_fatalities .                                      # build docker image
$ docker run -d --name child_fatalities_container child_fatalities        # run container & actual python
$ docker ps                                                               # confirm you see container running
$ docker exec -it child_fatalities_container /bin/bash                    # look inside container
    $ cd output_files && ls                                               # view output csv files after container finishes
$ docker logs child_fatalities_container                                  # look at internal print statements/logs

Docker Reset

WARNING: Only do this to start from scratch, it will delete ALL images & containers.

Stop & delete containers:
$ docker stop $(docker ps -aq)
$ docker rm $(docker ps -aq)

Remove all images:
$ docker rmi $(docker images -aq)

Remove any dangling/unused images:
$ docker system prune -a -f

Upload to Google Cloud Platform (GCP)

Note: Google's Container Registry is being deprecated so now images will now live in Artifact Registry.

1. Create a .env file at the project root
2. Create & set 3 env variables inside .env: GCP_PROJECT_ID, GCP_SERVICE_ACCOUNT_KEY_PATH, TAG (example: TAG="v1.0")
      2a. You will need privileges from admin to access these.
3. Run bash script
$ ./build_and_push.sh

    Note: If you get a permission error run: 
    $ chmod +x build_and_push.sh
    $ ./build_and_push.sh

4. Answer "yes" to any potential prompts
5. Check the image uploaded successfully: GCP -> Artifact Registry -> Repositories -> gcr.io.

Run inside GCP (roughly)

Containerize the application as in the "Upload to Google Cloud Platform" section above.
Configure a Cloud Run Service inside GCP
Deploy Cloud Run Service.
Create Cloud Function and select "Cloud Run" trigger; select your Cloud Run Service.
Set a scheduler job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

state_of_nv_child_fatalities

Env setup (recommended but not required)

Local Run

Directory Structure

Still in DEV below this line for GCP/AWS workflow.

Docker Instructions

Docker Reset

Upload to Google Cloud Platform (GCP)

Run inside GCP (roughly)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
aws_scripts		aws_scripts
config		config
gcp_scripts		gcp_scripts
output_files		output_files
research		research
scripts		scripts
tests		tests
.gcloudignore		.gcloudignore
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
main.py		main.py
requirements.txt		requirements.txt
tox.ini		tox.ini

kevinkurek/state_of_nv_child_fatalities

Folders and files

Latest commit

History

Repository files navigation

state_of_nv_child_fatalities

Env setup (recommended but not required)

Local Run

Directory Structure

Still in DEV below this line for GCP/AWS workflow.

Docker Instructions

Docker Reset

Upload to Google Cloud Platform (GCP)

Run inside GCP (roughly)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages