Skip to content

KeepLearningFromSideProject/SimpleComicCrawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

A simple comic crawler to crawl comic from https://comicbus.com/.

Code Structure

To execute

  1. Create a request file to download comics, example:
{
    "食戟之靈": {
        "01話":[],
        "02話":[],
        "03話":[],
        "04話":[]
    }
}
  1. Create a db file to restore the comic's image url, example:
{
}
  1. Execute the sample script

Scrape comic website directly

$ ./src/basic_main.py [request_file.json] scripts [db.json]

Scrape comic website through a well managed worker

a. download and follow the instruction of the README.md at schedular to start a worker service

b. run the script for worker of schedular

$ ./src/worker_main.py [request_file.json] scripts [db.json] http://0.0.0.0:5000/execute
# Notice that the url 'http://0.0.0.0:5000/execute' is the url for "shedular" to run as default,
# please just set this argument based on your real environment.

About

A simple crawler to crawl comics from comicbus.com.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •