SepSpider: Spider for UCAS SEP platform

SepSpider is a spider based on scrapy and selenium for automatically crawling and updating UCAS class resources. It's also suitable for server deployment.

Requirements

Linux (Ubuntu recommended)
Python3
- scrapy
- pillow
- selenium
Firefox

Usage

Set up account info at sep_spider/custom_setting.py.

# sep user info
SEP_USER = 'your sep account (email)'
SEP_PASSWD = 'your password'

# The spider use Yundama for Captcha recognition.
# If you don't have one, create an User account (NOT a Developer one) at:
# http://www.yundama.com/
# and add some credits to it.

# yundama user info
YDM_INFO = {
    'username': 'yundama username',
    'password': 'yundama password',
    'appid': 8741,
    'appkey': 'c2f7b93e58d4721079da3822c7aad5d4'
}

# where to store you file
CUSTOM_FILES_STORE = 'sep/'

# touch reload path used by listener
RELOAD_PATH = './reload'

Yundama account is needed for Captcha recognition. Create an User (NOT Developer) account at http://www.yundama.com/ and add a little credits to it. 5 RMB is enough for a year's use.

run spider

cd sep_spider
scrapy crawl sep_spider

The default location to store files is ./sep. It can be customized in sep_spider/custom_setting.py.

Listener usage

For server deployment, you can run listener.py to listen to the changes of reload file as the signal of recrawling. Recrawling will only download non-existing files. It won't modify existing files.

Set up your environment and account info (see above)
Run listener.py

python listener.py

Everytime you modify the reload file, the spider will recrawl the resources. The location of reload file is customizable in sep_spider/custom_setting.py.

touch reload

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
sep_spider		sep_spider
listener.py		listener.py
main.py		main.py
readme.md		readme.md
scrapy.cfg		scrapy.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SepSpider: Spider for UCAS SEP platform

Requirements

Usage

Listener usage

About

Uh oh!

Releases

Packages

Languages

Cothrax/sep_spider

Folders and files

Latest commit

History

Repository files navigation

SepSpider: Spider for UCAS SEP platform

Requirements

Usage

Listener usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages