logo头像
Snippet 博客主题

scrapy shell的使用(三)

一.scrapy的基本命令

scrapy shell命令

查看所有scrapy所有命令
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles/teach/scrapy/basic/example  
╰─➤ scrapy --help
Scrapy 1.7.1 - project: example

Usage:
scrapy <command> [options] [args]

Available commands:
bench Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

Use "scrapy <command> -h" to see more info about a command

由上面的命令可以看出scrapy有上面的这些命令

1
2
3
4
5
6
7
8
9
10
11
12
13
14
bench         Run quick benchmark test
check Check spider contracts
crawl Run a spider
edit Edit spider
fetch Fetch a URL using the Scrapy downloader
genspider Generate new spider using pre-defined templates
list List available spiders
parse Parse URL (using its spider) and print the results
runspider Run a self-contained spider (without creating a project)
settings Get settings values
shell Interactive scraping console
startproject Create new project
version Print Scrapy version
view Open URL in browser, as seen by Scrapy

注:运行scrapy –help命令需要在scrapy命令创建的项目下执行,不然scrapy命令只会显示一部分。
scrapy的命令具体用法

由上面命令提示可知,每个命令后面加上-h就可以知道命令具体用法

1
2
3
4
5
6
7
8
9
10
11
12
13
14
scrapy bench -h         
scrapy check -h
scrapy crawl -h
scrapy edit -h
scrapy fetch -h
scrapy genspider -h
scrapy list -h
scrapy parse -h
scrapy runspider -h
scrapy settings -h
scrapy shell -h
scrapy startproject -h
scrapy version -h
scrapy view -h

scrapy version

scrapy版本号查看

scrapy version -v

1
2
3
4
5
6
7
8
9
10
11
12
13
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles  
╰─➤ scrapy version -v
Scrapy : 1.7.1
lxml : 4.3.4.0
libxml2 : 2.9.9
cssselect : 1.0.3
parsel : 1.5.1
w3lib : 1.20.0
Twisted : 16.6.0
Python : 3.6.1 (default, Jun 20 2019, 10:32:22) - [GCC 6.1.0]
pyOpenSSL : 18.0.0 (OpenSSL 1.1.0i 14 Aug 2018)
cryptography : 2.3.1
Platform : Linux-3.10.0-957.5.1.el7.x86_64-x86_64-with-centos-7.6.1810-Core

scrapy startproject <项目名称>

创建一个scrapy项目命令

scrapy startproject picture

1
2
3
4
5
6
7
8
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles/teach/scrapy/basic  
╰─➤ scrapy startproject picture
New Scrapy project 'picture', using template directory '/home/lzq/PycharmProjects/python_reptiles/venv/lib/python3.6/site-packages/scrapy/templates/project', created in:
/home/lzq/PycharmProjects/python_reptiles/teach/scrapy/basic/picture

You can start your first spider with:
cd picture
scrapy genspider example example.com

scrapy genspider <scrapy名称> <域名>

生成一个spider

在创建的项目下执行下面的命令

1
2
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles/teach/scrapy/basic  
╰─➤ scrapy genspider jd_spider jd.com

scrapy genspider spider名字 域名

scrapy list

查看项目里有多少个spider

1
2
3
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles/teach/scrapy/basic/picture  
╰─➤ scrapy list 127 ↵
jd_spider

scrapy edit <scrapy名称>

编辑spider(scrapy edit scrapy_name)

scrapy edit scrapy名称

1
2
(venv) ╭─lzq@localhost.localdomain ~/PycharmProjects/python_reptiles/teach/scrapy/basic/picture  
╰─➤ scrapy edit jd_sipder

这个命令没啥卵用和vim差不多

scrapy fetch <url路径>

1
scrapy fetch http://jd.com

这个命令就是下载url的页面代码效果和urllib,wget差不多.

scrapy crawl <spider名称>

scrapy crawl <spider名称>

这个命令执行spider

1
scrapy crawl jd_sipder

scrapy runspider <文件>

scrapy runspider <文件>

1
scrapy runspider stackoverflow_spider.py

scrapy shell <url地址>

1
scrapy shell http://jd.com
微信打赏