为什么我的Scrapy爬不出数据?
向各位老师请教,我在做一个爬虫,第一步是想爬下来所有股票的代码和名字,网址是 http://app.finance.ifeng.com/list/stock.php?t=ha&f=symbol&o=asc&p=1
我的items.py是这样的:
python
import scrapy class NameItem(scrapy.Item): code = scrapy.Field() name = scrapy.Field()
我的爬取脚本是这样的:
python
from scrapy.spider import BaseSpider from Stock.items import NameItem from scrapy.selector import Selector from scrapy.http import Request class StockNameSpider(BaseSpider): name = "stock_name" allowed_domains = ["http://app.finance.ifeng.com"] start_urls = ["http://app.finance.ifeng.com/list/stock.php?t=ha"] def parse(self, response): sel = Selector(response) links = sel.xpath('//*[@class= "tab01"]/table/tbody/tr') for link in links: code = link.xpath('td[1]/a/text()').extract() name = link.xpath('td[2]/a/text()').extract() nameitem = NameItem() nameitem['code'] = code[0] if code else None nameitem['name'] = name[0] if name else None yield nameitem
Xpath没有写错,在Shell已经测试过了
运行期间没有报任何错误,
下列是运行log
2015-02-19 20:22:49+0800 [scrapy] INFO: Scrapy 0.24.4 started (bot: Stock)
2015-02-19 20:22:49+0800 [scrapy] INFO: Optional features available: ssl, http11, boto
2015-02-19 20:22:49+0800 [scrapy] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'Stock.spiders', 'SPIDER_MODULES': ['Stock.spiders'], 'LOG_FILE': 'test.log', 'BOT_NAME': 'Stock'}
2015-02-19 20:22:50+0800 [scrapy] INFO: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, MetaRefreshMiddleware, HttpCompressionMiddleware, RedirectMiddleware, CookiesMiddleware, ChunkedTransferMiddleware, DownloaderStats
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware
2015-02-19 20:22:51+0800 [scrapy] INFO: Enabled item pipelines:
2015-02-19 20:22:51+0800 [stock_name] INFO: Spider opened
2015-02-19 20:22:51+0800 [stock_name] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2015-02-19 20:22:51+0800 [scrapy] DEBUG: Telnet console listening on 127.0.0.1:6023
2015-02-19 20:22:51+0800 [scrapy] DEBUG: Web service listening on 127.0.0.1:6080
2015-02-19 20:22:51+0800 [stock_name] DEBUG: Crawled (200) < GET http://app.finance.ifeng.com/list/stock.php?t=ha > (referer: None)
2015-02-19 20:22:51+0800 [stock_name] INFO: Closing spider (finished)
2015-02-19 20:22:51+0800 [stock_name] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 239,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 11784,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2015, 2, 19, 12, 22, 51, 897000),
'log_count/DEBUG': 3,
'log_count/INFO': 7,
'response_received_count': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2015, 2, 19, 12, 22, 51, 352000)}
2015-02-19 20:22:51+0800 [stock_name] INFO: Spider closed (finished)
但是结果没有爬取到任何数据。
各位老师,请问是为什么?我是新手,在线等,十分感谢