Scrapy uses a signaling mechanism, as documented here: to notify various entities when something of interest happens, so callbacks can be triggered accordingly and deferred calls can be processed. Signaling is also exposed via the core API of Scrapy for use in extensions/middleware. The current signaling mechanism is based on the pydispatcher library which although does serve the purpose well often tends to be slower than the actual HTML parsing and tends to bottleneck spiders(see: Django, in its 1.0 release refactored the pydispatcher code which they claim resulted in a speed improvement of up to 90%, and also simplified the API for better. The task at hand is to introduce the same efficiency in Scrapy while also trying to retain as much backward API compatibility as possible and not breaking any user code.




  • Jakob de Maeyer
  • dangra
  • ptremberth