Python Page Spider Web Crawler Tutorial
Code for tutorials can be found at my github repository. Even more code is available for free here as well. http://github.com/creeveshft
I build a python page spider algorithm using a Stack and Queue. I append and pop urls on to a stack in order to keep track of scheduled page requests, while only pusing urls on to the historical array to make sure I only visit every page once.
this web crawler can be used for scraping articles, or any other data.
In the future we will be using the meta tags to come up with new related search terms for our spider algorithm. We will need to use mechanize for this feature.
Sorry if this tutorial was confusing.
Learn about a stack and a queue in order to understand what I am doing in this tutorial.
To see my data feeds and other products for sale and lease visit my website and purchase data feeds or software products.
Follow me on Twitter: http://twitter.com/cjreeves2011
The web scraping news system is located here
For consulting work greater than $50,000 or comments and suggestions email email@example.com
Read my personal blog : http://blog.christopherreevesofficial.com