Beautiful Soup

Beautiful Soup is a Python library which is very handy for projects like screen-scraping.  Here’s a brief tutorial on how to scrape a list of the top 250 movies from and write them to a local text file:

1) Download Beautiful Soup

Downloading Beautiful Soup is very easy. I’m currently using version 3 and so I simply downloaded the tarball and copied to my Python project folder.

2) Copy IMBD Top 250 Movies Web Page Locally

Since my Python application is not sending an HTTP user-agent, any requests that my application sends to are rejected. I’ll probably fix this at some point, but for now the easiest solution was to save a copy of the Top 250 Movies web page to my local hard drive e.g. imdb250.htm.

3) Write Python Code Using Beautiful Soup (

4) Run Program

First, I check to see if “imdb.txt” i.e. my output file exists.  If the file already exists, then I don’t need to do anything. If the file doesn’t exist, then I open the local version of the IMDB web page i.e. “imdb250.htm” in read mode.

Next I instantiate the BeautifulSoup class with the HTML from that web page. Next I use BeautifulSoup to find any instances of HTML tables in the page and then any <a> tags (which I now are links to the movie pages).

Then I open my output file i.e. “imdb.txt” in write mode and I write the string value i.e. title of each movie to that text file. Then I close the file and we’re done.

Whenever, I need to re-run this I just make another local copy of the IMDB web page and then delete the “imdb.txt” file.

Here is an example of the “imdb.txt” file created by this program.