Beautiful Soup is a Python library which is very handy for projects like screen-scraping. Here’s a brief tutorial on how to scrape a list of the top 250 movies from IMBD.com and write them to a local text file: 1) Download Beautiful Soup Downloading Beautiful Soup is very easy. I’m currently using version 3 and so I simply downloaded the tarball and copied BeautifulSoup.py to my Python project folder. 2) Copy IMBD Top 250 Movies Web Page Locally Since my Python application is not sending an HTTP user-agent, any requests that my application sends to IMDB.com are rejected. I’ll probably fix this at some point, but for now the easiest solution was to save a copy of the Top 250 Movies web page to my local hard drive e.g. imdb250.htm. 3) Write Python Code Using Beautiful Soup (imdb.py) Python import sys import string from urllib import urlopen from BeautifulSoup import BeautifulSoup try: with open('imdb.txt') as f: pass print "File (imdb.txt) already exists." except IOError as e: print "Generating new file (imdb.txt)." try: text = urlopen('imdb250.htm').read() soup = BeautifulSoup(text) f = open("imdb.txt", "w") table = soup.find('table') links = table.findAll('a') for item in links: f.write(item.string + '\n') f.close() except: print "Target file (imdb250.htm) could not be found."...Read More
These days you find wi-fi networks every where. This is good. And it’s bad. It’s great having the convenience of wi-fi at coffee shops, restaurants, hotels etc. But it’s a pain when all your neighbors have wi-fi networks that keep interfering with each other. I recently discovered a handy free utility for the Mac called iStumbler. It lists visible wireless networks in your area and provides lots of useful information such protocol, signal strength, frequency, MAC address etc. But probably the most useful information (if you like me, are having problems with wi-fi network interfering with each other) is the channel that each wi-fi network is using. It’s amazing how many of my neighbor’s routers are using the same channel (which is probably the default setting they came with). I read somewhere that two wireless networks using the same channel share the bandwidth, which halves each others bandwidth. I’m not sure if that’s true, but I’ve got 4 neighbors using the same channel, so their Internet connections must be really slow and certainly not anywhere close to what their probably paying for with Comcast. Anyway, I just found a free channel that wasn’t being used by anyone, restarted my router and I’m now getting double the download speed (around 10Mbs). Not bad for an old G...Read More
Python allows functions to have default values for arguments. These are used when the function is called without the argument. But where it gets really interesting is when you realize that in Python it’s also possible to name the arguments and call them in any order, which can be very useful. Python def shirt(styleID, size='Medium', color='White'): ... 12 def shirt(styleID, size='Medium', color='White'):... For example, in the ‘shirt’ function above, style is a required argument (since it has no default value) but the other two arguments i.e. size and color are optional (since they do have default values assigned). There are several ways to call this function: Python shirt(1) 1 shirt(1) styleID gets a value of 1 and size and color are assigned their default values. Python shirt(1, "Large") 1 shirt(1, "Large") styleID gets a value of 1 and size gets a value of “Large”, while color gets the default value. Python shirt(1, color="Black") 1 shirt(1, color="Black") styleID gets a value of 1 and color gets a value of “Black”, while size gets the default value. Python shirt(size="Small", styleID=2) 1 shirt(size="Small", styleID=2) size gets a value of “Small” and styleID gets a value of 2, while color gets the default...Read More
Unlike other languages such as Ruby, there is no built-in method to reverse a string in Python. Here are two possible ways to accomplish that: 1) The 'Long' Way This approach uses a for-loop with the join method to reverse a string: Python def rev(s): return ''.join([s[i] for i in range(len(s)-1,-1,-1)]) >>> print rev("abcde") edcba 12345 def rev(s): return ''.join([s[i] for i in range(len(s)-1,-1,-1)]) >>> print rev("abcde")edcba 2) The Simpler Way This approach uses Python's 'slice notation' to accomplish the same result: Python def rev(s): return s[::-1] >>> print rev("abcde") edcba 12345 def rev(s): return s[::-1] >>> print rev("abcde")edcba I think I read somewhere that #1 (for-loop) is faster, but I do love the simplicity of slicing. It looks a little strange at first, if you're not used to Python. But it really is pretty easy to use once you get the hang of...Read More
Hi. I'm Omer! I live in sunny Seattle. I usually write about technology and other stuff that I think is worth sharing with the world.