Javier's Blog

Mostly computers and other tech stuff,...

Monday, September 11, 2006

1st Dive into Python

Getting the links out of an html document using python's built in regular expressions:

from urllib import urlopen
import re

def links(url):
socket = urlopen(url)
html = re.sub('\n', '', socket.read())
return re.findall('< a href="(.*?)">.*?< /a>', html, re.IGNORECASE)

The regular expression might not show up as intended!
Remove the space between the less/greater-than sign and the a.


Post a Comment

<< Home