Javier's Blog

Mostly computers and other tech stuff,...

Monday, September 11, 2006

1st Dive into Python

Getting the links out of an html document using python's built in regular expressions:

from urllib import urlopen
import re

def links(url):
socket = urlopen(url)
html = re.sub('\n', '', socket.read())
socket.close()
return re.findall('< a href="(.*?)">.*?< /a>', html, re.IGNORECASE)



The regular expression might not show up as intended!
Remove the space between the less/greater-than sign and the a.

0 Comments:

Post a Comment

Links to this post:

Create a Link

<< Home