I'm writing a little site decorator to make my local airport terminal site use standard HTML.

On my small local computer, I personally use Python's mechanize and BeautifulSoup packages to scrape and parse the website contents, and everything appears to operate all right. I've installed these packages via apt-get.

On my small hosting that is shared site (at DreamHost) I've downloaded the .tar.gz files, removed the packages, re-named the sites (e.g., from BeautifulSoup-3.1..tar.gz to BeautifulSoup) and attempted to operate the command.

I have got a bizarre error with BeautifulSoup I'm not sure whether it's a good older version of Python on Dreamhost, about directory names, or any other reason.

[sanjose]$ python

Python 2.4.4 (#2, Jan 24 2010, 11:50:13)

[GCC 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)] on linux2

Type "help", "copyright", "credits" or "license" to learn more.

>>> from BeautifulSoup import BeautifulSoup

>>> import mechanize

>>> url='http://world wide web.iaa.gov.il/Rashat/he-IL/International airports/BenGurion/informationForTravelers/OnlineFlights.aspx?flightsType=arr'

>>> br=mechanize.Browser()

>>> br.addheaders = [('User-agent', 'Mozilla/4. (compatible MSIE 7. Home windows NT 5.1)')]

>>> r=br.open(url)

>>> html=r.read()

>>> type(html)

<type 'str'>

I have carried this out to exhibit the input is a real string. Now let us run the command that actually works during my local computer:

>>> soup    =   BeautifulSoup.BeautifulSoup(html)

Traceback (newest call last):

  File "<stdin>", line 1, in ?

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 1493, in __init__

    BeautifulStoneSoup.__init__(self, *args, **kwargs)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 1224, in __init__

    self._feed(isHTML=isHTML)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 1257, in _feed

    self.builder.feed(markup)

  File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed

    self.goahead()

  File "/usr/lib/python2.4/HTMLParser.py", line 148, in goahead

    k = self.parse_starttag(i)

  File "/usr/lib/python2.4/HTMLParser.py", line 268, in parse_starttag

    self.handle_starttag(tag, attrs)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 1011, in handle_starttag

    self.soup.unknown_starttag(title, attrs)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 1408, in unknown_starttag

    tag = Tag(self, title, attrs, self.currentTag, self.previous)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 525, in __init__

    self.attrs = map(convert, self.attrs)

  File "/home/adamatan/matan.title/natbug/BeautifulSoup/BeautifulSoup.py", line 524, in <lambda>

    val))

  File "/usr/lib/python2.4/sre.py", line 142, in sub

    return _compile(pattern, ).sub(repl, string, count)

TypeError: expected string or buffer

Any ideas?

Adam

You are using BeautifulSoup version 3.1. that is for Python 3.x. Make use of a 3. version of BeautifulSoup for Python 2.x.