Selenium now wraps htmlunit so you don´t need start a browser anymore. The new WebDriver api is very easy to use too. The first example use htmlunit driver
It would be very difficult to code a solution that would work with any arbitrary site out there. Each navigation menu implementation can be quite unique. I've worked a great deal with scrapers, and, provided you know the site you wish to target, here is how I'd approach it.
If you're still having problems, or need to emulate some form POSTs or ajax, get Firefox and install the LiveHttpHeaders plugin. This plugin will allow you to manually browse the site and capture the urls being navigated along with any cookies that are being passed during your manual browsing. That is what you need your scraperbot to send in a request to get a valid response from the target webserver(s). This will also capture any ajax calls being made, and in many cases the same ajax calls must be implementated in your scraper to get your desired responses.
You can try the open source screen scraper from Scrape.it
Update: As of April 4th, 2013 Scrape.it Screen Scraper is open source on github.
@insin Watir is not IE only.
Personally, I'm most familiar with Selenium, which has support for writing automation scripts in a good number of languagues and has more mature tooling, such as the excellent Selenium IDE extension for Firefox, which can be used to write and run testcases, and can export test scripts to many languages.
Using HtmlUnit is also a possibility.
HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.
It is typically used for testing purposes or to retrieve information from web sites.
Mozenda is a great tool to use as well.
I've been using Selenium for this and it find that it works great. Selenium runs in Browser and will work with Firefox, Webkit and IE. http://selenium.openqa.org/
©2020 All rights reserved.