Skip Navigation

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

JoBo

JoBo is described as "free software" and the project page on SourceForge says it is GPL, but there are no explicit licence terms in the project source code.

  • Several classes depend on the Apache Commons Logging component, org.apache.log4j.*, released under the Apache Software License.
  • Two classes depend on the Apache Regular Expressions package, org.apache.regexp, released under the Apache Software License.
  • Two classes depend on the Castor XML framework packages, org.exolab.castor.mapping and org.exolab.castor.xml, which is released under a "BSD-like" licence, see the master licence.
  • Several classes depend on the org.w3c.dom package, which is released under the W3C® Software Notice and License.
  • Two classes depend on the JTidy package, org.w3c.tidy, which is released under the W3C® Software Notice and License.
  • Various classes depend on the packages javax.swing and javax.swing.table, which may not be implemented by GNU Classpath.

Initial review notes

JoBo looks a strong candidate because there is not a strong coupling between the Apache classes and the core code. One issue would be removing the logging dependencies from the WebRobot, FormFiller, HtmlDocument and HttpTool classes. Logging could be handled dynamically through an interface adapter. Secondly, the Apache regular expression handling in the RegExpRule and RegExpURLCheck classes would have to be switched to use the GNU RegExp package.

One of the strengths of JoBo is that it is already part-integrated with JTidy and has an HttpDocManager interface for post-processing documents. The default document interface saves all content to individual files.

JoBo appears to have advanced support for HTTP methods including cookies and form handling, and respects the robots exclusion protocol. The rate of spidering can also be throttled to moderate the load on the origin servers.

Up

This document was last modified by Philip Shaw on 2004-11-04 01:51:16
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html