Skip Navigation

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Excluded spiders

These spiders have been excluded because their licence terms are not explicit or not compatible with GPL, or the project is not sufficiently mature to be considered.


Arale is declared open source, but has no explicit licence terms, it seems to be intended primarily for personal use.


Project is in its very early days, not suitable.

Nutch is based on Apache Lucene through the Apache Software License, and appears to have its own supplementary licence terms.
Oxyus is a search engine based on Apache Lucene indexer and is released under a version of the Apache Software License, see the OpenSymphony Software License page for details.

Spider is designed to handle custom post-processing of data acquired through spidering. It is released under an OSI Approved licence. It has significant dependencies on Apache software, so is not suitable.

  • The class com.tempeststrings.spider.util.SpiderHostnameVerifier depends on the class.
  • Many classes depend on Apache packages released under the Apache Software License version 2.0:
    • The Avalon framework, org.apache.avalon.*.
    • The Command Line Interface (CLI) package, org.apache.commons.cli.
    • The Commons Digester package, org.apache.commons.digester.
    • The Commons HTTP Client package, org.apache.commons.httpclient.
    • The Commons Lang package, org.apache.commons.lang
    • The Commons Logging package, org.apache.commons.logging.
    • The Excalibur package, org.apache.excalibur.*.
    • The Commons ORO package, org.apache.oro.*.
    • The Xerces XML parser packages, org.apache.xerces.* and org.apache.xml.*.
  • JUnit tests depend on the junit.framework package, see secondary dependencies on JUnit below.
  • The class com.tempeststrings.spider.manager.FeedManagerSpiderImpl depends on the class, which may not be fully implemented by GNU Classpath.
  • Two classes depend on the javax.servlet.http.Cookie class, which should be compatible with the GNU Servlet API.

WebSphinx is released under an "Apache-style" licence, see the master version for details.


This document was last modified by Philip Shaw on 2004-11-03 06:57:17
Copyright MKDoc Ltd. and others.
The Free Documentation License