Skip Navigation

Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.

Validator

Component description

The MKSearch validator component is responsible for ensuring that source documents are well-formed, valid XML documents; it therefore converts HTML documents to XHTML on the fly. The validator is largely composed of JTidy, which checks and corrects common HTML markup errors, and a validating XML parser.

Beta 2 development plans

Integrated exception handling
The alpha version of MKSearch does not have very sophisticated handling for HTML documents that cannot be parsed by JTidy; the output stream is empty and results in a SAXException when it is parsed and the document is not indexed. The beta validator will have to report such problems to the checker component so that the repository can be purged of any existing records for the problem document. The next release of JTidy is expected to implement a MessageListener interface that can be used to monitor the parse, see below.
Upgrade to release r8 of JTidy
A significant number of bugs have been reported against the current version r7 release of JTidy, many of which are expected to be corrected in the next release. No issues are known to affect MKSearch, but system tests have been relatively limited to date and it would be better to work with a cleaner version.

Up

This document was last modified by Philip Shaw on 2005-08-04 08:30:43
Copyright MKDoc Ltd. and others.
The Free Documentation License http://www.gnu.org/copyleft/fdl.html