Skip Navigation







Sign up

If you sign up for an account on this web site you can customise elements of this site and subscribe to an email newsletter.

If you have an account on this web site you may login.

If you have an account on this site but have forgotten your user name and / or your password then you can request an account reminder email.


A new kind of search engine

MKSearch is a research project to develop a metadata search engine. The system is composed of two linked systems; an indexing Web crawler and a public query interface. The indexing component extracts Dublin Core metadata from Web documents and stores them in RDF format. The query interface matches documents in the index using an RDF query language and can return the results in a variety of formats including standard HTML and as a standing RSS feed.

Project status

The project has recently completed an alpha proof of concept stage. A set of prototype components have been developed, integrated and tested from end to end using limited test data. The project is now entering a beta development phase in which the prototype system will be developed further to implement the full feature set for the system.

The MKSearch system is being developed using the Java programming language and is licenced under the GNU General Public Licence. All software is compiled and tested using both the Sun and GNU Java compilers. All project source material is available through the public MKSearch Subversion repository.

System composition

The MKSearch system is composed of several other free software components. Further details and development plans are available in the Plans section.

JSpider is a Java Web crawler engine that has pluggable interfaces that can be used to add custom processing and content handling. MKSearch uses custom SAX-based content handlers for extracting metadata from Web documents.
Sesame is a set of RDF processing and storage APIs and applications that includes RDF data query facilities. MKSearch uses Sesame to store indexed metadata in RDF format and to search the repository via the public query interface.
JTidy is a utility for correcting common HTML markup errors and is used to convert HTML documents to XHTML so they can be processed using SAX.


This document was last modified by Philip Shaw on 2005-02-09 10:33:06
Copyright MKDoc Ltd. and others.
The Free Documentation License