A few elements in the interface are specific and and need an explanation.
An udi (unique document identifier) identifies a document. Because of limitations inside the index engine, it is restricted in length (to 200 bytes), which is why a regular URI cannot be used. The structure and contents of the udi is defined by the application and opaque to the index engine. For example, the internal file system indexer uses the complete document path (file path + internal path), truncated to length, the suppressed part being replaced by a hash value.
This data value (set as a field in the Doc
          object) is stored, along with the URL, but not indexed by
          Recoll. Its contents are not interpreted, and its use is up
          to the application. For example, the Recoll internal file
          system indexer stores the part of the document access path
          internal to the container file (ipath in
          this case is a list of subdocument sequential numbers). url
          and ipath are returned in every search result and permit
          access to the original document.
The fields file inside
          the Recoll configuration defines which document fields are
          either "indexed" (searchable), "stored" (retrievable with
          search results), or both.
Data for an external indexer, should be stored in a separate index, not the one for the Recoll internal file system indexer, except if the latter is not used at all). The reason is that the main document indexer purge pass would remove all the other indexer's documents, as they were not seen during indexing. The main indexer documents would also probably be a problem for the external indexer purge operation.
Recoll versions after 1.11 define a Python programming interface, both for searching and indexing. The indexing portion has seen little use, but the searching one is used in the Recoll Ubuntu Unity Lens and Recoll Web UI.
The API is inspired by the Python database API specification. There were two major changes in recent Recoll versions:
recoll module became a
              package (with an internal recoll
              module) as of Recoll version 1.19, in order to add more
              functions. For existing code, this only changes the way
              the interface must be imported.
We will mostly describe the new API and package structure here. A paragraph at the end of this section will explain a few differences and ways to write code compatible with both versions.
The Python interface can be found in the source package,
          under python/recoll.
The python/recoll/ directory
	  contains the usual setup.py. After
	  configuring the main Recoll code, you can use the script to
	  build and install the Python module:
          
            cd recoll-xxx/python/recoll
            python setup.py build
            python setup.py install
          
The normal Recoll installer installs the Python API along with the main code.
When installing from a repository, and depending on the distribution, the Python API can sometimes be found in a separate package.
The recoll package contains two
          modules:
          
The recoll module contains
                functions and classes used to query (or update) the
                index.
The rclextract module contains
                functions and classes used to access document
                data.
connect() function connects to
                one or several Recoll index(es) and returns
                a Db object.
                confdir may specify
                    a configuration directory. The usual defaults
                    apply.extra_dbs is a list of
                  additional indexes (Xapian directories). writable decides if
                    we can index new data through this
                    connection.A Db object is created by
            a connect() call and holds a 
            connection to a Recoll index.
Methods
Db object after
                this.Query object
                for this index.maxchars defines the
                maximum total size of the abstract. 
                contextwords defines how many
                terms are shown around the keyword.match_type
                can be either
                of wildcard, regexp
                or stem. Returns a list of terms
                expanded from the input expression.
              A Query object (equivalent to a
            cursor in the Python DB API) is created by
            a Db.query() call. It is used to
            execute index searches.
Methods
fieldname, in ascending
                or descending order. Must be called before executing
                the search.query_string, a Recoll
              search language string.Doc objects in the current
                search results, and returns them as an array of the
                required size, which is by default the value of
                the arraysize data member.Doc object
                from the current search results.mode can
                be relative
                or absolute. ishtml
              can be set to indicate that the input text is HTML and
              that HTML special characters should not be escaped.
              methods if set should be an object
              with methods startMatch(i) and endMatch() which will be
              called for each match and should return a begin and end
              tagdoc (a Doc
                object) by selecting text around the match terms.
                If methods is set, will also perform highlighting. See
                the highlight method.
              for doc in
                  query: will work.Data descriptors
scroll()). Starts at
                0.A Doc object contains index data
            for a given document. The data is extracted from the
            index when searching, or set by the indexer program when
            updating. The Doc object has many attributes to be read or
            set by its user. It matches exactly the Rcl::Doc C++
            object. Some of the attributes are predefined, but,
            especially when indexing, others can be set, the name of
            which will be processed as field names by the indexing
            configuration.  Inputs can be specified as Unicode or
            strings. Outputs are Unicode objects. All dates are
            specified as Unix timestamps, printed as strings. Please
            refer to the rcldb/rcldoc.h C++ file
            for a description of the predefined attributes.
At query time, only the fields that are defined
            as stored either by default or in
            the fields configuration file will be
            meaningful in the Doc
            object. Especially this will not be the case for the
            document text. See the rclextract
            module for accessing document contents.
Methods
A SearchData object allows building
            a query by combining clauses, for execution
            by Query.executesd(). It can be used
            in replacement of the query language approach. The
            interface is going to change a little, so no detailed doc
            for now...
Methods
Index queries do not provide document content (only a
          partial and unprecise reconstruction is performed to show the
          snippets text). In order to access the actual document data, 
          the data extraction part of the indexing process
          must be performed (subdocument access and format
          translation). This is not trivial in
          general. The rclextract module currently
          provides a single class which can be used to access the data
          content for result documents.
Methods
Extractor object is
                  built from a Doc object, output
                  from a query.ipath and return
                a Doc object. The doc.text field
                has the document text converted to either text/plain or
                text/html according to doc.mimetype. The typical use
                would be as follows:
                  qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) doc = extractor.textextract(qdoc.ipath) # use doc.text, e.g. for previewing
qdoc = query.fetchone() extractor = recoll.Extractor(qdoc) filename = extractor.idoctofile(qdoc.ipath, qdoc.mimetype)
The following sample would query the index with a user
        language string. See the python/samples
          directory inside the Recoll source for other
          examples. The recollgui subdirectory
          has a very embryonic GUI which demonstrates the
          highlighting and data extraction functions.
#!/usr/bin/env python
from recoll import recoll
db = recoll.connect()
db.setAbstractParams(maxchars=80, contextwords=4)
query = db.query()
nres = query.execute("some user question")
print "Result count: ", nres
if nres > 5:
    nres = 5
for i in range(nres):
    doc = query.fetchone()
    print "Result #%d" % (query.rownumber,)
    for k in ("title", "size"):
        print k, ":", getattr(doc, k).encode('utf-8')
    abs = db.makeDocAbstract(doc, query).encode('utf-8')
    print abs
    print
The following code fragments can be used to ensure that code can run with both the old and the new API (as long as it does not use the new abilities of the new API of course).
Adapting to the new package structure:
try:
    from recoll import recoll
    from recoll import rclextract
    hasextract = True
except:
    import recoll
    hasextract = False
Adapting to the change of nature of
          the next Query
          member. The same test can be used to choose to use
          the scroll() method (new) or set
          the next value (old).
       rownum = query.next if type(query.next) == int else \
                 query.rownumber