|
xapian-core
1.5.1
|
An indexed database of documents. More...
#include <database.h>
Public Member Functions | |
| void | add_database (const Database &other) |
| Add shards from another Database. | |
| size_t | size () const |
| Return number of shards in this Database object. | |
| Database () | |
| Construct a Database containing no shards. | |
| Database (std::string_view path, int flags=0) | |
| Open a Database. | |
| Database (int fd, int flags=0) | |
| Open a single-file Database. | |
| virtual | ~Database () |
| Destructor. | |
| Database (const Database &o) | |
| Copy constructor. | |
| Database & | operator= (const Database &o) |
| Assignment operator. | |
| Database (Database &&o) | |
| Move constructor. | |
| Database & | operator= (Database &&o) |
| Move assignment operator. | |
| bool | reopen () |
| Reopen the database at the latest available revision. | |
| void | close () |
| Close the database. | |
| virtual std::string | get_description () const |
| Return a string describing this object. | |
| PostingIterator | postlist_begin (std::string_view term) const |
| Start iterating the postings of a term. | |
| PostingIterator | postlist_end (std::string_view) const noexcept |
| End iterator corresponding to postlist_begin(). | |
| TermIterator | termlist_begin (Xapian::docid did) const |
| Start iterating the terms in a document. | |
| TermIterator | termlist_end (Xapian::docid) const noexcept |
| End iterator corresponding to termlist_begin(). | |
| bool | has_positions () const |
| Does this database have any positional information? | |
| PositionIterator | positionlist_begin (Xapian::docid did, std::string_view term) const |
| Start iterating positions for a term in a document. | |
| PositionIterator | positionlist_end (Xapian::docid, std::string_view) const noexcept |
| End iterator corresponding to positionlist_begin(). | |
| TermIterator | allterms_begin (std::string_view prefix={}) const |
| Start iterating all terms in the database with a given prefix. | |
| TermIterator | allterms_end (std::string_view={}) const noexcept |
| End iterator corresponding to allterms_begin(prefix). | |
| Xapian::doccount | get_doccount () const |
| Get the number of documents in the database. | |
| Xapian::docid | get_lastdocid () const |
| Get the highest document id which has been used in the database. | |
| double | get_average_length () const |
| Get the mean document length in the database. | |
| double | get_avlength () const |
| Old name for get_average_length() for backward compatibility. | |
| Xapian::totallength | get_total_length () const |
| Get the total length of all the documents in the database. | |
| Xapian::doccount | get_termfreq (std::string_view term) const |
| Get the number of documents indexed by a specified term. | |
| bool | term_exists (std::string_view term) const |
| Test is a particular term is present in any document. | |
| Xapian::termcount | get_collection_freq (std::string_view term) const |
| Get the total number of occurrences of a specified term. | |
| Xapian::doccount | get_value_freq (Xapian::valueno slot) const |
| Return the frequency of a given value slot. | |
| std::string | get_value_lower_bound (Xapian::valueno slot) const |
| Get a lower bound on the values stored in the given value slot. | |
| std::string | get_value_upper_bound (Xapian::valueno slot) const |
| Get an upper bound on the values stored in the given value slot. | |
| Xapian::termcount | get_doclength_lower_bound () const |
| Get a lower bound on the length of a document in this DB. | |
| Xapian::termcount | get_doclength_upper_bound () const |
| Get an upper bound on the length of a document in this DB. | |
| Xapian::termcount | get_wdf_upper_bound (std::string_view term) const |
| Get an upper bound on the wdf of term term. | |
| Xapian::termcount | get_unique_terms_lower_bound () const |
| Get a lower bound on the unique terms size of a document in this DB. | |
| Xapian::termcount | get_unique_terms_upper_bound () const |
| Get an upper bound on the unique terms size of a document in this DB. | |
| ValueIterator | valuestream_begin (Xapian::valueno slot) const |
| Return an iterator over the value in slot slot for each document. | |
| ValueIterator | valuestream_end (Xapian::valueno) const noexcept |
| Return end iterator corresponding to valuestream_begin(). | |
| Xapian::termcount | get_doclength (Xapian::docid did) const |
| Get the length of a specified document. | |
| Xapian::termcount | get_unique_terms (Xapian::docid did) const |
| Get the number of unique terms in a specified document. | |
| Xapian::termcount | get_wdfdocmax (Xapian::docid did) const |
| Get the maximum wdf value in a specified document. | |
| void | keep_alive () |
| Send a keep-alive message. | |
| Xapian::Document | get_document (Xapian::docid did, unsigned flags=0) const |
| Get a document from the database. | |
| std::string | get_spelling_suggestion (std::string_view word, unsigned max_edit_distance=2) const |
| Suggest a spelling correction. | |
| Xapian::TermIterator | spellings_begin () const |
| An iterator which returns all the spelling correction targets. | |
| Xapian::TermIterator | spellings_end () const noexcept |
| End iterator corresponding to spellings_begin(). | |
| Xapian::TermIterator | synonyms_begin (std::string_view term) const |
| An iterator which returns all the synonyms for a given term. | |
| Xapian::TermIterator | synonyms_end (std::string_view) const noexcept |
| End iterator corresponding to synonyms_begin(term). | |
| Xapian::TermIterator | synonym_keys_begin (std::string_view prefix={}) const |
| An iterator which returns all terms which have synonyms. | |
| Xapian::TermIterator | synonym_keys_end (std::string_view={}) const noexcept |
| End iterator corresponding to synonym_keys_begin(prefix). | |
| std::string | get_metadata (std::string_view key) const |
| Get the user-specified metadata associated with a given key. | |
| Xapian::TermIterator | metadata_keys_begin (std::string_view prefix={}) const |
| An iterator which returns all user-specified metadata keys. | |
| Xapian::TermIterator | metadata_keys_end (std::string_view={}) const noexcept |
| End iterator corresponding to metadata_keys_begin(). | |
| std::string | get_uuid () const |
| Get the UUID for the database. | |
| bool | locked () const |
| Test if this database is currently locked for writing. | |
| Xapian::WritableDatabase | lock (int flags=0) |
| Lock a read-only database for writing. | |
| Xapian::Database | unlock () |
| Release a database write lock. | |
| Xapian::rev | get_revision () const |
| Get the revision of the database. | |
| void | compact (std::string_view output, unsigned flags=0, int block_size=0) |
| Produce a compact version of this database. | |
| void | compact (int fd, unsigned flags=0, int block_size=0) |
| Produce a compact version of this database. | |
| void | compact (std::string_view output, unsigned flags, int block_size, Xapian::Compactor &compactor) |
| Produce a compact version of this database. | |
| void | compact (int fd, unsigned flags, int block_size, Xapian::Compactor &compactor) |
| Produce a compact version of this database. | |
| std::string | reconstruct_text (Xapian::docid did, size_t length=0, std::string_view prefix={}, Xapian::termpos start_pos=0, Xapian::termpos end_pos=0) const |
| Reconstruct document text. | |
Static Public Member Functions | |
| static size_t | check (std::string_view path, int opts=0, std::ostream *out=NULL) |
| Check the integrity of a database or database table. | |
| static size_t | check (int fd, int opts=0, std::ostream *out=NULL) |
| Check the integrity of a single file database. | |
An indexed database of documents.
A Database object contains zero or more shards, and operations are performed across these shards.
To perform a search on a Database, you need to use an Enquire object.
Most methods can throw:
| Xapian::DatabaseCorruptError | if database corruption is detected |
| Xapian::DatabaseError | in various situation (for example, if there's an I/O error). |
| Xapian::DatabaseModifiedError | if the revision being read has been discarded |
| Xapian::DatabaseClosedError | may be thrown by some methods after after close() has been called |
| Xapian::NetworkError | when remote databases are in use |
| Xapian::Database::Database | ( | ) |
Construct a Database containing no shards.
You can then add shards by calling add_database(). A Database containing no shards can also be useful in situations where you need an empty database.
Referenced by add_database(), Database(), Database(), Database(), operator=(), operator=(), Xapian::WritableDatabase::WritableDatabase(), Xapian::WritableDatabase::WritableDatabase(), Xapian::WritableDatabase::WritableDatabase(), and Xapian::WritableDatabase::WritableDatabase().
|
explicit |
Open a Database.
| path | Filing system path to open database from |
| flags | Bitwise-or of Xapian::DB_* constants |
The path can be a file (for a stub database or a single-file glass database) or a directory (for a standard glass database). If flags includes DB_BACKEND_INMEMORY then path is ignored.
| Xapian::DatabaseOpeningError | if the specified database cannot be opened |
| Xapian::DatabaseVersionError | if the specified database has a format too old or too new to be supported. |
|
explicit |
Open a single-file Database.
This method opens a single-file Database given a file descriptor open on it. Xapian looks starting at the current file offset, allowing a single file database to be easily embedded within another file.
| fd | File descriptor for the file. Xapian takes ownership of this and will close it when the database is closed. |
| flags | Bitwise-or of Xapian::DB_* constants. |
| Xapian::DatabaseOpeningError | if the specified database cannot be opened |
| Xapian::DatabaseVersionError | if the specified database has a format too old or too new to be supported. |
References Database().
| Xapian::Database::Database | ( | const Database & | o | ) |
|
inline |
Add shards from another Database.
Any shards in other are appended to the list of shards in this object. The shards are reference counted and also remain in other.
| other | Another Database to add shards from |
| Xapian::InvalidArgumentError | if other is the same object as this. |
References Database().
| TermIterator Xapian::Database::allterms_begin | ( | std::string_view | prefix = {} | ) | const |
Start iterating all terms in the database with a given prefix.
The terms are returned in ascending string order (by byte value).
| prefix | The prefix to restrict the returned terms to (default: iterate all terms) |
|
inlinestatic |
Check the integrity of a single file database.
| fd | file descriptor for the database. The current file offset is used, allowing checking a single file database which is embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
| opts | Options to use for check |
| out | std::ostream to write output to (NULL for no output) |
|
inlinestatic |
Check the integrity of a database or database table.
| path | Path to database or table |
| opts | Options to use for check |
| out | std::ostream to write output to (NULL for no output) |
| void Xapian::Database::close | ( | ) |
Close the database.
This closes the database and closes all its file handles.
For a WritableDatabase, if a transaction is active it will be aborted, while if no transaction is active commit() will be implicitly called. Also the write lock is released.
Calling close() on an object cannot be undone - in particular, a subsequent call to reopen() on the same object will not reopen it, but will instead throw a Xapian::DatabaseClosedError exception.
Calling close() again on an object which has already been closed has no effect (and doesn't raise an exception).
After close() has been called, calls to other methods of the database, and to methods of other objects associated with the database, will either:
The reason for this behaviour is that otherwise we'd have to check that the database is still open on every method call on every object associated with a Database, when in many cases they are working on data which has already been loaded and so they are able to just behave correctly.
|
inline |
Produce a compact version of this database.
The compactor functor allows handling progress output and specifying how user metadata is merged.
This variant writes a single-file database to the specified file descriptor. Only the glass backend supports such databases, so this form is only supported for this backend.
| fd | File descriptor to write the compact version to. The descriptor needs to be readable and writable (open with O_RDWR) and seekable. The current file offset is used, allowing compacting to a single file database embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
| flags | Any of the following combined using bitwise-or (| in C++):
|
| block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
| compactor | Functor |
|
inline |
Produce a compact version of this database.
This variant writes a single-file database to the specified file descriptor. Only the glass backend supports such databases, so this form is only supported for this backend.
| fd | File descriptor to write the compact version to. The descriptor needs to be readable and writable (open with O_RDWR) and seekable. The current file offset is used, allowing compacting to a single file database embedded within another file. Xapian takes ownership of the file descriptor and will close it before returning. |
| flags | Any of the following combined using bitwise-or (| in C++):
|
| block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
|
inline |
Produce a compact version of this database.
The compactor functor allows handling progress output and specifying how user metadata is merged.
| output | Path to write the compact version to. This can be the same as an input if that input is a stub database (in which case the database(s) listed in the stub will be compacted to a new database and then the stub will be atomically updated to point to this new database). |
| flags | Any of the following combined using bitwise-or (| in C++):
|
| block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
| compactor | Functor |
|
inline |
Produce a compact version of this database.
| output | Path to write the compact version to. This can be the same as an input if that input is a stub database (in which case the database(s) listed in the stub will be compacted to a new database and then the stub will be atomically updated to point to this new database). |
| flags | Any of the following combined using bitwise-or (| in C++):
|
| block_size | This specifies the block size (in bytes) for to use for the output. For glass, the block size must be a power of 2 between 2048 and 65536 (inclusive), and the default (also used if an invalid value is passed) is 8192 bytes. |
| Xapian::termcount Xapian::Database::get_collection_freq | ( | std::string_view | term | ) | const |
Get the total number of occurrences of a specified term.
The collection frequency of a term is defined as the total number of times it occurs in the database, which is the sum of its wdf in all the documents it indexes.
| term | The term to get the collection frequency of. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns get_doccount(). If the term isn't present in the database, 0 is returned. |
|
virtual |
Return a string describing this object.
Reimplemented in Xapian::WritableDatabase.
| Xapian::termcount Xapian::Database::get_doclength | ( | Xapian::docid | did | ) | const |
Get the length of a specified document.
| did | The document id of the document |
Xapian defines a document's length as the sum of the wdf of all the terms which index it.
| Xapian::termcount Xapian::Database::get_doclength_lower_bound | ( | ) | const |
Get a lower bound on the length of a document in this DB.
This bound does not include any zero-length documents.
| Xapian::Document Xapian::Database::get_document | ( | Xapian::docid | did, |
| unsigned | flags = 0 ) const |
Get a document from the database.
The returned object acts as a handle which lazily fetches information about the specified document from the database.
| did | The document ID of the document to be get |
| flags | Zero or more flags bitwise-or-ed together (currently only Xapian::DOC_ASSUME_VALID is supported). (default: 0) |
| Xapian::InvalidArgumentError | is thrown if did is 0. |
| Xapian::DocNotFoundError | is thrown if the specified docid is not present in this database. |
| std::string Xapian::Database::get_metadata | ( | std::string_view | key | ) | const |
Get the user-specified metadata associated with a given key.
User-specified metadata allows you to store arbitrary information in the form of (key, value) pairs. See WritableDatabase::set_metadata() for more information.
When invoked on a Xapian::Database object representing multiple databases, currently only the metadata for the first is considered but this behaviour may change in the future.
If there is no piece of metadata associated with the specified key, an empty string is returned (this applies even for backends which don't support metadata).
Empty keys are not valid, and specifying one will cause an exception.
| key | The key of the metadata item to access. |
| Xapian::InvalidArgumentError | will be thrown if the key supplied is empty. |
| Xapian::rev Xapian::Database::get_revision | ( | ) | const |
Get the revision of the database.
The revision is an unsigned integer which increases with each commit.
| Xapian::InvalidOperationError | If the database consists of more than one shard. |
| Xapian::UnimplementedError | Currently this is only implemented for glass. |
| In | Xapian < 1.4.13, if the database consists of no shards; In Xapian >= 1.4.13 this method returns 0 if there are no shards. |
Experimental - see https://xapian.org/docs/deprecation#experimental-features
| std::string Xapian::Database::get_spelling_suggestion | ( | std::string_view | word, |
| unsigned | max_edit_distance = 2 ) const |
Suggest a spelling correction.
| word | The potentially misspelled word. |
| max_edit_distance | Only consider words which are at most max_edit_distance edits from word. An edit is a character insertion, deletion, or the transposition of two adjacent characters (default is 2). |
| Xapian::doccount Xapian::Database::get_termfreq | ( | std::string_view | term | ) | const |
Get the number of documents indexed by a specified term.
| term | The term to get the frequency of. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns get_doccount(). If the term isn't present in the database, 0 is returned. |
| Xapian::totallength Xapian::Database::get_total_length | ( | ) | const |
Get the total length of all the documents in the database.
| Xapian::termcount Xapian::Database::get_unique_terms | ( | Xapian::docid | did | ) | const |
Get the number of unique terms in a specified document.
| did | The document id of the document |
This is the number of different terms which index the given document.
| Xapian::termcount Xapian::Database::get_unique_terms_lower_bound | ( | ) | const |
Get a lower bound on the unique terms size of a document in this DB.
| Xapian::termcount Xapian::Database::get_unique_terms_upper_bound | ( | ) | const |
Get an upper bound on the unique terms size of a document in this DB.
| std::string Xapian::Database::get_uuid | ( | ) | const |
Get the UUID for the database.
The UUID will persist for the lifetime of the database.
Replicas (eg, made with the replication protocol, or by copying all the database files) will have the same UUID. However, copies (made with copydatabase, or xapian-compact) will have different UUIDs.
If the backend does not support UUIDs or this database has no subdatabases, the UUID will be empty.
If this database has multiple sub-databases, the UUID string will contain the UUIDs of all the sub-databases separated by colons.
| Xapian::doccount Xapian::Database::get_value_freq | ( | Xapian::valueno | slot | ) | const |
Return the frequency of a given value slot.
This is the number of documents which have a (non-empty) value stored in the slot.
| slot | The value slot to examine. |
| std::string Xapian::Database::get_value_lower_bound | ( | Xapian::valueno | slot | ) | const |
Get a lower bound on the values stored in the given value slot.
If there are no values stored in the given value slot, this will return an empty string.
| slot | The value slot to examine. |
| std::string Xapian::Database::get_value_upper_bound | ( | Xapian::valueno | slot | ) | const |
Get an upper bound on the values stored in the given value slot.
If there are no values stored in the given value slot, this will return an empty string.
| slot | The value slot to examine. |
| Xapian::termcount Xapian::Database::get_wdfdocmax | ( | Xapian::docid | did | ) | const |
Get the maximum wdf value in a specified document.
| did | The document id of the document |
| void Xapian::Database::keep_alive | ( | ) |
Send a keep-alive message.
For remote databases, this method sends a message to the server to reset the timeout timer. As well as preventing timeouts at the Xapian remote protocol level, this message will also avoid timeouts at lower levels.
For local databases, this method does nothing.
| Xapian::WritableDatabase Xapian::Database::lock | ( | int | flags = 0 | ) |
Lock a read-only database for writing.
If the database is actually already writable (i.e. a WritableDatabase via a Database reference) then the same database is returned (with its flags updated, so this provides an efficient way to modify flags on an open WritableDatabase).
Unlike unlock(), the object this is called on remains open.
| flags | The flags to use for the writable database. Flags which specify how to open the database are ignored (e.g. DB_CREATE_OR_OVERWRITE doesn't result in the database being wiped), and flags which specify the backend are also ignored as they are only relevant when creating a new database. |
| bool Xapian::Database::locked | ( | ) | const |
Test if this database is currently locked for writing.
If the underlying object is actually a WritableDatabase, always returns true unless close() has been called.
Otherwise tests if there's a writer holding the lock (or if we can't test for a lock without taking it on the current platform, throw Xapian::UnimplementedError). If there's an error while trying to test the lock, throws Xapian::DatabaseLockError.
For multi-databases, this tests each sub-database and returns true if any of them are locked.
| Xapian::TermIterator Xapian::Database::metadata_keys_begin | ( | std::string_view | prefix = {} | ) | const |
An iterator which returns all user-specified metadata keys.
When invoked on a Xapian::Database object representing multiple databases, currently only the metadata for the first is considered but this behaviour may change in the future.
If the backend doesn't support metadata, then this method returns an iterator which compares equal to that returned by metadata_keys_end().
| prefix | If non-empty, only keys with this prefix are returned. |
| Xapian::UnimplementedError | will be thrown if the backend implements user-specified metadata, but doesn't implement iterating its keys (currently this happens for the InMemory backend). |
Assignment operator.
The internals are reference counted, so assignment is cheap.
References Database().
Referenced by Xapian::WritableDatabase::operator=(), and Xapian::WritableDatabase::operator=().
| PositionIterator Xapian::Database::positionlist_begin | ( | Xapian::docid | did, |
| std::string_view | term ) const |
Start iterating positions for a term in a document.
| did | The document id of the document |
| term | The term |
| PostingIterator Xapian::Database::postlist_begin | ( | std::string_view | term | ) | const |
Start iterating the postings of a term.
| term | The term to iterate the postings of. An empty string acts as a special pseudo-term which indexes all the documents in the database with a wdf of 1. |
| std::string Xapian::Database::reconstruct_text | ( | Xapian::docid | did, |
| size_t | length = 0, | ||
| std::string_view | prefix = {}, | ||
| Xapian::termpos | start_pos = 0, | ||
| Xapian::termpos | end_pos = 0 ) const |
Reconstruct document text.
This uses term positional information to reconstruct the document text which was indexed. Reading the required positional information is potentially quite I/O intensive.
The reconstructed text will be missing punctuation and most capitalisation.
| did | The document id of the document to reconstruct |
| length | Number of bytes of text to aim for - note that slightly more may be returned (default: 0 meaning unlimited) |
| prefix | Term prefix to reconstruct (default: none) |
| start_pos | First position to reconstruct (default: 0) |
| end_pos | Last position to reconstruct (default: 0 meaning all) |
| bool Xapian::Database::reopen | ( | ) |
Reopen the database at the latest available revision.
Xapian databases (at least with most backends) support versioning such that a Database object uses a snapshot of the database. However, write operations may cause this snapshot to be discarded, which can cause Xapian::DatabaseModifiedError to be thrown. You can recover from this situation by calling reopen() and restarting the search operation.
All shards are updated to the latest available revision. This should be a cheap operation if they're already at the latest revision, so if you're using the same Database object for many searches it's reasonable to call reopen() before each search.
| Xapian::DatabaseError | is thrown if close() has been called on any of the shards. |
| size_t Xapian::Database::size | ( | ) | const |
Return number of shards in this Database object.
If you want the number of documents, see @ get_doccount().
| Xapian::TermIterator Xapian::Database::spellings_begin | ( | ) | const |
An iterator which returns all the spelling correction targets.
This returns all the words which are considered as targets for the spelling correction algorithm. The frequency of each word is available as the term frequency of each entry in the returned iterator.
| Xapian::TermIterator Xapian::Database::synonym_keys_begin | ( | std::string_view | prefix = {} | ) | const |
An iterator which returns all terms which have synonyms.
| prefix | If non-empty, only terms with this prefix are returned. |
| Xapian::TermIterator Xapian::Database::synonyms_begin | ( | std::string_view | term | ) | const |
An iterator which returns all the synonyms for a given term.
| term | The term to return synonyms for. |
| bool Xapian::Database::term_exists | ( | std::string_view | term | ) | const |
Test is a particular term is present in any document.
| term | The term to test for. An empty string acts as a special pseudo-term which indexes all the documents in the database, so returns true if the database contains any documents. |
db.term_exists(t) gives the same answer as db.get_termfreq(t) != 0, but is typically more efficient.
| TermIterator Xapian::Database::termlist_begin | ( | Xapian::docid | did | ) | const |
Start iterating the terms in a document.
| did | The document id to iterate terms from |
The terms are returned in ascending string order (by byte value).
| Xapian::Database Xapian::Database::unlock | ( | ) |