|
xapian-core
1.5.1
|
Xapian::Weight subclass implementing the tf-idf weighting scheme. More...
#include <weight.h>
Public Types | |
| enum class | wdf_norm : unsigned char { NONE = 1 , BOOLEAN = 2 , SQUARE = 3 , LOG = 4 , PIVOTED = 5 , LOG_AVERAGE = 6 , AUG_LOG = 7 , SQRT = 8 , AUG_AVERAGE = 9 , MAX = 10 , AUG = 11 } |
| Wdf normalizations. More... | |
| enum class | idf_norm : unsigned char { NONE = 1 , TFIDF = 2 , SQUARE = 3 , FREQ = 4 , PROB = 5 , PIVOTED = 6 , GLOBAL_FREQ = 7 , LOG_GLOBAL_FREQ = 8 , INCREMENTED_GLOBAL_FREQ = 9 , SQRT_GLOBAL_FREQ = 10 } |
| Idf normalizations. More... | |
| enum class | wt_norm : unsigned char { NONE = 1 } |
| Weight normalizations. More... | |
Public Member Functions | |
| TfIdfWeight (const std::string &normalizations) | |
| Construct a TfIdfWeight. | |
| TfIdfWeight (const std::string &normalizations, double slope, double delta) | |
| Construct a TfIdfWeight. | |
| TfIdfWeight (wdf_norm wdf_normalization, idf_norm idf_normalization, wt_norm wt_normalization) | |
| Construct a TfIdfWeight. | |
| TfIdfWeight (wdf_norm wdf_norm_, idf_norm idf_norm_, wt_norm wt_norm_, double slope, double delta) | |
| Construct a TfIdfWeight. | |
| TfIdfWeight () | |
| Construct a TfIdfWeight using the default normalizations ("ntn"). | |
| std::string | name () const |
| Return the name of this weighting scheme, e.g. | |
| std::string | serialise () const |
| Return this object's parameters serialised as a single string. | |
| TfIdfWeight * | unserialise (const std::string &serialised) const |
| Unserialise parameters. | |
| double | get_sumpart (Xapian::termcount wdf, Xapian::termcount doclen, Xapian::termcount uniqterm, Xapian::termcount wdfdocmax) const |
| Calculate the weight contribution for this object's term to a document. | |
| double | get_maxpart () const |
| Return an upper bound on what get_sumpart() can return for any document. | |
| TfIdfWeight * | create_from_parameters (const char *params) const |
| Create from a human-readable parameter string. | |
| Public Member Functions inherited from Xapian::Weight | |
| Weight () | |
| Default constructor, needed by subclass constructors. | |
| virtual | ~Weight () |
| Virtual destructor, because we have virtual methods. | |
| virtual double | get_sumextra (Xapian::termcount doclen, Xapian::termcount uniqterms, Xapian::termcount wdfdocmax) const |
| Calculate the term-independent weight component for a document. | |
| virtual double | get_maxextra () const |
| Return an upper bound on what get_sumextra() can return for any document. | |
Additional Inherited Members | |
| Static Public Member Functions inherited from Xapian::Weight | |
| static const Weight * | create (const std::string &scheme, const Registry ®=Registry()) |
| Return the appropriate weighting scheme object. | |
| Protected Types inherited from Xapian::Weight | |
| enum | stat_flags { COLLECTION_SIZE = 0 , RSET_SIZE = 0 , AVERAGE_LENGTH = 4 , TERMFREQ = 1 , RELTERMFREQ = 1 , QUERY_LENGTH = 0 , WQF = 0 , WDF = 2 , DOC_LENGTH = 8 , DOC_LENGTH_MIN = 16 , DOC_LENGTH_MAX = 32 , WDF_MAX = 64 , COLLECTION_FREQ = 1 , UNIQUE_TERMS = 128 , TOTAL_LENGTH = 256 , WDF_DOC_MAX = 512 , UNIQUE_TERMS_MIN = 1024 , UNIQUE_TERMS_MAX = 2048 , DB_DOC_LENGTH_MIN = 4096 , DB_DOC_LENGTH_MAX = 8192 , DB_UNIQUE_TERMS_MIN = 16384 , DB_UNIQUE_TERMS_MAX = 32768 , DB_WDF_MAX = 65536 , IS_BOOLWEIGHT_ = static_cast<int>(0x80000000) } |
| Stats which the weighting scheme can use (see need_stat()). More... | |
| Protected Member Functions inherited from Xapian::Weight | |
| void | need_stat (stat_flags flag) |
| Tell Xapian that your subclass will want a particular statistic. | |
| Weight (const Weight &) | |
| Don't allow copying. | |
| Xapian::doccount | get_collection_size () const |
| The number of documents in the collection. | |
| Xapian::doccount | get_rset_size () const |
| The number of documents marked as relevant. | |
| Xapian::doclength | get_average_length () const |
| The average length of a document in the collection. | |
| Xapian::doccount | get_termfreq () const |
| The number of documents which this term indexes. | |
| Xapian::doccount | get_reltermfreq () const |
| The number of relevant documents which this term indexes. | |
| Xapian::termcount | get_collection_freq () const |
| The collection frequency of the term. | |
| Xapian::termcount | get_query_length () const |
| The length of the query. | |
| Xapian::termcount | get_wqf () const |
| The within-query-frequency of this term. | |
| Xapian::termcount | get_doclength_upper_bound () const |
| An upper bound on the maximum length of any document in the shard. | |
| Xapian::termcount | get_doclength_lower_bound () const |
| A lower bound on the minimum length of any document in the shard. | |
| Xapian::termcount | get_wdf_upper_bound () const |
| An upper bound on the wdf of this term in the shard. | |
| Xapian::totallength | get_total_length () const |
| Total length of all documents in the collection. | |
| Xapian::termcount | get_unique_terms_upper_bound () const |
| A lower bound on the number of unique terms in any document in the shard. | |
| Xapian::termcount | get_unique_terms_lower_bound () const |
| An upper bound on the number of unique terms in any document in the shard. | |
| Xapian::termcount | get_db_doclength_upper_bound () const |
| An upper bound on the maximum length of any document in the database. | |
| Xapian::termcount | get_db_doclength_lower_bound () const |
| A lower bound on the minimum length of any document in the database. | |
| Xapian::termcount | get_db_unique_terms_upper_bound () const |
| A lower bound on the number of unique terms in any document in the database. | |
| Xapian::termcount | get_db_unique_terms_lower_bound () const |
| An upper bound on the number of unique terms in any document in the database. | |
| Xapian::termcount | get_db_wdf_upper_bound () const |
| An upper bound on the wdf of this term in the database. | |
Xapian::Weight subclass implementing the tf-idf weighting scheme.
|
strong |
Idf normalizations.
|
strong |
Wdf normalizations.
|
strong |
|
inlineexplicit |
Construct a TfIdfWeight.
| normalizations | A three character string indicating the normalizations to be used for the tf(wdf), idf and document weight. (default: "ntn") |
The normalizations string works like so:
Implementing support for more normalizations of each type would require extending the backend to track more statistics.
References TfIdfWeight().
Referenced by create_from_parameters(), TfIdfWeight(), TfIdfWeight(), and unserialise().
| Xapian::TfIdfWeight::TfIdfWeight | ( | const std::string & | normalizations, |
| double | slope, | ||
| double | delta ) |
Construct a TfIdfWeight.
| normalizations | A three character string indicating the normalizations to be used for the tf(wdf), idf and document weight. (default: "ntn") |
| slope | Extra parameter for "Pivoted" tf normalization. (default: 0.2) |
| delta | Extra parameter for "Pivoted" tf normalization. (default: 1.0) |
The normalizations string works like so:
Implementing support for more normalizations of each type would require extending the backend to track more statistics.
|
inline |
Construct a TfIdfWeight.
| wdf_norm_ | The normalization for the wdf. |
| idf_norm_ | The normalization for the idf. |
| wt_norm_ | The normalization for the document weight. |
Implementing support for more normalizations of each type would require extending the backend to track more statistics.
References TfIdfWeight().
| Xapian::TfIdfWeight::TfIdfWeight | ( | wdf_norm | wdf_norm_, |
| idf_norm | idf_norm_, | ||
| wt_norm | wt_norm_, | ||
| double | slope, | ||
| double | delta ) |
Construct a TfIdfWeight.
| wdf_norm_ | The normalization for the wdf. |
| idf_norm_ | The normalization for the idf. |
| wt_norm_ | The normalization for the document weight. |
| slope | Extra parameter for "Pivoted" tf normalization. (default: 0.2) |
| delta | Extra parameter for "Pivoted" tf normalization. (default: 1.0) |
Implementing support for more normalizations of each type would require extending the backend to track more statistics.
|
virtual |
Create from a human-readable parameter string.
| params | string containing weighting scheme parameter values. |
Reimplemented from Xapian::Weight.
References TfIdfWeight().
|
virtual |
Return an upper bound on what get_sumpart() can return for any document.
This information is used by the matcher to perform various optimisations, so strive to make the bound as tight as possible.
Implements Xapian::Weight.
|
virtual |
Calculate the weight contribution for this object's term to a document.
The parameters give information about the document which may be used in the calculations:
| wdf | The within document frequency of the term in the document. You need to call need_stat(WDF) if you use this value. |
| doclen | The document's length (unnormalised). You need to call need_stat(DOC_LENGTH) if you use this value. |
| uniqterms | Number of unique terms in the document. You need to call need_stat(UNIQUE_TERMS) if you use this value. |
| wdfdocmax | Maximum wdf value in the document. You need to call need_stat(WDF_DOC_MAX) if you use this value. |
You can rely of wdf <= doclen if you call both need_stat(WDF) and need_stat(DOC_LENGTH) - this is trivially true for terms, but Xapian also ensure it's true for OP_SYNONYM, where the wdf is approximated.
Implements Xapian::Weight.
|
virtual |
Return the name of this weighting scheme, e.g.
"bm25+".
This is the name that the weighting scheme gets registered under when passed to Xapian:Registry::register_weighting_scheme().
As a result:
For 1.4.x and earlier we recommended returning the full namespace-qualified name of your class here, but now we recommend returning a just the name in lower case, e.g. "foo" instead of "FooWeight", "bm25+" instead of "Xapian::BM25PlusWeight".
If you don't want to support creation via Weight::create() or the remote backend, you can use the default implementation which simply returns an empty string.
Reimplemented from Xapian::Weight.
|
virtual |
Return this object's parameters serialised as a single string.
If you don't want to support the remote backend, you can use the default implementation which simply throws Xapian::UnimplementedError.
Reimplemented from Xapian::Weight.
|
virtual |
Unserialise parameters.
This method unserialises parameters serialised by the serialise() method and allocates and returns a new object initialised with them.
If you don't want to support the remote backend, you can use the default implementation which simply throws Xapian::UnimplementedError.
Note that the returned object will be deallocated by Xapian after use with "delete". If you want to handle the deletion in a special way (for example when wrapping the Xapian API for use from another language) then you can define a static operator delete method in your subclass as shown here: https://trac.xapian.org/ticket/554#comment:1
| serialised | A string containing the serialised parameters. |
Reimplemented from Xapian::Weight.
References TfIdfWeight().