maxSize+1
, in {@link #initialize}.
+ - // extends getSentinelObject() to return a non-null value. - PriorityQueue pq = new MyQueue(numHits); - // save the 'top' element, which is guaranteed to not be null. - MyObject pqTop = (MyObject) pq.top(); - <...> - // now in order to add a new element, which is 'better' than top (after - // you've verified it is better), it is as simple as: - pqTop.change(). - pqTop = pq.updateTop(); -- - NOTE: if this method returns a non-null value, it will be called by - {@link #Initialize(int)} {@link #Size()} times, relying on a new object to - be returned and will not check if it's null again. Therefore you should - ensure any call to this method creates a new instance and behaves - consistently, e.g., it cannot return null if it previously returned - non-null. +
- pq.top().change(); - pq.adjustTop(); -- - instead of - -
- o = pq.pop(); - o.change(); - pq.push(o); -+
- pq.top().change(); - pq.updateTop(); -- - instead of - -
- o = pq.pop(); - o.change(); - pq.push(o); -- -
n
in the
+ array used to construct this searcher/reader.
Weight
is used in the following way:
+ Weight
is constructed by a top-level query, given a
+ Searcher
({@link Query#CreateWeight(Searcher)}).Weight
to compute the query normalization factor
+ {@link Similarity#QueryNorm(float)} of the query clauses contained in the
+ query.Scorer
is constructed by {@link #Scorer(IndexReader,boolean,boolean)}.scoreDocsInOrder
.
+
+ NOTE: even if scoreDocsInOrder
is false, it is
+ recommended to check whether the returned Scorer
indeed scores
+ documents out of order (i.e., call {@link #ScoresDocsOutOfOrder()}), as
+ some Scorer
implementations will always return documents
+ in-order.false
, i.e.
+ the Scorer
scores documents in-order.
+ app*
.
+
+ This query uses the {@link
+ MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}
+ rewrite method.
+ b
. Documents
+ matching this clause will (in addition to the normal weightings) have
+ their score multiplied by b
.
+ b
. The boost is 1.0 by default.
+ field
assumed to be the
+ default field and omitted.
+ The representation used is one that is supposed to be readable
+ by {@link Lucene.Net.QueryParsers.QueryParser QueryParser}. However,
+ there are the following limitations:
+ term
.prefix
. MultiTermQueryWrapperFilter
is not designed to
+ be used by itself. Normally you subclass it to provide a Filter
+ counterpart for a {@link MultiTermQuery} subclass.
+
+ For example, {@link TermRangeFilter} and {@link PrefixFilter} extend
+ MultiTermQueryWrapperFilter
.
+ This class also provides the functionality behind
+ {@link MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE};
+ this is why it is not abstract.
+ getDirectory
methods default to use
- {@link SimpleFSLockFactory} for backwards compatibility.
- The system properties
- org.apache.lucene.store.FSDirectoryLockFactoryClass
- and org.apache.lucene.FSDirectory.class
- are deprecated and only used by the deprecated
- getDirectory
methods. The system property
- org.apache.lucene.lockDir
is ignored completely,
- If you really want to store locks
- elsewhere, you can create your own {@link
- SimpleFSLockFactory} (or {@link NativeFSLockFactory},
- etc.) passing in your preferred lock directory.
+ NOTE: The doc that is passed to the collect
+ method is relative to the current reader. If your
+ collector needs to resolve this to the docID space of the
+ Multi*Reader, you must re-base it by recording the
+ docBase from the most recent setNextReader call. Here's
+ a simple example showing how to collect docIDs into a
+ BitSet:
- In 3.0 this class will become abstract.
+ + Searcher searcher = new IndexSearcher(indexReader); + final BitSet bits = new BitSet(indexReader.maxDoc()); + searcher.search(query, new Collector() { + private int docBase; -
true
, call {@link #Close()} method on source directory
-
- getDirectory
- respect this setting.
- Integer.MAX_VALUE
.
- - boolean skipTo(int target) { - do { - if (!next()) - return false; - } while (target > doc()); + // accept docs out of order (for a BitSet it doesn't matter) + public boolean acceptsDocsOutOfOrder() { return true; } + + public void collect(int doc) { + bits.set(doc + docBase); + } + + public void setNextReader(IndexReader reader, int docBase) { + this.docBase = docBase; + } + });- Most implementations are considerably more efficient than that. -
true
if this collector does not
+ * require the matching docIDs to be delivered in int sort
+ * order (smallest to largest) to {@link #collect}.
+ *
+ * Most Lucene Query implementations will visit
+ * matching docIDs in order. However, some queries
+ * (currently limited to certain cases of {@link
+ * BooleanQuery}) can achieve faster searching if the
+ * Collector
allows them to deliver the
+ * docIDs out of order.
+ *
+ * Many collectors don't mind getting docIDs out of
+ * order, so it's important to return true
+ * here.
+ *
+ slop
total unmatched positions between
+ them. * When inOrder
is true, the spans from each clause
+ must be * ordered as in clauses
.
+ o
is equal to this. + Query q = NumericRangeQuery.newFloatRange("weight", + new Float(0.3f), new Float(0.10f), + true, true); ++ + matches all documents whose float valued "weight" field + ranges from 0.3 to 0.10, inclusive. + + The performance of NumericRangeQuery is much better + than the corresponding {@link TermRangeQuery} because the + number of terms that must be searched is usually far + fewer, thanks to trie indexing, described below. + + You can optionally specify a
precisionStep
+ when creating this query. This is necessary if you've
+ changed this configuration from its default (4) during
+ indexing. Lower values consume more disk space but speed
+ up searching. Suitable values are between 1 and
+ 8. A good starting point to test is 4,
+ which is the default value for all Numeric*
+ classes. See below for
+ details.
+
+ This query defaults to {@linkplain
+ MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for
+ 32 bit (int/float) ranges with precisionStep <8 and 64
+ bit (long/double) ranges with precisionStep <6.
+ Otherwise it uses {@linkplain
+ MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the
+ number of terms is likely to be high. With precision
+ steps of <4, this query can be run with one of the
+ BooleanQuery rewrite methods without changing
+ BooleanQuery's default max clause count.
+
+ NOTE: This API is experimental and
+ might change in incompatible ways in the next release.
+
+ TrieRangeQuery
):
+
+ Schindler, U, Diepenbroek, M, 2008. + Generic XML-based Framework for Metadata Portals. + Computers & Geosciences 34 (12), 1947-1955. + doi:10.1016/j.cageo.2008.02.023+ + A quote from this paper: Because Apache Lucene is a full-text + search engine and not a conventional database, it cannot handle numerical ranges + (e.g., field value is inside user defined bounds, even dates are numerical values). + We have developed an extension to Apache Lucene that stores + the numerical values in a special string-encoded format with variable precision + (all numerical values like doubles, longs, floats, and ints are converted to + lexicographic sortable string representations and stored with different precisions + (for a more detailed description of how the values are stored, + see {@link NumericUtils}). A range is then divided recursively into multiple intervals for searching: + The center of the range is searched only with the lowest possible precision in the trie, + while the boundaries are matched more exactly. This reduces the number of terms dramatically. + + For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that + uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the + lowest precision. Overall, a range could consist of a theoretical maximum of +
7*255*2 + 255 = 3825
distinct terms (when there is a term for every distinct value of an
+ 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used
+ because it would always be possible to reduce the full 256 values to one term with degraded precision).
+ In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records
+ and a uniform value distribution).
+
+ precisionStep
when encoding values.
+ Lower step values mean more precisions and so more terms in index (and index gets larger).
+ On the other hand, the maximum number of terms to match reduces, which optimized query speed.
+ The formula to calculate the maximum term count is:
+ + n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 ) ++ (this formula is only correct, when
bitsPerValue/precisionStep
is an integer;
+ in other cases, the value must be rounded up and the last summand must contain the modulo of the division as
+ precision step).
+ For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465
, and for a precision
+ step of 2, n = 31*3*2 + 3 = 189
. But the faster search speed is reduced by more seeking
+ in the term enum of the index. Because of this, the ideal precisionStep
value can only
+ be found out by testing. Important: You can index with a lower precision step value and test search speed
+ using a multiple of the original step value.
+
+ Good values for precisionStep
are depending on usage and data type:
+ precisionStep
is given.precisionStep
). Using {@link NumericField NumericFields} for sorting
+ is ideal, because building the field cache is much faster than with text-only numbers.
+ Sorting is also possible with range query optimized fields using one of the above precisionSteps
.NumericRangeQuery
, that queries a long
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
NumericRangeQuery
, that queries a long
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
NumericRangeQuery
, that queries a int
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeQuery
, that queries a int
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeQuery
, that queries a double
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeQuery
, that queries a double
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeQuery
, that queries a float
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeQuery
, that queries a float
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ true
if the lower endpoint is inclusive true
if the upper endpoint is inclusive query
.
- + public boolean skipTo(Term target) { + do { + if (!next()) + return false; + } while (target > term()); + return true; + } ++ Some implementations *could* be considerably more efficient than a linear scan. + Check the implementation to be sure. +
false
ends iterating the current enum
+ and forwards to the next sub-range.
+ shift
value (number of bits removed) used
+ during encoding.
+
+ To also index floating point numbers, this class supplies two methods to convert them
+ to integer values by changing their bit layout: {@link #doubleToSortableLong},
+ {@link #floatToSortableInt}. You will have no precision loss by
+ converting floating point numbers to integers and back (only that the integer form
+ is not usable). Other data types like dates can easily converted to longs or ints (e.g.
+ date to long: {@link java.util.Date#getTime}).
+
+ For easy usage, the trie algorithm is implemented for indexing inside
+ {@link NumericTokenStream} that can index int
, long
,
+ float
, and double
. For querying,
+ {@link NumericRangeQuery} and {@link NumericRangeFilter} implement the query part
+ for the same data types.
+
+ This class can also be used, to generate lexicographically sortable (according
+ {@link String#compareTo(String)}) representations of numeric data types for other
+ usages (e.g. sorting).
+
+ NOTE: This API is experimental and
+ might change in incompatible ways in the next release.
+
+ char[]
buffer size)
+ for encoding long
values.
+ char[]
buffer size)
+ for encoding int
values.
+ SHIFT_START_LONG+shift
in the first character
+ SHIFT_START_INT+shift
in the first character
+ shift
bits.
+ This is method is used by {@link NumericTokenStream}.
+ shift
bits.
+ This is method is used by {@link LongRangeBuilder}.
+ shift
bits.
+ This is method is used by {@link NumericTokenStream}.
+ shift
bits.
+ This is method is used by {@link IntRangeBuilder}.
+ double
value to a sortable signed long
.
+ The value is converted by getting their IEEE 754 floating-point "double format"
+ bit layout and then some bits are swapped, to be able to compare the result as long.
+ By this the precision is not reduced, but the value can easily used as a long.
+ long
back to a double
.float
value to a sortable signed int
.
+ The value is converted by getting their IEEE 754 floating-point "float format"
+ bit layout and then some bits are swapped, to be able to compare the result as int.
+ By this the precision is not reduced, but the value can easily used as an int.
+ int
back to a float
.Float.NaN
if this
+ DocValues instance does not contain any value.
+
+ This operation is optional
+
+
+ Float.NaN
if this
+ DocValues instance does not contain any value.
+ Float.NaN
if this
+ DocValues instance does not contain any value.
+
+ This operation is optional
+
+
+ Float.NaN
if this
+ DocValues instance does not contain any value.
+ Float.NaN
if this
+ DocValues instance does not contain any value. *
+
+ This operation is optional
+
+
+ Float.NaN
if this
+ DocValues instance does not contain any value
+ field
as a single byte and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
i
should come before j
i
should come after j
0
if they are equal
+ field
as bytes and returns an array of
+ size reader.maxDoc()
of the value each document has in the
+ given field.
+ field
as shorts and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+ field
as shorts and returns an array of
+ size reader.maxDoc()
of the value each document has in the
+ given field.
+ field
as integers and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+ field
as integers and returns an array of
+ size reader.maxDoc()
of the value each document has in the
+ given field.
+ field
as floats and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+ field
as floats and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+ field
as longs and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+
+ field
as longs and returns an array of
+ size reader.maxDoc()
of the value each document has in the
+ given field.
+
+ field
as integers and returns an array
+ of size reader.maxDoc()
of the value each document
+ has in the given field.
+
+ field
as doubles and returns an array of
+ size reader.maxDoc()
of the value each document has in the
+ given field.
+
+ field
and returns an array
+ of size reader.maxDoc()
containing the value each document
+ has in the given field.
+ field
and returns
+ an array of them in natural order, along with an array telling
+ which element in the term array each document uses.
+ field
to see if it contains integers, longs, floats
+ or strings, and then calls one of the other methods in this class to get the
+ values. For string values, a StringIndex is returned. After
+ calling this method, there is an entry in the cache for both
+ type AUTO
and the actual found type.
+ field
and calls the given SortComparator
+ to get the sort values. A hit in the cache will happen if reader
,
+ field
, and comparator
are the same (using equals()
)
+ as a previous call to this method.
+ lowerTerm
+ but less/equal than upperTerm
.
+
+
+ If an endpoint is null, it is said
+ to be "open". Either or both endpoints may be open. Open endpoints may not
+ be exclusive (you can't select all but the first or last term without
+ explicitly specifying the term to exclude.)
+
+ lowerTerm
is
+ included in the range.
+
+ If true, the upperTerm
is
+ included in the range.
+
+ lowerTerm
but less/equal than upperTerm
.
+
+ If an endpoint is null, it is said
+ to be "open". Either or both endpoints may be open. Open endpoints may not
+ be exclusive (you can't select all but the first or last term without
+ explicitly specifying the term to exclude.)
+
+ If collator
is not null, it will be used to decide whether
+ index terms are within the given range, rather than using the Unicode code
+ point order in which index terms are stored.
+
+ WARNING: Using this constructor and supplying a non-null
+ value in the collator
parameter will cause every single
+ index Term in the Field referenced by lowerTerm and/or upperTerm to be
+ examined. Depending on the number of index Terms in this Field, the
+ operation could be very slow.
+
+ lowerTerm
is
+ included in the range.
+
+ If true, the upperTerm
is
+ included in the range.
+
+ The collator to use to collate index Terms, to determine
+ their membership in the range bounded by lowerTerm
and
+ upperTerm
.
+
+ true
if the lower endpoint is inclusive true
if the upper endpoint is inclusive + boolean skipTo(int target) { + do { + if (!next()) + return false; + } while (target > doc()); + return true; + } ++ Some implementations are considerably more efficient than that. +
SortField.SCORE
,
- SortField.DOC
, SortField.STRING
,
- SortField.INTEGER
, SortField.FLOAT
or
- SortField.CUSTOM
. It is not valid to return
- SortField.AUTO
.
- This is used by multisearchers to determine how to collate results
- from their searchers.
+ SegmentInfos
) to do its job;
+ if you implement your own MergePolicy, you'll need to put
+ it in package Lucene.Net.Index in order to use
+ these APIs.
+ stored
attribute instead.
+
+ stored
attribute instead.
+
+ TokenStream
enumerates the sequence of tokens, either from
+ {@link Field}s of a {@link Document} or from query text.
+
+ This is an abstract class. Concrete subclasses are:
+ TokenStream
whose input is a Reader; andTokenStream
whose input is another
+ TokenStream
.TokenStream
API has been introduced with Lucene 2.9. This API
+ has moved from being {@link Token} based to {@link Attribute} based. While
+ {@link Token} still exists in 2.9 as a convenience class, the preferred way
+ to store the information of a {@link Token} is to use {@link AttributeImpl}s.
+
+ TokenStream
now extends {@link AttributeSource}, which provides
+ access to all of the token {@link Attribute}s for the TokenStream
.
+ Note that only one instance per {@link AttributeImpl} is created and reused
+ for every token. This approach reduces object creation and allows local
+ caching of references to the {@link AttributeImpl}s. See
+ {@link #IncrementToken()} for further details.
+
+ The workflow of the new TokenStream
API is as follows:
+ TokenStream
/{@link TokenFilter}s which add/get
+ attributes to/from the {@link AttributeSource}.TokenStream
TokenStream
+ , e. g. for buffering purposes (see {@link CachingTokenFilter},
+ {@link TeeSinkTokenFilter}). For this usecase
+ {@link AttributeSource#CaptureState} and {@link AttributeSource#RestoreState}
+ can be used.
+ public Iterator<Class<? extends Attribute>> getAttributeClassesIterator()
+
+ Note that this return value is different from Java in that it enumerates over the values
+ and not the keys
+ public Iterator<AttributeImpl> getAttributeImplsIterator()
+ public <T extends Attribute> T addAttribute(Class<T>)
+ public boolean hasAttribute(Class<? extends Attribute>)
+ public <T extends Attribute> T getAttribute(Class<T>)
+
+ public AttributeImpl createAttributeInstance(Class%lt;? extends Attribute> attClass)
+ Impl
to it.
+ TokenStream
s use the new API and
+ implement {@link #IncrementToken}. This setting can only be enabled
+ globally.
+
+ This setting only affects TokenStream
s instantiated after this
+ call. All TokenStream
s already created use the other setting.
+
+ All core {@link Analyzer}s are compatible with this setting, if you have
+ your own TokenStream
s that are also compatible, you should enable
+ this.
+
+ When enabled, tokenization may throw {@link UnsupportedOperationException}
+ s, if the whole tokenizer chain is not compatible eg one of the
+ TokenStream
s does not implement the new TokenStream
API.
+
+ The default is false
, so there is the fallback to the old API
+ available.
+
+ false
+ (using the new TokenStream
API). Streams implementing the old API
+ should upgrade to use this feature.
+
+ This method can be used to perform any end-of-stream operations, such as
+ setting the final offset of a stream. The final offset of a stream might
+ differ from the offset of the last token eg in case one or more whitespaces
+ followed after the last token, but a {@link WhitespaceTokenizer} was used.
+
+ TokenStream
are intended to be consumed more than once, it is
+ necessary to implement {@link #Reset()}. Note that if your TokenStream
+ caches tokens and feeds them back again after a reset, it is imperative
+ that you clone the tokens when you store them away (on the first pass) as
+ well as when you return them (on future passes after {@link #Reset()}).
+ end()
on the
+ input TokenStream.
+ NOTE: Be sure to call super.end()
first when overriding this method.
+ stopWords
is an instance of {@link CharArraySet} (true if
+ makeStopSet()
was used to construct the set) it will be directly used
+ and ignoreCase
will be ignored since CharArraySet
+ directly controls case sensitivity.
+
+ If stopWords
is not an instance of {@link CharArraySet},
+ a new CharArraySet will be constructed and ignoreCase
will be
+ used to specify the case sensitivity of that set.
+
+ stopWords
is an instance of {@link CharArraySet} (true if
+ makeStopSet()
was used to construct the set) it will be directly used
+ and ignoreCase
will be ignored since CharArraySet
+ directly controls case sensitivity.
+
+ If stopWords
is not an instance of {@link CharArraySet},
+ a new CharArraySet will be constructed and ignoreCase
will be
+ used to specify the case sensitivity of that set.
+
+ true
, this StopFilter will preserve
+ positions of the incoming tokens (ie, accumulate and
+ set position increments of the removed stop tokens).
+ Generally, true
is best as it does not
+ lose information (positions of the original tokens)
+ during indexing.
+
+ When set, when a token is stopped
+ (omitted), the position increment of the following
+ token is incremented.
+
+ NOTE: be sure to also
+ set {@link QueryParser#setEnablePositionIncrements} if
+ you use QueryParser to create queries.
+ null
or
+ {@linkplain #EMPTY_DOCIDSET}.iterator()
if there
+ are no docs that match.
+ DocIdSet
+ should be cached without copying it into a BitSet. The default is to return
+ false
. If you have an own DocIdSet
implementation
+ that does its iteration very effective and fast without doing disk I/O,
+ override this method and return true.
+ + { pq.top().change(); pq.adjustTop(); } +instead of
+ { o = pq.pop(); o.change(); pq.push(o); } ++
OpenBitSet
is faster than java.util.BitSet
in most operations
+ and *much* faster at calculating cardinality of sets and results of set operations.
+ It can also handle sets of larger cardinality (up to 64 * 2**32-1)
+
+ The goals of OpenBitSet
are the fastest implementation possible, and
+ maximum code reuse. Extra safety and encapsulation
+ may always be built on top, but if that's built in, the cost can never be removed (and
+ hence people re-implement their own version in order to get better performance).
+ If you want a "safe", totally encapsulated (and slower and limited) BitSet
+ class, use java.util.BitSet
.
+
+ cardinality | intersect_count | union | nextSetBit | get | iterator | +|
---|---|---|---|---|---|---|
50% full | 3.36 | 3.96 | 1.44 | 1.46 | 1.99 | 1.58 | +
1% full | 3.31 | 3.90 | 1.04 | 0.99 | +
cardinality | intersect_count | union | nextSetBit | get | iterator | +|
---|---|---|---|---|---|---|
50% full | 2.50 | 3.50 | 1.00 | 1.03 | 1.12 | 1.25 | +
1% full | 2.51 | 3.49 | 1.00 | 1.02 | +
+ new Lock.With(directory.makeLock("my.lock")) { + public Object doBody() { + ... code to execute while locked ... + } + }.run(); ++ + +
super()
in order to filter which
+ documents are collected.
+
+ + Searcher searcher = new IndexSearcher(indexReader); + final BitSet bits = new BitSet(indexReader.maxDoc()); + searcher.search(query, new HitCollector() { + public void collect(int doc, float score) { + bits.set(doc); + } + }); ++ + Note: This is called in an inner search loop. For good search + performance, implementations of this method should not call + {@link Searcher#Doc(int)} or + {@link Lucene.Net.Index.IndexReader#Document(int)} on every + document number encountered. Doing so can slow searches by an order + of magnitude or more. + Note: The
score
passed to this method is a raw score.
+ In other words, the score will not necessarily be a float whose value is
+ between 0 and 1.
+ t1 t2 t3+ with slop at least 1, the fragment: +
t1 t2 t1 t3 t2 t3+ matches twice: +
t1 t2 .. t3+
t1 .. t2 t3+ + + Expert: + Only public for subclassing. Most implementations should not need this class +
+ boolean skipTo(int target) { + do { + if (!next()) + return false; + } while (target > doc()); + return true; + } ++ Most implementations are considerably more efficient than that. +
+ teacherid: 1 + studentfirstname: james + studentsurname: jones + + teacherid: 2 + studenfirstname: james + studentsurname: smith + studentfirstname: sally + studentsurname: jones ++ + a SpanNearQuery with a slop of 0 can be applied across two + {@link SpanTermQuery} objects as follows: +
+ SpanQuery q1 = new SpanTermQuery(new Term("studentfirstname", "james")); + SpanQuery q2 = new SpanTermQuery(new Term("studentsurname", "jones")); + SpanQuery q2m new FieldMaskingSpanQuery(q2, "studentfirstname"); + Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); ++ to search for 'studentfirstname:james studentsurname:jones' and find + teacherid 1 without matching teacherid 2 (which has a 'james' in position 0 + and 'jones' in position 1). + + Note: as {@link #GetField()} returns the masked field, scoring will be + done using the norms of the field name supplied. This may lead to unexpected + scoring behaviour. +
query
.
+ + Filter f = NumericRangeFilter.newFloatRange("weight", + new Float(0.3f), new Float(0.10f), + true, true); ++ + accepts all documents whose float valued "weight" field + ranges from 0.3 to 0.10, inclusive. + See {@link NumericRangeQuery} for details on how Lucene + indexes and searches numeric valued fields. + + NOTE: This API is experimental and + might change in incompatible ways in the next + release. + +
NumericRangeFilter
, that filters a long
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that queries a long
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that filters a int
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that queries a int
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that filters a double
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that queries a double
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that filters a float
+ range using the given precisionStep
.
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ NumericRangeFilter
, that queries a float
+ range using the default precisionStep
{@link NumericUtils#PRECISION_STEP_DEFAULT} (4).
+ You can have half-open ranges (which are in fact </≤ or >/≥ queries)
+ by setting the min or max value to null
. By setting inclusive to false, it will
+ match all documents excluding the bounds, with inclusive on, the boundaries are hits, too.
+ true
if the lower endpoint is inclusive true
if the upper endpoint is inclusive getShorts()
and makes those values
- available as other numeric types, casting as needed.
-
-
- WARNING: The status of the Search.Function package is experimental.
- The APIs introduced here might change in the future and will not be
- supported anymore in such a case.
+ Zero
value.
-
-
- WARNING: The status of the Search.Function package is experimental.
- The APIs introduced here might change in the future and will not be
- supported anymore in such a case.
-
- NOTE: with the switch in 2.9 to segment-based
- searching, if {@link #getValues} is invoked with a
- composite (multi-segment) reader, this can easily cause
- double RAM usage for the values in the FieldCache. It's
- best to switch your application to pass only atomic
- (single segment) readers to this API. Alternatively, for
- a short-term fix, you could wrap your ValueSource using
- {@link MultiValueSource}, which costs more CPU per lookup
- but will not consume double the FieldCache RAM.
- Float.NaN
if this
- DocValues instance does not contain any value.
-
- This operation is optional
-
-
- Float.NaN
if this
- DocValues instance does not contain any value.
- Float.NaN
if this
- DocValues instance does not contain any value.
-
- This operation is optional
-
-
- Float.NaN
if this
- DocValues instance does not contain any value.
- Float.NaN
if this
- DocValues instance does not contain any value. *
-
- This operation is optional
-
-
- Float.NaN
if this
- DocValues instance does not contain any value
- FieldCache.DEFAULT
for maintaining
- internal term lookup tables.
+ Uses FieldCache.DEFAULT
for maintaining internal term lookup tables.
- This class will not resolve SortField.AUTO types, and expects the type
- of all SortFields used for construction to already have been resolved.
- {@link SortField#DetectFieldType(IndexReader, String)} is a utility method which
- may be used for field type detection.
-
- NOTE: This API is experimental and might change in
- incompatible ways in the next release.
+ Created: Dec 8, 2003 12:56:03 PM
numHits
.
+ NOTE: This class pre-allocates a full array of
+ length maxSize+1
, in {@link #initialize}.
null
or empty
+ + // extends getSentinelObject() to return a non-null value. + PriorityQueue pq = new MyQueue(numHits); + // save the 'top' element, which is guaranteed to not be null. + MyObject pqTop = (MyObject) pq.top(); + <...> + // now in order to add a new element, which is 'better' than top (after + // you've verified it is better), it is as simple as: + pqTop.change(). + pqTop = pq.updateTop(); ++ + NOTE: if this method returns a non-null value, it will be called by + {@link #Initialize(int)} {@link #Size()} times, relying on a new object to + be returned and will not check if it's null again. Therefore you should + ensure any call to this method creates a new instance and behaves + consistently, e.g., it cannot return null if it previously returned + non-null. + +
+ pq.top().change(); + pq.adjustTop(); ++ + instead of + +
+ o = pq.pop(); + o.change(); + pq.push(o); ++ +
+ pq.top().change(); + pq.updateTop(); ++ + instead of + +
+ o = pq.pop(); + o.change(); + pq.push(o); ++ +
null
or empty.
+
+ The number of hits to retain. Must be greater than zero.
a
is less relevant than b
.true
if document a
should be sorted after document b
.
NOT_ANALYZED
field to ensure that
- there is only a single term.
-
- This class does not have an constructor, use one of the static factory methods available,
- that create a correct instance for different data types supported by {@link FieldCache}.
null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
.
- null
or
- {@linkplain #EMPTY_DOCIDSET}.iterator()
if there
- are no docs that match.
- DocIdSet
- should be cached without copying it into a BitSet. The default is to return
- false
. If you have an own DocIdSet
implementation
- that does its iteration very effective and fast without doing disk I/O,
- override this method and return true.
- i
should come before j
i
should come after j
0
if they are equal
+ SortField.SCORE
,
+ SortField.DOC
, SortField.STRING
,
+ SortField.INTEGER
, SortField.FLOAT
or
+ SortField.CUSTOM
. It is not valid to return
+ SortField.AUTO
.
+ This is used by multisearchers to determine how to collate results
+ from their searchers.
+ MUST_NOT
clause.
TermFreqVector
to provide additional information about
positions in which each of the terms is found. A TermPositionVector not necessarily
@@ -2891,1948 +6629,17 @@
segments_N
). This point in time, when the
- action of writing of a new segments file to the directory
- is completed, is an index commit.
-
- Each index commit point has a unique segments file
- associated with it. The segments file associated with a
- later index commit point would have a larger N.
-
- WARNING: This API is a new and experimental and
- may suddenly change.
- segments_N
) associated
- with this commit point.
- segments_N
) associated
- with this commit point.
- TokenStream
enumerates the sequence of tokens, either from
- {@link Field}s of a {@link Document} or from query text.
-
- This is an abstract class. Concrete subclasses are:
- TokenStream
whose input is a Reader; andTokenStream
whose input is another
- TokenStream
.TokenStream
API has been introduced with Lucene 2.9. This API
- has moved from being {@link Token} based to {@link Attribute} based. While
- {@link Token} still exists in 2.9 as a convenience class, the preferred way
- to store the information of a {@link Token} is to use {@link AttributeImpl}s.
-
- TokenStream
now extends {@link AttributeSource}, which provides
- access to all of the token {@link Attribute}s for the TokenStream
.
- Note that only one instance per {@link AttributeImpl} is created and reused
- for every token. This approach reduces object creation and allows local
- caching of references to the {@link AttributeImpl}s. See
- {@link #IncrementToken()} for further details.
-
- The workflow of the new TokenStream
API is as follows:
- TokenStream
/{@link TokenFilter}s which add/get
- attributes to/from the {@link AttributeSource}.TokenStream
TokenStream
- , e. g. for buffering purposes (see {@link CachingTokenFilter},
- {@link TeeSinkTokenFilter}). For this usecase
- {@link AttributeSource#CaptureState} and {@link AttributeSource#RestoreState}
- can be used.
- public Iterator<Class<? extends Attribute>> getAttributeClassesIterator()
-
- Note that this return value is different from Java in that it enumerates over the values
- and not the keys
- public Iterator<AttributeImpl> getAttributeImplsIterator()
- public <T extends Attribute> T addAttribute(Class<T>)
- public boolean hasAttribute(Class<? extends Attribute>)
- public <T extends Attribute> T getAttribute(Class<T>)
-
- public AttributeImpl createAttributeInstance(Class%lt;? extends Attribute> attClass)
- Impl
to it.
- TokenStream
s use the new API and
- implement {@link #IncrementToken}. This setting can only be enabled
- globally.
-
- This setting only affects TokenStream
s instantiated after this
- call. All TokenStream
s already created use the other setting.
-
- All core {@link Analyzer}s are compatible with this setting, if you have
- your own TokenStream
s that are also compatible, you should enable
- this.
-
- When enabled, tokenization may throw {@link UnsupportedOperationException}
- s, if the whole tokenizer chain is not compatible eg one of the
- TokenStream
s does not implement the new TokenStream
API.
-
- The default is false
, so there is the fallback to the old API
- available.
-
- false
- (using the new TokenStream
API). Streams implementing the old API
- should upgrade to use this feature.
-
- This method can be used to perform any end-of-stream operations, such as
- setting the final offset of a stream. The final offset of a stream might
- differ from the offset of the last token eg in case one or more whitespaces
- followed after the last token, but a {@link WhitespaceTokenizer} was used.
-
- TokenStream
are intended to be consumed more than once, it is
- necessary to implement {@link #Reset()}. Note that if your TokenStream
- caches tokens and feeds them back again after a reset, it is imperative
- that you clone the tokens when you store them away (on the first pass) as
- well as when you return them (on future passes after {@link #Reset()}).
- currentOff
.
- - public String toString() { - return "start=" + startOffset + ",end=" + endOffset; - } -- - This method may be overridden by subclasses. -
- public int hashCode() { - int code = startOffset; - code = code * 31 + endOffset; - return code; - } -- - see also {@link #equals(Object)} -
true
, if the workaround
- can be enabled (with no guarantees).
- true
, if this platform supports unmapping mmaped files.IndexInput
- is closed while another thread is still accessing it (SIGSEGV).
- false
and the workaround cannot be enabled.
- true
, if the unmap workaround is enabled.write.lock
- could not be acquired. This
- happens when a writer tries to open an index
- that another writer already has open.
- "new york"
.
-
- This query may be combined with other terms or queries with a {@link BooleanQuery}.
- b
. Documents
- matching this clause will (in addition to the normal weightings) have
- their score multiplied by b
.
- b
. The boost is 1.0 by default.
- field
assumed to be the
- default field and omitted.
- The representation used is one that is supposed to be readable
- by {@link Lucene.Net.QueryParsers.QueryParser QueryParser}. However,
- there are the following limitations:
- WITHIN
or NEAR
operator.
- The slop is in fact an edit-distance, where the units correspond to
- moves of terms in the query phrase out of position. For example, to switch
- the order of two words requires two moves (the first move places the words
- atop one another), so to permit re-orderings of phrases, the slop must be
- at least two.
- More exact matches are scored higher than sloppier matches, thus search
- results are sorted by exactness.
- The slop is zero by default, requiring exact matches.
- o
is equal to this. Weight
is used in the following way:
- Weight
is constructed by a top-level query, given a
- Searcher
({@link Query#CreateWeight(Searcher)}).Weight
to compute the query normalization factor
- {@link Similarity#QueryNorm(float)} of the query clauses contained in the
- query.Scorer
is constructed by {@link #Scorer(IndexReader,boolean,boolean)}.scoreDocsInOrder
.
-
- NOTE: even if scoreDocsInOrder
is false, it is
- recommended to check whether the returned Scorer
indeed scores
- documents out of order (i.e., call {@link #ScoresDocsOutOfOrder()}), as
- some Scorer
implementations will always return documents
- in-order.false
, i.e.
- the Scorer
scores documents in-order.
- slop
total unmatched positions between
- them. * When inOrder
is true, the spans from each clause
- must be * ordered as in clauses
.
- o
is equal to this. - public boolean skipTo(Term target) { - do { - if (!next()) - return false; - } while (target > term()); - return true; - } -- Some implementations *could* be considerably more efficient than a linear scan. - Check the implementation to be sure. -
field
as a single byte and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
- field
as bytes and returns an array of
- size reader.maxDoc()
of the value each document has in the
- given field.
- field
as shorts and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
- field
as shorts and returns an array of
- size reader.maxDoc()
of the value each document has in the
- given field.
- field
as integers and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
- field
as integers and returns an array of
- size reader.maxDoc()
of the value each document has in the
- given field.
- field
as floats and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
- field
as floats and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
- field
as longs and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
-
- field
as longs and returns an array of
- size reader.maxDoc()
of the value each document has in the
- given field.
-
- field
as integers and returns an array
- of size reader.maxDoc()
of the value each document
- has in the given field.
-
- field
as doubles and returns an array of
- size reader.maxDoc()
of the value each document has in the
- given field.
-
- field
and returns an array
- of size reader.maxDoc()
containing the value each document
- has in the given field.
- field
and returns
- an array of them in natural order, along with an array telling
- which element in the term array each document uses.
- field
to see if it contains integers, longs, floats
- or strings, and then calls one of the other methods in this class to get the
- values. For string values, a StringIndex is returned. After
- calling this method, there is an entry in the cache for both
- type AUTO
and the actual found type.
- field
and calls the given SortComparator
- to get the sort values. A hit in the cache will happen if reader
,
- field
, and comparator
are the same (using equals()
)
- as a previous call to this method.
- o
is equal to this. FilterIndexReader
contains another IndexReader, which it
- uses as its basic source of data, possibly transforming the data along the
- way or providing additional functionality. The class
- FilterIndexReader
itself simply implements all abstract methods
- of IndexReader
with versions that pass all requests to the
- contained index reader. Subclasses of FilterIndexReader
may
- further override some of these methods and may also provide additional
- methods and fields.
- open()
methods, e.g. {@link
- #Open(String, boolean)}.
- For efficiency, in this API documents are often referred to via
- document numbers, non-negative integers which each name a unique
- document in the index. These document numbers are ephemeral--they may change
- as documents are added to and deleted from an index. Clients should thus not
- rely on a given document having the same number between sessions.
- An IndexReader can be opened on a directory for which an IndexWriter is
- opened already, but it cannot be used to delete documents from the index then.
-
- NOTE: for backwards API compatibility, several methods are not listed
- as abstract, but have no useful implementations in this base class and
- instead always throw UnsupportedOperationException. Subclasses are
- strongly encouraged to override these methods, but in many cases may not
- need to.
-
-
- NOTE: as of 2.4, it's possible to open a read-only
- IndexReader using one of the static open methods that
- accepts the boolean readOnly parameter. Such a reader has
- better concurrency as it's not necessary to synchronize on
- the isDeleted method. Currently the default for readOnly
- is false, meaning if not specified you will get a
- read/write IndexReader. But in 3.0 this default will
- change to true, meaning you must explicitly specify false
- if you want to make changes with the resulting IndexReader.
-
- NOTE: {@link
- IndexReader
} instances are completely thread
- safe, meaning multiple threads can call any of its methods,
- concurrently. If your application requires external
- synchronization, you should not synchronize on the
- IndexReader
instance; use your own
- (non-Lucene) objects instead.
- - IndexReader reader = ... - ... - IndexReader newReader = r.reopen(); - if (newReader != reader) { - ... // reader was reopened - reader.close(); - } - reader = newReader; - ... -- - Be sure to synchronize that code so that other threads, - if present, can never use reader after it has been - closed and before it's switched to newReader. - - NOTE: If this reader is a near real-time - reader (obtained from {@link IndexWriter#GetReader()}, - reopen() will simply call writer.getReader() again for - you, though this may change in the future. - -
true
if the index is optimized; false
otherwise
- true
if an index exists at the specified directory.
- If the directory does not exist or if there is no index in it.
- false
is returned.
- true
if an index exists; false
otherwise
- true
if an index exists at the specified directory.
- If the directory does not exist or if there is no index in it.
- true
if an index exists; false
otherwise
- true
if an index exists at the specified directory.
- If the directory does not exist or if there is no index in it.
- true
if an index exists; false
otherwise
- n
th
- Document
in this index.
-
- NOTE: for performance reasons, this method does not check if the
- requested document is deleted, and therefore asking for a deleted document
- may yield unspecified results. Usually this is not required, however you
- can call {@link #IsDeleted(int)} with the requested document ID to verify
- the document is not deleted.
-
- n
- th position. The {@link FieldSelector} may be used to determine
- what {@link Lucene.Net.Documents.Field}s to load and how they should
- be loaded. NOTE: If this Reader (more specifically, the underlying
- FieldsReader
) is closed before the lazy
- {@link Lucene.Net.Documents.Field} is loaded an exception may be
- thrown. If you want the value of a lazy
- {@link Lucene.Net.Documents.Field} to be available after closing you
- must explicitly load it or fetch the Document again with a new loader.
-
- NOTE: for performance reasons, this method does not check if the
- requested document is deleted, and therefore asking for a deleted document
- may yield unspecified results. Usually this is not required, however you
- can call {@link #IsDeleted(int)} with the requested document ID to verify
- the document is not deleted.
-
- n
th position
-
- The {@link FieldSelector} to use to determine what
- Fields should be loaded on the Document. May be null, in which case
- all Fields will be loaded.
-
- write.lock
could not
- be obtained)
- write.lock
could not
- be obtained)
- t
.term
. For each document, the document number, the frequency of
- the term in that document is also provided, for use in
- search scoring. If term is null, then all non-deleted
- docs are returned with freq=1.
- Thus, this method implements the mapping:
- term
. For each document, in addition to the document number
- and frequency of the term in that document, a list of all of the ordinal
- positions of the term in the document is available. Thus, this method
- implements the mapping:
-
- docNum
. Once a document is
- deleted it will not appear in TermDocs or TermPostitions enumerations.
- Attempts to read its field with the {@link #document}
- method will result in an error. The presence of this document may still be
- reflected in the {@link #docFreq} statistic, though
- this will be corrected eventually as the index is further modified.
-
- write.lock
could not
- be obtained)
- docNum
.
- Applications should call {@link #DeleteDocument(int)} or {@link #DeleteDocuments(Term)}.
- term
indexed.
- This is useful if one uses a document field to hold a unique ID string for
- the document. Then to delete such a document, one merely constructs a
- term with the appropriate field and the unique ID string as its text and
- passes it to this method.
- See {@link #DeleteDocument(int)} for information about when this deletion will
- become effective.
-
- write.lock
could not
- be obtained)
- write.lock
could not
- be obtained)
- true
iff the index in the named directory is
- currently locked.
- true
iff the index in the named directory is
- currently locked.
- - boolean skipTo(int target) { - do { - if (!next()) - return false; - } while (target > doc()); - return true; - } -- Some implementations are considerably more efficient than that. -
null
as its
- detail message. The cause is not initialized, and may subsequently be
- initialized by a call to {@link #innerException}.
- cause
is not automatically incorporated in
- this runtime exception's detail message.
-
- stored
attribute instead.
-
- stored
attribute instead.
-
- len
chars of text
starting at off
- are in the set
- System.String
is in the set null
.
- nextCharArray
for more efficient access.
- OpenBitSet
is faster than java.util.BitSet
in most operations
- and *much* faster at calculating cardinality of sets and results of set operations.
- It can also handle sets of larger cardinality (up to 64 * 2**32-1)
-
- The goals of OpenBitSet
are the fastest implementation possible, and
- maximum code reuse. Extra safety and encapsulation
- may always be built on top, but if that's built in, the cost can never be removed (and
- hence people re-implement their own version in order to get better performance).
- If you want a "safe", totally encapsulated (and slower and limited) BitSet
- class, use java.util.BitSet
.
-
- cardinality | intersect_count | union | nextSetBit | get | iterator | -|
---|---|---|---|---|---|---|
50% full | 3.36 | 3.96 | 1.44 | 1.46 | 1.99 | 1.58 | -
1% full | 3.31 | 3.90 | 1.04 | 0.99 | -
cardinality | intersect_count | union | nextSetBit | get | iterator | -|
---|---|---|---|---|---|---|
50% full | 2.50 | 3.50 | 1.00 | 1.03 | 1.12 | 1.25 | -
1% full | 2.51 | 3.49 | 1.00 | 1.02 | -
- new Lock.With(directory.makeLock("my.lock")) { - public Object doBody() { - ... code to execute while locked ... - } - }.run(); -- - -
super()
in order to filter which
- documents are collected.
-
- - Searcher searcher = new IndexSearcher(indexReader); - final BitSet bits = new BitSet(indexReader.maxDoc()); - searcher.search(query, new HitCollector() { - public void collect(int doc, float score) { - bits.set(doc); - } - }); -- - Note: This is called in an inner search loop. For good search - performance, implementations of this method should not call - {@link Searcher#Doc(int)} or - {@link Lucene.Net.Index.IndexReader#Document(int)} on every - document number encountered. Doing so can slow searches by an order - of magnitude or more. - Note: The
score
passed to this method is a raw score.
- In other words, the score will not necessarily be a float whose value is
- between 0 and 1.
- document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));
-
-
- Integer.MIN_VALUE
and Integer.MAX_VALUE
inclusive.
- Documents which should appear first in the sort
- should have low value integers, later documents high values
- (i.e. the documents should be numbered 1..n
where
- 1
is the first and n
the last).
-
- Long term values should contain only digits and an optional
- preceding negative sign. Values must be base 10 and in the range
- Long.MIN_VALUE
and Long.MAX_VALUE
inclusive.
- Documents which should appear first in the sort
- should have low value integers, later documents high values.
-
- Float term values should conform to values accepted by
- {@link Float Float.valueOf(String)} (except that NaN
- and Infinity
are not supported).
- Documents which should appear first in the sort
- should have low values, later documents high values.
-
- String term values can contain any valid String, but should
- not be tokenized. The values are sorted according to their
- {@link Comparable natural order}. Note that using this type
- of term value has higher memory requirements than the other
- two types.
-
- IndexReader.maxDoc()
for each field
- name for which a sort is performed. In other words, the size of the
- cache in bytes is:
-
- 4 * IndexReader.maxDoc() * (# of different fields actually used to sort)
-
- For String fields, the cache is larger: in addition to the
- above array, the value of every term in the field is kept in memory.
- If there are many unique terms in the field, this could
- be quite large.
-
- Note that the size of the cache is not affected by how many
- fields are in the index and might be used to sort - only by
- the ones actually used to sort a result set.
-
- Created: Feb 12, 2004 10:53:57 AM
-
- field
then by index order (document
- number). The type of value in field
is determined
- automatically.
-
- field
then by
- index order (document number). The type of value in field
is
- determined automatically.
-
- field
is determined automatically.
-
- field
then by index order
- (document number).
- field
possibly in reverse,
- then by index order (document number).
- o
is equal to this. o
is equal to this. type
param tells how to parse the field string values into a numeric score value.
- q
for document d
correlates to the
- cosine-distance or dot-product between document and query vectors in a
-
- Vector Space Model (VSM) of Information Retrieval.
- A document whose vector is closer to the query vector in that model is scored higher.
-
- The score is computed as follows:
-
-
-
-
|
- {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) tf(t in d)} = - | -- frequency½ - | -
- {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) idf(t)} = - | -- 1 + log ( - | -
-
|
- - ) - | -
- queryNorm(q) = - {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) queryNorm(sumOfSquaredWeights)} - = - | -
-
|
-
- {@link Lucene.Net.Search.Weight#SumOfSquaredWeights() sumOfSquaredWeights} = - {@link Lucene.Net.Search.Query#GetBoost() q.getBoost()} 2 - · - | -- ∑ - | -- ( - idf(t) · - t.getBoost() - ) 2 - | -
- | t in q | -- |
- norm(t,d) = - {@link Lucene.Net.Documents.Document#GetBoost() doc.getBoost()} - · - {@link #LengthNorm(String, int) lengthNorm(field)} - · - | -- ∏ - | -- {@link Lucene.Net.Documents.Fieldable#GetBoost() f.getBoost}() - | -
- | field f in d named as t | -- |
numTokens
is large,
- and larger values when numTokens
is small.
-
- Note that the return values are computed under
- {@link Lucene.Net.Index.IndexWriter#AddDocument(Lucene.Net.Documents.Document)}
- and then stored using
- {@link #EncodeNorm(float)}.
- Thus they have limited precision, and documents
- must be re-indexed if this method is altered.
-
- freq
is large, and smaller values when freq
- is small.
-
- The default implementation calls {@link #Tf(float)}.
-
- freq
is large, and smaller values when freq
- is small.
-
- - return idf(searcher.docFreq(term), searcher.maxDoc()); -- - Note that {@link Searcher#MaxDoc()} is used instead of - {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is proportional to - {@link Searcher#DocFreq(Term)} , i.e., when one is inaccurate, - so is the other, and in the same direction. - -
- idf(searcher.docFreq(term), searcher.maxDoc()); -- - Note that {@link Searcher#MaxDoc()} is used instead of - {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is - proportional to {@link Searcher#DocFreq(Term)} , i.e., when one is - inaccurate, so is the other, and in the same direction. - -
state.getBoost()*lengthNorm(numTerms)
, where
- numTerms
is {@link FieldInvertState#GetLength()} if {@link
- #setDiscountOverlaps} is false, else it's {@link
- FieldInvertState#GetLength()} - {@link
- FieldInvertState#GetNumOverlap()}.
-
- WARNING: This API is new and experimental, and may suddenly
- change.
- 1/sqrt(numTerms)
. 1/sqrt(sumOfSquaredWeights)
. sqrt(freq)
. 1 / (distance + 1)
. log(numDocs/(docFreq+1)) + 1
. overlap / maxOverlap
. - java -ea:Lucene.Net... Lucene.Net.Index.CheckIndex pathToIndex [-fix] [-segment X] [-segment Y] --
-fix
: actually write a new segments_N file, removing any problematic segments-segment X
: only check the specified
- segment(s). This can be specified multiple times,
- to check more than one segment, eg -segment _2
- -segment _a
. You can't use this with the -fix
- option.-fix
should only be used on an emergency basis as it will cause
- documents (perhaps many) to be permanently removed from the index. Always make
- a backup copy of your index before running this! Do not run this tool on an index
- that is actively being written to. You have been warned!
- Run without -fix, this tool will open the index, report version information
- and report any exceptions it hits and what action it would take if -fix were
- specified. With -fix, this tool will remove any segments that have issues and
- write a new segments_N file. This means all documents contained in the affected
- segments will be removed.
-
- This tool exits with exit code 1 if the index cannot be opened or has any
- corruption, else 0.
- - SinkTokenizer sink1 = new SinkTokenizer(); - SinkTokenizer sink2 = new SinkTokenizer(); - TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1), sink1), sink2); - TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2), sink1), sink2); - TokenStream final1 = new LowerCaseFilter(source1); - TokenStream final2 = source2; - TokenStream final3 = new EntityDetect(sink1); - TokenStream final4 = new URLDetect(sink2); - d.add(new Field("f1", final1)); - d.add(new Field("f2", final2)); - d.add(new Field("f3", final3)); - d.add(new Field("f4", final4)); -- In this example,
sink1
and sink2
will both get tokens from both
- reader1
and reader2
after whitespace tokenizer
- and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired.
- It is important, that tees are consumed before sinks (in the above example, the field names must be
- less the sink's field names).
- Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene
-
-
- See LUCENE-1058.
-
- WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API.
- If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers
- the same functionality.
- end()
on the
- input TokenStream.
- NOTE: Be sure to call super.end()
first when overriding this method.
- n
bits. bit
to one. bit
to true, and
- returns true if bit was already set
- bit
to zero. true
if bit
is one and
- false
if it is zero.
- name
in Directory
- d
, in a format that can be read by the constructor {@link
- #BitVector(Directory, String)}.
- name
in Directory
- d
, as written by the {@link #write} method.
- - Searcher searcher = new IndexSearcher(indexReader); - final BitSet bits = new BitSet(indexReader.maxDoc()); - searcher.search(query, new Collector() { - private int docBase; - - // ignore scorer - public void setScorer(Scorer scorer) { - } - - // accept docs out of order (for a BitSet it doesn't matter) - public boolean acceptsDocsOutOfOrder() { - return true; - } - - public void collect(int doc) { - bits.set(doc + docBase); - } - - public void setNextReader(IndexReader reader, int docBase) { - this.docBase = docBase; - } - }); -- - Not all collectors will need to rebase the docID. For - example, a collector that simply counts the total number - of hits would skip it. - - NOTE: Prior to 2.9, Lucene silently filtered - out hits with score <= 0. As of 2.9, the core Collectors - no longer do that. It's very unusual to have such hits - (a negative query boost, or function query returning - negative custom scores, could cause it to happen). If - you need that behavior, use {@link - PositiveScoresOnlyCollector}. - - NOTE: This API is experimental and might change - in incompatible ways in the next release. - -
true
if this collector does not
- * require the matching docIDs to be delivered in int sort
- * order (smallest to largest) to {@link #collect}.
- *
- * Most Lucene Query implementations will visit
- * matching docIDs in order. However, some queries
- * (currently limited to certain cases of {@link
- * BooleanQuery}) can achieve faster searching if the
- * Collector
allows them to deliver the
- * docIDs out of order.
- *
- * Many collectors don't mind getting docIDs out of
- * order, so it's important to return true
- * here.
- *
- results
is null it means there are no results to return,
- either because there were 0 calls to collect() or because the arguments to
- topDocs were invalid.
- start
, you should call {@link #TopDocs()} and work
- with the returned {@link TopDocs} object, which will contain all the
- results this search execution collected.
- t
. o
is equal to this. include
which
- have no overlap with spans from exclude
.
- o
is equal to this. Scorer
implements {@link Scorer#SkipTo(int)}.
- ReqOptScorer
.- ModifiedScore = valSrcScore * valSrcScores[0] * valSrcScores[1] * ... --
fieldName
should be loaded.
- - document.add(new NumericField(name).setIntValue(value)); -- - For optimal performance, re-use the -
NumericField
and {@link Document} instance for more than
- one document:
-
- - NumericField field = new NumericField(name); - Document document = new Document(); - document.add(field); - - for(all documents) { - ... - field.setIntValue(value) - writer.addDocument(document); - ... - } -- - The java native types
int
, long
,
- float
and double
are
- directly supported. However, any value that can be
- converted into these native types can also be indexed.
- For example, date/time values represented by a
- {@link java.util.Date} can be translated into a long
- value using the {@link java.util.Date#getTime} method. If you
- don't need millisecond precision, you can quantize the
- value, either by dividing the result of
- {@link java.util.Date#getTime} or using the separate getters
- (for year, month, etc.) to construct an int
or
- long
value.
-
- To perform range querying or filtering against a
- NumericField
, use {@link NumericRangeQuery} or {@link
- NumericRangeFilter}. To sort according to a
- NumericField
, use the normal numeric sort types, eg
- {@link SortField#INT} (note that {@link SortField#AUTO}
- will not work with these fields). NumericField
values
- can also be loaded directly from {@link FieldCache}.
-
- By default, a NumericField
's value is not stored but
- is indexed for range filtering and sorting. You can use
- the {@link #NumericField(String,Field.Store,boolean)}
- constructor if you need to change these defaults.
-
- You may add the same field name as a NumericField
to
- the same document more than once. Range querying and
- filtering will be the logical OR of all values; so a range query
- will hit all documents that have at least one value in
- the range. However sort behavior is not defined. If you need to sort,
- you should separately index a single-valued NumericField
.
-
- A NumericField
will consume somewhat more disk space
- in the index than an ordinary single-valued field.
- However, for a typical index that includes substantial
- textual content per document, this increase will likely
- be in the noise.
-
- Within Lucene, each numeric value is indexed as a
- trie structure, where each term is logically
- assigned to larger and larger pre-defined brackets (which
- are simply lower-precision representations of the value).
- The step size between each successive bracket is called the
- precisionStep
, measured in bits. Smaller
- precisionStep
values result in larger number
- of brackets, which consumes more disk space in the index
- but may result in faster range search performance. The
- default value, 4, was selected for a reasonable tradeoff
- of disk space consumption versus performance. You can
- use the expert constructor {@link
- #NumericField(String,int,Field.Store,boolean)} if you'd
- like to change the value. Note that you must also
- specify a congruent value when creating {@link
- NumericRangeQuery} or {@link NumericRangeFilter}.
- For low cardinality fields larger precision steps are good.
- If the cardinality is < 100, it is fair
- to use {@link Integer#MAX_VALUE}, which produces one
- term per value.
-
- For more information on the internals of numeric trie
- indexing, including the precisionStep
- configuration, see {@link NumericRangeQuery}. The format of
- indexed values is described in {@link NumericUtils}.
-
- If you only need to sort by numeric value, and never
- run range querying/filtering, you can index using a
- precisionStep
of {@link Integer#MAX_VALUE}.
- This will minimize disk space consumed.
-
- More advanced users can instead use {@link
- NumericTokenStream} directly, when indexing numbers. This
- class is a wrapper around this token stream type for
- easier, more intuitive usage.
-
- NOTE: This class is only used during
- indexing. When retrieving the stored field value from a
- {@link Document} instance after search, you will get a
- conventional {@link Fieldable} instance where the numeric
- values are returned as {@link String}s (according to
- toString(value)
of the used data type).
-
- NOTE: This API is
- experimental and might change in incompatible ways in the
- next release.
-
- precisionStep
- {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with
- a numeric value, before indexing a document containing this field,
- set a value using the various set???Value() methods.
- This constructor creates an indexed, but not stored field.
- precisionStep
- {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with
- a numeric value, before indexing a document containing this field,
- set a value using the various set???Value() methods.
- toString(value)
of the used data type)
-
- if the field should be indexed using {@link NumericTokenStream}
-
- precisionStep
. The instance is not yet initialized with
- a numeric value, before indexing a document containing this field,
- set a value using the various set???Value() methods.
- This constructor creates an indexed, but not stored field.
- precisionStep
. The instance is not yet initialized with
- a numeric value, before indexing a document containing this field,
- set a value using the various set???Value() methods.
- toString(value)
of the used data type)
-
- if the field should be indexed using {@link NumericTokenStream}
-
- null
for numeric fields null
for numeric fields null
for numeric fields null
if not yet initialized. long
value.document.add(new NumericField(name, precisionStep).SetLongValue(value))
- int
value.document.add(new NumericField(name, precisionStep).setIntValue(value))
- double
value.document.add(new NumericField(name, precisionStep).setDoubleValue(value))
- float
value.document.add(new NumericField(name, precisionStep).setFloatValue(value))
- word\tstem- (i.e. two tab seperated words) - -
- { pq.top().change(); pq.adjustTop(); } -instead of
- { o = pq.pop(); o.change(); pq.push(o); } --
RAMDirectory
instance from a different
- Directory
implementation. This can be used to load
- a disk-based index into memory.
-
- This should be used only with indices that can fit into memory.
-
- Note that the resulting RAMDirectory
instance is fully
- independent from the original Directory
(it is a
- complete copy). Any subsequent changes to the
- original Directory
will not be visible in the
- RAMDirectory
instance.
-
- Directory
value
-
- RAMDirectory
instance from the {@link FSDirectory}.
-
- File
specifying the index directory
-
-
- RAMDirectory
instance from the {@link FSDirectory}.
-
- String
specifying the full index directory path
-
-
- numHits
, and fill the array with sentinel
- objects.
- term
.lowerTerm
- but less/equal than upperTerm
.
-
-
- If an endpoint is null, it is said
- to be "open". Either or both endpoints may be open. Open endpoints may not
- be exclusive (you can't select all but the first or last term without
- explicitly specifying the term to exclude.)
-
- lowerTerm
is
- included in the range.
-
- If true, the upperTerm
is
- included in the range.
-
- lowerTerm
but less/equal than upperTerm
.
-
- If an endpoint is null, it is said
- to be "open". Either or both endpoints may be open. Open endpoints may not
- be exclusive (you can't select all but the first or last term without
- explicitly specifying the term to exclude.)
-
- If collator
is not null, it will be used to decide whether
- index terms are within the given range, rather than using the Unicode code
- point order in which index terms are stored.
-
- WARNING: Using this constructor and supplying a non-null
- value in the collator
parameter will cause every single
- index Term in the Field referenced by lowerTerm and/or upperTerm to be
- examined. Depending on the number of index Terms in this Field, the
- operation could be very slow.
-
- lowerTerm
is
- included in the range.
-
- If true, the upperTerm
is
- included in the range.
-
- The collator to use to collate index Terms, to determine
- their membership in the range bounded by lowerTerm
and
- upperTerm
.
-
- true
if the lower endpoint is inclusive true
if the upper endpoint is inclusive MultiTermQueryWrapperFilter
is not designed to
- be used by itself. Normally you subclass it to provide a Filter
- counterpart for a {@link MultiTermQuery} subclass.
-
- For example, {@link TermRangeFilter} and {@link PrefixFilter} extend
- MultiTermQueryWrapperFilter
.
- This class also provides the functionality behind
- {@link MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE};
- this is why it is not abstract.
- n
- > current_{@link #Length()} (but n
< {@link #Length()}
- _at_start).
-
- - TopDocs topDocs = searcher.Search(query, numHits); - ScoreDoc[] hits = topDocs.scoreDocs; - for (int i = 0; i < hits.Length; i++) { - int docId = hits[i].doc; - Document d = searcher.Doc(docId); - // do something with current hit - ... --
min
has been retrieved.
- +
) or a minus (-
) sign, indicating
- that the clause is required or prohibited respectively; or+
/-
prefix to require any of a set of
- terms.- Query ::= ( Clause )* - Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" ) -- - - Examples of appropriately formatted queries can be found in the query syntax - documentation. - - - - In {@link TermRangeQuery}s, QueryParser tries to detect date values, e.g. - date:[6/1/2005 TO 6/4/2005] produces a range query that searches - for "date" fields between 2005-06-01 and 2005-06-04. Note that the format - of the accepted input depends on {@link #SetLocale(Locale) the locale}. - By default a date is converted into a search term using the deprecated - {@link DateField} for compatibility reasons. - To use the new {@link DateTools} to convert dates, a - {@link Lucene.Net.Documents.DateTools.Resolution} has to be set. - - - The date resolution that shall be used for RangeQueries can be set - using {@link #SetDateResolution(DateTools.Resolution)} - or {@link #SetDateResolution(String, DateTools.Resolution)}. The former - sets the default date resolution for all fields, whereas the latter can - be used to set field specific date resolutions. Field specific date - resolutions take, if set, precedence over the default date resolution. - - - If you use neither {@link DateField} nor {@link DateTools} in your - index, you can create your own - query parser that inherits QueryParser and overwrites - {@link #GetRangeQuery(String, String, String, boolean)} to - use a different method for date conversion. - - - Note that QueryParser is not thread-safe. - - NOTE: there is a new QueryParser in contrib, which matches - the same syntax as this class, but is more modular, - enabling substantial customization to how a query is created. - - NOTE: there is a new QueryParser in contrib, which matches - the same syntax as this class, but is more modular, - enabling substantial customization to how a query is created. - NOTE: You must specify the required {@link Version} compatibility when - creating QueryParser: -
true
to allow leading wildcard characters.
-
- When set, *
or ?
are allowed as
- the first character of a PrefixQuery and WildcardQuery.
- Note that this can produce very slow
- queries on big indexes.
-
- Default: false.
- true
to enable position increments in result query.
-
- When set, result phrase and multi-phrase queries will
- be aware of position increments.
- Useful when e.g. a StopFilter increases the position increment of
- the token that follows an omitted token.
-
- Default: false.
- OR_OPERATOR
) terms without any modifiers
- are considered optional: for example capital of Hungary
is equal to
- capital OR of OR Hungary
.AND_OPERATOR
mode terms are considered to be in conjunction: the
- above mentioned query is parsed as capital AND of AND Hungary
- true
.
- \\u0041
to A
.
-
- \
.
- java Lucene.Net.QueryParsers.QueryParser <input>
- stopWords
is an instance of {@link CharArraySet} (true if
- makeStopSet()
was used to construct the set) it will be directly used
- and ignoreCase
will be ignored since CharArraySet
- directly controls case sensitivity.
-
- If stopWords
is not an instance of {@link CharArraySet},
- a new CharArraySet will be constructed and ignoreCase
will be
- used to specify the case sensitivity of that set.
-
- stopWords
is an instance of {@link CharArraySet} (true if
- makeStopSet()
was used to construct the set) it will be directly used
- and ignoreCase
will be ignored since CharArraySet
- directly controls case sensitivity.
-
- If stopWords
is not an instance of {@link CharArraySet},
- a new CharArraySet will be constructed and ignoreCase
will be
- used to specify the case sensitivity of that set.
-
- true
, this StopFilter will preserve
- positions of the incoming tokens (ie, accumulate and
- set position increments of the removed stop tokens).
- Generally, true
is best as it does not
- lose information (positions of the original tokens)
- during indexing.
-
- When set, when a token is stopped
- (omitted), the position increment of the following
- token is incremented.
-
- NOTE: be sure to also
- set {@link QueryParser#setEnablePositionIncrements} if
- you use QueryParser to create queries.
- File.createNewFile
contain a vague
- yet spooky warning about not using the API for file
- locking. This warning was added due to this
- bug, and in fact the only known problem with using
- this API for locking is that the Lucene write lock may
- not be released when the JVM exits abnormally.
- When this happens, a {@link LockObtainFailedException}
- is hit when trying to create a writer, in which case you
- need to explicitly clear the lock file first. You can
- either manually remove the file, or use the {@link
- org.apache.lucene.index.IndexReader#unlock(Directory)}
- API. But, first be certain that no writer is in fact
- writing to the index otherwise you can easily corrupt
- your index.
-
- If you suspect that this or any other LockFactory is
- not working properly in your environment, you can easily
- test it by using {@link VerifyingLockFactory}, {@link
- LockVerifyServer} and {@link LockStressTest}.
-
- WildcardTermEnum
.
-
- After calling the constructor the enumeration is already pointing to the first
- valid term if such a term exists.
- - teacherid: 1 - studentfirstname: james - studentsurname: jones - - teacherid: 2 - studenfirstname: james - studentsurname: smith - studentfirstname: sally - studentsurname: jones -- - a SpanNearQuery with a slop of 0 can be applied across two - {@link SpanTermQuery} objects as follows: -
- SpanQuery q1 = new SpanTermQuery(new Term("studentfirstname", "james")); - SpanQuery q2 = new SpanTermQuery(new Term("studentsurname", "jones")); - SpanQuery q2m new FieldMaskingSpanQuery(q2, "studentfirstname"); - Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false); -- to search for 'studentfirstname:james studentsurname:jones' and find - teacherid 1 without matching teacherid 2 (which has a 'james' in position 0 - and 'jones' in position 1). - - Note: as {@link #GetField()} returns the masked field, scoring will be - done using the norms of the field name supplied. This may lead to unexpected - scoring behaviour. -
query
.
- app*
.
-
- This query uses the {@link
- MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}
- rewrite method.
- prefix
. getFloats()
and makes those values
- available as other numeric types, casting as needed.
-
-
- WARNING: The status of the Search.Function package is experimental.
- The APIs introduced here might change in the future and will not be
- supported anymore in such a case.
-
- getBytes()
and makes those values
- available as other numeric types, casting as needed.
-
-
- WARNING: The status of the Search.Function package is experimental.
- The APIs introduced here might change in the future and will not be
- supported anymore in such a case.
-
- IndexWriter
creates and maintains an index.
@@ -16573,241 +8977,4051 @@
been carried over to the merged segment.
open()
methods, e.g. {@link
+ #Open(String, boolean)}.
+ For efficiency, in this API documents are often referred to via
+ document numbers, non-negative integers which each name a unique
+ document in the index. These document numbers are ephemeral--they may change
+ as documents are added to and deleted from an index. Clients should thus not
+ rely on a given document having the same number between sessions.
+ An IndexReader can be opened on a directory for which an IndexWriter is
+ opened already, but it cannot be used to delete documents from the index then.
+
+ NOTE: for backwards API compatibility, several methods are not listed
+ as abstract, but have no useful implementations in this base class and
+ instead always throw UnsupportedOperationException. Subclasses are
+ strongly encouraged to override these methods, but in many cases may not
+ need to.
+
+
+ NOTE: as of 2.4, it's possible to open a read-only
+ IndexReader using one of the static open methods that
+ accepts the boolean readOnly parameter. Such a reader has
+ better concurrency as it's not necessary to synchronize on
+ the isDeleted method. Currently the default for readOnly
+ is false, meaning if not specified you will get a
+ read/write IndexReader. But in 3.0 this default will
+ change to true, meaning you must explicitly specify false
+ if you want to make changes with the resulting IndexReader.
+
+ NOTE: {@link
+ IndexReader
} instances are completely thread
+ safe, meaning multiple threads can call any of its methods,
+ concurrently. If your application requires external
+ synchronization, you should not synchronize on the
+ IndexReader
instance; use your own
+ (non-Lucene) objects instead.
+ + IndexReader reader = ... + ... + IndexReader newReader = r.reopen(); + if (newReader != reader) { + ... // reader was reopened + reader.close(); + } + reader = newReader; + ... ++ + Be sure to synchronize that code so that other threads, + if present, can never use reader after it has been + closed and before it's switched to newReader. + + NOTE: If this reader is a near real-time + reader (obtained from {@link IndexWriter#GetReader()}, + reopen() will simply call writer.getReader() again for + you, though this may change in the future. + +
true
if the index is optimized; false
otherwise
+ true
if an index exists at the specified directory.
+ If the directory does not exist or if there is no index in it.
+ false
is returned.
+ true
if an index exists; false
otherwise
+ true
if an index exists at the specified directory.
+ If the directory does not exist or if there is no index in it.
+ true
if an index exists; false
otherwise
+ true
if an index exists at the specified directory.
+ If the directory does not exist or if there is no index in it.
+ true
if an index exists; false
otherwise
+ n
th
+ Document
in this index.
+
+ NOTE: for performance reasons, this method does not check if the
+ requested document is deleted, and therefore asking for a deleted document
+ may yield unspecified results. Usually this is not required, however you
+ can call {@link #IsDeleted(int)} with the requested document ID to verify
+ the document is not deleted.
+
+ n
+ th position. The {@link FieldSelector} may be used to determine
+ what {@link Lucene.Net.Documents.Field}s to load and how they should
+ be loaded. NOTE: If this Reader (more specifically, the underlying
+ FieldsReader
) is closed before the lazy
+ {@link Lucene.Net.Documents.Field} is loaded an exception may be
+ thrown. If you want the value of a lazy
+ {@link Lucene.Net.Documents.Field} to be available after closing you
+ must explicitly load it or fetch the Document again with a new loader.
+
+ NOTE: for performance reasons, this method does not check if the
+ requested document is deleted, and therefore asking for a deleted document
+ may yield unspecified results. Usually this is not required, however you
+ can call {@link #IsDeleted(int)} with the requested document ID to verify
+ the document is not deleted.
+
+ n
th position
+
+ The {@link FieldSelector} to use to determine what
+ Fields should be loaded on the Document. May be null, in which case
+ all Fields will be loaded.
+
+ write.lock
could not
+ be obtained)
+ write.lock
could not
+ be obtained)
+ t
.term
. For each document, the document number, the frequency of
+ the term in that document is also provided, for use in
+ search scoring. If term is null, then all non-deleted
+ docs are returned with freq=1.
+ Thus, this method implements the mapping:
+ term
. For each document, in addition to the document number
+ and frequency of the term in that document, a list of all of the ordinal
+ positions of the term in the document is available. Thus, this method
+ implements the mapping:
+
+ docNum
. Once a document is
+ deleted it will not appear in TermDocs or TermPostitions enumerations.
+ Attempts to read its field with the {@link #document}
+ method will result in an error. The presence of this document may still be
+ reflected in the {@link #docFreq} statistic, though
+ this will be corrected eventually as the index is further modified.
+
+ write.lock
could not
+ be obtained)
+ docNum
.
+ Applications should call {@link #DeleteDocument(int)} or {@link #DeleteDocuments(Term)}.
+ term
indexed.
+ This is useful if one uses a document field to hold a unique ID string for
+ the document. Then to delete such a document, one merely constructs a
+ term with the appropriate field and the unique ID string as its text and
+ passes it to this method.
+ See {@link #DeleteDocument(int)} for information about when this deletion will
+ become effective.
+
+ write.lock
could not
+ be obtained)
+ write.lock
could not
+ be obtained)
+ true
iff the index in the named directory is
+ currently locked.
+ true
iff the index in the named directory is
+ currently locked.
+ fieldsToLoad
and lazyFieldsToLoad
, lazy has precedence.
+ segments_N
) associated
+ with this commit point.
+ currentOff
.
+ super()
in order to filter which
- documents are collected.
+ match
whose end
- position is less than or equal to end
.
+ getDirectory
methods default to use
+ {@link SimpleFSLockFactory} for backwards compatibility.
+ The system properties
+ org.apache.lucene.store.FSDirectoryLockFactoryClass
+ and org.apache.lucene.FSDirectory.class
+ are deprecated and only used by the deprecated
+ getDirectory
methods. The system property
+ org.apache.lucene.lockDir
is ignored completely,
+ If you really want to store locks
+ elsewhere, you can create your own {@link
+ SimpleFSLockFactory} (or {@link NativeFSLockFactory},
+ etc.) passing in your preferred lock directory.
+
+ In 3.0 this class will become abstract.
+
+ FieldCache.DEFAULT
for maintaining internal term lookup tables.
+ true
, call {@link #Close()} method on source directory
+
+ getDirectory
+ respect this setting.
+ Integer.MAX_VALUE
.
+ Scorer
implements {@link Scorer#SkipTo(int)}.
+ ReqOptScorer
.Searchables
.
+
+ Applications usually need only call the inherited {@link #Search(Query)}
+ or {@link #Search(Query,Filter)} methods.
+ Searchables
.
+
+ Applications usually need only call the inherited {@link #Search(Query)}
+ or {@link #Search(Query,Filter)} methods.
+ n
in the array
+ used to construct this searcher.
+ n
within its
+ sub-index.
+ FieldCache.DEFAULT
for maintaining
+ internal term lookup tables.
+
+ This class will not resolve SortField.AUTO types, and expects the type
+ of all SortFields used for construction to already have been resolved.
+ {@link SortField#DetectFieldType(IndexReader, String)} is a utility method which
+ may be used for field type detection.
+
+ NOTE: This API is experimental and might change in
+ incompatible ways in the next release.
+
+ numHits
.
+
+ null
or empty
- Fieldable names, in priority order (highest priority first). Cannot be null
or empty.
-
- The number of hits to retain. Must be greater than zero.
+ The number of hits to retain. Must be greater than zero.
a
is less relevant than b
.true
if document a
should be sorted after document b
.
title
and body
):
-
-
-
- (title:term1 body:term1) (title:term2 body:term2)
-
-
-
- When setDefaultOperator(AND_OPERATOR) is set, the result will be:
-
-
-
- +(title:term1 body:term1) +(title:term2 body:term2)
-
-
-
- When you pass a boost (title=>5 body=>10) you can get
-
-
-
- +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0)
-
-
-
- In other words, all the query's terms must appear, but it doesn't matter
- in what fields they appear.
-
-
- title
and body
):
-
-
-
- (title:term1 body:term1) (title:term2 body:term2)
-
-
-
- When setDefaultOperator(AND_OPERATOR) is set, the result will be:
-
-
-
- +(title:term1 body:term1) +(title:term2 body:term2)
-
-
-
- When you pass a boost (title=>5 body=>10) you can get
-
-
-
- +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0)
-
-
-
- In other words, all the query's terms must appear, but it doesn't matter
- in what fields they appear.
-
+ null
.
+ This is to handle the case using ParallelMultiSearcher where the
+ original list contains AUTO and we don't know the actual sort
+ type until the values come back. The fields can only be set once.
+ This method is thread safe.
title
and body
):
-
-
-
- (title:term1 body:term1) (title:term2 body:term2)
-
-
-
- When setDefaultOperator(AND_OPERATOR) is set, the result will be:
-
-
-
- +(title:term1 body:term1) +(title:term2 body:term2)
-
-
-
- In other words, all the query's terms must appear, but it doesn't matter
- in what fields they appear.
-
-
- title
and body
):
-
-
-
- (title:term1 body:term1) (title:term2 body:term2)
-
-
-
- When setDefaultOperator(AND_OPERATOR) is set, the result will be:
-
-
-
- +(title:term1 body:term1) +(title:term2 body:term2)
-
-
-
- In other words, all the query's terms must appear, but it doesn't matter
- in what fields they appear.
-
+ null
. The collators
+ correspond to any SortFields which were given a specific locale.
- <code> - (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx) - </code> -- -
- <code> - (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx) - </code> -- -
- Usage:
-
- String[] fields = {"filename", "contents", "description"};
- BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
- BooleanClause.Occur.MUST,
- BooleanClause.Occur.MUST_NOT};
- MultiFieldQueryParser.parse("query", fields, flags, analyzer);
-
-
-
- The code above would construct a query:
-
-
- (filename:query) +(contents:query) -(description:query)
-
-
-
- - Usage: - <code> - String[] fields = {"filename", "contents", "description"}; - BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD, - BooleanClause.Occur.MUST, - BooleanClause.Occur.MUST_NOT}; - MultiFieldQueryParser.parse("query", fields, flags, analyzer); - </code> -- - The code above would construct a query: - -
- <code> - (filename:query) +(contents:query) -(description:query) - </code> -- -
- Usage:
-
- String[] query = {"query1", "query2", "query3"};
- String[] fields = {"filename", "contents", "description"};
- BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
- BooleanClause.Occur.MUST,
- BooleanClause.Occur.MUST_NOT};
- MultiFieldQueryParser.parse(query, fields, flags, analyzer);
-
-
-
- The code above would construct a query:
-
-
- (filename:query1) +(contents:query2) -(description:query3)
-
-
-
- - Usage: - <code> - String[] query = {"query1", "query2", "query3"}; - String[] fields = {"filename", "contents", "description"}; - BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD, - BooleanClause.Occur.MUST, - BooleanClause.Occur.MUST_NOT}; - MultiFieldQueryParser.parse(query, fields, flags, analyzer); - </code> -- - The code above would construct a query: - -
- <code> - (filename:query1) +(contents:query2) -(description:query3) - </code> -- -
true
if the index is optimized; false
otherwise
+ null
.
write.lock
could not be
- obtained)
- a
is less relevant than b
.true
if document a
should be sorted after document b
.
+ indexOf
method.
+ MultipleTermPositions
instance.
+
+ FilterIndexReader
contains another IndexReader, which it
+ uses as its basic source of data, possibly transforming the data along the
+ way or providing additional functionality. The class
+ FilterIndexReader
itself simply implements all abstract methods
+ of IndexReader
with versions that pass all requests to the
+ contained index reader. Subclasses of FilterIndexReader
may
+ further override some of these methods and may also provide additional
+ methods and fields.
+ len
chars of text
starting at off
+ are in the set
+ System.String
is in the set null
.
+ nextCharArray
for more efficient access.
+ q
for document d
correlates to the
+ cosine-distance or dot-product between document and query vectors in a
+
+ Vector Space Model (VSM) of Information Retrieval.
+ A document whose vector is closer to the query vector in that model is scored higher.
+
+ The score is computed as follows:
+
+
+
+
|
+ {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) tf(t in d)} = + | ++ frequency½ + | +
+ {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) idf(t)} = + | ++ 1 + log ( + | +
+
|
+ + ) + | +
+ queryNorm(q) = + {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) queryNorm(sumOfSquaredWeights)} + = + | +
+
|
+
+ {@link Lucene.Net.Search.Weight#SumOfSquaredWeights() sumOfSquaredWeights} = + {@link Lucene.Net.Search.Query#GetBoost() q.getBoost()} 2 + · + | ++ ∑ + | ++ ( + idf(t) · + t.getBoost() + ) 2 + | +
+ | t in q | ++ |
+ norm(t,d) = + {@link Lucene.Net.Documents.Document#GetBoost() doc.getBoost()} + · + {@link #LengthNorm(String, int) lengthNorm(field)} + · + | ++ ∏ + | ++ {@link Lucene.Net.Documents.Fieldable#GetBoost() f.getBoost}() + | +
+ | field f in d named as t | ++ |
numTokens
is large,
+ and larger values when numTokens
is small.
+
+ Note that the return values are computed under
+ {@link Lucene.Net.Index.IndexWriter#AddDocument(Lucene.Net.Documents.Document)}
+ and then stored using
+ {@link #EncodeNorm(float)}.
+ Thus they have limited precision, and documents
+ must be re-indexed if this method is altered.
+
+ freq
is large, and smaller values when freq
+ is small.
+
+ The default implementation calls {@link #Tf(float)}.
+
+ freq
is large, and smaller values when freq
+ is small.
+
+ + return idf(searcher.docFreq(term), searcher.maxDoc()); ++ + Note that {@link Searcher#MaxDoc()} is used instead of + {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is proportional to + {@link Searcher#DocFreq(Term)} , i.e., when one is inaccurate, + so is the other, and in the same direction. + +
+ idf(searcher.docFreq(term), searcher.maxDoc()); ++ + Note that {@link Searcher#MaxDoc()} is used instead of + {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is + proportional to {@link Searcher#DocFreq(Term)} , i.e., when one is + inaccurate, so is the other, and in the same direction. + +
state.getBoost()*lengthNorm(numTerms)
, where
+ numTerms
is {@link FieldInvertState#GetLength()} if {@link
+ #setDiscountOverlaps} is false, else it's {@link
+ FieldInvertState#GetLength()} - {@link
+ FieldInvertState#GetNumOverlap()}.
+
+ WARNING: This API is new and experimental, and may suddenly
+ change.
+ 1/sqrt(numTerms)
. 1/sqrt(sumOfSquaredWeights)
. sqrt(freq)
. 1 / (distance + 1)
. log(numDocs/(docFreq+1)) + 1
. overlap / maxOverlap
. o
is equal to this. .f
+ a number and
- from .s
+ a number. Also note that
- Lucene's segments_N
files do not have any
- filename extension.
+ segments_N
). This point in time, when the
+ action of writing of a new segments file to the directory
+ is completed, is an index commit.
+
+ Each index commit point has a unique segments file
+ associated with it. The segments file associated with a
+ later index commit point would have a larger N.
+
+ WARNING: This API is a new and experimental and
+ may suddenly change.
+ segments_N
) associated
+ with this commit point.
+ + document.add(new NumericField(name).setIntValue(value)); ++ + For optimal performance, re-use the +
NumericField
and {@link Document} instance for more than
+ one document:
+
+ + NumericField field = new NumericField(name); + Document document = new Document(); + document.add(field); + + for(all documents) { + ... + field.setIntValue(value) + writer.addDocument(document); + ... + } ++ + The java native types
int
, long
,
+ float
and double
are
+ directly supported. However, any value that can be
+ converted into these native types can also be indexed.
+ For example, date/time values represented by a
+ {@link java.util.Date} can be translated into a long
+ value using the {@link java.util.Date#getTime} method. If you
+ don't need millisecond precision, you can quantize the
+ value, either by dividing the result of
+ {@link java.util.Date#getTime} or using the separate getters
+ (for year, month, etc.) to construct an int
or
+ long
value.
+
+ To perform range querying or filtering against a
+ NumericField
, use {@link NumericRangeQuery} or {@link
+ NumericRangeFilter}. To sort according to a
+ NumericField
, use the normal numeric sort types, eg
+ {@link SortField#INT} (note that {@link SortField#AUTO}
+ will not work with these fields). NumericField
values
+ can also be loaded directly from {@link FieldCache}.
+
+ By default, a NumericField
's value is not stored but
+ is indexed for range filtering and sorting. You can use
+ the {@link #NumericField(String,Field.Store,boolean)}
+ constructor if you need to change these defaults.
+
+ You may add the same field name as a NumericField
to
+ the same document more than once. Range querying and
+ filtering will be the logical OR of all values; so a range query
+ will hit all documents that have at least one value in
+ the range. However sort behavior is not defined. If you need to sort,
+ you should separately index a single-valued NumericField
.
+
+ A NumericField
will consume somewhat more disk space
+ in the index than an ordinary single-valued field.
+ However, for a typical index that includes substantial
+ textual content per document, this increase will likely
+ be in the noise.
+
+ Within Lucene, each numeric value is indexed as a
+ trie structure, where each term is logically
+ assigned to larger and larger pre-defined brackets (which
+ are simply lower-precision representations of the value).
+ The step size between each successive bracket is called the
+ precisionStep
, measured in bits. Smaller
+ precisionStep
values result in larger number
+ of brackets, which consumes more disk space in the index
+ but may result in faster range search performance. The
+ default value, 4, was selected for a reasonable tradeoff
+ of disk space consumption versus performance. You can
+ use the expert constructor {@link
+ #NumericField(String,int,Field.Store,boolean)} if you'd
+ like to change the value. Note that you must also
+ specify a congruent value when creating {@link
+ NumericRangeQuery} or {@link NumericRangeFilter}.
+ For low cardinality fields larger precision steps are good.
+ If the cardinality is < 100, it is fair
+ to use {@link Integer#MAX_VALUE}, which produces one
+ term per value.
+
+ For more information on the internals of numeric trie
+ indexing, including the precisionStep
+ configuration, see {@link NumericRangeQuery}. The format of
+ indexed values is described in {@link NumericUtils}.
+
+ If you only need to sort by numeric value, and never
+ run range querying/filtering, you can index using a
+ precisionStep
of {@link Integer#MAX_VALUE}.
+ This will minimize disk space consumed.
+
+ More advanced users can instead use {@link
+ NumericTokenStream} directly, when indexing numbers. This
+ class is a wrapper around this token stream type for
+ easier, more intuitive usage.
+
+ NOTE: This class is only used during
+ indexing. When retrieving the stored field value from a
+ {@link Document} instance after search, you will get a
+ conventional {@link Fieldable} instance where the numeric
+ values are returned as {@link String}s (according to
+ toString(value)
of the used data type).
+
+ NOTE: This API is
+ experimental and might change in incompatible ways in the
+ next release.
+
+ precisionStep
+ {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with
+ a numeric value, before indexing a document containing this field,
+ set a value using the various set???Value() methods.
+ This constructor creates an indexed, but not stored field.
+ precisionStep
+ {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with
+ a numeric value, before indexing a document containing this field,
+ set a value using the various set???Value() methods.
+ toString(value)
of the used data type)
+
+ if the field should be indexed using {@link NumericTokenStream}
+
+ precisionStep
. The instance is not yet initialized with
+ a numeric value, before indexing a document containing this field,
+ set a value using the various set???Value() methods.
+ This constructor creates an indexed, but not stored field.
+ precisionStep
. The instance is not yet initialized with
+ a numeric value, before indexing a document containing this field,
+ set a value using the various set???Value() methods.
+ toString(value)
of the used data type)
+
+ if the field should be indexed using {@link NumericTokenStream}
+
+ null
for numeric fields null
for numeric fields null
for numeric fields null
if not yet initialized. long
value.document.add(new NumericField(name, precisionStep).SetLongValue(value))
+ int
value.document.add(new NumericField(name, precisionStep).setIntValue(value))
+ double
value.document.add(new NumericField(name, precisionStep).setDoubleValue(value))
+ float
value.document.add(new NumericField(name, precisionStep).setFloatValue(value))
+ + SinkTokenizer sink1 = new SinkTokenizer(); + SinkTokenizer sink2 = new SinkTokenizer(); + TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1), sink1), sink2); + TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2), sink1), sink2); + TokenStream final1 = new LowerCaseFilter(source1); + TokenStream final2 = source2; + TokenStream final3 = new EntityDetect(sink1); + TokenStream final4 = new URLDetect(sink2); + d.add(new Field("f1", final1)); + d.add(new Field("f2", final2)); + d.add(new Field("f3", final3)); + d.add(new Field("f4", final4)); ++ In this example,
sink1
and sink2
will both get tokens from both
+ reader1
and reader2
after whitespace tokenizer
+ and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired.
+ It is important, that tees are consumed before sinks (in the above example, the field names must be
+ less the sink's field names).
+ Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene
+
+
+ See LUCENE-1058.
+
+ WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API.
+ If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers
+ the same functionality.
value
should be stored in the index
-
- Whether the field should be indexed, and if so, if it should
- be tokenized before indexing
-
- null
value
should be stored in the index
-
- Whether the field should be indexed, and if so, if it should
- be tokenized before indexing
-
- Whether term vector should be stored
-
- null
TermVector.YES
value
should be stored in the index
-
- Whether the field should be indexed, and if so, if it should
- be tokenized before indexing
-
- Whether term vector should be stored
-
- null
TermVector.YES
null
null
null
null
value
should be stored (compressed or not)
-
- Store.NO
value
should be stored (compressed or not)
-
- Store.NO
results
is null it means there are no results to return,
+ either because there were 0 calls to collect() or because the arguments to
+ topDocs were invalid.
+ start
, you should call {@link #TopDocs()} and work
+ with the returned {@link TopDocs} object, which will contain all the
+ results this search execution collected.
+ getFloats()
and makes those values
+ available as other numeric types, casting as needed.
+
+
+ WARNING: The status of the Search.Function package is experimental.
+ The APIs introduced here might change in the future and will not be
+ supported anymore in such a case.
Zero
value.
+
+
+ WARNING: The status of the Search.Function package is experimental.
+ The APIs introduced here might change in the future and will not be
+ supported anymore in such a case.
+
+ NOTE: with the switch in 2.9 to segment-based
+ searching, if {@link #getValues} is invoked with a
+ composite (multi-segment) reader, this can easily cause
+ double RAM usage for the values in the FieldCache. It's
+ best to switch your application to pass only atomic
+ (single segment) readers to this API. Alternatively, for
+ a short-term fix, you could wrap your ValueSource using
+ {@link MultiValueSource}, which costs more CPU per lookup
+ but will not consume double the FieldCache RAM.
+ Resolution.DAY
or lower.
+
+
+ Another approach is {@link NumericUtils}, which provides
+ a sortable binary representation (prefix encoded) of numeric values, which
+ date/time are.
+ For indexing a {@link Date} or {@link Calendar}, just get the unix timestamp as
+ long
using {@link Date#getTime} or {@link Calendar#getTimeInMillis} and
+ index this as a numeric value with {@link NumericField}
+ and use {@link NumericRangeQuery} to query it.
+ yyyyMMddHHmmssSSS
or shorter,
+ depending on resolution
; using GMT as timezone
+ yyyyMMddHHmmssSSS
or shorter,
+ depending on resolution
; using GMT as timezone
+ timeToString
or
+ DateToString
back to a time, represented as the
+ number of milliseconds since January 1, 1970, 00:00:00 GMT.
+
+ dateString
is not in the timeToString
or
+ DateToString
back to a time, represented as a
+ Date object.
+
+ dateString
is not in the 2004-09-21 13:50:11
+ will be changed to 2004-09-01 00:00:00
when using
+ Resolution.MONTH
.
+
+ resolution
+ set to 0 or 1
+ 1095767411000
+ (which represents 2004-09-21 13:50:11) will be changed to
+ 1093989600000
(2004-09-01 00:00:00) when using
+ Resolution.MONTH
.
+
+ resolution
+ set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT
+ + TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader1)); + TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream(); + TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream(); + TeeSinkTokenFilter source2 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader2)); + source2.addSinkTokenStream(sink1); + source2.addSinkTokenStream(sink2); + TokenStream final1 = new LowerCaseFilter(source1); + TokenStream final2 = source2; + TokenStream final3 = new EntityDetect(sink1); + TokenStream final4 = new URLDetect(sink2); + d.add(new Field("f1", final1)); + d.add(new Field("f2", final2)); + d.add(new Field("f3", final3)); + d.add(new Field("f4", final4)); ++ In this example,
sink1
and sink2
will both get tokens from both
+ reader1
and reader2
after whitespace tokenizer
+ and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired.
+ It is important, that tees are consumed before sinks (in the above example, the field names must be
+ less the sink's field names). If you are not sure, which stream is consumed first, you can simply
+ add another sink and then pass all tokens to the sinks at once using {@link #consumeAllTokens}.
+ This TokenFilter is exhausted after this. In the above example, change
+ the example above to:
+ + ... + TokenStream final1 = new LowerCaseFilter(source1.newSinkTokenStream()); + TokenStream final2 = source2.newSinkTokenStream(); + sink1.consumeAllTokens(); + sink2.consumeAllTokens(); + ... ++ In this case, the fields can be added in any order, because the sources are not used anymore and all sinks are ready. + Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene. +
TeeSinkTokenFilter
+ to this one. The supplied stream will also receive all consumed tokens.
+ This method can be used to pass tokens from two different tees to one sink.
+ TeeSinkTokenFilter
passes all tokens to the added sinks
+ when itself is consumed. To be sure, that all tokens from the input
+ stream are passed to the sinks, you can call this methods.
+ This instance is exhausted after this, but all sinks are instant available.
+ Scorer
for documents matching a Term
.TermScorer
.
+
+ Term
in the query.
+
+ An iterator over the documents matching the Term
.
+
+ The Similarity
implementation to be used for score
+ computations.
+
+ The field norms of the document fields for the Term
.
+
+ TermScorer
. document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));
+
+
+ Integer.MIN_VALUE
and Integer.MAX_VALUE
inclusive.
+ Documents which should appear first in the sort
+ should have low value integers, later documents high values
+ (i.e. the documents should be numbered 1..n
where
+ 1
is the first and n
the last).
+
+ Long term values should contain only digits and an optional
+ preceding negative sign. Values must be base 10 and in the range
+ Long.MIN_VALUE
and Long.MAX_VALUE
inclusive.
+ Documents which should appear first in the sort
+ should have low value integers, later documents high values.
+
+ Float term values should conform to values accepted by
+ {@link Float Float.valueOf(String)} (except that NaN
+ and Infinity
are not supported).
+ Documents which should appear first in the sort
+ should have low values, later documents high values.
+
+ String term values can contain any valid String, but should
+ not be tokenized. The values are sorted according to their
+ {@link Comparable natural order}. Note that using this type
+ of term value has higher memory requirements than the other
+ two types.
+
+ IndexReader.maxDoc()
for each field
+ name for which a sort is performed. In other words, the size of the
+ cache in bytes is:
+
+ 4 * IndexReader.maxDoc() * (# of different fields actually used to sort)
+
+ For String fields, the cache is larger: in addition to the
+ above array, the value of every term in the field is kept in memory.
+ If there are many unique terms in the field, this could
+ be quite large.
+
+ Note that the size of the cache is not affected by how many
+ fields are in the index and might be used to sort - only by
+ the ones actually used to sort a result set.
+
+ Created: Feb 12, 2004 10:53:57 AM
+
+ field
then by index order (document
+ number). The type of value in field
is determined
+ automatically.
+
+ field
then by
+ index order (document number). The type of value in field
is
+ determined automatically.
+
+ field
is determined automatically.
+
+ field
then by index order
+ (document number).
+ field
possibly in reverse,
+ then by index order (document number).
+ o
is equal to this. lowerTerm
but less than upperTerm
.
+ There must be at least one term and either term may be null,
+ in which case there is no bound on that side, but if there are
+ two terms, both terms must be for the same field.
+
+ lowerTerm
and
+ upperTerm
will themselves be included in the range.
+
+ lowerTerm
but less than upperTerm
.
+ There must be at least one term and either term may be null,
+ in which case there is no bound on that side, but if there are
+ two terms, both terms must be for the same field.
+
+ If collator
is not null, it will be used to decide whether
+ index terms are within the given range, rather than using the Unicode code
+ point order in which index terms are stored.
+
+ WARNING: Using this constructor and supplying a non-null
+ value in the collator
parameter will cause every single
+ index Term in the Field referenced by lowerTerm and/or upperTerm to be
+ examined. Depending on the number of index Terms in this Field, the
+ operation could be very slow.
+
+ lowerTerm
and
+ upperTerm
will themselves be included in the range.
+
+ The collator to use to collate index Terms, to determine
+ their membership in the range bounded by lowerTerm
and
+ upperTerm
.
+
+ true
if the range query is inclusive o
is equal to this. +
) or a minus (-
) sign, indicating
+ that the clause is required or prohibited respectively; or+
/-
prefix to require any of a set of
+ terms.+ Query ::= ( Clause )* + Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" ) ++ + + Examples of appropriately formatted queries can be found in the query syntax + documentation. + + + + In {@link TermRangeQuery}s, QueryParser tries to detect date values, e.g. + date:[6/1/2005 TO 6/4/2005] produces a range query that searches + for "date" fields between 2005-06-01 and 2005-06-04. Note that the format + of the accepted input depends on {@link #SetLocale(Locale) the locale}. + By default a date is converted into a search term using the deprecated + {@link DateField} for compatibility reasons. + To use the new {@link DateTools} to convert dates, a + {@link Lucene.Net.Documents.DateTools.Resolution} has to be set. + + + The date resolution that shall be used for RangeQueries can be set + using {@link #SetDateResolution(DateTools.Resolution)} + or {@link #SetDateResolution(String, DateTools.Resolution)}. The former + sets the default date resolution for all fields, whereas the latter can + be used to set field specific date resolutions. Field specific date + resolutions take, if set, precedence over the default date resolution. + + + If you use neither {@link DateField} nor {@link DateTools} in your + index, you can create your own + query parser that inherits QueryParser and overwrites + {@link #GetRangeQuery(String, String, String, boolean)} to + use a different method for date conversion. + + + Note that QueryParser is not thread-safe. + + NOTE: there is a new QueryParser in contrib, which matches + the same syntax as this class, but is more modular, + enabling substantial customization to how a query is created. + + NOTE: there is a new QueryParser in contrib, which matches + the same syntax as this class, but is more modular, + enabling substantial customization to how a query is created. + NOTE: You must specify the required {@link Version} compatibility when + creating QueryParser: +
true
to allow leading wildcard characters.
+
+ When set, *
or ?
are allowed as
+ the first character of a PrefixQuery and WildcardQuery.
+ Note that this can produce very slow
+ queries on big indexes.
+
+ Default: false.
+ true
to enable position increments in result query.
+
+ When set, result phrase and multi-phrase queries will
+ be aware of position increments.
+ Useful when e.g. a StopFilter increases the position increment of
+ the token that follows an omitted token.
+
+ Default: false.
+ OR_OPERATOR
) terms without any modifiers
+ are considered optional: for example capital of Hungary
is equal to
+ capital OR of OR Hungary
.AND_OPERATOR
mode terms are considered to be in conjunction: the
+ above mentioned query is parsed as capital AND of AND Hungary
+ true
.
+ \\u0041
to A
.
+
+ \
.
+ java Lucene.Net.QueryParsers.QueryParser <input>
+ word\tstem+ (i.e. two tab seperated words) + +
+ public String toString() { + return "start=" + startOffset + ",end=" + endOffset; + } ++ + This method may be overridden by subclasses. +
+ public int hashCode() { + int code = startOffset; + code = code * 31 + endOffset; + return code; + } ++ + see also {@link #equals(Object)} +
shift
value (number of bits removed) used
- during encoding.
-
- To also index floating point numbers, this class supplies two methods to convert them
- to integer values by changing their bit layout: {@link #doubleToSortableLong},
- {@link #floatToSortableInt}. You will have no precision loss by
- converting floating point numbers to integers and back (only that the integer form
- is not usable). Other data types like dates can easily converted to longs or ints (e.g.
- date to long: {@link java.util.Date#getTime}).
-
- For easy usage, the trie algorithm is implemented for indexing inside
- {@link NumericTokenStream} that can index int
, long
,
- float
, and double
. For querying,
- {@link NumericRangeQuery} and {@link NumericRangeFilter} implement the query part
- for the same data types.
-
- This class can also be used, to generate lexicographically sortable (according
- {@link String#compareTo(String)}) representations of numeric data types for other
- usages (e.g. sorting).
-
- NOTE: This API is experimental and
- might change in incompatible ways in the next release.
-
- WildcardTermEnum
.
+
+ After calling the constructor the enumeration is already pointing to the first
+ valid term if such a term exists.
char[]
buffer size)
- for encoding long
values.
+ char[]
buffer size)
- for encoding int
values.
+ SHIFT_START_LONG+shift
in the first character
- SHIFT_START_INT+shift
in the first character
- shift
bits.
- This is method is used by {@link NumericTokenStream}.
- shift
bits.
- This is method is used by {@link LongRangeBuilder}.
- shift
bits.
- This is method is used by {@link NumericTokenStream}.
- shift
bits.
- This is method is used by {@link IntRangeBuilder}.
- double
value to a sortable signed long
.
- The value is converted by getting their IEEE 754 floating-point "double format"
- bit layout and then some bits are swapped, to be able to compare the result as long.
- By this the precision is not reduced, but the value can easily used as a long.
- long
back to a double
.float
value to a sortable signed int
.
- The value is converted by getting their IEEE 754 floating-point "float format"
- bit layout and then some bits are swapped, to be able to compare the result as int.
- By this the precision is not reduced, but the value can easily used as an int.
- int
back to a float
.Scorer
implements {@link Scorer#SkipTo(int)},
- and it uses the skipTo() on the given scorers.
- ReqExclScorer
.include
which
+ have no overlap with spans from exclude
.
+ o
is equal to this. SegmentInfos
) to do its job;
- if you implement your own MergePolicy, you'll need to put
- it in package Lucene.Net.Index in order to use
- these APIs.
- byte[]
payloads
- query
.
- o
is equal to this. reader
which share a prefix of
- length prefixLength
with term
and which have a fuzzy similarity >
- minSimilarity
.
-
- After calling the constructor the enumeration is already pointing to the first
- valid term if such a term exists.
-
- - editDistance < maximumEditDistance- Otherwise it returns: -
- 1 - (editDistance / length)- where length is the length of the shortest term (text or target) including a - prefix that are identical and editDistance is the Levenshtein distance for - the two words. - - Embedded within this algorithm is a fail-fast Levenshtein distance - algorithm. The fail-fast algorithm differs from the standard Levenshtein - distance algorithm in that it is aborted if it is discovered that the - mimimum distance between the words is greater than some threshold. - - To calculate the maximum distance threshold we use the following formula: -
- (1 - minimumSimilarity) * length- where length is the shortest term including any prefix that is not part of the - similarity comparision. This formula was derived by solving for what maximum value - of distance returns false for the following statements: -
- similarity = 1 - ((float)distance / (float) (prefixLength + Math.min(textlen, targetlen))); - return (similarity > minimumSimilarity);- where distance is the Levenshtein distance for the two words. - - Levenshtein distance (also known as edit distance) is a measure of similiarity - between two strings where the distance is measured as the number of character - deletions, insertions or substitutions required to transform one string to - the other string. -
null
.
-
- Filter to apply to query results, cannot be null
.
-
- o
is equal to this. ConjunctionScorer
.
- This Scorer implements {@link Scorer#SkipTo(int)} and uses skipTo() on the given Scorers.
- TODO: Implement score(HitCollector, int).
- currentSumScore
is the total score of the current matching doc,
- nrMatchers
is the number of matching scorers,
- and all scorers are after the matching doc, or are exhausted.
- DisjunctionScorer
.minimumNrMatchers
is bigger than
- the number of subScorers
,
- no matches will be produced.
- ConjunctionScorer
.
-
- DisjunctionScorer
, using one as the minimum number
- of matching subscorers.
- scorerDocQueue
.
- scorerDocQueue
.
- Repeat until at least the minimum number of subscorers match on the same
- document and all subscorers are after that document or are exhausted.
- scorerDocQueue
has at least minimumNrMatchers
- available. At least the scorer with the minimum document number will be advanced.
- currentDoc
, currentSumScore
,
- and nrMatchers
describe the match.
-
- TODO: Investigate whether it is possible to use skipTo() when
- the minimum number of matchers is bigger than one, ie. try and use the
- character of ConjunctionScorer for the minimum number of matchers.
- Also delay calling score() on the sub scorers until the minimum number of
- matchers is reached.
- dir
or name
is null file
is the string by which the
- sub-stream will be known in the compound stream.
-
- file
is null t1 t2 t3- with slop at least 1, the fragment: -
t1 t2 t1 t3 t2 t3- matches twice: -
t1 t2 .. t3-
t1 .. t2 t3- - - Expert: - Only public for subclassing. Most implementations should not need this class -
lowerTerm
but less than upperTerm
.
- There must be at least one term and either term may be null,
- in which case there is no bound on that side, but if there are
- two terms, both terms must be for the same field.
-
- lowerTerm
and
- upperTerm
will themselves be included in the range.
-
- lowerTerm
but less than upperTerm
.
- There must be at least one term and either term may be null,
- in which case there is no bound on that side, but if there are
- two terms, both terms must be for the same field.
-
- If collator
is not null, it will be used to decide whether
- index terms are within the given range, rather than using the Unicode code
- point order in which index terms are stored.
-
- WARNING: Using this constructor and supplying a non-null
- value in the collator
parameter will cause every single
- index Term in the Field referenced by lowerTerm and/or upperTerm to be
- examined. Depending on the number of index Terms in this Field, the
- operation could be very slow.
-
- lowerTerm
and
- upperTerm
will themselves be included in the range.
-
- The collator to use to collate index Terms, to determine
- their membership in the range bounded by lowerTerm
and
- upperTerm
.
-
- true
if the range query is inclusive o
is equal to this. Searchables
.
-
- Applications usually need only call the inherited {@link #Search(Query)}
- or {@link #Search(Query,Filter)} methods.
- n
in the array
- used to construct this searcher.
- n
within its
- sub-index.
- size
elements. If
- prePopulate
is set to true, the queue will pre-populate itself
- with sentinel objects and set its {@link #Size()} to size
. In
- that case, you should not rely on {@link #Size()} to get the number of
- actual elements that were added to the queue, but keep track yourself.prePopulate
is true, you should pop
- elements from the queue using the following code example:
-
- - PriorityQueue pq = new HitQueue(10, true); // pre-populate. - ScoreDoc top = pq.top(); - - // Add/Update one element. - top.score = 1.0f; - top.doc = 0; - top = (ScoreDoc) pq.updateTop(); - int totalHits = 1; - - // Now pop only the elements that were *truly* inserted. - // First, pop all the sentinel elements (there are pq.size() - totalHits). - for (int i = pq.size() - totalHits; i > 0; i--) pq.pop(); - - // Now pop the truly added elements. - ScoreDoc[] results = new ScoreDoc[totalHits]; - for (int i = totalHits - 1; i >= 0; i--) { - results[i] = (ScoreDoc) pq.pop(); - } -- - NOTE: This class pre-allocate a full array of - length
size
.
-
- positionIncrement == 0
.Resolution.DAY
or lower.
-
-
- Another approach is {@link NumericUtils}, which provides
- a sortable binary representation (prefix encoded) of numeric values, which
- date/time are.
- For indexing a {@link Date} or {@link Calendar}, just get the unix timestamp as
- long
using {@link Date#getTime} or {@link Calendar#getTimeInMillis} and
- index this as a numeric value with {@link NumericField}
- and use {@link NumericRangeQuery} to query it.
- yyyyMMddHHmmssSSS
or shorter,
- depending on resolution
; using GMT as timezone
- yyyyMMddHHmmssSSS
or shorter,
- depending on resolution
; using GMT as timezone
- timeToString
or
- DateToString
back to a time, represented as the
- number of milliseconds since January 1, 1970, 00:00:00 GMT.
-
- dateString
is not in the timeToString
or
- DateToString
back to a time, represented as a
- Date object.
-
- dateString
is not in the 2004-09-21 13:50:11
- will be changed to 2004-09-01 00:00:00
when using
- Resolution.MONTH
.
-
- resolution
- set to 0 or 1
- 1095767411000
- (which represents 2004-09-21 13:50:11) will be changed to
- 1093989600000
(2004-09-01 00:00:00) when using
- Resolution.MONTH
.
-
- resolution
- set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT
- int
field:
-
- - Field field = new Field(name, new NumericTokenStream(precisionStep).setIntValue(value)); - field.setOmitNorms(true); - field.setOmitTermFreqAndPositions(true); - document.add(field); -- - For optimal performance, re-use the TokenStream and Field instance - for more than one document: - -
- NumericTokenStream stream = new NumericTokenStream(precisionStep); - Field field = new Field(name, stream); - field.setOmitNorms(true); - field.setOmitTermFreqAndPositions(true); - Document document = new Document(); - document.add(field); - - for(all documents) { - stream.setIntValue(value) - writer.addDocument(document); - } -- - This stream is not intended to be used in analyzers; - it's more for iterating the different precisions during - indexing a specific numeric value. - - NOTE: as token streams are only consumed once - the document is added to the index, if you index more - than one numeric field, use a separate
NumericTokenStream
- instance for each.
-
- See {@link NumericRangeQuery} for more details on the
- precisionStep
- parameter as well as how numeric fields work under the hood.
-
- NOTE: This API is experimental and
- might change in incompatible ways in the next release.
-
- precisionStep
- {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The stream is not yet initialized,
- before using set a value using the various set???Value() methods.
- precisionStep
. The stream is not yet initialized,
- before using set a value using the various set???Value() methods.
- precisionStep
using the given {@link AttributeSource}.
- The stream is not yet initialized,
- before using set a value using the various set???Value() methods.
- precisionStep
using the given
- {@link org.apache.lucene.util.AttributeSource.AttributeFactory}.
- The stream is not yet initialized,
- before using set a value using the various set???Value() methods.
- long
value.new Field(name, new NumericTokenStream(precisionStep).SetLongValue(value))
- int
value.new Field(name, new NumericTokenStream(precisionStep).SetIntValue(value))
- double
value.new Field(name, new NumericTokenStream(precisionStep).SetDoubleValue(value))
- float
value.new Field(name, new NumericTokenStream(precisionStep).SetFloatValue(value))
- singleMatch
occurs in
- the input, it will be replaced with
- replacement
.
-
- Scorer
for documents matching a Term
.TermScorer
.
-
- Term
in the query.
-
- An iterator over the documents matching the Term
.
-
- The Similarity
implementation to be used for score
- computations.
-
- The field norms of the document fields for the Term
.
-
- TermScorer
. Searchables
.
-
- Applications usually need only call the inherited {@link #Search(Query)}
- or {@link #Search(Query,Filter)} methods.
- null
.
- This is to handle the case using ParallelMultiSearcher where the
- original list contains AUTO and we don't know the actual sort
- type until the values come back. The fields can only be set once.
- This method is thread safe.
- null
. The collators
- correspond to any SortFields which were given a specific locale.
- null
.
- a
is less relevant than b
.true
if document a
should be sorted after document b
.
- fields
object array
- will have three elements, corresponding respectively to
- the term values for the document in fields "a", "b" and "c".
- The class of each element in the array will be either
- Integer, Float or String depending on the type of values
- in the terms of each field.
-
- Created: Feb 11, 2004 1:23:38 PM
-
- .f
+ a number and
+ from .s
+ a number. Also note that
+ Lucene's segments_N
files do not have any
+ filename extension.
true
if the index is optimized; false
otherwise
write.lock
could not be
+ obtained)
+ fieldName
should be loaded.
+ fieldsToLoad
and lazyFieldsToLoad
, lazy has precedence.
+
+ input
to a newly created JFlex scanner.
+ input
to the newly created JFlex scanner.
+
+ input
to the newly created JFlex scanner.
+
+ File.createNewFile
contain a vague
+ yet spooky warning about not using the API for file
+ locking. This warning was added due to this
+ bug, and in fact the only known problem with using
+ this API for locking is that the Lucene write lock may
+ not be released when the JVM exits abnormally.
+ When this happens, a {@link LockObtainFailedException}
+ is hit when trying to create a writer, in which case you
+ need to explicitly clear the lock file first. You can
+ either manually remove the file, or use the {@link
+ org.apache.lucene.index.IndexReader#unlock(Directory)}
+ API. But, first be certain that no writer is in fact
+ writing to the index otherwise you can easily corrupt
+ your index.
- NOTE: The instances returned by this method
- pre-allocate a full array of length
- numHits
.
+ If you suspect that this or any other LockFactory is
+ not working properly in your environment, you can easily
+ test it by using {@link VerifyingLockFactory}, {@link
+ LockVerifyServer} and {@link LockStressTest}.
trackDocScores
to true as well.
-
- specifies whether documents are scored in doc Id order or not by
- the given {@link Scorer} in {@link #SetScorer(Scorer)}.
-
- Scorer
implements {@link Scorer#SkipTo(int)},
+ and it uses the skipTo() on the given scorers.
ReqExclScorer
.position
as location - offset
, so that a
- matching exact phrase is easily identified when all PhrasePositions
- have exactly the same position
.
+ n
+ > current_{@link #Length()} (but n
< {@link #Length()}
+ _at_start).
+
+ + TopDocs topDocs = searcher.Search(query, numHits); + ScoreDoc[] hits = topDocs.scoreDocs; + for (int i = 0; i < hits.Length; i++) { + int docId = hits[i].doc; + Document d = searcher.Doc(docId); + // do something with current hit + ... +
min
has been retrieved.
o
is equal to this. - class MyAnalyzer extends Analyzer { - public final TokenStream tokenStream(String fieldName, Reader reader) { - return new PorterStemFilter(new LowerCaseTokenizer(reader)); - } - } --
n
in the
- array used to construct this searcher/reader.
- indexOf
method.
+ write.lock
+ could not be acquired. This
+ happens when a writer tries to open an index
+ that another writer already has open.
+ reader
which share a prefix of
+ length prefixLength
with term
and which have a fuzzy similarity >
+ minSimilarity
.
+
+ After calling the constructor the enumeration is already pointing to the first
+ valid term if such a term exists.
+
+ + editDistance < maximumEditDistance+ Otherwise it returns: +
+ 1 - (editDistance / length)+ where length is the length of the shortest term (text or target) including a + prefix that are identical and editDistance is the Levenshtein distance for + the two words. + + Embedded within this algorithm is a fail-fast Levenshtein distance + algorithm. The fail-fast algorithm differs from the standard Levenshtein + distance algorithm in that it is aborted if it is discovered that the + mimimum distance between the words is greater than some threshold. + + To calculate the maximum distance threshold we use the following formula: +
+ (1 - minimumSimilarity) * length+ where length is the shortest term including any prefix that is not part of the + similarity comparision. This formula was derived by solving for what maximum value + of distance returns false for the following statements: +
+ similarity = 1 - ((float)distance / (float) (prefixLength + Math.min(textlen, targetlen))); + return (similarity > minimumSimilarity);+ where distance is the Levenshtein distance for the two words. + + Levenshtein distance (also known as edit distance) is a measure of similiarity + between two strings where the distance is measured as the number of character + deletions, insertions or substitutions required to transform one string to + the other string. +
o
is equal to this. positionIncrement == 0
.dir
or name
is null file
is the string by which the
+ sub-stream will be known in the compound stream.
+
+ file
is null