diff --git a/lib/lucene.net/Lucene.Net.dll b/lib/lucene.net/Lucene.Net.dll index 7f78e8b04..2d17e209a 100644 Binary files a/lib/lucene.net/Lucene.Net.dll and b/lib/lucene.net/Lucene.Net.dll differ diff --git a/lib/lucene.net/Lucene.Net.pdb b/lib/lucene.net/Lucene.Net.pdb index 97ead5096..3f9e6a4c0 100644 Binary files a/lib/lucene.net/Lucene.Net.pdb and b/lib/lucene.net/Lucene.Net.pdb differ diff --git a/lib/lucene.net/Lucene.Net.xml b/lib/lucene.net/Lucene.Net.xml index c9de818bb..9391dfb18 100644 --- a/lib/lucene.net/Lucene.Net.xml +++ b/lib/lucene.net/Lucene.Net.xml @@ -4,1311 +4,808 @@ Lucene.Net - - A PriorityQueue maintains a partial ordering of its elements such that the - least element can always be found in constant time. Put()'s and pop()'s - require log(size) time. - -

NOTE: This class pre-allocates a full array of - length maxSize+1, in {@link #initialize}. + +

Helper methods to ease implementing {@link Object#toString()}. + + + for printing boost only if not 1.0 + + + Common util methods for dealing with {@link IndexReader}s. - - Determines the ordering of objects in this priority queue. Subclasses - must define this one method. - - - - This method can be overridden by extending classes to return a sentinel - object which will be used by {@link #Initialize(int)} to fill the queue, so - that the code which uses that queue can always assume it's full and only - change the top without attempting to insert any new object.
- - Those sentinel values should always compare worse than any non-sentinel - value (i.e., {@link #LessThan(Object, Object)} should always favor the - non-sentinel values).
- - By default, this method returns false, which means the queue will not be - filled with sentinel values. Otherwise, the value returned will be used to - pre-populate the queue. Adds sentinel values to the queue.
- - If this method is extended to return a non-null value, then the following - usage pattern is recommended: - -
-            // extends getSentinelObject() to return a non-null value.
-            PriorityQueue pq = new MyQueue(numHits);
-            // save the 'top' element, which is guaranteed to not be null.
-            MyObject pqTop = (MyObject) pq.top();
-            <...>
-            // now in order to add a new element, which is 'better' than top (after 
-            // you've verified it is better), it is as simple as:
-            pqTop.change().
-            pqTop = pq.updateTop();
-            
- - NOTE: if this method returns a non-null value, it will be called by - {@link #Initialize(int)} {@link #Size()} times, relying on a new object to - be returned and will not check if it's null again. Therefore you should - ensure any call to this method creates a new instance and behaves - consistently, e.g., it cannot return null if it previously returned - non-null. + + Gathers sub-readers from reader into a List. - the sentinel object to use to pre-populate the queue, or null if - sentinel objects are not supported. - - - - Subclass constructors must call this. - - - Adds an Object to a PriorityQueue in log(size) time. If one tries to add - more objects than maxSize from initialize a RuntimeException - (ArrayIndexOutOfBound) is thrown. - - - use {@link #Add(Object)} which returns the new top object, - saving an additional call to {@link #Top()}. - - - - Adds an Object to a PriorityQueue in log(size) time. If one tries to add - more objects than maxSize from initialize an - {@link ArrayIndexOutOfBoundsException} is thrown. - - - the new 'top' element in the queue. - - - - Adds element to the PriorityQueue in log(size) time if either the - PriorityQueue is not full, or not lessThan(element, top()). - - - + + + - true if element is added, false otherwise. - - use {@link #InsertWithOverflow(Object)} instead, which - encourages objects reuse. - - - insertWithOverflow() is the same as insert() except its - return value: it returns the object (if any) that was - dropped off the heap because it was full. This can be - the given parameter (in case it is smaller than the - full heap's minimum, and couldn't be added), or another - object that was previously the smallest value in the - heap and now has been replaced by a larger one, or null - if the queue wasn't yet full with maxSize elements. - - - - Returns the least element of the PriorityQueue in constant time. - - - Removes and returns the least element of the PriorityQueue in log(size) - time. - - - - Should be called when the Object at top changes values. Still log(n) worst - case, but it's at least twice as fast to - -
-            pq.top().change();
-            pq.adjustTop();
-            
- - instead of - -
-            o = pq.pop();
-            o.change();
-            pq.push(o);
-            
+ + Returns sub IndexReader that contains the given document id. - use {@link #UpdateTop()} which returns the new top element and - saves an additional call to {@link #Top()}. - - - - Should be called when the Object at top changes values. Still log(n) worst - case, but it's at least twice as fast to - -
-            pq.top().change();
-            pq.updateTop();
-            
- - instead of - -
-            o = pq.pop();
-            o.change();
-            pq.push(o);
-            
- -
- the new 'top' element. + id of document + + parent reader + + sub reader of parent which contains the specified doc id
- - Returns the number of elements currently stored in the PriorityQueue. + + Returns sub-reader subIndex from reader. + + + parent reader + + index of desired sub reader + + the subreader at subINdex + - - Removes all entries from the PriorityQueue. - - - A straightforward implementation of {@link FSDirectory} - using java.io.RandomAccessFile. However, this class has - poor concurrent performance (multiple threads will - bottleneck) as it synchronizes when multiple threads - read from the same file. It's usually better to use - {@link NIOFSDirectory} or {@link MMapDirectory} instead. + + Returns index of the searcher/reader for document n in the + array used to construct this searcher/reader. - - - Base class for Directory implementations that store index - files in the file system. There are currently three core - subclasses: + + Returns primitive memory sizes for estimating RAM usage. + + + + + size of array beyond contents + + + + Class size overhead + + + + a primitive Class - bool, byte, char, short, long, float, + short, double, int + + the size in bytes of given primitive Class + + + + size of reference + + + + Base interface for attributes. + + + Expert: Calculate query weights and build query scorers. +

+ The purpose of {@link Weight} is to ensure searching does not + modify a {@link Query}, so that a {@link Query} instance can be reused.
+ {@link Searcher} dependent state of the query should reside in the + {@link Weight}.
+ {@link IndexReader} dependent state should reside in the {@link Scorer}. +

+ A Weight is used in the following way: +

    +
  1. A Weight is constructed by a top-level query, given a + Searcher ({@link Query#CreateWeight(Searcher)}).
  2. +
  3. The {@link #SumOfSquaredWeights()} method is called on the + Weight to compute the query normalization factor + {@link Similarity#QueryNorm(float)} of the query clauses contained in the + query.
  4. +
  5. The query normalization factor is passed to {@link #Normalize(float)}. At + this point the weighting is complete.
  6. +
  7. A Scorer is constructed by {@link #Scorer(IndexReader,boolean,boolean)}.
  8. +
+ +
+ 2.9 + +
+ + An explanation of the score computation for the named document. + + + sub-reader containing the give doc + + + + an Explanation for the score + + IOException + + + The query that this concerns. + + + The weight for this query. + + + Assigns the query normalization factor to this. + + + Returns a {@link Scorer} which scores documents in/out-of order according + to scoreDocsInOrder. +

+ NOTE: even if scoreDocsInOrder is false, it is + recommended to check whether the returned Scorer indeed scores + documents out of order (i.e., call {@link #ScoresDocsOutOfOrder()}), as + some Scorer implementations will always return documents + in-order.
+ NOTE: null can be returned if no documents will be scored by this + query. + +

+ + the {@link IndexReader} for which to return the {@link Scorer}. + + specifies whether in-order scoring of documents is required. Note + that if set to false (i.e., out-of-order scoring is required), + this method can return whatever scoring mode it supports, as every + in-order scorer is also an out-of-order one. However, an + out-of-order scorer may not support {@link Scorer#NextDoc()} + and/or {@link Scorer#Advance(int)}, therefore it is recommended to + request an in-order scorer if use of these methods is required. + + + if true, {@link Scorer#Score(Collector)} will be called; if false, + {@link Scorer#NextDoc()} and/or {@link Scorer#Advance(int)} will + be called. + + a {@link Scorer} which scores documents in/out-of order. + + IOException +
+ + The sum of squared weights of contained query clauses. + + + Returns true iff this implementation scores docs only out of order. This + method is used in conjunction with {@link Collector}'s + {@link Collector#AcceptsDocsOutOfOrder() acceptsDocsOutOfOrder} and + {@link #Scorer(Lucene.Net.Index.IndexReader, boolean, boolean)} to + create a matching {@link Scorer} instance for a given {@link Collector}, or + vice versa. +

+ NOTE: the default implementation returns false, i.e. + the Scorer scores documents in-order. +

+
+ + A Query that matches documents containing terms with a specified prefix. A PrefixQuery + is built by QueryParser for input like app*. + +

This query uses the {@link + MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} + rewrite method. +

+
+ + An abstract {@link Query} that matches documents + containing a subset of terms provided by a {@link + FilteredTermEnum} enumeration. + +

This query cannot be used directly; you must subclass + it and define {@link #getEnum} to provide a {@link + FilteredTermEnum} that iterates through the terms to be + matched. + +

NOTE: if {@link #setRewriteMethod} is either + {@link #CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE} or {@link + #SCORING_BOOLEAN_QUERY_REWRITE}, you may encounter a + {@link BooleanQuery.TooManyClauses} exception during + searching, which happens when the number of terms to be + searched exceeds {@link + BooleanQuery#GetMaxClauseCount()}. Setting {@link + #setRewriteMethod} to {@link #CONSTANT_SCORE_FILTER_REWRITE} + prevents this. + +

The recommended rewrite method is {@link + #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}: it doesn't spend CPU + computing unhelpful scores, and it tries to pick the most + performant rewrite method given the query. + + Note that {@link QueryParser} produces + MultiTermQueries using {@link + #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default. +

+
+ + The abstract base class for queries. +

Instantiable subclasses are: +

    +
  • {@link TermQuery}
  • +
  • {@link MultiTermQuery}
  • +
  • {@link BooleanQuery}
  • +
  • {@link WildcardQuery}
  • +
  • {@link PhraseQuery}
  • +
  • {@link PrefixQuery}
  • +
  • {@link MultiPhraseQuery}
  • +
  • {@link FuzzyQuery}
  • +
  • {@link TermRangeQuery}
  • +
  • {@link NumericRangeQuery}
  • +
  • {@link Lucene.Net.Search.Spans.SpanQuery}
  • +
+

A parser for queries is contained in: +

    +
  • {@link Lucene.Net.QueryParsers.QueryParser QueryParser}
  • +
+
+
+ + Sets the boost for this query clause to b. Documents + matching this clause will (in addition to the normal weightings) have + their score multiplied by b. + + + + Gets the boost for this clause. Documents matching + this clause will (in addition to the normal weightings) have their score + multiplied by b. The boost is 1.0 by default. + + + + Prints a query to a string, with field assumed to be the + default field and omitted. +

The representation used is one that is supposed to be readable + by {@link Lucene.Net.QueryParsers.QueryParser QueryParser}. However, + there are the following limitations: +

    +
  • If the query was created by the parser, the printed + representation may not be exactly what was parsed. For example, + characters that need to be escaped will be represented without + the required backslash.
  • +
  • Some of the more complicated queries (e.g. span queries) + don't have a representation that can be parsed by QueryParser.
  • +
+
+
+ + Prints a query to a string. + + + Expert: Constructs an appropriate Weight implementation for this query. + +

+ Only implemented by primitive queries, which re-write to themselves. +

+
+ + Expert: Constructs and initializes a Weight for a top-level query. + + + Expert: called to re-write queries into primitive queries. For example, + a PrefixQuery will be rewritten into a BooleanQuery that consists + of TermQuerys. + + + + Expert: called when re-writing queries under MultiSearcher. + + Create a single query suitable for use by all subsearchers (in 1-1 + correspondence with queries). This is an optimization of the OR of + all queries. We handle the common optimization cases of equal + queries and overlapping clauses of boolean OR queries (as generated + by MultiTermQuery.rewrite()). + Be careful overriding this method as queries[0] determines which + method will be called and is not necessarily of the same type as + the other queries. + + + + Expert: adds all terms occuring in this query to the terms set. Only + works if this query is in its {@link #rewrite rewritten} form. + + + UnsupportedOperationException if this query is not yet rewritten + + + Expert: merges the clauses of a set of BooleanQuery's into a single + BooleanQuery. + +

A utility for use by {@link #Combine(Query[])} implementations. +

+
+ + Expert: Returns the Similarity implementation to be used for this query. + Subclasses may override this method to specify their own Similarity + implementation, perhaps one that delegates through that of the Searcher. + By default the Searcher's Similarity implementation is returned. + + + + Returns a clone of this query. + + + A rewrite method that first creates a private Filter, + by visiting each term in sequence and marking all docs + for that term. Matching documents are assigned a + constant score equal to the query's boost. + +

This method is faster than the BooleanQuery + rewrite methods when the number of matched terms or + matched documents is non-trivial. Also, it will never + hit an errant {@link BooleanQuery.TooManyClauses} + exception. + +

+ + +
+ + A rewrite method that first translates each term into + {@link BooleanClause.Occur#SHOULD} clause in a + BooleanQuery, and keeps the scores as computed by the + query. Note that typically such scores are + meaningless to the user, and require non-trivial CPU + to compute, so it's almost always better to use {@link + #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} instead. + +

NOTE: This rewrite method will hit {@link + BooleanQuery.TooManyClauses} if the number of terms + exceeds {@link BooleanQuery#getMaxClauseCount}. + +

+ + +
+ + Like {@link #SCORING_BOOLEAN_QUERY_REWRITE} except + scores are not computed. Instead, each matching + document receives a constant score equal to the + query's boost. + +

NOTE: This rewrite method will hit {@link + BooleanQuery.TooManyClauses} if the number of terms + exceeds {@link BooleanQuery#getMaxClauseCount}. + +

+ + +
+ + Read-only default instance of {@link + ConstantScoreAutoRewrite}, with {@link + ConstantScoreAutoRewrite#setTermCountCutoff} set to + {@link + ConstantScoreAutoRewrite#DEFAULT_TERM_COUNT_CUTOFF} + and {@link + ConstantScoreAutoRewrite#setDocCountPercent} set to + {@link + ConstantScoreAutoRewrite#DEFAULT_DOC_COUNT_PERCENT}. + Note that you cannot alter the configuration of this + instance; you'll need to create a private instance + instead. + + + + Constructs a query for terms matching term. + check sub class for possible term access - the Term does not + make sense for all MultiTermQuerys and will be removed. + + + + Constructs a query matching terms that cannot be represented with a single + Term. + + + + Returns the pattern term. + check sub class for possible term access - getTerm does not + make sense for all MultiTermQuerys and will be removed. + + + + Construct the enumeration to be used, expanding the pattern term. + + + Expert: Return the number of unique terms visited during execution of the query. + If there are many of them, you may consider using another query type + or optimize your total term count in index. +

This method is not thread safe, be sure to only call it when no query is running! + If you re-use the same query instance for another + search, be sure to first reset the term counter + with {@link #clearTotalNumberOfTerms}. +

On optimized indexes / no MultiReaders, you get the correct number of + unique terms for the whole index. Use this number to compare different queries. + For non-optimized indexes this number can also be achived in + non-constant-score mode. In constant-score mode you get the total number of + terms seeked for all segments / sub-readers. +

+ + +
+ + Expert: Resets the counting of unique terms. + Do this before executing the query/filter. + + + + + + + + + + Sets the rewrite method to be used when executing the + query. You can use one of the four core methods, or + implement your own subclass of {@link RewriteMethod}. + + + + A rewrite method that tries to pick the best + constant-score rewrite method based on term and + document counts from the query. If both the number of + terms and documents is small enough, then {@link + #CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE} is used. + Otherwise, {@link #CONSTANT_SCORE_FILTER_REWRITE} is + used. + + + + Abstract class that defines how the query is rewritten. + + + If the number of terms in this query is equal to or + larger than this setting then {@link + #CONSTANT_SCORE_FILTER_REWRITE} is used. + + + + + + + + If the number of documents to be visited in the + postings exceeds this specified percentage of the + maxDoc() for the index, then {@link + #CONSTANT_SCORE_FILTER_REWRITE} is used. + + 0.0 to 100.0 + + + + + + + + Constructs a query for terms starting with prefix. + + + Returns the prefix of this query. + + + Prints a user-readable version of this query. + + + A Filter that restricts search results to values that have a matching prefix in a given + field. + + + + A wrapper for {@link MultiTermQuery}, that exposes its + functionality as a {@link Filter}. +

+ MultiTermQueryWrapperFilter is not designed to + be used by itself. Normally you subclass it to provide a Filter + counterpart for a {@link MultiTermQuery} subclass. +

+ For example, {@link TermRangeFilter} and {@link PrefixFilter} extend + MultiTermQueryWrapperFilter. + This class also provides the functionality behind + {@link MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE}; + this is why it is not abstract. +

+
+ + Abstract base class for restricting which documents may be returned during searching. +

+ Note: In Lucene 3.0 {@link #Bits(IndexReader)} will be removed + and {@link #GetDocIdSet(IndexReader)} will be defined as abstract. + All implementing classes must therefore implement {@link #GetDocIdSet(IndexReader)} + in order to work with Lucene 3.0. +

+
+ + + + Creates a {@link DocIdSet} enumerating the documents that should be + permitted in search results. NOTE: null can be + returned if no documents are accepted by this Filter. +

+ Note: This method will be called once per segment in + the index during searching. The returned {@link DocIdSet} + must refer to document IDs for that segment, not for + the top-level reader. + + @param reader a {@link IndexReader} instance opened on the index currently + searched on. Note, it is likely that the provided reader does not + represent the whole underlying index i.e. if the index has more than + one segment the given reader only represents a single segment. + +

+ a DocIdSet that provides the documents which should be permitted or + prohibited in search results. NOTE: null can be returned if + no documents will be accepted by this Filter. + + + +
+ + Wrap a {@link MultiTermQuery} as a Filter. + + + Expert: Return the number of unique terms visited during execution of the filter. + If there are many of them, you may consider using another filter type + or optimize your total term count in index. +

This method is not thread safe, be sure to only call it when no filter is running! + If you re-use the same filter instance for another + search, be sure to first reset the term counter + with {@link #clearTotalNumberOfTerms}. +

+ + +
+ + Expert: Resets the counting of unique terms. + Do this before executing the filter. + + + + + + Returns a BitSet with true for documents which should be + permitted in search results, and false for those that should + not. + + Use {@link #GetDocIdSet(IndexReader)} instead. + + + + Returns a DocIdSet with documents that should be + permitted in search results. + + + + Prints a user-readable version of this query. + + + A {@link Collector} implementation which wraps another + {@link Collector} and makes sure only documents with + scores > 0 are collected. + + + +

Expert: Collectors are primarily meant to be used to + gather raw results from a search, and implement sorting + or custom result filtering, collation, etc.

+ +

As of 2.9, this class replaces the deprecated + HitCollector, and offers an API for efficient collection + of hits across sequential {@link IndexReader}s. {@link + IndexSearcher} advances the collector through each of the + sub readers, in an arbitrary order. This results in a + higher performance means of collection.

+ +

Lucene's core collectors are derived from Collector. + Likely your application can use one of these classes, or + subclass {@link TopDocsCollector}, instead of + implementing Collector directly:

- Unfortunately, because of system peculiarities, there is - no single overall best implementation. Therefore, we've - added the {@link #open} method, to allow Lucene to choose - the best FSDirectory implementation given your - environment, and the known limitations of each - implementation. For users who have no reason to prefer a - specific implementation, it's best to simply use {@link - #open}. For all others, you should instantiate the - desired implementation directly. +

Collector decouples the score from the collected doc: + the score computation is skipped entirely if it's not + needed. Collectors that do need the score should + implement the {@link #setScorer} method, to hold onto the + passed {@link Scorer} instance, and call {@link + Scorer#Score()} within the collect method to compute the + current hit's score. If your collector may request the + score for a single hit multiple times, you should use + {@link ScoreCachingWrappingScorer}.

-

The locking implementation is by default {@link - NativeFSLockFactory}, but can be changed by - passing in a custom {@link LockFactory} instance. - The deprecated getDirectory methods default to use - {@link SimpleFSLockFactory} for backwards compatibility. - The system properties - org.apache.lucene.store.FSDirectoryLockFactoryClass - and org.apache.lucene.FSDirectory.class - are deprecated and only used by the deprecated - getDirectory methods. The system property - org.apache.lucene.lockDir is ignored completely, - If you really want to store locks - elsewhere, you can create your own {@link - SimpleFSLockFactory} (or {@link NativeFSLockFactory}, - etc.) passing in your preferred lock directory. +

NOTE: The doc that is passed to the collect + method is relative to the current reader. If your + collector needs to resolve this to the docID space of the + Multi*Reader, you must re-base it by recording the + docBase from the most recent setNextReader call. Here's + a simple example showing how to collect docIDs into a + BitSet:

-

In 3.0 this class will become abstract. +

+            Searcher searcher = new IndexSearcher(indexReader);
+            final BitSet bits = new BitSet(indexReader.maxDoc());
+            searcher.search(query, new Collector() {
+            private int docBase;
             
-            
- - - - - A Directory is a flat list of files. Files may be written once, when they - are created. Once a file is created it may only be opened for read, or - deleted. Random access is permitted both when reading and writing. + // ignore scorer + public void setScorer(Scorer scorer) { + } -

Java's i/o APIs not used directly, but rather all i/o is - through this API. This permits things such as:

    -
  • implementation of RAM-based indices;
  • -
  • implementation indices stored in a database, via JDBC;
  • -
  • implementation of an index as a single file;
  • -
- - Directory locking is implemented by an instance of {@link - LockFactory}, and can be changed for each Directory - instance using {@link #setLockFactory}. - -
-
- - Holds the LockFactory instance (implements locking for - this Directory instance). - - - - For some Directory implementations ({@link - FSDirectory}, and its subclasses), this method - silently filters its results to include only index - files. Please use {@link #listAll} instead, which - does no filtering. - - - - Returns an array of strings, one for each file in the - directory. Unlike {@link #list} this method does no - filtering of the contents in a directory, and it will - never return null (throws IOException instead). - - Currently this method simply fallsback to {@link - #list} for Directory impls outside of Lucene's core & - contrib, but in 3.0 that method will be removed and - this method will become abstract. - - - - Returns true iff a file with the given name exists. - - - Returns the time the named file was last modified. - - - Set the modified time of an existing file to now. - - - Removes an existing file in the directory. - - - Renames an existing file in the directory. - If a file already exists with the new name, then it is replaced. - This replacement is not guaranteed to be atomic. - - - - - - Returns the length of a file in the directory. - - - Creates a new, empty file in the directory with the given name. - Returns a stream writing this file. - - - - Ensure that any writes to this file are moved to - stable storage. Lucene uses this to properly commit - changes to the index, to prevent a machine/OS crash - from corrupting the index. - - - - Returns a stream reading an existing file. - - - Returns a stream reading an existing file, with the - specified read buffer size. The particular Directory - implementation may ignore the buffer size. Currently - the only Directory implementations that respect this - parameter are {@link FSDirectory} and {@link - Lucene.Net.Index.CompoundFileReader}. - - - - Construct a {@link Lock}. - the name of the lock file - - - - Attempt to clear (forcefully unlock and remove) the - specified lock. Only call this at a time when you are - certain this lock is no longer in use. - - name of the lock to be cleared. - - - - Closes the store. - - - Set the LockFactory that this Directory instance should - use for its locking implementation. Each * instance of - LockFactory should only be used for one directory (ie, - do not share a single instance across multiple - Directories). - - - instance of {@link LockFactory}. - - - - Get the LockFactory that this Directory instance is - using for its locking implementation. Note that this - may be null for Directory implementations that provide - their own locking implementation. - - - - Return a string identifier that uniquely differentiates - this Directory instance from other Directory instances. - This ID should be the same if two Directory instances - (even in different JVMs and/or on different machines) - are considered "the same index". This is how locking - "scopes" to the right index. - - - - Copy contents of a directory src to a directory dest. - If a file in src already exists in dest then the - one in dest will be blindly overwritten. - -

NOTE: the source directory cannot change - while this method is running. Otherwise the results - are undefined and you could easily hit a - FileNotFoundException. - -

NOTE: this method only copies files that look - like index files (ie, have extensions matching the - known extensions of index files). - -

- source directory - - destination directory - - if true, call {@link #Close()} method on source directory - - IOException -
- - AlreadyClosedException if this Directory is closed - - - This cache of directories ensures that there is a unique Directory - instance per path, so that synchronization on the Directory can be used to - synchronize access between readers and writers. We use - refcounts to ensure when the last use of an FSDirectory - instance for a given canonical path is closed, we remove the - instance from the cache. See LUCENE-776 - for some relevant discussion. - - Not used by any non-deprecated methods anymore - - - - Set whether Lucene's use of lock files is disabled. By default, - lock files are enabled. They should only be disabled if the index - is on a read-only medium like a CD-ROM. - - Use a {@link #open(File, LockFactory)} or a constructor - that takes a {@link LockFactory} and supply - {@link NoLockFactory#getNoLockFactory}. This setting does not work - with {@link #open(File)} only the deprecated getDirectory - respect this setting. - - - - Returns whether Lucene's use of lock files is disabled. - true if locks are disabled, false if locks are enabled. - - - - Use a constructor that takes a {@link LockFactory} and - supply {@link NoLockFactory#getNoLockFactory}. - - - - The default class which implements filesystem-based directories. - - - A buffer optionally used in renameTo method - - - Returns the directory instance for the named location. - - - Use {@link #Open(File)} - - - the path to the directory. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use {@link #Open(File, LockFactory)} - - - the path to the directory. - - instance of {@link LockFactory} providing the - locking implementation. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use {@link #Open(File)} - - - the path to the directory. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use {@link #Open(File)} - - - the path to the directory. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use {@link #Open(File, LockFactory)} - - - the path to the directory. - - instance of {@link LockFactory} providing the - locking implementation. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use {@link #Open(File, LockFactory)} - - - the path to the directory. - - instance of {@link LockFactory} providing the - locking implementation. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use IndexWriter's create flag, instead, to - create a new index. - - - the path to the directory. - - if true, create, or erase any existing contents. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use IndexWriter's create flag, instead, to - create a new index. - - - the path to the directory. - - if true, create, or erase any existing contents. - - the FSDirectory for the named file. - - - - Returns the directory instance for the named location. - - - Use IndexWriter's create flag, instead, to - create a new index. - - - the path to the directory. - - if true, create, or erase any existing contents. - - the FSDirectory for the named file. - - - - - - - - Initializes the directory to create a new file with the given name. - This method should be used in {@link #createOutput}. - - - - The underlying filesystem directory - - - - - - - - - - - Create a new FSDirectory for the named location (ctor for subclasses). - the path of the directory - - the lock factory to use, or null for the default - ({@link NativeFSLockFactory}); - - IOException - - - Creates an FSDirectory instance, trying to pick the - best implementation given the current environment. - The directory returned uses the {@link NativeFSLockFactory}. - -

Currently this returns {@link SimpleFSDirectory} as - NIOFSDirectory is currently not supported. - -

Currently this returns {@link SimpleFSDirectory} as - NIOFSDirectory is currently not supported. - -

NOTE: this method may suddenly change which - implementation is returned from release to release, in - the event that higher performance defaults become - possible; if the precise implementation is important to - your application, please instantiate it directly, - instead. On 64 bit systems, it may also good to - return {@link MMapDirectory}, but this is disabled - because of officially missing unmap support in Java. - For optimal performance you should consider using - this implementation on 64 bit JVMs. - -

See above -

-
- - Creates an FSDirectory instance, trying to pick the - best implementation given the current environment. - The directory returned uses the {@link NativeFSLockFactory}. - -

Currently this returns {@link SimpleFSDirectory} as - NIOFSDirectory is currently not supported. - -

NOTE: this method may suddenly change which - implementation is returned from release to release, in - the event that higher performance defaults become - possible; if the precise implementation is important to - your application, please instantiate it directly, - instead. On 64 bit systems, it may also good to - return {@link MMapDirectory}, but this is disabled - because of officially missing unmap support in Java. - For optimal performance you should consider using - this implementation on 64 bit JVMs. - -

See above -

-
- - Just like {@link #Open(File)}, but allows you to - also specify a custom {@link LockFactory}. - - - - Lists all files (not subdirectories) in the - directory. This method never returns null (throws - {@link IOException} instead). - - - NoSuchDirectoryException if the directory - does not exist, or does exist but is not a - directory. - - IOException if list() returns null - - - Lists all files (not subdirectories) in the - directory. This method never returns null (throws - {@link IOException} instead). - - - NoSuchDirectoryException if the directory - does not exist, or does exist but is not a - directory. - - IOException if list() returns null - - - Lists all files (not subdirectories) in the - directory. - - - - - - Returns true iff a file with the given name exists. - - - Returns the time the named file was last modified. - - - Returns the time the named file was last modified. - - - Set the modified time of an existing file to now. - - - Returns the length in bytes of a file in the directory. - - - Removes an existing file in the directory. - - - Renames an existing file in the directory. - Warning: This is not atomic. - - - - - - Creates an IndexOutput for the file with the given name. - In 3.0 this method will become abstract. - - - - Creates an IndexInput for the file with the given name. - In 3.0 this method will become abstract. - - - - So we can do some byte-to-hexchar conversion below - - - Closes the store to future operations. - - - For debug output. - - - Default read chunk size. This is a conditional - default: on 32bit JVMs, it defaults to 100 MB. On - 64bit JVMs, it's Integer.MAX_VALUE. - - - - - - Sets the maximum number of bytes read at once from the - underlying file during {@link IndexInput#readBytes}. - The default value is {@link #DEFAULT_READ_CHUNK_SIZE}; - -

This was introduced due to Sun - JVM Bug 6478546, which throws an incorrect - OutOfMemoryError when attempting to read too many bytes - at once. It only happens on 32bit JVMs with a large - maximum heap size.

- -

Changes to this value will not impact any - already-opened {@link IndexInput}s. You should call - this before attempting to open an index on the - directory.

- -

NOTE: This value should be as large as - possible to reduce any possible performance impact. If - you still encounter an incorrect OutOfMemoryError, - trying lowering the chunk size.

-

-
- - The maximum number of bytes to read at once from the - underlying file during {@link IndexInput#readBytes}. - - - - - - Use SimpleFSDirectory.SimpleFSIndexInput instead - - - - Base implementation class for buffered {@link IndexInput}. - - - Abstract base class for input from a file in a {@link Directory}. A - random-access input stream. Used for all Lucene index input operations. - - - - - - Reads and returns a single byte. - - - - - Reads a specified number of bytes into an array at the specified offset. - the array to read bytes into - - the offset in the array to start storing bytes - - the number of bytes to read - - - - - - Reads a specified number of bytes into an array at the - specified offset with control over whether the read - should be buffered (callers who have their own buffer - should pass in "false" for useBuffer). Currently only - {@link BufferedIndexInput} respects this parameter. - - the array to read bytes into - - the offset in the array to start storing bytes - - the number of bytes to read - - set to false if the caller will handle - buffering. - - - - - - Reads four bytes and returns an int. - - - - - Reads an int stored in variable-length format. Reads between one and - five bytes. Smaller values take fewer bytes. Negative numbers are not - supported. - - - - - - Reads eight bytes and returns a long. - - - - - Reads a long stored in variable-length format. Reads between one and - nine bytes. Smaller values take fewer bytes. Negative numbers are not - supported. - - - - Call this if readString should read characters stored - in the old modified UTF8 format (length in java chars - and java's modified UTF8 encoding). This is used for - indices written pre-2.4 See LUCENE-510 for details. - - - - Reads a string. - - - - - Reads Lucene's old "modified UTF-8" encoded - characters into an array. - - the array to read characters into - - the offset in the array to start storing characters - - the number of characters to read - - - - -- please use readString or readBytes - instead, and construct the string - from those utf8 bytes - - - - Expert - - Similar to {@link #ReadChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still - has to invoke {@link #ReadByte()} just as {@link #ReadChars(char[], int, int)} does, but it does not need a buffer to store anything - and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine - how many more bytes to read - - The number of chars to read - - this method operates on old "modified utf8" encoded - strings - - - - Closes the stream to futher operations. - - - Returns the current position in this file, where the next read will - occur. - - - - - - Sets current position in this file, where the next read will occur. - - - - - The number of bytes in the file. - - - Returns a clone of this stream. - -

Clones of a stream access the same data, and are positioned at the same - point as the stream they were cloned from. - -

Expert: Subclasses must ensure that clones may be positioned at - different points in the input from each other and from the stream they - were cloned from. -

-
- - Default buffer size - - - Inits BufferedIndexInput with a specific bufferSize - - - Change the buffer size used by this IndexInput - - - - - - - Expert: implements buffer refill. Reads bytes from the current position - in the input. - - the array to read bytes into - - the offset in the array to start storing bytes - - the number of bytes to read - - - - Expert: implements seek. Sets current position in this file, where the - next {@link #ReadInternal(byte[],int,int)} will occur. - - - - - - A straightforward implementation of {@link FSDirectory} - using java.io.RandomAccessFile. However, this class has - poor concurrent performance (multiple threads will - bottleneck) as it synchronizes when multiple threads - read from the same file. It's usually better to use - {@link NIOFSDirectory} or {@link MMapDirectory} instead. - - - - Create a new SimpleFSDirectory for the named location. - - - the path of the directory - - the lock factory to use, or null for the default. - - IOException - - - Create a new SimpleFSDirectory for the named location. - - - the path of the directory - - the lock factory to use, or null for the default. - - IOException - - - Create a new SimpleFSDirectory for the named location and the default lock factory. - - - the path of the directory - - IOException - - - - - - - Create a new SimpleFSDirectory for the named location and the default lock factory. - - - the path of the directory - - IOException - - - Creates an IndexOutput for the file with the given name. - - - Creates an IndexInput for the file with the given name. - - - Please use ctor taking chunkSize - - - - Please use ctor taking chunkSize - - - - IndexInput methods - - - Method used for testing. Returns true if the underlying - file descriptor is valid. - - - - Base implementation class for buffered {@link IndexOutput}. - - - Abstract base class for output to a file in a Directory. A random-access - output stream. Used for all Lucene index output operations. - - - - - - - - Writes a single byte. - - - - - Writes an array of bytes. - the bytes to write - - the number of bytes to write - - - - - - Writes an array of bytes. - the bytes to write - - the offset in the byte array - - the number of bytes to write - - - - - - Writes an int as four bytes. - - - - - Writes an int in a variable-length format. Writes between one and - five bytes. Smaller values take fewer bytes. Negative numbers are not - supported. - - - - - - Writes a long as eight bytes. - - - - - Writes an long in a variable-length format. Writes between one and five - bytes. Smaller values take fewer bytes. Negative numbers are not - supported. - - - - - - Writes a string. - - - - - Writes a sub sequence of characters from s as the old - format (modified UTF-8 encoded bytes). - - the source of the characters - - the first character in the sequence - - the number of characters in the sequence - - -- please pre-convert to utf8 bytes - instead or use {@link #writeString} - - - - Writes a sub sequence of characters from char[] as - the old format (modified UTF-8 encoded bytes). - - the source of the characters - - the first character in the sequence - - the number of characters in the sequence - - -- please pre-convert to utf8 bytes instead or use {@link #writeString} - - - - Copy numBytes bytes from input to ourself. - - - Forces any buffered output to be written. - - - Closes this stream to further operations. - - - Returns the current position in this file, where the next write will - occur. - - - - - - Sets current position in this file, where the next write will occur. - - - - - The number of bytes in the file. - - - Set the file length. By default, this method does - nothing (it's optional for a Directory to implement - it). But, certain Directory implementations (for - - can use this to inform the - underlying IO system to pre-allocate the file to the - specified size. If the length is longer than the - current file length, the bytes added to the file are - undefined. Otherwise the file is truncated. - - file length - - - - Writes a single byte. - - - - - Writes an array of bytes. - the bytes to write - - the number of bytes to write - - - - - - Forces any buffered output to be written. - - - Expert: implements buffer write. Writes bytes at the current position in - the output. - - the bytes to write - - the number of bytes to write - - - - Expert: implements buffer write. Writes bytes at the current position in - the output. - - the bytes to write - - the offset in the byte array - - the number of bytes to write - - - - Closes this stream to further operations. - - - Returns the current position in this file, where the next write will - occur. - - - - - - Sets current position in this file, where the next write will occur. - - - - - The number of bytes in the file. - - - output methods: - - - Random-access methods - - - - - - - - - - - - - - - - - - - Use SimpleFSDirectory.SimpleFSIndexOutput instead - - - - - - - - Expert: - Public for extension only - - - - Expert: an enumeration of span matches. Used to implement span searching. - Each span represents a range of term positions within a document. Matches - are enumerated in order, by increasing document number, within that by - increasing start position and finally by increasing end position. - - - - Move to the next match, returning true iff any such exists. - - - Skips to the first match beyond the current, whose document number is - greater than or equal to target.

Returns true iff there is such - a match.

Behaves as if written:

-            boolean skipTo(int target) {
-            do {
-            if (!next())
-            return false;
-            } while (target > doc());
+            // accept docs out of order (for a BitSet it doesn't matter)
+            public boolean acceptsDocsOutOfOrder() {
             return true;
             }
+            
+            public void collect(int doc) {
+            bits.set(doc + docBase);
+            }
+            
+            public void setNextReader(IndexReader reader, int docBase) {
+            this.docBase = docBase;
+            }
+            });
             
- Most implementations are considerably more efficient than that. -
-
- - Returns the document number of the current match. Initially invalid. - - - Returns the start position of the current match. Initially invalid. - - - Returns the end position of the current match. Initially invalid. - - - Returns the payload data for the current span. - This is invalid until {@link #Next()} is called for - the first time. - This method must not be called more than once after each call - of {@link #Next()}. However, most payloads are loaded lazily, - so if the payload data for the current position is not needed, - this method may not be called at all for performance reasons. An ordered - SpanQuery does not lazy load, so if you have payloads in your index and - you do not want ordered SpanNearQuerys to collect payloads, you can - disable collection with a constructor option.
- Note that the return type is a collection, thus the ordering should not be relied upon. -
-

- WARNING: The status of the Payloads feature is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case.

+

Not all collectors will need to rebase the docID. For + example, a collector that simply counts the total number + of hits would skip it.

+ +

NOTE: Prior to 2.9, Lucene silently filtered + out hits with score <= 0. As of 2.9, the core Collectors + no longer do that. It's very unusual to have such hits + (a negative query boost, or function query returning + negative custom scores, could cause it to happen). If + you need that behavior, use {@link + PositiveScoresOnlyCollector}.

+ +

NOTE: This API is experimental and might change + in incompatible ways in the next release.

- a List of byte arrays containing the data of this payload, otherwise null if isPayloadAvailable is false - - java.io.IOException + 2.9 +
- - Checks if a payload can be loaded at this position. + + Called before successive calls to {@link #Collect(int)}. Implementations + that need the score of the current document (passed-in to + {@link #Collect(int)}), should save the passed-in Scorer and call + scorer.score() when needed. + + + + Called once for every document matching a query, with the unbased document + number. +

- Payloads can only be loaded once per call to - {@link #Next()}. + Note: This is called in an inner search loop. For good search performance, + implementations of this method should not call {@link Searcher#Doc(int)} or + {@link Lucene.Net.Index.IndexReader#Document(int)} on every hit. + Doing so can slow searches by an order of magnitude or more. +

+
+ + Called before collecting from each IndexReader. All doc ids in + {@link #Collect(int)} will correspond to reader. + + Add docBase to the current IndexReaders internal document id to re-base ids + in {@link #Collect(int)}. - true if there is a payload available at this position that can be loaded - + next IndexReader + + + + + + + * Return true if this collector does not + * require the matching docIDs to be delivered in int sort + * order (smallest to largest) to {@link #collect}. + * + *

Most Lucene Query implementations will visit + * matching docIDs in order. However, some queries + * (currently limited to certain cases of {@link + * BooleanQuery}) can achieve faster searching if the + * Collector allows them to deliver the + * docIDs out of order. + * + *

Many collectors don't mind getting docIDs out of + * order, so it's important to return true + * here. + * +

+ +
+ + This class is very similar to + {@link Lucene.Net.Search.Spans.SpanNearQuery} except that it factors + in the value of the payloads located at each of the positions where the + {@link Lucene.Net.Search.Spans.TermSpans} occurs. +

+ In order to take advantage of this, you must override + {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} + which returns 1 by default. +

+ Payload scores are aggregated using a pluggable {@link PayloadFunction}. + +

+ + +
+ + Matches spans which are near one another. One can specify slop, the + maximum number of intervening unmatched positions, as well as whether + matches are required to be in-order. + + + + Base class for span-based queries. + + + Expert: Returns the matches for this query in an index. Used internally + to search for spans. + + + + Returns the name of the field matched by this query. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + + Construct a SpanNearQuery. Matches spans matching a span from each + clause, with up to slop total unmatched positions between + them. * When inOrder is true, the spans from each clause + must be * ordered as in clauses. + + + + Return the clauses whose spans are matched. + + + Return the maximum number of intervening unmatched positions permitted. + + + Return true if matches are required to be in-order. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + + Returns true iff o is equal to this. + + + Expert-only. Public for use by other weight implementations Public for extension only. @@ -1534,202 +1031,4502 @@ use {@link #DocID()} instead. - - Constrains search results to only match those which also match a provided - query. Also provides position information about where each document matches - at the cost of extra space compared with the QueryWrapperFilter. - There is an added cost to this above what is stored in a {@link QueryWrapperFilter}. Namely, - the position information for each matching document is stored. -

- This filter does not cache. See the {@link Lucene.Net.Search.CachingSpanFilter} for a wrapper that - caches. - + +

By default, uses the {@link PayloadFunction} to score the payloads, but + can be overridden to do other things. - $Id:$ - + The payloads + + The start position of the span being scored + + The end position of the span being scored + + + + - - Abstract base class providing a mechanism to restrict searches to a subset - of an index and also maintains and returns position information. - This is useful if you want to compare the positions from a SpanQuery with the positions of items in - a filter. For instance, if you had a SpanFilter that marked all the occurrences of the word "foo" in documents, - and then you entered a new SpanQuery containing bar, you could not only filter by the word foo, but you could - then compare position information for post processing. + +

A {@link Query} that matches numeric values within a + specified range. To use this, you must first index the + numeric values using {@link NumericField} (expert: {@link + NumericTokenStream}). If your terms are instead textual, + you should use {@link TermRangeQuery}. {@link + NumericRangeFilter} is the filter equivalent of this + query.

+ +

You create a new NumericRangeQuery with the static + factory methods, eg: + +

+            Query q = NumericRangeQuery.newFloatRange("weight",
+            new Float(0.3f), new Float(0.10f),
+            true, true);
+            
+ + matches all documents whose float valued "weight" field + ranges from 0.3 to 0.10, inclusive. + +

The performance of NumericRangeQuery is much better + than the corresponding {@link TermRangeQuery} because the + number of terms that must be searched is usually far + fewer, thanks to trie indexing, described below.

+ +

You can optionally specify a precisionStep + when creating this query. This is necessary if you've + changed this configuration from its default (4) during + indexing. Lower values consume more disk space but speed + up searching. Suitable values are between 1 and + 8. A good starting point to test is 4, + which is the default value for all Numeric* + classes. See below for + details. + +

This query defaults to {@linkplain + MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for + 32 bit (int/float) ranges with precisionStep <8 and 64 + bit (long/double) ranges with precisionStep <6. + Otherwise it uses {@linkplain + MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the + number of terms is likely to be high. With precision + steps of <4, this query can be run with one of the + BooleanQuery rewrite methods without changing + BooleanQuery's default max clause count. + +

NOTE: This API is experimental and + might change in incompatible ways in the next release. + +

How it works

+ +

See the publication about panFMP, + where this algorithm was described (referred to as TrieRangeQuery): + +

Schindler, U, Diepenbroek, M, 2008. + Generic XML-based Framework for Metadata Portals. + Computers & Geosciences 34 (12), 1947-1955. + doi:10.1016/j.cageo.2008.02.023
+ +

A quote from this paper: Because Apache Lucene is a full-text + search engine and not a conventional database, it cannot handle numerical ranges + (e.g., field value is inside user defined bounds, even dates are numerical values). + We have developed an extension to Apache Lucene that stores + the numerical values in a special string-encoded format with variable precision + (all numerical values like doubles, longs, floats, and ints are converted to + lexicographic sortable string representations and stored with different precisions + (for a more detailed description of how the values are stored, + see {@link NumericUtils}). A range is then divided recursively into multiple intervals for searching: + The center of the range is searched only with the lowest possible precision in the trie, + while the boundaries are matched more exactly. This reduces the number of terms dramatically.

+ +

For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that + uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the + lowest precision. Overall, a range could consist of a theoretical maximum of + 7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an + 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used + because it would always be possible to reduce the full 256 values to one term with degraded precision). + In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records + and a uniform value distribution).

+ +

Precision Step

+

You can choose any precisionStep when encoding values. + Lower step values mean more precisions and so more terms in index (and index gets larger). + On the other hand, the maximum number of terms to match reduces, which optimized query speed. + The formula to calculate the maximum term count is: +

+            n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
+            
+

(this formula is only correct, when bitsPerValue/precisionStep is an integer; + in other cases, the value must be rounded up and the last summand must contain the modulo of the division as + precision step). + For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision + step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking + in the term enum of the index. Because of this, the ideal precisionStep value can only + be found out by testing. Important: You can index with a lower precision step value and test search speed + using a multiple of the original step value.

+ +

Good values for precisionStep are depending on usage and data type: +

    +
  • The default for all data types is 4, which is used, when no precisionStep is given.
  • +
  • Ideal value in most cases for 64 bit data types (long, double) is 6 or 8.
  • +
  • Ideal value in most cases for 32 bit data types (int, float) is 4.
  • +
  • Steps >64 for long/double and >32 for int/float produces one token + per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used + to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as + precisionStep). Using {@link NumericField NumericFields} for sorting + is ideal, because building the field cache is much faster than with text-only numbers. + Sorting is also possible with range query optimized fields using one of the above precisionSteps.
  • +
+ +

Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed + that {@link TermRangeQuery} in boolean rewrite mode (with raised {@link BooleanQuery} clause count) + took about 30-40 secs to complete, {@link TermRangeQuery} in constant score filter rewrite mode took 5 secs + and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit + precision step). This query type was developed for a geographic portal, where the performance for + e.g. bounding boxes or exact date/time stamps is important.

+ +

+ 2.9 + + +
+ + Factory that creates a NumericRangeQuery, that queries a long + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - Abstract base class for restricting which documents may be returned during searching. -

- Note: In Lucene 3.0 {@link #Bits(IndexReader)} will be removed - and {@link #GetDocIdSet(IndexReader)} will be defined as abstract. - All implementing classes must therefore implement {@link #GetDocIdSet(IndexReader)} - in order to work with Lucene 3.0. + +

Factory that creates a NumericRangeQuery, that queries a long + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - + + Factory that creates a NumericRangeQuery, that queries a int + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeQuery, that queries a int + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeQuery, that queries a double + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeQuery, that queries a double + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeQuery, that queries a float + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeQuery, that queries a float + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Returns the field name for this query + + + Returns true if the lower endpoint is inclusive + + + Returns true if the upper endpoint is inclusive + + + Returns the lower value of this range query + + + Returns the upper value of this range query + + - Creates a {@link DocIdSet} enumerating the documents that should be - permitted in search results. NOTE: null can be - returned if no documents are accepted by this Filter. -

- Note: This method will be called once per segment in - the index during searching. The returned {@link DocIdSet} - must refer to document IDs for that segment, not for - the top-level reader. - - @param reader a {@link IndexReader} instance opened on the index currently - searched on. Note, it is likely that the provided reader does not - represent the whole underlying index i.e. if the index has more than - one segment the given reader only represents a single segment. - + Lucene.Net specific. Needed for Serialization

- a DocIdSet that provides the documents which should be permitted or - prohibited in search results. NOTE: null can be returned if - no documents will be accepted by this Filter. - - - + +
- - Returns a SpanFilterResult with true for documents which should be permitted in - search results, and false for those that should not and Spans for where the true docs match. - - The {@link Lucene.Net.Index.IndexReader} to load position and DocIdSet information from - - A {@link SpanFilterResult} - - java.io.IOException if there was an issue accessing the necessary information - + + + Lucene.Net specific. Needed for deserialization + + - - Constructs a filter which only matches documents matching - query. - - The {@link Lucene.Net.Search.Spans.SpanQuery} to use as the basis for the Filter. - - - - The results of a SpanQueryFilter. Wraps the BitSet and the position information from the SpanQuery - + + Subclass of FilteredTermEnum for enumerating all terms that match the + sub-ranges for trie range queries.

- NOTE: This API is still experimental and subject to change. + WARNING: This term enumeration is not guaranteed to be always ordered by + {@link Term#compareTo}. + The ordering depends on how {@link NumericUtils#splitLongRange} and + {@link NumericUtils#splitIntRange} generates the sub-ranges. For + {@link MultiTermQuery} ordering is not relevant. +

+
+ + Abstract class for enumerating a subset of all terms. +

Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it. +

+
+ + Abstract class for enumerating terms. +

Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it. +

+
+ + Increments the enumeration to the next element. True if one exists. + + + Returns the current Term in the enumeration. + + + Returns the docFreq of the current Term in the enumeration. + + + Closes the enumeration to further activity, freeing resources. + + + Skips terms to the first beyond the current whose value is + greater or equal to target.

Returns true iff there is such + an entry.

Behaves as if written:

+            public boolean skipTo(Term target) {
+            do {
+            if (!next())
+            return false;
+            } while (target > term());
+            return true;
+            }
+            
+ Some implementations *could* be considerably more efficient than a linear scan. + Check the implementation to be sure. +
+ This method is not performant and will be removed in Lucene 3.0. + Use {@link IndexReader#Terms(Term)} to create a new TermEnum positioned at a + given term. + +
+ + the current term + + + the delegate enum - to set this member use {@link #setEnum} + + + Equality compare on the term + + + Equality measure on the term + + + Indicates the end of the enumeration has been reached + + + use this method to set the actual TermEnum (e.g. in ctor), + it will be automatically positioned on the first matching term. + + + + Returns the docFreq of the current Term in the enumeration. + Returns -1 if no Term matches or all terms have been enumerated. + + + + Increments the enumeration to the next element. True if one exists. + + + Returns the current Term in the enumeration. + Returns null if no Term matches or all terms have been enumerated. + + + + Closes the enumeration to further activity, freeing resources. + + + this is a dummy, it is not used by this class. + + + Compares if current upper bound is reached, + this also updates the term count for statistics. + In contrast to {@link FilteredTermEnum}, a return value + of false ends iterating the current enum + and forwards to the next sub-range. + + + + Increments the enumeration to the next element. True if one exists. + + + Closes the enumeration to further activity, freeing resources. + + + Expert: Callback for {@link #splitLongRange}. + You need to overwrite only one of the methods. +

NOTE: This is a very low-level interface, + the method signatures may change in later versions. +

+
+ + This is a helper class to generate prefix-encoded representations for numerical values + and supplies converters to represent float/double values as sortable integers/longs. + +

To quickly execute range queries in Apache Lucene, a range is divided recursively + into multiple intervals for searching: The center of the range is searched only with + the lowest possible precision in the trie, while the boundaries are matched + more exactly. This reduces the number of terms dramatically. + +

This class generates terms to achive this: First the numerical integer values need to + be converted to strings. For that integer values (32 bit or 64 bit) are made unsigned + and the bits are converted to ASCII chars with each 7 bit. The resulting string is + sortable like the original integer value. Each value is also prefixed + (in the first char) by the shift value (number of bits removed) used + during encoding. + +

To also index floating point numbers, this class supplies two methods to convert them + to integer values by changing their bit layout: {@link #doubleToSortableLong}, + {@link #floatToSortableInt}. You will have no precision loss by + converting floating point numbers to integers and back (only that the integer form + is not usable). Other data types like dates can easily converted to longs or ints (e.g. + date to long: {@link java.util.Date#getTime}). + +

For easy usage, the trie algorithm is implemented for indexing inside + {@link NumericTokenStream} that can index int, long, + float, and double. For querying, + {@link NumericRangeQuery} and {@link NumericRangeFilter} implement the query part + for the same data types. + +

This class can also be used, to generate lexicographically sortable (according + {@link String#compareTo(String)}) representations of numeric data types for other + usages (e.g. sorting). + +

NOTE: This API is experimental and + might change in incompatible ways in the next release. + +

+ 2.9 + +
+ + The default precision step used by {@link NumericField}, {@link NumericTokenStream}, + {@link NumericRangeQuery}, and {@link NumericRangeFilter} as default + + + + Expert: The maximum term length (used for char[] buffer size) + for encoding long values. + + + + + + Expert: The maximum term length (used for char[] buffer size) + for encoding int values. + + + + + + Expert: Longs are stored at lower precision by shifting off lower bits. The shift count is + stored as SHIFT_START_LONG+shift in the first character + + + + Expert: Integers are stored at lower precision by shifting off lower bits. The shift count is + stored as SHIFT_START_INT+shift in the first character + + + + Expert: Returns prefix coded bits after reducing the precision by shift bits. + This is method is used by {@link NumericTokenStream}. + + the numeric value + + how many bits to strip from the right + + that will contain the encoded chars, must be at least of {@link #BUF_SIZE_LONG} + length + + number of chars written to buffer + + + + Expert: Returns prefix coded bits after reducing the precision by shift bits. + This is method is used by {@link LongRangeBuilder}. + + the numeric value + + how many bits to strip from the right + + + + This is a convenience method, that returns prefix coded bits of a long without + reducing the precision. It can be used to store the full precision value as a + stored field in index. +

To decode, use {@link #prefixCodedToLong}. +

+
+ + Expert: Returns prefix coded bits after reducing the precision by shift bits. + This is method is used by {@link NumericTokenStream}. + + the numeric value + + how many bits to strip from the right + + that will contain the encoded chars, must be at least of {@link #BUF_SIZE_INT} + length + + number of chars written to buffer + + + + Expert: Returns prefix coded bits after reducing the precision by shift bits. + This is method is used by {@link IntRangeBuilder}. + + the numeric value + + how many bits to strip from the right + + + + This is a convenience method, that returns prefix coded bits of an int without + reducing the precision. It can be used to store the full precision value as a + stored field in index. +

To decode, use {@link #prefixCodedToInt}. +

+
+ + Returns a long from prefixCoded characters. + Rightmost bits will be zero for lower precision codes. + This method can be used to decode e.g. a stored field. + + NumberFormatException if the supplied string is + not correctly prefix encoded. + + + + + + Returns an int from prefixCoded characters. + Rightmost bits will be zero for lower precision codes. + This method can be used to decode e.g. a stored field. + + NumberFormatException if the supplied string is + not correctly prefix encoded. + + + + + + Converts a double value to a sortable signed long. + The value is converted by getting their IEEE 754 floating-point "double format" + bit layout and then some bits are swapped, to be able to compare the result as long. + By this the precision is not reduced, but the value can easily used as a long. + + + + + + Convenience method: this just returns: + longToPrefixCoded(doubleToSortableLong(val)) + + + + Converts a sortable long back to a double. + + + + + Convenience method: this just returns: + sortableLongToDouble(prefixCodedToLong(val)) + + + + Converts a float value to a sortable signed int. + The value is converted by getting their IEEE 754 floating-point "float format" + bit layout and then some bits are swapped, to be able to compare the result as int. + By this the precision is not reduced, but the value can easily used as an int. + + + + + + Convenience method: this just returns: + intToPrefixCoded(floatToSortableInt(val)) + + + + Converts a sortable int back to a float. + + + + + Convenience method: this just returns: + sortableIntToFloat(prefixCodedToInt(val)) + + + + Expert: Splits a long range recursively. + You may implement a builder that adds clauses to a + {@link Lucene.Net.Search.BooleanQuery} for each call to its + {@link LongRangeBuilder#AddRange(String,String)} + method. +

This method is used by {@link NumericRangeQuery}. +

+
+ + Expert: Splits an int range recursively. + You may implement a builder that adds clauses to a + {@link Lucene.Net.Search.BooleanQuery} for each call to its + {@link IntRangeBuilder#AddRange(String,String)} + method. +

This method is used by {@link NumericRangeQuery}. +

+
+ + This helper does the splitting for both 32 and 64 bit. + + + Helper that delegates to correct range builder + + + Expert: Callback for {@link #splitLongRange}. + You need to overwrite only one of the methods. +

NOTE: This is a very low-level interface, + the method signatures may change in later versions. +

+
+ + Overwrite this method, if you like to receive the already prefix encoded range bounds. + You can directly build classical (inclusive) range queries from them. + + + + Overwrite this method, if you like to receive the raw long range bounds. + You can use this for e.g. debugging purposes (print out range bounds). + + + + Expert: Callback for {@link #splitIntRange}. + You need to overwrite only one of the methods. +

NOTE: This is a very low-level interface, + the method signatures may change in later versions. +

+
+ + Overwrite this method, if you like to receive the already prefix encoded range bounds. + You can directly build classical range (inclusive) queries from them. + + + + Overwrite this method, if you like to receive the raw int range bounds. + You can use this for e.g. debugging purposes (print out range bounds). + + + + Expert: obtains the ordinal of the field value from the default Lucene + {@link Lucene.Net.Search.FieldCache Fieldcache} using getStringIndex(). +

+ The native lucene index order is used to assign an ordinal value for each field value. +

+ Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. +

+ Example: +
If there were only three field values: "apple","banana","pear" +
then ord("apple")=1, ord("banana")=2, ord("pear")=3 +

+ WARNING: + ord() depends on the position in an index and can thus change + when other documents are inserted or deleted, + or if a MultiSearcher is used. + +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

+

+
+ + Expert: source of values for basic function queries. +

At its default/simplest form, values - one per doc - are used as the score of that doc. +

Values are instantiated as + {@link Lucene.Net.Search.Function.DocValues DocValues} for a particular reader. +

ValueSource implementations differ in RAM requirements: it would always be a factor + of the number of documents, but for each document the number of bytes can be 1, 2, 4, or 8. + +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case.

- - - + + Return the DocValues used by the function query. + the IndexReader used to read these values. + If any caching is involved, that caching would also be IndexReader based. + + IOException for any error. - - - The bits for the Filter - - A List of {@link Lucene.Net.Search.SpanFilterResult.PositionInfo} objects - - Use {@link #SpanFilterResult(DocIdSet, List)} instead - + + description of field, used in explain() - - - The DocIdSet for the Filter - - A List of {@link Lucene.Net.Search.SpanFilterResult.PositionInfo} objects + + Needed for possible caching of query results - used by {@link ValueSourceQuery#equals(Object)}. + + + + + Needed for possible caching of query results - used by {@link ValueSourceQuery#hashCode()}. + + + + + Constructor for a certain field. + field whose values order is used. - - The first entry in the array corresponds to the first "on" bit. - Entries are increasing by document order - - A List of PositionInfo objects - - - - Use {@link #GetDocIdSet()} - - - - Returns the docIdSet - - - - A List of {@link Lucene.Net.Search.SpanFilterResult.StartEnd} objects - - - - - The end position of this match - - - - The Start position - The start position of this match - - - - Expert: Compares two ScoreDoc objects for sorting. + + Expert: represents field values as different types. + Normally created via a + {@link Lucene.Net.Search.Function.ValueSource ValueSuorce} + for a particular field and reader. -

Created: Feb 3, 2004 9:00:16 AM +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + + +

+
+ + Return doc value as a float. +

Mandatory: every DocValues implementation must implement at least this method. +

+ document whose float value is requested. + +
+ + Return doc value as an int. +

Optional: DocValues implementation can (but don't have to) override this method. +

+ document whose int value is requested. + +
+ + Return doc value as a long. +

Optional: DocValues implementation can (but don't have to) override this method. +

+ document whose long value is requested. + +
+ + Return doc value as a double. +

Optional: DocValues implementation can (but don't have to) override this method. +

+ document whose double value is requested. + +
+ + Return doc value as a string. +

Optional: DocValues implementation can (but don't have to) override this method. +

+ document whose string value is requested. + +
+ + Return a string representation of a doc value, as reuired for Explanations. + + + Explain the scoring value for the input doc. + + + Expert: for test purposes only, return the inner array of values, or null if not applicable. +

+ Allows tests to verify that loaded values are: +

    +
  1. indeed cached/reused.
  2. +
  3. stored in the expected size/type (byte/short/int/float).
  4. +
+ Note: implementations of DocValues must override this method for + these test elements to be tested, Otherwise the test would not fail, just + print a warning. +
+
+ + Returns the minimum of all values or Float.NaN if this + DocValues instance does not contain any value. +

+ This operation is optional +

+ +

+ the minimum of all values or Float.NaN if this + DocValues instance does not contain any value. + +
+ + Returns the maximum of all values or Float.NaN if this + DocValues instance does not contain any value. +

+ This operation is optional +

+ +

+ the maximum of all values or Float.NaN if this + DocValues instance does not contain any value. + +
+ + Returns the average of all values or Float.NaN if this + DocValues instance does not contain any value. * +

+ This operation is optional +

+ +

+ the average of all values or Float.NaN if this + DocValues instance does not contain any value + +
+ + This class wraps another ValueSource, but protects + against accidental double RAM usage in FieldCache when + a composite reader is passed to {@link #getValues}. + +

NOTE: this class adds a CPU penalty to every + lookup, as it must resolve the incoming document to the + right sub-reader using a binary search.

+ +

+ This class is temporary, to ease the + migration to segment-based searching. Please change your + code to not pass composite readers to these APIs. + +
+ + Expert: Maintains caches of term values. + +

Created: May 19, 2004 11:13:14 AM

lucene 1.4 - $Id: ScoreDocComparator.java 738219 2009-01-27 20:15:21Z mikemccand $ + $Id: FieldCache.java 807841 2009-08-25 22:27:31Z markrmiller $ - use {@link FieldComparator} + + +
+ + Expert: Stores term text values and document ordering data. + + + All the term values, in natural order. + + + For each document, an index into the lookup array. + + + Creates one of these objects + + + EXPERT: A unique Identifier/Description for each item in the FieldCache. + Can be useful for logging/debugging. +

+ EXPERIMENTAL API: This API is considered extremely advanced + and experimental. It may be removed or altered w/o warning in future + releases + of Lucene. +

+

+
+ + + + + + Computes (and stores) the estimated size of the cache Value + + + + + The most recently estimated size of the value, null unless + estimateSize has been called. + + + + Indicator for StringIndex values in the cache. + + + Expert: The cache used internally by sorting and range query classes. + + + The default parser for byte values, which are encoded by {@link Byte#toString(byte)} + + + The default parser for short values, which are encoded by {@link Short#toString(short)} + + + The default parser for int values, which are encoded by {@link Integer#toString(int)} + + + The default parser for float values, which are encoded by {@link Float#toString(float)} + + + The default parser for long values, which are encoded by {@link Long#toString(long)} + + + The default parser for double values, which are encoded by {@link Double#toString(double)} + + + A parser instance for int values encoded by {@link NumericUtils#IntToPrefixCoded(int)}, e.g. when indexed + via {@link NumericField}/{@link NumericTokenStream}. + + + + A parser instance for float values encoded with {@link NumericUtils}, e.g. when indexed + via {@link NumericField}/{@link NumericTokenStream}. + + + + A parser instance for long values encoded by {@link NumericUtils#LongToPrefixCoded(long)}, e.g. when indexed + via {@link NumericField}/{@link NumericTokenStream}. + + + + A parser instance for double values encoded with {@link NumericUtils}, e.g. when indexed + via {@link NumericField}/{@link NumericTokenStream}. + + + + Interface to parse bytes from document fields. + + + + + Marker interface as super-interface to all parsers. It + is used to specify a custom parser to {@link + SortField#SortField(String, FieldCache.Parser)}. + + + + Return a single Byte representation of this field's value. + + + Interface to parse shorts from document fields. + + + + + Return a short representation of this field's value. + + + Interface to parse ints from document fields. + + + + + Return an integer representation of this field's value. + + + Interface to parse floats from document fields. + + + + + Return an float representation of this field's value. + + + Interface to parse long from document fields. + + + Use {@link FieldCache.LongParser}, this will be removed in Lucene 3.0 - - Special comparator for sorting hits according to computed relevance (document score). + + Return an long representation of this field's value. - - Special comparator for sorting hits according to index order (document number). + + Interface to parse doubles from document fields. + + + Use {@link FieldCache.DoubleParser}, this will be removed in Lucene 3.0 + - - Compares two ScoreDoc objects and returns a result indicating their - sort order. + + Return an long representation of this field's value. + + + Checks the internal cache for an appropriate entry, and if none is + found, reads the terms in field as a single byte and returns an array + of size reader.maxDoc() of the value each document + has in the given field. - First ScoreDoc + Used to get field values. - Second ScoreDoc + Which field contains the single byte values. - a negative integer if i should come before j
- a positive integer if i should come after j
- 0 if they are equal + The values in the given field for each document. - + IOException If any error occurs. +
+ + Checks the internal cache for an appropriate entry, and if none is found, + reads the terms in field as bytes and returns an array of + size reader.maxDoc() of the value each document has in the + given field. + + Used to get field values. + + Which field contains the bytes. + + Computes byte for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is + found, reads the terms in field as shorts and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + Used to get field values. + + Which field contains the shorts. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is found, + reads the terms in field as shorts and returns an array of + size reader.maxDoc() of the value each document has in the + given field. + + Used to get field values. + + Which field contains the shorts. + + Computes short for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is + found, reads the terms in field as integers and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + Used to get field values. + + Which field contains the integers. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is found, + reads the terms in field as integers and returns an array of + size reader.maxDoc() of the value each document has in the + given field. + + Used to get field values. + + Which field contains the integers. + + Computes integer for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if + none is found, reads the terms in field as floats and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + Used to get field values. + + Which field contains the floats. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if + none is found, reads the terms in field as floats and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + Used to get field values. + + Which field contains the floats. + + Computes float for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is + found, reads the terms in field as longs and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + + Used to get field values. + + Which field contains the longs. + + The values in the given field for each document. + + java.io.IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is found, + reads the terms in field as longs and returns an array of + size reader.maxDoc() of the value each document has in the + given field. + + + Used to get field values. + + Which field contains the longs. + + Computes integer for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is + found, reads the terms in field as integers and returns an array + of size reader.maxDoc() of the value each document + has in the given field. + + + Used to get field values. + + Which field contains the doubles. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none is found, + reads the terms in field as doubles and returns an array of + size reader.maxDoc() of the value each document has in the + given field. + + + Used to get field values. + + Which field contains the doubles. + + Computes integer for string values. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none + is found, reads the term values in field and returns an array + of size reader.maxDoc() containing the value each document + has in the given field. + + Used to get field values. + + Which field contains the strings. + + The values in the given field for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if none + is found reads the term values in field and returns + an array of them in natural order, along with an array telling + which element in the term array each document uses. + + Used to get field values. + + Which field contains the strings. + + Array of terms and index into the array for each document. + + IOException If any error occurs. + + + Checks the internal cache for an appropriate entry, and if + none is found reads field to see if it contains integers, longs, floats + or strings, and then calls one of the other methods in this class to get the + values. For string values, a StringIndex is returned. After + calling this method, there is an entry in the cache for both + type AUTO and the actual found type. + + Used to get field values. + + Which field contains the values. + + int[], long[], float[] or StringIndex. + + IOException If any error occurs. + Please specify the exact type, instead. + Especially, guessing does not work with the new + {@link NumericField} type. + + + + Checks the internal cache for an appropriate entry, and if none + is found reads the terms out of field and calls the given SortComparator + to get the sort values. A hit in the cache will happen if reader, + field, and comparator are the same (using equals()) + as a previous call to this method. + + Used to get field values. + + Which field contains the values. + + Used to convert terms into something to sort by. + + Array of sort objects, one for each document. + + IOException If any error occurs. + Please implement {@link + FieldComparatorSource} directly, instead. + + + + EXPERT: Generates an array of CacheEntry objects representing all items + currently in the FieldCache. +

+ NOTE: These CacheEntry objects maintain a strong refrence to the + Cached Values. Maintaining refrences to a CacheEntry the IndexReader + associated with it has garbage collected will prevent the Value itself + from being garbage collected when the Cache drops the WeakRefrence. +

+

+ EXPERIMENTAL API: This API is considered extremely advanced + and experimental. It may be removed or altered w/o warning in future + releases + of Lucene. +

+

+
+ +

+ EXPERT: Instructs the FieldCache to forcibly expunge all entries + from the underlying caches. This is intended only to be used for + test methods as a way to ensure a known base state of the Cache + (with out needing to rely on GC to free WeakReferences). + It should not be relied on for "Cache maintenance" in general + application code. +

+

+ EXPERIMENTAL API: This API is considered extremely advanced + and experimental. It may be removed or altered w/o warning in future + releases + of Lucene. +

+

+
+ + If non-null, FieldCacheImpl will warn whenever + entries are created that are not sane according to + {@link Lucene.Net.Util.FieldCacheSanityChecker}. + + + + counterpart of {@link #SetInfoStream(PrintStream)} + + + A range query that returns a constant score equal to its boost for + all documents in the exclusive range of terms. + +

It does not have an upper bound on the number of clauses covered in the range. + +

This query matches the documents looking for terms that fall into the + supplied range according to {@link String#compareTo(String)}. It is not intended + for numerical ranges, use {@link NumericRangeQuery} instead. + +

This query is hardwired to {@link MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}. + If you want to change this, use {@link TermRangeQuery} instead. + +

+ Use {@link TermRangeQuery} for term ranges or + {@link NumericRangeQuery} for numeric ranges instead. + This class will be removed in Lucene 3.0. + + $Id: ConstantScoreRangeQuery.java 797694 2009-07-25 00:03:33Z mikemccand $ + +
+ + A Query that matches documents within an exclusive range of terms. + +

This query matches the documents looking for terms that fall into the + supplied range according to {@link String#compareTo(String)}. It is not intended + for numerical ranges, use {@link NumericRangeQuery} instead. + +

This query uses the {@link + MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} + rewrite method. +

+ 2.9 + +
+ + Constructs a query selecting all terms greater/equal than lowerTerm + but less/equal than upperTerm. + +

+ If an endpoint is null, it is said + to be "open". Either or both endpoints may be open. Open endpoints may not + be exclusive (you can't select all but the first or last term without + explicitly specifying the term to exclude.) + +

+ The field that holds both lower and upper terms. + + The term text at the lower end of the range + + The term text at the upper end of the range + + If true, the lowerTerm is + included in the range. + + If true, the upperTerm is + included in the range. + +
+ + Constructs a query selecting all terms greater/equal than + lowerTerm but less/equal than upperTerm. +

+ If an endpoint is null, it is said + to be "open". Either or both endpoints may be open. Open endpoints may not + be exclusive (you can't select all but the first or last term without + explicitly specifying the term to exclude.) +

+ If collator is not null, it will be used to decide whether + index terms are within the given range, rather than using the Unicode code + point order in which index terms are stored. +

+ WARNING: Using this constructor and supplying a non-null + value in the collator parameter will cause every single + index Term in the Field referenced by lowerTerm and/or upperTerm to be + examined. Depending on the number of index Terms in this Field, the + operation could be very slow. + +

+ The Term text at the lower end of the range + + The Term text at the upper end of the range + + If true, the lowerTerm is + included in the range. + + If true, the upperTerm is + included in the range. + + The collator to use to collate index Terms, to determine + their membership in the range bounded by lowerTerm and + upperTerm. + +
+ + Returns the field name for this query + + + Returns the lower value of this range query + + + Returns the upper value of this range query + + + Returns true if the lower endpoint is inclusive + + + Returns true if the upper endpoint is inclusive + + + Returns the collator used to determine range inclusion, if any. + + + Prints a user-readable version of this query. + + + Changes of mode are not supported by this class (fixed to constant score rewrite mode) + + + Lucene's package information, including version. * + + + TermPositions provides an interface for enumerating the <document, + frequency, <position>* > tuples for a term.

The document and + frequency are the same as for a TermDocs. The positions portion lists the ordinal + positions of each occurrence of a term in a document. + +

+
- - Returns the value used to sort the given document. The - object returned must implement the java.io.Serializable - interface. This is used by multisearchers to determine how - to collate results from their searchers. + + TermDocs provides an interface for enumerating <document, frequency> + pairs for a term.

The document portion names each document containing + the term. Documents are indicated by number. The frequency portion gives + the number of times the term occurred in each document.

The pairs are + ordered by document number.

- + - Document - - Serializable object +
+ + Sets this to the data for a term. + The enumeration is reset to the start of the data for this term. + + + + Sets this to the data for the current term in a {@link TermEnum}. + This may be optimized in some implementations. + + + + Returns the current document number.

This is invalid until {@link + #Next()} is called for the first time. +

+
+ + Returns the frequency of the term within the current document.

This + is invalid until {@link #Next()} is called for the first time. +

+
+ + Moves to the next pair in the enumeration.

Returns true iff there is + such a next pair in the enumeration. +

+
+ + Attempts to read multiple entries from the enumeration, up to length of + docs. Document numbers are stored in docs, and term + frequencies are stored in freqs. The freqs array must be as + long as the docs array. + +

Returns the number of entries read. Zero is only returned when the + stream has been exhausted. +

+
+ + Skips entries to the first beyond the current whose document number is + greater than or equal to target.

Returns true iff there is such + an entry.

Behaves as if written:

+            boolean skipTo(int target) {
+            do {
+            if (!next())
+            return false;
+            } while (target > doc());
+            return true;
+            }
+            
+ Some implementations are considerably more efficient than that. +
+
+ + Frees associated resources. + + + Returns next position in the current document. It is an error to call + this more than {@link #Freq()} times + without calling {@link #Next()}

This is + invalid until {@link #Next()} is called for + the first time. +

+
+ + Returns the length of the payload at the current term position. + This is invalid until {@link #NextPosition()} is called for + the first time.
+
+ length of the current payload in number of bytes
- - Returns the type of sort. Should return SortField.SCORE, - SortField.DOC, SortField.STRING, - SortField.INTEGER, SortField.FLOAT or - SortField.CUSTOM. It is not valid to return - SortField.AUTO. - This is used by multisearchers to determine how to collate results - from their searchers. + + Returns the payload data at the current term position. + This is invalid until {@link #NextPosition()} is called for + the first time. + This method must not be called more than once after each call + of {@link #NextPosition()}. However, payloads are loaded lazily, + so if the payload data for the current position is not needed, + this method may not be called at all for performance reasons.
+
- One of the constants in SortField. + the array into which the data of this payload is to be + stored, if it is big enough; otherwise, a new byte[] array + is allocated for this purpose. + + the offset in the array into which the data of this payload + is to be stored. + + a byte[] array containing the data of this payload - + IOException +
+ + Checks if a payload can be loaded at this position. +

+ Payloads can only be loaded once per call to + {@link #NextPosition()}. + +

+ true if there is a payload available at this position that can be loaded + +
+ + Optimized implementation. + + + Overridden by SegmentTermPositions to skip in prox stream. + + + Optimized implementation. + + + Called by super.skipTo(). + + + This is a {@link LogMergePolicy} that measures size of a + segment as the total byte size of the segment's files. + + + +

This class implements a {@link MergePolicy} that tries + to merge segments into levels of exponentially + increasing size, where each level has fewer segments than + the value of the merge factor. Whenever extra segments + (beyond the merge factor upper bound) are encountered, + all segments within the level are merged. You can get or + set the merge factor using {@link #GetMergeFactor()} and + {@link #SetMergeFactor(int)} respectively.

+ +

This class is abstract and requires a subclass to + define the {@link #size} method which specifies how a + segment's size is determined. {@link LogDocMergePolicy} + is one subclass that measures size by document count in + the segment. {@link LogByteSizeMergePolicy} is another + subclass that measures size as the total byte size of the + file(s) for the segment.

+

+
+ +

Expert: a MergePolicy determines the sequence of + primitive merge operations to be used for overall merge + and optimize operations.

+ +

Whenever the segments in an index have been altered by + {@link IndexWriter}, either the addition of a newly + flushed segment, addition of many segments from + addIndexes* calls, or a previous merge that may now need + to cascade, {@link IndexWriter} invokes {@link + #findMerges} to give the MergePolicy a chance to pick + merges that are now required. This method returns a + {@link MergeSpecification} instance describing the set of + merges that should be done, or null if no merges are + necessary. When IndexWriter.optimize is called, it calls + {@link #findMergesForOptimize} and the MergePolicy should + then return the necessary merges.

+ +

Note that the policy can return more than one merge at + a time. In this case, if the writer is using {@link + SerialMergeScheduler}, the merges will be run + sequentially but if it is using {@link + ConcurrentMergeScheduler} they will be run concurrently.

+ +

The default MergePolicy is {@link + LogByteSizeMergePolicy}.

+ +

NOTE: This API is new and still experimental + (subject to change suddenly in the next release)

+ +

NOTE: This class typically requires access to + package-private APIs (e.g. SegmentInfos) to do its job; + if you implement your own MergePolicy, you'll need to put + it in package Lucene.Net.Index in order to use + these APIs. +

+
+ + Determine what set of merge operations are now necessary on the index. + {@link IndexWriter} calls this whenever there is a change to the segments. + This call is always synchronized on the {@link IndexWriter} instance so + only one thread at a time will call this method. + + + the total set of segments in the index + + + + Determine what set of merge operations is necessary in order to optimize + the index. {@link IndexWriter} calls this when its + {@link IndexWriter#Optimize()} method is called. This call is always + synchronized on the {@link IndexWriter} instance so only one thread at a + time will call this method. + + + the total set of segments in the index + + requested maximum number of segments in the index (currently this + is always 1) + + contains the specific SegmentInfo instances that must be merged + away. This may be a subset of all SegmentInfos. + + + + Determine what set of merge operations is necessary in order to expunge all + deletes from the index. + + + the total set of segments in the index + + + + Release all resources for the policy. + + + Returns true if a newly flushed (not from merge) + segment should use the compound file format. + + + + Returns true if the doc store files should use the + compound file format. + + + + OneMerge provides the information necessary to perform + an individual primitive merge operation, resulting in + a single new segment. The merge spec includes the + subset of segments to be merged as well as whether the + new segment should use the compound file format. + + + + Record that an exception occurred while executing + this merge + + + + Retrieve previous exception set by {@link + #setException}. + + + + Mark this merge as aborted. If this is called + before the merge is committed then the merge will + not be committed. + + + + Returns true if this merge was aborted. + + + A MergeSpecification instance provides the information + necessary to perform multiple merges. It simply + contains a list of {@link OneMerge} instances. + + + + The subset of segments to be included in the primitive merge. + + + Exception thrown if there are any problems while + executing a merge. + + + + + Use {@link #MergePolicy.MergeException(String,Directory)} instead + + + + + Use {@link #MergePolicy.MergeException(Throwable,Directory)} instead + + + + Returns the {@link Directory} of the index that hit + the exception. + + + + Defines the allowed range of log(size) for each + level. A level is computed by taking the max segment + log size, minus LEVEL_LOG_SPAN, and finding all + segments falling within that range. + + + + Default merge factor, which is how many segments are + merged at a time + + + + Default maximum segment size. A segment of this size + + +

Returns the number of segments that are merged at + once and also controls the total number of segments + allowed to accumulate in the index.

+

+
+ + Determines how often segment indices are merged by + addDocument(). With smaller values, less RAM is used + while indexing, and searches on unoptimized indices are + faster, but indexing speed is slower. With larger + values, more RAM is used during indexing, and while + searches on unoptimized indices are slower, indexing is + faster. Thus larger values (> 10) are best for batch + index creation, and smaller values (< 10) for indices + that are interactively maintained. + + + + Sets whether compound file format should be used for + newly flushed and newly merged segments. + + + + Returns true if newly flushed and newly merge segments + + + + + Sets whether compound file format should be used for + newly flushed and newly merged doc store + segment files (term vectors and stored fields). + + + + Returns true if newly flushed and newly merge doc + store segment files (term vectors and stored fields) + + + + + + Sets whether the segment size should be calibrated by + the number of deletes when choosing segments for merge. + + + + Returns true if the segment size should be calibrated + by the number of deletes when choosing segments for merge. + + + + Returns true if this single info is optimized (has no + pending norms or deletes, is in the same dir as the + writer, and matches the current compound file setting + + + + Returns the merges necessary to optimize the index. + This merge policy defines "optimized" to mean only one + segment in the index, where that segment has no + deletions pending nor separate norms, and it is in + compound file format if the current useCompoundFile + setting is true. This method returns multiple merges + (mergeFactor at a time) so the {@link MergeScheduler} + in use may make use of concurrency. + + + + Finds merges necessary to expunge all deletes from the + index. We simply merge adjacent segments that have + deletes, up to mergeFactor at a time. + + + + Checks if any merges are now necessary and returns a + {@link MergePolicy.MergeSpecification} if so. A merge + is necessary when there are more than {@link + #setMergeFactor} segments at a given level. When + multiple levels have too many segments, this method + will return multiple merges, allowing the {@link + MergeScheduler} to use concurrency. + + + +

Determines the largest segment (measured by + document count) that may be merged with other segments. + Small values (e.g., less than 10,000) are best for + interactive indexing, as this limits the length of + pauses while indexing to a few seconds. Larger values + are best for batched indexing and speedier + searches.

+ +

The default value is {@link Integer#MAX_VALUE}.

+ +

The default merge policy ({@link + LogByteSizeMergePolicy}) also allows you to set this + limit by net size (in MB) of the segment, using {@link + LogByteSizeMergePolicy#setMaxMergeMB}.

+

+
+ + Returns the largest segment (measured by document + count) that may be merged with other segments. + + + + + + + + + + Default maximum segment size. A segment of this size + + + + +

Determines the largest segment (measured by total + byte size of the segment's files, in MB) that may be + merged with other segments. Small values (e.g., less + than 50 MB) are best for interactive indexing, as this + limits the length of pauses while indexing to a few + seconds. Larger values are best for batched indexing + and speedier searches.

+ +

Note that {@link #setMaxMergeDocs} is also + used to check whether a segment is too large for + merging (it's either or).

+

+
+ + Returns the largest segment (meaured by total byte + size of the segment's files, in MB) that may be merged + with other segments. + + + + + + Sets the minimum size for the lowest level segments. + Any segments below this size are considered to be on + the same level (even if they vary drastically in size) + and will be merged whenever there are mergeFactor of + them. This effectively truncates the "long tail" of + small segments that would otherwise be created into a + single level. If you set this too large, it could + greatly increase the merging cost during indexing (if + you flush many small segments). + + + + Get the minimum size for a segment to remain + un-merged. + + + + + + Bulk write a contiguous series of documents. The + lengths array is the length (in bytes) of each raw + document. The stream IndexInput is the + fieldsStream from which we should bulk-copy all + bytes. + + + + Access to the Fieldable Info file that describes document fields and whether or + not they are indexed. Each segment has a separate Fieldable Info file. Objects + of this class are thread-safe for multiple readers, but only one thread can + be adding documents at a time, with no other reader or writer threads + accessing this object. + + + + Construct a FieldInfos object using the directory and the name of the file + IndexInput + + The directory to open the IndexInput from + + The name of the file to open the IndexInput from in the Directory + + IOException + + + Returns a deep clone of this FieldInfos instance. + + + Adds field info for a Document. + + + Returns true if any fields do not omitTermFreqAndPositions + + + Add fields that are indexed. Whether they have termvectors has to be specified. + + + The names of the fields + + Whether the fields store term vectors or not + + true if positions should be stored. + + true if offsets should be stored + + + + Assumes the fields are not storing term vectors. + + + The names of the fields + + Whether the fields are indexed or not + + + + + + + Calls 5 parameter add with false for all TermVector parameters. + + + The name of the Fieldable + + true if the field is indexed + + + + + + Calls 5 parameter add with false for term vector positions and offsets. + + + The name of the field + + true if the field is indexed + + true if the term vector should be stored + + + + If the field is not yet known, adds it. If it is known, checks to make + sure that the isIndexed flag is the same as was given previously for this + field. If not - marks it as being indexed. Same goes for the TermVector + parameters. + + + The name of the field + + true if the field is indexed + + true if the term vector should be stored + + true if the term vector with positions should be stored + + true if the term vector with offsets should be stored + + + + If the field is not yet known, adds it. If it is known, checks to make + sure that the isIndexed flag is the same as was given previously for this + field. If not - marks it as being indexed. Same goes for the TermVector + parameters. + + + The name of the field + + true if the field is indexed + + true if the term vector should be stored + + true if the term vector with positions should be stored + + true if the term vector with offsets should be stored + + true if the norms for the indexed field should be omitted + + + + If the field is not yet known, adds it. If it is known, checks to make + sure that the isIndexed flag is the same as was given previously for this + field. If not - marks it as being indexed. Same goes for the TermVector + parameters. + + + The name of the field + + true if the field is indexed + + true if the term vector should be stored + + true if the term vector with positions should be stored + + true if the term vector with offsets should be stored + + true if the norms for the indexed field should be omitted + + true if payloads should be stored for this field + + true if term freqs should be omitted for this field + + + + Return the fieldName identified by its number. + + + + + the fieldName or an empty string when the field + with the given number doesn't exist. + + + + Return the fieldinfo object referenced by the fieldNumber. + + + the FieldInfo object or null when the given fieldNumber + doesn't exist. + + + + Implements the skip list writer for the default posting list format + that stores positions and payloads. + + + + + This abstract class writes skip lists with multiple levels. + + Example for skipInterval = 3: + c (skip level 2) + c c c (skip level 1) + x x x x x x x x x x (skip level 0) + d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d (posting list) + 3 6 9 12 15 18 21 24 27 30 (df) + + d - document + x - skip data + c - skip data with child pointer + + Skip level i contains every skipInterval-th entry from skip level i-1. + Therefore the number of entries on level i is: floor(df / ((skipInterval ^ (i + 1))). + + Each skip entry on a level i>0 contains a pointer to the corresponding skip entry in list i-1. + This guarantess a logarithmic amount of skips to find the target document. + + While this class takes care of writing the different skip levels, + subclasses must define the actual format of the skip data. + + + + + Subclasses must implement the actual skip data encoding in this method. + + + the level skip data shall be writting for + + the skip buffer to write to + + + + Writes the current skip data to the buffers. The current document frequency determines + the max level is skip data is to be written to. + + + the current document frequency + + IOException + + + Writes the buffered skip lists to the given output. + + + the IndexOutput the skip lists shall be written to + + the pointer the skip list starts + + + + Sets the values for the current skip data. + + + This exception is thrown when Lucene detects + an inconsistency in the index. + + + + + Base class for enumerating all but deleted docs. + +

NOTE: this class is meant only to be used internally + by Lucene; it's only public so it can be shared across + packages. This means the API is freely subject to + change, and, the class could be removed entirely, in any + Lucene release. Use directly at your own risk! */ +

+
+ + + + + + + + Synonymous with {@link Field}. + +

WARNING: This interface may change within minor versions, despite Lucene's backward compatibility requirements. + This means new methods may be added from version to version. This change only affects the Fieldable API; other backwards + compatibility promises remain intact. For example, Lucene can still + read and write indices created within the same major version. +

+ + +

+
+ + Sets the boost factor hits on this field. This value will be + multiplied into the score of all hits on this this field of this + document. + +

The boost is multiplied by {@link Lucene.Net.Documents.Document#GetBoost()} of the document + containing this field. If a document has multiple fields with the same + name, all such values are multiplied together. This product is then + used to compute the norm factor for the field. By + default, in the {@link + Lucene.Net.Search.Similarity#ComputeNorm(String, + FieldInvertState)} method, the boost value is multiplied + by the {@link + Lucene.Net.Search.Similarity#LengthNorm(String, + int)} and then rounded by {@link Lucene.Net.Search.Similarity#EncodeNorm(float)} before it is stored in the + index. One should attempt to ensure that this product does not overflow + the range of that encoding. + +

+ + + + + + +
+ + Returns the boost factor for hits for this field. + +

The default value is 1.0. + +

Note: this value is not stored directly with the document in the index. + Documents returned from {@link Lucene.Net.Index.IndexReader#Document(int)} and + {@link Lucene.Net.Search.Hits#Doc(int)} may thus not have the same value present as when + this field was indexed. + +

+ + +
+ + Returns the name of the field as an interned string. + For example "date", "title", "body", ... + + + + The value of the field as a String, or null. +

+ For indexing, if isStored()==true, the stringValue() will be used as the stored field value + unless isBinary()==true, in which case binaryValue() will be used. + + If isIndexed()==true and isTokenized()==false, this String value will be indexed as a single token. + If isIndexed()==true and isTokenized()==true, then tokenStreamValue() will be used to generate indexed tokens if not null, + else readerValue() will be used to generate indexed tokens if not null, else stringValue() will be used to generate tokens. +

+
+ + The value of the field as a Reader, which can be used at index time to generate indexed tokens. + + + + + The value of the field in Binary, or null. + + + + + The TokenStream for this field to be used when indexing, or null. + + + + + True if the value of the field is to be stored in the index for return + with search hits. + + + + True if the value of the field is to be indexed, so that it may be + searched on. + + + + True if the value of the field should be tokenized as text prior to + indexing. Un-tokenized fields are indexed as a single word and may not be + Reader-valued. + + + + True if the value of the field is stored and compressed within the index + + + True if the term or terms used to index this field are stored as a term + vector, available from {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. + These methods do not provide access to the original content of the field, + only to terms used to index it. If the original content must be + preserved, use the stored attribute instead. + + + + + + + True if terms are stored as term vector together with their offsets + (start and end positon in source text). + + + + True if terms are stored as term vector together with their token positions. + + + True if the value of the field is stored as binary + + + True if norms are omitted for this indexed field + + + Expert: + + If set, omit normalization factors associated with this indexed field. + This effectively disables indexing boosts and length normalization for this field. + + + + Renamed to {@link AbstractField#setOmitTermFreqAndPositions} + + + + Renamed to {@link AbstractField#getOmitTermFreqAndPositions} + + + + Indicates whether a Field is Lazy or not. The semantics of Lazy loading are such that if a Field is lazily loaded, retrieving + it's values via {@link #StringValue()} or {@link #BinaryValue()} is only valid as long as the {@link Lucene.Net.Index.IndexReader} that + retrieved the {@link Document} is still open. + + + true if this field can be loaded lazily + + + + Returns offset into byte[] segment that is used as value, if Field is not binary + returned value is undefined + + index of the first character in byte[] segment that represents this Field value + + + + Returns length of byte[] segment that is used as value, if Field is not binary + returned value is undefined + + length of byte[] segment that represents this Field value + + + + Return the raw byte[] for the binary field. Note that + you must also call {@link #getBinaryLength} and {@link + #getBinaryOffset} to know which range of bytes in this + returned array belong to the field. + + reference to the Field value as byte[]. + + + + Return the raw byte[] for the binary field. Note that + you must also call {@link #getBinaryLength} and {@link + #getBinaryOffset} to know which range of bytes in this + returned array belong to the field.

+ About reuse: if you pass in the result byte[] and it is + used, likely the underlying implementation will hold + onto this byte[] and return it in future calls to + {@link #BinaryValue()} or {@link #GetBinaryValue()}. + So if you subsequently re-use the same byte[] elsewhere + it will alter this Fieldable's value. +

+ User defined buffer that will be used if + possible. If this is null or not large enough, a new + buffer is allocated + + reference to the Field value as byte[]. + +
+ + Sets the boost factor hits on this field. This value will be + multiplied into the score of all hits on this this field of this + document. + +

The boost is multiplied by {@link Lucene.Net.Documents.Document#GetBoost()} of the document + containing this field. If a document has multiple fields with the same + name, all such values are multiplied together. This product is then + used to compute the norm factor for the field. By + default, in the {@link + Lucene.Net.Search.Similarity#ComputeNorm(String, + FieldInvertState)} method, the boost value is multipled + by the {@link + Lucene.Net.Search.Similarity#LengthNorm(String, + int)} and then + rounded by {@link Lucene.Net.Search.Similarity#EncodeNorm(float)} before it is stored in the + index. One should attempt to ensure that this product does not overflow + the range of that encoding. + +

+ + + + + + +
+ + Returns the boost factor for hits for this field. + +

The default value is 1.0. + +

Note: this value is not stored directly with the document in the index. + Documents returned from {@link Lucene.Net.Index.IndexReader#Document(int)} and + {@link Lucene.Net.Search.Hits#Doc(int)} may thus not have the same value present as when + this field was indexed. + +

+ + +
+ + Returns the name of the field as an interned string. + For example "date", "title", "body", ... + + + + True iff the value of the field is to be stored in the index for return + with search hits. It is an error for this to be true if a field is + Reader-valued. + + + + True iff the value of the field is to be indexed, so that it may be + searched on. + + + + True iff the value of the field should be tokenized as text prior to + indexing. Un-tokenized fields are indexed as a single word and may not be + Reader-valued. + + + + True if the value of the field is stored and compressed within the index + + + True iff the term or terms used to index this field are stored as a term + vector, available from {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. + These methods do not provide access to the original content of the field, + only to terms used to index it. If the original content must be + preserved, use the stored attribute instead. + + + + + + + True iff terms are stored as term vector together with their offsets + (start and end position in source text). + + + + True iff terms are stored as term vector together with their token positions. + + + True iff the value of the filed is stored as binary + + + Return the raw byte[] for the binary field. Note that + you must also call {@link #getBinaryLength} and {@link + #getBinaryOffset} to know which range of bytes in this + returned array belong to the field. + + reference to the Field value as byte[]. + + + + Returns length of byte[] segment that is used as value, if Field is not binary + returned value is undefined + + length of byte[] segment that represents this Field value + + + + Returns offset into byte[] segment that is used as value, if Field is not binary + returned value is undefined + + index of the first character in byte[] segment that represents this Field value + + + + True if norms are omitted for this indexed field + + + Renamed to {@link #getOmitTermFreqAndPositions} + + + + + + + + Expert: + + If set, omit normalization factors associated with this indexed field. + This effectively disables indexing boosts and length normalization for this field. + + + + Renamed to {@link #setOmitTermFreqAndPositions} + + + + Expert: + + If set, omit term freq, positions and payloads from + postings for this field. + +

NOTE: While this option reduces storage space + required in the index, it also means any query + requiring positional information, such as {@link + PhraseQuery} or {@link SpanQuery} subclasses will + silently fail to find results. +

+
+ + Prints a Field for human consumption. + + + Removes stop words from a token stream. + + + A TokenFilter is a TokenStream whose input is another TokenStream. +

+ This is an abstract class; subclasses must override {@link #IncrementToken()}. + +

+ + +
+ + A TokenStream enumerates the sequence of tokens, either from + {@link Field}s of a {@link Document} or from query text. +

+ This is an abstract class. Concrete subclasses are: +

    +
  • {@link Tokenizer}, a TokenStream whose input is a Reader; and
  • +
  • {@link TokenFilter}, a TokenStream whose input is another + TokenStream.
  • +
+ A new TokenStream API has been introduced with Lucene 2.9. This API + has moved from being {@link Token} based to {@link Attribute} based. While + {@link Token} still exists in 2.9 as a convenience class, the preferred way + to store the information of a {@link Token} is to use {@link AttributeImpl}s. +

+ TokenStream now extends {@link AttributeSource}, which provides + access to all of the token {@link Attribute}s for the TokenStream. + Note that only one instance per {@link AttributeImpl} is created and reused + for every token. This approach reduces object creation and allows local + caching of references to the {@link AttributeImpl}s. See + {@link #IncrementToken()} for further details. +

+ The workflow of the new TokenStream API is as follows: +

    +
  1. Instantiation of TokenStream/{@link TokenFilter}s which add/get + attributes to/from the {@link AttributeSource}.
  2. +
  3. The consumer calls {@link TokenStream#Reset()}.
  4. +
  5. The consumer retrieves attributes from the stream and stores local + references to all attributes it wants to access
  6. +
  7. The consumer calls {@link #IncrementToken()} until it returns false and + consumes the attributes after each call.
  8. +
  9. The consumer calls {@link #End()} so that any end-of-stream operations + can be performed.
  10. +
  11. The consumer calls {@link #Close()} to release any resource when finished + using the TokenStream
  12. +
+ To make sure that filters and consumers know which attributes are available, + the attributes must be added during instantiation. Filters and consumers are + not required to check for availability of attributes in + {@link #IncrementToken()}. +

+ You can find some example code for the new API in the analysis package level + Javadoc. +

+ Sometimes it is desirable to capture a current state of a TokenStream + , e. g. for buffering purposes (see {@link CachingTokenFilter}, + {@link TeeSinkTokenFilter}). For this usecase + {@link AttributeSource#CaptureState} and {@link AttributeSource#RestoreState} + can be used. +

+
+ + An AttributeSource contains a list of different {@link AttributeImpl}s, + and methods to add and get them. There can only be a single instance + of an attribute in the same AttributeSource instance. This is ensured + by passing in the actual type of the Attribute (Class<Attribute>) to + the {@link #AddAttribute(Class)}, which then checks if an instance of + that type is already present. If yes, it returns the instance, otherwise + it creates a new instance and returns it. + + + + An AttributeSource using the default attribute factory {@link AttributeSource.AttributeFactory#DEFAULT_ATTRIBUTE_FACTORY}. + + + An AttributeSource that uses the same attributes as the supplied one. + + + An AttributeSource using the supplied {@link AttributeFactory} for creating new {@link Attribute} instances. + + + returns the used AttributeFactory. + + + Returns a new iterator that iterates the attribute classes + in the same order they were added in. + Signature for Java 1.5: public Iterator<Class<? extends Attribute>> getAttributeClassesIterator() + + Note that this return value is different from Java in that it enumerates over the values + and not the keys + + + + Returns a new iterator that iterates all unique Attribute implementations. + This iterator may contain less entries that {@link #getAttributeClassesIterator}, + if one instance implements more than one Attribute interface. + Signature for Java 1.5: public Iterator<AttributeImpl> getAttributeImplsIterator() + + + + a cache that stores all interfaces for known implementation classes for performance (slow reflection) + + + Adds a custom AttributeImpl instance with one or more Attribute interfaces. + + + The caller must pass in a Class<? extends Attribute> value. + This method first checks if an instance of that class is + already in this AttributeSource and returns it. Otherwise a + new instance is created, added to this AttributeSource and returned. + Signature for Java 1.5: public <T extends Attribute> T addAttribute(Class<T>) + + + + Returns true, iff this AttributeSource has any attributes + + + The caller must pass in a Class<? extends Attribute> value. + Returns true, iff this AttributeSource contains the passed-in Attribute. + Signature for Java 1.5: public boolean hasAttribute(Class<? extends Attribute>) + + + + The caller must pass in a Class<? extends Attribute> value. + Returns the instance of the passed in Attribute contained in this AttributeSource + Signature for Java 1.5: public <T extends Attribute> T getAttribute(Class<T>) + + + IllegalArgumentException if this AttributeSource does not contain the + Attribute. It is recommended to always use {@link #addAttribute} even in consumers + of TokenStreams, because you cannot know if a specific TokenStream really uses + a specific Attribute. {@link #addAttribute} will automatically make the attribute + available. If you want to only use the attribute, if it is available (to optimize + consuming), use {@link #hasAttribute}. + + + + Resets all Attributes in this AttributeSource by calling + {@link AttributeImpl#Clear()} on each Attribute implementation. + + + + Captures the state of all Attributes. The return value can be passed to + {@link #restoreState} to restore the state of this or another AttributeSource. + + + + Restores this state by copying the values of all attribute implementations + that this state contains into the attributes implementations of the targetStream. + The targetStream must contain a corresponding instance for each argument + contained in this state (e.g. it is not possible to restore the state of + an AttributeSource containing a TermAttribute into a AttributeSource using + a Token instance as implementation). + + Note that this method does not affect attributes of the targetStream + that are not contained in this state. In other words, if for example + the targetStream contains an OffsetAttribute, but this state doesn't, then + the value of the OffsetAttribute remains unchanged. It might be desirable to + reset its value to the default, in which case the caller should first + call {@link TokenStream#ClearAttributes()} on the targetStream. + + + + Performs a clone of all {@link AttributeImpl} instances returned in a new + AttributeSource instance. This method can be used to e.g. create another TokenStream + with exactly the same attributes (using {@link #AttributeSource(AttributeSource)}) + + + + An AttributeFactory creates instances of {@link AttributeImpl}s. + + + returns an {@link AttributeImpl} for the supplied {@link Attribute} interface class. +

Signature for Java 1.5: public AttributeImpl createAttributeInstance(Class%lt;? extends Attribute> attClass) +

+
+ + This is the default factory that creates {@link AttributeImpl}s using the + class name of the supplied {@link Attribute} interface class by appending Impl to it. + + + + This class holds the state of an AttributeSource. + + + + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + A TokenStream using the default attribute factory. + + + A TokenStream that uses the same attributes as the supplied one. + + + A TokenStream using the supplied AttributeFactory for creating new {@link Attribute} instances. + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + For extra performance you can globally enable the new + {@link #IncrementToken} API using {@link Attribute}s. There will be a + small, but in most cases negligible performance increase by enabling this, + but it only works if all TokenStreams use the new API and + implement {@link #IncrementToken}. This setting can only be enabled + globally. +

+ This setting only affects TokenStreams instantiated after this + call. All TokenStreams already created use the other setting. +

+ All core {@link Analyzer}s are compatible with this setting, if you have + your own TokenStreams that are also compatible, you should enable + this. +

+ When enabled, tokenization may throw {@link UnsupportedOperationException} + s, if the whole tokenizer chain is not compatible eg one of the + TokenStreams does not implement the new TokenStream API. +

+ The default is false, so there is the fallback to the old API + available. + +

+ This setting will no longer be needed in Lucene 3.0 as the old + API will be removed. + +
+ + Returns if only the new API is used. + + + + + This setting will no longer be needed in Lucene 3.0 as + the old API will be removed. + + + + Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to + the next token. Implementing classes must implement this method and update + the appropriate {@link AttributeImpl}s with the attributes of the next + token. + + The producer must make no assumptions about the attributes after the + method has been returned: the caller may arbitrarily change it. If the + producer needs to preserve the state for subsequent calls, it can use + {@link #captureState} to create a copy of the current attribute state. + + This method is called for every token of a document, so an efficient + implementation is crucial for good performance. To avoid calls to + {@link #AddAttribute(Class)} and {@link #GetAttribute(Class)} or downcasts, + references to all {@link AttributeImpl}s that this stream uses should be + retrieved during instantiation. + + To ensure that filters and consumers know which attributes are available, + the attributes must be added during instantiation. Filters and consumers + are not required to check for availability of attributes in + {@link #IncrementToken()}. + + + false for end of stream; true otherwise + + Note that this method will be defined abstract in Lucene + 3.0. + + + + This method is called by the consumer after the last token has been + consumed, after {@link #IncrementToken()} returned false + (using the new TokenStream API). Streams implementing the old API + should upgrade to use this feature. +

+ This method can be used to perform any end-of-stream operations, such as + setting the final offset of a stream. The final offset of a stream might + differ from the offset of the last token eg in case one or more whitespaces + followed after the last token, but a {@link WhitespaceTokenizer} was used. + +

+ IOException +
+ + Returns the next token in the stream, or null at EOS. When possible, the + input Token should be used as the returned Token (this gives fastest + tokenization performance), but this is not required and a new Token may be + returned. Callers may re-use a single Token instance for successive calls + to this method. + + This implicitly defines a "contract" between consumers (callers of this + method) and producers (implementations of this method that are the source + for tokens): +
    +
  • A consumer must fully consume the previously returned {@link Token} + before calling this method again.
  • +
  • A producer must call {@link Token#Clear()} before setting the fields in + it and returning it
  • +
+ Also, the producer must make no assumptions about a {@link Token} after it + has been returned: the caller may arbitrarily change it. If the producer + needs to hold onto the {@link Token} for subsequent calls, it must clone() + it before storing it. Note that a {@link TokenFilter} is considered a + consumer. + +
+ a {@link Token} that may or may not be used to return; + this parameter should never be null (the callee is not required to + check for null before using it, but it is a good idea to assert that + it is not null.) + + next {@link Token} in the stream or null if end-of-stream was hit + + The new {@link #IncrementToken()} and {@link AttributeSource} + APIs should be used instead. + +
+ + Returns the next {@link Token} in the stream, or null at EOS. + + + The returned Token is a "full private copy" (not re-used across + calls to {@link #Next()}) but will be slower than calling + {@link #Next(Token)} or using the new {@link #IncrementToken()} + method with the new {@link AttributeSource} API. + + + + Resets this stream to the beginning. This is an optional operation, so + subclasses may or may not implement this method. {@link #Reset()} is not needed for + the standard indexing process. However, if the tokens of a + TokenStream are intended to be consumed more than once, it is + necessary to implement {@link #Reset()}. Note that if your TokenStream + caches tokens and feeds them back again after a reset, it is imperative + that you clone the tokens when you store them away (on the first pass) as + well as when you return them (on future passes after {@link #Reset()}). + + + + Releases resources associated with this stream. + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + The source of tokens for this filter. + + + Construct a token stream filtering the given input. + + + Performs end-of-stream operations, if any, and calls then end() on the + input TokenStream.

+ NOTE: Be sure to call super.end() first when overriding this method. +

+
+ + Close the input TokenStream. + + + Reset the filter as well as the input TokenStream. + + + Construct a token stream filtering the given input. + Use {@link #StopFilter(boolean, TokenStream, String[])} instead + + + + Construct a token stream filtering the given input. + true if token positions should record the removed stop words + + input TokenStream + + array of stop words + + Use {@link #StopFilter(boolean, TokenStream, Set)} instead. + + + + Constructs a filter which removes words from the input + TokenStream that are named in the array of words. + + Use {@link #StopFilter(boolean, TokenStream, String[], boolean)} instead + + + + Constructs a filter which removes words from the input + TokenStream that are named in the array of words. + + true if token positions should record the removed stop words + + input TokenStream + + array of stop words + + true if case is ignored + + Use {@link #StopFilter(boolean, TokenStream, Set, boolean)} instead. + + + + Construct a token stream filtering the given input. + If stopWords is an instance of {@link CharArraySet} (true if + makeStopSet() was used to construct the set) it will be directly used + and ignoreCase will be ignored since CharArraySet + directly controls case sensitivity. +

+ If stopWords is not an instance of {@link CharArraySet}, + a new CharArraySet will be constructed and ignoreCase will be + used to specify the case sensitivity of that set. + +

+ + + The set of Stop Words. + + -Ignore case when stopping. + + Use {@link #StopFilter(boolean, TokenStream, Set, boolean)} instead + +
+ + Construct a token stream filtering the given input. + If stopWords is an instance of {@link CharArraySet} (true if + makeStopSet() was used to construct the set) it will be directly used + and ignoreCase will be ignored since CharArraySet + directly controls case sensitivity. +

+ If stopWords is not an instance of {@link CharArraySet}, + a new CharArraySet will be constructed and ignoreCase will be + used to specify the case sensitivity of that set. + +

+ true if token positions should record the removed stop words + + Input TokenStream + + The set of Stop Words. + + -Ignore case when stopping. + +
+ + Constructs a filter which removes words from the input + TokenStream that are named in the Set. + + + + + Use {@link #StopFilter(boolean, TokenStream, Set)} instead + + + + Constructs a filter which removes words from the input + TokenStream that are named in the Set. + + + true if token positions should record the removed stop words + + Input stream + + The set of Stop Words. + + + + + + Builds a Set from an array of stop words, + appropriate for passing into the StopFilter constructor. + This permits this stopWords construction to be cached once when + an Analyzer is constructed. + + + passing false to ignoreCase + + + + Builds a Set from an array of stop words, + appropriate for passing into the StopFilter constructor. + This permits this stopWords construction to be cached once when + an Analyzer is constructed. + + + passing false to ignoreCase + + + + + An array of stopwords + + If true, all words are lower cased first. + + a Set containing the words + + + + + A List of Strings representing the stopwords + + if true, all words are lower cased first + + A Set containing the words + + + + Returns the next input Token whose term() is not a stop word. + + + + + Please specify this when you create the StopFilter + + + + Returns version-dependent default for enablePositionIncrements. Analyzers + that embed StopFilter use this method when creating the StopFilter. Prior + to 2.9, this returns {@link #getEnablePositionIncrementsDefault}. On 2.9 + or later, it returns true. + + + + Set the default position increments behavior of every StopFilter created + from now on. +

+ Note: behavior of a single StopFilter instance can be modified with + {@link #SetEnablePositionIncrements(boolean)}. This static method allows + control over behavior of classes using StopFilters internally, for + example {@link Lucene.Net.Analysis.Standard.StandardAnalyzer + StandardAnalyzer} if used with the no-arg ctor. +

+ Default : false. + +

+ + + Please specify this when you create the StopFilter + +
+ + + + + + If true, this StopFilter will preserve + positions of the incoming tokens (ie, accumulate and + set position increments of the removed stop tokens). + Generally, true is best as it does not + lose information (positions of the original tokens) + during indexing. + +

When set, when a token is stopped + (omitted), the position increment of the following + token is incremented. + +

NOTE: be sure to also + set {@link QueryParser#setEnablePositionIncrements} if + you use QueryParser to create queries. +

+
+ + Base utility class for implementing a {@link CharFilter}. + You subclass this, and then record mappings by calling + {@link #addOffCorrectMap}, and then invoke the correct + method to correct an offset. + +

NOTE: This class is not particularly efficient. + For example, a new class instance is created for every + call to {@link #addOffCorrectMap}, which is then appended + to a private list. +

+
+ + Subclasses of CharFilter can be chained to filter CharStream. + They can be used as {@link java.io.Reader} with additional offset + correction. {@link Tokenizer}s will automatically use {@link #CorrectOffset} + if a CharFilter/CharStream subclass is used. + + + $Id$ + + + + + CharStream adds {@link #CorrectOffset} + functionality over {@link Reader}. All Tokenizers accept a + CharStream instead of {@link Reader} as input, which enables + arbitrary character based filtering before tokenization. + The {@link #CorrectOffset} method fixed offsets to account for + removal or insertion of characters, so that the offsets + reported in the tokens match the character offsets of the + original Reader. + + + + Called by CharFilter(s) and Tokenizer to correct token offset. + + + offset as seen in the output + + corrected offset based on the input + + + + Subclass may want to override to correct the current offset. + + + current offset + + corrected offset + + + + Chains the corrected offset through the input + CharFilter. + + + + Retrieve the corrected offset. Note that this method + is slow, if you correct positions far before the most + recently added position, as it's a simple linear + search backwards through all offset corrections added + by {@link #addOffCorrectMap}. + + + + Stores and iterate on sorted integers in compressed form in RAM.
+ The code for compressing the differences between ascending integers was + borrowed from {@link Lucene.Net.Store.IndexInput} and + {@link Lucene.Net.Store.IndexOutput}. +

+ NOTE: this class assumes the stored integers are doc Ids (hence why it + extends {@link DocIdSet}). Therefore its {@link #Iterator()} assumes {@link + DocIdSetIterator#NO_MORE_DOCS} can be used as sentinel. If you intent to use + this value, then make sure it's not used during search flow. +

+
+ + A DocIdSet contains a set of doc ids. Implementing classes must + only implement {@link #iterator} to provide access to the set. + + + + An empty {@code DocIdSet} instance for easy use, e.g. in Filters that hit no documents. + + + Provides a {@link DocIdSetIterator} to access the set. + This implementation can return null or + {@linkplain #EMPTY_DOCIDSET}.iterator() if there + are no docs that match. + + + + This method is a hint for {@link CachingWrapperFilter}, if this DocIdSet + should be cached without copying it into a BitSet. The default is to return + false. If you have an own DocIdSet implementation + that does its iteration very effective and fast without doing disk I/O, + override this method and return true. + + + + When a BitSet has fewer than 1 in BITS2VINTLIST_SIZE bits set, + a SortedVIntList representing the index numbers of the set bits + will be smaller than that BitSet. + + + + Create a SortedVIntList from all elements of an array of integers. + + + A sorted array of non negative integers. + + + + Create a SortedVIntList from an array of integers. + An array of sorted non negative integers. + + The number of integers to be used from the array. + + + + Create a SortedVIntList from a BitSet. + A bit set representing a set of integers. + + + + Create a SortedVIntList from an OpenBitSet. + A bit set representing a set of integers. + + + + Create a SortedVIntList. + An iterator providing document numbers as a set of integers. + This DocIdSetIterator is iterated completely when this constructor + is called and it must provide the integers in non + decreasing order. + + + + The total number of sorted integers. + + + + The size of the byte array storing the compressed sorted integers. + + + + This DocIdSet implementation is cacheable. + + + An iterator over the sorted integers. + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + A ScorerDocQueue maintains a partial ordering of its Scorers such that the + least Scorer can always be found in constant time. Put()'s and pop()'s + require log(size) time. The ordering is by Scorer.doc(). + + + + Create a ScorerDocQueue with a maximum size. + + + Adds a Scorer to a ScorerDocQueue in log(size) time. + If one tries to add more Scorers than maxSize + a RuntimeException (ArrayIndexOutOfBound) is thrown. + + + + Adds a Scorer to the ScorerDocQueue in log(size) time if either + the ScorerDocQueue is not full, or not lessThan(scorer, top()). + + + + true if scorer is added, false otherwise. + + + + Returns the least Scorer of the ScorerDocQueue in constant time. + Should not be used when the queue is empty. + + + + Returns document number of the least Scorer of the ScorerDocQueue + in constant time. + Should not be used when the queue is empty. + + + + Removes and returns the least scorer of the ScorerDocQueue in log(size) + time. + Should not be used when the queue is empty. + + + + Removes the least scorer of the ScorerDocQueue in log(size) time. + Should not be used when the queue is empty. + + + + Should be called when the scorer at top changes doc() value. + Still log(n) worst case, but it's at least twice as fast to
+            { pq.top().change(); pq.adjustTop(); }
+            
instead of
+            { o = pq.pop(); o.change(); pq.push(o); }
+            
+
+
+ + Returns the number of scorers currently stored in the ScorerDocQueue. + + + Removes all entries from the ScorerDocQueue. + + + Simple cache implementation that uses a HashMap to store (key, value) pairs. + This cache is not synchronized, use {@link Cache#SynchronizedCache(Cache)} + if needed. + + + + Base class for cache implementations. + + + Returns a thread-safe cache backed by the specified cache. + In order to guarantee thread-safety, all access to the backed cache must + be accomplished through the returned cache. + + + + Called by {@link #SynchronizedCache(Cache)}. This method + returns a {@link SynchronizedCache} instance that wraps + this instance by default and can be overridden to return + e. g. subclasses of {@link SynchronizedCache} or this + in case this cache is already synchronized. + + + + Puts a (key, value)-pair into the cache. + + + Returns the value for the given key. + + + Returns whether the given key is in this cache. + + + Closes the cache. + + + Simple Cache wrapper that synchronizes all + calls that access the cache. + + + + Returns a Set containing all keys in this cache. + + + An "open" BitSet implementation that allows direct access to the array of words + storing the bits. +

+ Unlike java.util.bitset, the fact that bits are packed into an array of longs + is part of the interface. This allows efficient implementation of other algorithms + by someone other than the author. It also allows one to efficiently implement + alternate serialization or interchange formats. +

+ OpenBitSet is faster than java.util.BitSet in most operations + and *much* faster at calculating cardinality of sets and results of set operations. + It can also handle sets of larger cardinality (up to 64 * 2**32-1) +

+ The goals of OpenBitSet are the fastest implementation possible, and + maximum code reuse. Extra safety and encapsulation + may always be built on top, but if that's built in, the cost can never be removed (and + hence people re-implement their own version in order to get better performance). + If you want a "safe", totally encapsulated (and slower and limited) BitSet + class, use java.util.BitSet. +

+

Performance Results

+ + Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M +
BitSet size = 1,000,000 +
Results are java.util.BitSet time divided by OpenBitSet time. + + + + + + + + + + +
cardinality intersect_count union nextSetBit get iterator
50% full 3.36 3.96 1.44 1.46 1.99 1.58
1% full 3.31 3.90   1.04   0.99
+
+ Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M +
BitSet size = 1,000,000 +
Results are java.util.BitSet time divided by OpenBitSet time. + + + + + + + + + + +
cardinality intersect_count union nextSetBit get iterator
50% full 2.50 3.50 1.00 1.03 1.12 1.25
1% full 2.51 3.49   1.00   1.02
+
+ $Id$ + +
+ + Constructs an OpenBitSet large enough to hold numBits. + + + + + + + Constructs an OpenBitSet from an existing long[]. +
+ The first 64 bits are in long[0], + with bit index 0 at the least significant bit, and bit index 63 at the most significant. + Given a bit index, + the word containing it is long[index/64], and it is at bit number index%64 within that word. +

+ numWords are the number of elements in the array that contain + set bits (non-zero longs). + numWords should be <= bits.length, and + any existing words in the array at position >= numWords should be zero. + +

+
+ + This DocIdSet implementation is cacheable. + + + Returns the current capacity in bits (1 greater than the index of the last bit) + + + Returns the current capacity of this set. Included for + compatibility. This is *not* equal to {@link #cardinality} + + + + Returns true if there are no set bits + + + Expert: returns the long[] storing the bits + + + Expert: sets a new long[] to use as the bit storage + + + Expert: gets the number of longs in the array that are in use + + + Expert: sets the number of longs in the array that are in use + + + Returns true or false for the specified bit index. + + + Returns true or false for the specified bit index. + The index should be less than the OpenBitSet size + + + + Returns true or false for the specified bit index + + + Returns true or false for the specified bit index. + The index should be less than the OpenBitSet size. + + + + returns 1 if the bit is set, 0 if not. + The index should be less than the OpenBitSet size + + + + sets a bit, expanding the set size if necessary + + + Sets the bit at the specified index. + The index should be less than the OpenBitSet size. + + + + Sets the bit at the specified index. + The index should be less than the OpenBitSet size. + + + + Sets a range of bits, expanding the set size if necessary + + + lower index + + one-past the last bit to set + + + + clears a bit. + The index should be less than the OpenBitSet size. + + + + clears a bit. + The index should be less than the OpenBitSet size. + + + + clears a bit, allowing access beyond the current set size without changing the size. + + + Clears a range of bits. Clearing past the end does not change the size of the set. + + + lower index + + one-past the last bit to clear + + + + Clears a range of bits. Clearing past the end does not change the size of the set. + + + lower index + + one-past the last bit to clear + + + + Sets a bit and returns the previous value. + The index should be less than the OpenBitSet size. + + + + Sets a bit and returns the previous value. + The index should be less than the OpenBitSet size. + + + + flips a bit. + The index should be less than the OpenBitSet size. + + + + flips a bit. + The index should be less than the OpenBitSet size. + + + + flips a bit, expanding the set size if necessary + + + flips a bit and returns the resulting bit value. + The index should be less than the OpenBitSet size. + + + + flips a bit and returns the resulting bit value. + The index should be less than the OpenBitSet size. + + + + Flips a range of bits, expanding the set size if necessary + + + lower index + + one-past the last bit to flip + + + + the number of set bits + + + + Returns the popcount or cardinality of the intersection of the two sets. + Neither set is modified. + + + + Returns the popcount or cardinality of the union of the two sets. + Neither set is modified. + + + + Returns the popcount or cardinality of "a and not b" + or "intersection(a, not(b))". + Neither set is modified. + + + + Returns the popcount or cardinality of the exclusive-or of the two sets. + Neither set is modified. + + + + Returns the index of the first set bit starting at the index specified. + -1 is returned if there are no more set bits. + + + + Returns the index of the first set bit starting at the index specified. + -1 is returned if there are no more set bits. + + + + this = this AND other + + + this = this OR other + + + Remove all elements set in other. this = this AND_NOT other + + + this = this XOR other + + + returns true if the sets have any elements in common + + + Expand the long[] with the size given as a number of words (64 bit longs). + getNumWords() is unchanged by this call. + + + + Ensure that the long[] is big enough to hold numBits, expanding it if necessary. + getNumWords() is unchanged by this call. + + + + Lowers numWords, the number of words in use, + by checking for trailing zero words. + + + + returns the number of 64 bit words it would take to hold numBits + + + returns true if both sets have the same bits set + + + Construct an OpenBitSetDISI with its bits set + from the doc ids of the given DocIdSetIterator. + Also give a maximum size one larger than the largest doc id for which a + bit may ever be set on this OpenBitSetDISI. + + + + Construct an OpenBitSetDISI with no bits set, and a given maximum size + one larger than the largest doc id for which a bit may ever be set + on this OpenBitSetDISI. + + + + Perform an inplace OR with the doc ids from a given DocIdSetIterator, + setting the bit for each such doc id. + These doc ids should be smaller than the maximum size passed to the + constructor. + + + + Perform an inplace AND with the doc ids from a given DocIdSetIterator, + leaving only the bits set for which the doc ids are in common. + These doc ids should be smaller than the maximum size passed to the + constructor. + + + + Perform an inplace NOT with the doc ids from a given DocIdSetIterator, + clearing all the bits for each such doc id. + These doc ids should be smaller than the maximum size passed to the + constructor. + + + + Perform an inplace XOR with the doc ids from a given DocIdSetIterator, + flipping all the bits for each such doc id. + These doc ids should be smaller than the maximum size passed to the + constructor. + + + + Simple DocIdSet and DocIdSetIterator backed by a BitSet + + + This DocIdSet implementation is cacheable. + + + Returns the underlying BitSet. + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Methods for manipulating arrays. + + + Parses the string argument as if it was an int value and returns the + result. Throws NumberFormatException if the string does not represent an + int quantity. + + + a string representation of an int quantity. + + int the value represented by the argument + + NumberFormatException if the argument could not be parsed as an int quantity. + + + Parses a char array into an int. + the character array + + The offset into the array + + The length + + the int + + NumberFormatException if it can't parse + + + Parses the string argument as if it was an int value and returns the + result. Throws NumberFormatException if the string does not represent an + int quantity. The second argument specifies the radix to use when parsing + the value. + + + a string representation of an int quantity. + + the base to use for conversion. + + int the value represented by the argument + + NumberFormatException if the argument could not be parsed as an int quantity. + + + Returns hash of chars in range start (inclusive) to + end (inclusive) + + + + Returns hash of chars in range start (inclusive) to + end (inclusive) + + + +

Implements {@link LockFactory} using native OS file + locks. Note that because this LockFactory relies on + java.nio.* APIs for locking, any problems with those APIs + will cause locking to fail. Specifically, on certain NFS + environments the java.nio.* locks will fail (the lock can + incorrectly be double acquired) whereas {@link + SimpleFSLockFactory} worked perfectly in those same + environments. For NFS based access to an index, it's + recommended that you try {@link SimpleFSLockFactory} + first and work around the one limitation that a lock file + could be left when the JVM exits abnormally.

+ +

The primary benefit of {@link NativeFSLockFactory} is + that lock files will be properly removed (by the OS) if + the JVM has an abnormal exit.

+ +

Note that, unlike {@link SimpleFSLockFactory}, the existence of + leftover lock files in the filesystem on exiting the JVM + is fine because the OS will free the locks held against + these files even though the files still remain.

+ +

If you suspect that this or any other LockFactory is + not working properly in your environment, you can easily + test it by using {@link VerifyingLockFactory}, {@link + LockVerifyServer} and {@link LockStressTest}.

+ +

+ + +
+ + Base class for file system based locking implementation. + + +

Base class for Locking implementation. {@link Directory} uses + instances of this class to implement locking.

+ +

Note that there are some useful tools to verify that + your LockFactory is working correctly: {@link + VerifyingLockFactory}, {@link LockStressTest}, {@link + LockVerifyServer}.

+ +

+ + + + + + +
+ + Set the prefix in use for all locks created in this + LockFactory. This is normally called once, when a + Directory gets this LockFactory instance. However, you + can also call this (after this instance is assigned to + a Directory) to override the prefix in use. This + is helpful if you're running Lucene on machines that + have different mount points for the same shared + directory. + + + + Get the prefix in use for all locks created in this LockFactory. + + + Return a new Lock instance identified by lockName. + name of the lock to be created. + + + + Attempt to clear (forcefully unlock and remove) the + specified lock. Only call this at a time when you are + certain this lock is no longer in use. + + name of the lock to be cleared. + + + + Directory for the lock files. + + + Set the lock directory. This method can be only called + once to initialize the lock directory. It is used by {@link FSDirectory} + to set the lock directory to itsself. + Subclasses can also use this method to set the directory + in the constructor. + + + + Retrieve the lock directory. + + + Create a NativeFSLockFactory instance, with null (unset) + lock directory. When you pass this factory to a {@link FSDirectory} + subclass, the lock directory is automatically set to the + directory itsself. Be sure to create one instance for each directory + your create! + + + + Create a NativeFSLockFactory instance, storing lock + files into the specified lockDirName: + + + where lock files are created. + + + + Create a NativeFSLockFactory instance, storing lock + files into the specified lockDir: + + + where lock files are created. + + + + Create a NativeFSLockFactory instance, storing lock + files into the specified lockDir: + + + where lock files are created. + + + + An interprocess mutex lock. +

Typical use might look like:

+            new Lock.With(directory.makeLock("my.lock")) {
+            public Object doBody() {
+            ... code to execute while locked ...
+            }
+            }.run();
+            
+ + +
+ $Id: Lock.java 769409 2009-04-28 14:05:43Z mikemccand $ + + + +
+ + Pass this value to {@link #Obtain(long)} to try + forever to obtain the lock. + + + + How long {@link #Obtain(long)} waits, in milliseconds, + in between attempts to acquire the lock. + + + + Attempts to obtain exclusive access and immediately return + upon success or failure. + + true iff exclusive access is obtained + + + + If a lock obtain called, this failureReason may be set + with the "root cause" Exception as to why the lock was + not obtained. + + + + Attempts to obtain an exclusive lock within amount of + time given. Polls once per {@link #LOCK_POLL_INTERVAL} + (currently 1000) milliseconds until lockWaitTimeout is + passed. + + length of time to wait in + milliseconds or {@link + #LOCK_OBTAIN_WAIT_FOREVER} to retry forever + + true if lock was obtained + + LockObtainFailedException if lock wait times out + IllegalArgumentException if lockWaitTimeout is + out of bounds + + IOException if obtain() throws IOException + + + Releases exclusive access. + + + Returns true if the resource is currently locked. Note that one must + still call {@link #Obtain()} before using the resource. + + + + Utility class for executing code with exclusive access. + + + Constructs an executor that will grab the named lock. + + + Code to execute with exclusive access. + + + Calls {@link #doBody} while lock is obtained. Blocks if lock + cannot be obtained immediately. Retries to obtain lock once per second + until it is obtained, or until it has tried ten times. Lock is released when + {@link #doBody} exits. + + LockObtainFailedException if lock could not + be obtained + + IOException if {@link Lock#obtain} throws IOException + + + Abstract base class for input from a file in a {@link Directory}. A + random-access input stream. Used for all Lucene index input operations. + + + + + + Reads and returns a single byte. + + + + + Reads a specified number of bytes into an array at the specified offset. + the array to read bytes into + + the offset in the array to start storing bytes + + the number of bytes to read + + + + + + Reads a specified number of bytes into an array at the + specified offset with control over whether the read + should be buffered (callers who have their own buffer + should pass in "false" for useBuffer). Currently only + {@link BufferedIndexInput} respects this parameter. + + the array to read bytes into + + the offset in the array to start storing bytes + + the number of bytes to read + + set to false if the caller will handle + buffering. + + + + + + Reads four bytes and returns an int. + + + + + Reads an int stored in variable-length format. Reads between one and + five bytes. Smaller values take fewer bytes. Negative numbers are not + supported. + + + + + + Reads eight bytes and returns a long. + + + + + Reads a long stored in variable-length format. Reads between one and + nine bytes. Smaller values take fewer bytes. Negative numbers are not + supported. + + + + Call this if readString should read characters stored + in the old modified UTF8 format (length in java chars + and java's modified UTF8 encoding). This is used for + indices written pre-2.4 See LUCENE-510 for details. + + + + Reads a string. + + + + + Reads Lucene's old "modified UTF-8" encoded + characters into an array. + + the array to read characters into + + the offset in the array to start storing characters + + the number of characters to read + + + + -- please use readString or readBytes + instead, and construct the string + from those utf8 bytes + + + + Expert + + Similar to {@link #ReadChars(char[], int, int)} but does not do any conversion operations on the bytes it is reading in. It still + has to invoke {@link #ReadByte()} just as {@link #ReadChars(char[], int, int)} does, but it does not need a buffer to store anything + and it does not have to do any of the bitwise operations, since we don't actually care what is in the byte except to determine + how many more bytes to read + + The number of chars to read + + this method operates on old "modified utf8" encoded + strings + + + + Closes the stream to futher operations. + + + Returns the current position in this file, where the next read will + occur. + + + + + + Sets current position in this file, where the next read will occur. + + + + + The number of bytes in the file. + + + Returns a clone of this stream. + +

Clones of a stream access the same data, and are positioned at the same + point as the stream they were cloned from. + +

Expert: Subclasses must ensure that clones may be positioned at + different points in the input from each other and from the stream they + were cloned from. +

+
+ + A {@link HitCollector} implementation that collects the top-scoring + documents, returning them as a {@link TopDocs}. This is used by {@link + IndexSearcher} to implement {@link TopDocs}-based search. + +

This may be extended, overriding the collect method to, e.g., + conditionally invoke super() in order to filter which + documents are collected. + +

+ Please use {@link TopScoreDocCollector} + instead, which has better performance. + + +
+ + Lower-level search API.
+ HitCollectors are primarily meant to be used to implement queries, sorting + and filtering. See {@link Collector} for a lower level and higher performance + (on a multi-segment index) API. + +
+ + + $Id: HitCollector.java 764551 2009-04-13 18:33:56Z mikemccand $ + + Please use {@link Collector} instead. + +
+ + Called once for every document matching a query, with the document + number and its raw score. + +

If, for example, an application wished to collect all of the hits for a + query in a BitSet, then it might:

+            Searcher searcher = new IndexSearcher(indexReader);
+            final BitSet bits = new BitSet(indexReader.maxDoc());
+            searcher.search(query, new HitCollector() {
+            public void collect(int doc, float score) {
+            bits.set(doc);
+            }
+            });
+            
+ +

Note: This is called in an inner search loop. For good search + performance, implementations of this method should not call + {@link Searcher#Doc(int)} or + {@link Lucene.Net.Index.IndexReader#Document(int)} on every + document number encountered. Doing so can slow searches by an order + of magnitude or more. +

Note: The score passed to this method is a raw score. + In other words, the score will not necessarily be a float whose value is + between 0 and 1. +

+
+ + The total number of hits the collector encountered. + + + The priority queue which holds the top-scoring documents. + + + Construct to collect a given number of hits. + the maximum number of hits to collect + + + + use TopDocCollector(hq) instead. numHits is not used by this + constructor. It will be removed in a future release. + + + + Constructor to collect the top-scoring documents by using the given PQ. + the PQ to use by this instance. + + + + The total number of documents that matched this query. + + + The top-scoring hits. + + + A Spans that is formed from the ordered subspans of a SpanNearQuery + where the subspans do not overlap and have a maximum slop between them. +

+ The formed spans only contains minimum slop matches.
+ The matching slop is computed from the distance(s) between + the non overlapping matching Spans.
+ Successive matches are always formed from the successive Spans + of the SpanNearQuery. +

+ The formed spans may contain overlaps when the slop is at least 1. + For example, when querying using +

t1 t2 t3
+ with slop at least 1, the fragment: +
t1 t2 t1 t3 t2 t3
+ matches twice: +
t1 t2 .. t3      
+
      t1 .. t2 t3
+ + + Expert: + Only public for subclassing. Most implementations should not need this class +
+
+ + Expert: an enumeration of span matches. Used to implement span searching. + Each span represents a range of term positions within a document. Matches + are enumerated in order, by increasing document number, within that by + increasing start position and finally by increasing end position. + + + + Move to the next match, returning true iff any such exists. + + + Skips to the first match beyond the current, whose document number is + greater than or equal to target.

Returns true iff there is such + a match.

Behaves as if written:

+            boolean skipTo(int target) {
+            do {
+            if (!next())
+            return false;
+            } while (target > doc());
+            return true;
+            }
+            
+ Most implementations are considerably more efficient than that. +
+
+ + Returns the document number of the current match. Initially invalid. + + + Returns the start position of the current match. Initially invalid. + + + Returns the end position of the current match. Initially invalid. + + + Returns the payload data for the current span. + This is invalid until {@link #Next()} is called for + the first time. + This method must not be called more than once after each call + of {@link #Next()}. However, most payloads are loaded lazily, + so if the payload data for the current position is not needed, + this method may not be called at all for performance reasons. An ordered + SpanQuery does not lazy load, so if you have payloads in your index and + you do not want ordered SpanNearQuerys to collect payloads, you can + disable collection with a constructor option.
+ + Note that the return type is a collection, thus the ordering should not be relied upon. +
+

+ WARNING: The status of the Payloads feature is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case.

+ +

+ a List of byte arrays containing the data of this payload, otherwise null if isPayloadAvailable is false + + java.io.IOException +
+ + Checks if a payload can be loaded at this position. +

+ Payloads can only be loaded once per call to + {@link #Next()}. + +

+ true if there is a payload available at this position that can be loaded + +
+ + The spans in the same order as the SpanNearQuery + + + Indicates that all subSpans have same doc() + + + Advances the subSpans to just after an ordered match with a minimum slop + that is smaller than the slop allowed by the SpanNearQuery. + + true iff there is such a match. + + + + Advance the subSpans to the same document + + + Check whether two Spans in the same document are ordered. + + + + + true iff spans1 starts before spans2 + or the spans start at the same position, + and spans1 ends before spans2. + + + + Like {@link #DocSpansOrdered(Spans,Spans)}, but use the spans + starts and ends as parameters. + + + + Order the subSpans within the same document by advancing all later spans + after the previous one. + + + + The subSpans are ordered in the same doc, so there is a possible match. + Compute the slop while making the match as short as possible by advancing + all subSpans except the last one in reverse order. + + + +

Wrapper to allow {@link SpanQuery} objects participate in composite + single-field SpanQueries by 'lying' about their search field. That is, + the masked SpanQuery will function as normal, + but {@link SpanQuery#GetField()} simply hands back the value supplied + in this class's constructor.

+ +

This can be used to support Queries like {@link SpanNearQuery} or + {@link SpanOrQuery} across different fields, which is not ordinarily + permitted.

+ +

This can be useful for denormalized relational data: for example, when + indexing a document with conceptually many 'children':

+ +

+            teacherid: 1
+            studentfirstname: james
+            studentsurname: jones
+            
+            teacherid: 2
+            studenfirstname: james
+            studentsurname: smith
+            studentfirstname: sally
+            studentsurname: jones
+            
+ +

a SpanNearQuery with a slop of 0 can be applied across two + {@link SpanTermQuery} objects as follows: +

+            SpanQuery q1  = new SpanTermQuery(new Term("studentfirstname", "james"));
+            SpanQuery q2  = new SpanTermQuery(new Term("studentsurname", "jones"));
+            SpanQuery q2m new FieldMaskingSpanQuery(q2, "studentfirstname");
+            Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);
+            
+ to search for 'studentfirstname:james studentsurname:jones' and find + teacherid 1 without matching teacherid 2 (which has a 'james' in position 0 + and 'jones' in position 1).

+ +

Note: as {@link #GetField()} returns the masked field, scoring will be + done using the norms of the field name supplied. This may lead to unexpected + scoring behaviour.

+

+
+ + use {@link #ExtractTerms(Set)} instead. + + + + Expert: returns a comparator for sorting ScoreDocs. + +

+ Created: Apr 21, 2004 3:49:28 PM + + This class will be used as part of a key to a FieldCache value. You must + implement hashCode and equals to avoid an explosion in RAM usage if you use + instances that are not the same instance. If you are searching using the + Remote contrib, the same instance of this class on the client will be a new + instance on every call to the server, so hashCode/equals is very important in + that situation. + +

+ $Id: SortComparatorSource.java 747019 2009-02-23 13:59:50Z + mikemccand $ + + 1.4 + + Please use {@link FieldComparatorSource} instead. + +
+ + Creates a comparator for the field in the given index. + Index to create comparator for. + + Name of the field to create comparator for. + + Comparator of ScoreDoc objects. + + IOException If an error occurs reading the index. + + + Expert: Returned by low-level search implementations. + + + + + Expert: The score of this document for the query. + + + Expert: A hit document's number. + + + + + Expert: Constructs a ScoreDoc. + + + Constrains search results to only match those which also match a provided + query. Results are cached, so that searches after the first on the same + index using this filter are much faster. + + + $Id: QueryFilter.java 528298 2007-04-13 00:59:28Z hossman $ + + use a CachingWrapperFilter with QueryWrapperFilter + + + + Wraps another filter's result and caches it. The purpose is to allow + filters to simply filter, and then wrap with this class to add caching. + + + + A transient Filter cache. + + + Filter to cache results of + + + + Use {@link #GetDocIdSet(IndexReader)} instead. + + + + Provide the DocIdSet to be cached, using the DocIdSet provided + by the wrapped Filter. + This implementation returns the given DocIdSet. + + + + Constructs a filter which only matches documents matching + query. + + + + A {@link Filter} that only accepts numeric values within + a specified range. To use this, you must first index the + numeric values using {@link NumericField} (expert: {@link + NumericTokenStream}). + +

You create a new NumericRangeFilter with the static + factory methods, eg: + +

+            Filter f = NumericRangeFilter.newFloatRange("weight",
+            new Float(0.3f), new Float(0.10f),
+            true, true);
+            
+ + accepts all documents whose float valued "weight" field + ranges from 0.3 to 0.10, inclusive. + See {@link NumericRangeQuery} for details on how Lucene + indexes and searches numeric valued fields. + +

NOTE: This API is experimental and + might change in incompatible ways in the next + release. + +

+ 2.9 + + +
+ + Factory that creates a NumericRangeFilter, that filters a long + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that queries a long + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that filters a int + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that queries a int + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that filters a double + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that queries a double + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that filters a float + range using the given precisionStep. + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Factory that creates a NumericRangeFilter, that queries a float + range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). + You can have half-open ranges (which are in fact </≤ or >/≥ queries) + by setting the min or max value to null. By setting inclusive to false, it will + match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. + + + + Returns the field name for this filter + + + Returns true if the lower endpoint is inclusive + + + Returns true if the upper endpoint is inclusive + + + Returns the lower value of this range filter + + + Returns the upper value of this range filter + Implements search over a single IndexReader. @@ -2183,309 +5980,255 @@ If true, then the max score for all matching docs is computed. - - Expert: obtains short field values from the - {@link Lucene.Net.Search.FieldCache FieldCache} - using getShorts() and makes those values - available as other numeric types, casting as needed. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. + +

Wrapper for ({@link HitCollector}) implementations, which simply re-bases the + incoming docID before calling {@link HitCollector#collect}. - for requirements - on the field. - -

NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

- - - -

Expert: A base class for ValueSource implementations that retrieve values for - a single field from the {@link Lucene.Net.Search.FieldCache FieldCache}. -

- Fields used herein nust be indexed (doesn't matter if these fields are stored or not). -

- It is assumed that each such indexed field is untokenized, or at least has a single token in a document. - For documents with multiple tokens of the same field, behavior is undefined (It is likely that current - code would use the value of one of these tokens, but this is not guaranteed). -

- Document with no tokens in this field are assigned the Zero value. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

-

- - - Expert: source of values for basic function queries. -

At its default/simplest form, values - one per doc - are used as the score of that doc. -

Values are instantiated as - {@link Lucene.Net.Search.Function.DocValues DocValues} for a particular reader. -

ValueSource implementations differ in RAM requirements: it would always be a factor - of the number of documents, but for each document the number of bytes can be 1, 2, 4, or 8. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - - -

-
- - Return the DocValues used by the function query. - the IndexReader used to read these values. - If any caching is involved, that caching would also be IndexReader based. - - IOException for any error. - - - description of field, used in explain() - - - Needed for possible caching of query results - used by {@link ValueSourceQuery#equals(Object)}. - - - - - Needed for possible caching of query results - used by {@link ValueSourceQuery#hashCode()}. - - - - - Create a cached field source for the input field. - - - Return cached DocValues for input field and reader. - FieldCache so that values of a field are loaded once per reader (RAM allowing) - - Field for which values are required. - - - - - - Check if equals to another {@link FieldCacheSource}, already knowing that cache and field are equal. - - - - - Return a hash code of a {@link FieldCacheSource}, without the hash-codes of the field - and the cache (those are taken care of elsewhere). - - - - - - Create a cached short field source with default string-to-short parser. - - - Create a cached short field source with a specific string-to-short parser. - - - Expert: represents field values as different types. - Normally created via a - {@link Lucene.Net.Search.Function.ValueSource ValueSuorce} - for a particular field and reader. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - - -

-
- - Return doc value as a float. -

Mandatory: every DocValues implementation must implement at least this method. -

- document whose float value is requested. - -
- - Return doc value as an int. -

Optional: DocValues implementation can (but don't have to) override this method. -

- document whose int value is requested. - -
- - Return doc value as a long. -

Optional: DocValues implementation can (but don't have to) override this method. -

- document whose long value is requested. - -
- - Return doc value as a double. -

Optional: DocValues implementation can (but don't have to) override this method. -

- document whose double value is requested. - -
- - Return doc value as a string. -

Optional: DocValues implementation can (but don't have to) override this method. -

- document whose string value is requested. - -
- - Return a string representation of a doc value, as reuired for Explanations. - - - Explain the scoring value for the input doc. - - - Expert: for test purposes only, return the inner array of values, or null if not applicable. -

- Allows tests to verify that loaded values are: -

    -
  1. indeed cached/reused.
  2. -
  3. stored in the expected size/type (byte/short/int/float).
  4. -
- Note: implementations of DocValues must override this method for - these test elements to be tested, Otherwise the test would not fail, just - print a warning. -
-
- - Returns the minimum of all values or Float.NaN if this - DocValues instance does not contain any value. -

- This operation is optional -

- -

- the minimum of all values or Float.NaN if this - DocValues instance does not contain any value. - -
- - Returns the maximum of all values or Float.NaN if this - DocValues instance does not contain any value. -

- This operation is optional -

- -

- the maximum of all values or Float.NaN if this - DocValues instance does not contain any value. - -
- - Returns the average of all values or Float.NaN if this - DocValues instance does not contain any value. * -

- This operation is optional -

- -

- the average of all values or Float.NaN if this - DocValues instance does not contain any value - -
- - This class wraps another ValueSource, but protects - against accidental double RAM usage in FieldCache when - a composite reader is passed to {@link #getValues}. - -

NOTE: this class adds a CPU penalty to every - lookup, as it must resolve the incoming document to the - right sub-reader using a binary search.

- -

- This class is temporary, to ease the - migration to segment-based searching. Please change your - code to not pass composite readers to these APIs. + Please migrate custom HitCollectors to the new {@link Collector} + class. This class will be removed when {@link HitCollector} is + removed.
- + + Expert: obtains the ordinal of the field value from the default Lucene + {@link Lucene.Net.Search.FieldCache FieldCache} using getStringIndex() + and reverses the order. +

+ The native lucene index order is used to assign an ordinal value for each field value. +

+ Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. +
+ Example of reverse ordinal (rord): +
If there were only three field values: "apple","banana","pear" +
then rord("apple")=3, rord("banana")=2, ord("pear")=1 +

+ WARNING: + rord() depends on the position in an index and can thus change + when other documents are inserted or deleted, + or if a MultiSearcher is used. + +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

+

+
+ + Contructor for a certain field. + field whose values reverse order is used. + + + Expert: A hit queue for sorting by hits by terms in more than one field. - Uses FieldCache.DEFAULT for maintaining - internal term lookup tables. + Uses FieldCache.DEFAULT for maintaining internal term lookup tables. - This class will not resolve SortField.AUTO types, and expects the type - of all SortFields used for construction to already have been resolved. - {@link SortField#DetectFieldType(IndexReader, String)} is a utility method which - may be used for field type detection. - - NOTE: This API is experimental and might change in - incompatible ways in the next release. +

Created: Dec 8, 2003 12:56:03 PM

- 2.9 + lucene 1.4 - $Id: + $Id: FieldSortedHitQueue.java 803676 2009-08-12 19:31:38Z hossman $ + see {@link FieldValueHitQueue} +
- - Creates a hit queue sorted by the given list of fields. + + A PriorityQueue maintains a partial ordering of its elements such that the + least element can always be found in constant time. Put()'s and pop()'s + require log(size) time. -

NOTE: The instances returned by this method - pre-allocate a full array of length numHits. +

NOTE: This class pre-allocates a full array of + length maxSize+1, in {@link #initialize}.

- SortField array we are sorting by in priority order (highest - priority first); cannot be null or empty +
+ + Determines the ordering of objects in this priority queue. Subclasses + must define this one method. + + + + This method can be overridden by extending classes to return a sentinel + object which will be used by {@link #Initialize(int)} to fill the queue, so + that the code which uses that queue can always assume it's full and only + change the top without attempting to insert any new object.
+ + Those sentinel values should always compare worse than any non-sentinel + value (i.e., {@link #LessThan(Object, Object)} should always favor the + non-sentinel values).
+ + By default, this method returns false, which means the queue will not be + filled with sentinel values. Otherwise, the value returned will be used to + pre-populate the queue. Adds sentinel values to the queue.
+ + If this method is extended to return a non-null value, then the following + usage pattern is recommended: + +
+            // extends getSentinelObject() to return a non-null value.
+            PriorityQueue pq = new MyQueue(numHits);
+            // save the 'top' element, which is guaranteed to not be null.
+            MyObject pqTop = (MyObject) pq.top();
+            <...>
+            // now in order to add a new element, which is 'better' than top (after 
+            // you've verified it is better), it is as simple as:
+            pqTop.change().
+            pqTop = pq.updateTop();
+            
+ + NOTE: if this method returns a non-null value, it will be called by + {@link #Initialize(int)} {@link #Size()} times, relying on a new object to + be returned and will not check if it's null again. Therefore you should + ensure any call to this method creates a new instance and behaves + consistently, e.g., it cannot return null if it previously returned + non-null. + +
+ the sentinel object to use to pre-populate the queue, or null if + sentinel objects are not supported. + +
+ + Subclass constructors must call this. + + + Adds an Object to a PriorityQueue in log(size) time. If one tries to add + more objects than maxSize from initialize a RuntimeException + (ArrayIndexOutOfBound) is thrown. + + + use {@link #Add(Object)} which returns the new top object, + saving an additional call to {@link #Top()}. + + + + Adds an Object to a PriorityQueue in log(size) time. If one tries to add + more objects than maxSize from initialize an + {@link ArrayIndexOutOfBoundsException} is thrown. + + + the new 'top' element in the queue. + + + + Adds element to the PriorityQueue in log(size) time if either the + PriorityQueue is not full, or not lessThan(element, top()). + + + - The number of hits to retain. Must be greater than zero. + true if element is added, false otherwise. + + use {@link #InsertWithOverflow(Object)} instead, which + encourages objects reuse. + + + + insertWithOverflow() is the same as insert() except its + return value: it returns the object (if any) that was + dropped off the heap because it was full. This can be + the given parameter (in case it is smaller than the + full heap's minimum, and couldn't be added), or another + object that was previously the smallest value in the + heap and now has been replaced by a larger one, or null + if the queue wasn't yet full with maxSize elements. + + + + Returns the least element of the PriorityQueue in constant time. + + + Removes and returns the least element of the PriorityQueue in log(size) + time. + + + + Should be called when the Object at top changes values. Still log(n) worst + case, but it's at least twice as fast to + +
+            pq.top().change();
+            pq.adjustTop();
+            
+ + instead of + +
+            o = pq.pop();
+            o.change();
+            pq.push(o);
+            
+ +
+ use {@link #UpdateTop()} which returns the new top element and + saves an additional call to {@link #Top()}. + +
+ + Should be called when the Object at top changes values. Still log(n) worst + case, but it's at least twice as fast to + +
+            pq.top().change();
+            pq.updateTop();
+            
+ + instead of + +
+            o = pq.pop();
+            o.change();
+            pq.push(o);
+            
+ +
+ the new 'top' element. + +
+ + Returns the number of elements currently stored in the PriorityQueue. + + + Removes all entries from the PriorityQueue. + + + Creates a hit queue sorted by the given list of fields. + Index to use. + + Fieldable names, in priority order (highest priority first). Cannot be null or empty. + + The number of hits to retain. Must be greater than zero. IOException - + + Stores a comparator corresponding to each field being sorted by + + Stores the sort criteria being used. - - Given a queue Entry, creates a corresponding FieldDoc - that contains the values used to sort the given document. - These values are not the raw values out of the index, but the internal - representation of them. This is so the given search hit can be collated by - a MultiSearcher with other search hits. - - - The Entry used to create a FieldDoc - - The newly created FieldDoc - - - + + Stores the maximum score value encountered, needed for normalizing. - - Returns the SortFields being used by this hit queue. + + returns the maximum score encountered by elements inserted via insert() - - An implementation of {@link FieldValueHitQueue} which is optimized in case - there is just one comparator. - - - + Returns whether a is less relevant than b. ScoreDoc @@ -2494,173 +6237,275 @@ true if document a should be sorted after document b. - - An implementation of {@link FieldValueHitQueue} which is optimized in case - there is more than one comparator. + + Given a FieldDoc object, stores the values used + to sort the given document. These values are not the raw + values out of the index, but the internal representation + of them. This is so the given search hit can be collated + by a MultiSearcher with other search hits. + + The FieldDoc to store sort values into. + + The same FieldDoc passed in. + + + + + + Returns the SortFields being used by this hit queue. + + + Internal cache of comparators. Similar to FieldCache, only + caches comparators instead of term values. - - A range filter built on top of a cached single term field (in {@link FieldCache}). + + Returns a comparator for sorting hits according to a field containing bytes. + Index to use. + + Fieldable containing integer values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing shorts. + Index to use. + + Fieldable containing integer values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing integers. + Index to use. + + Fieldable containing integer values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing integers. + Index to use. + + Fieldable containing integer values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing floats. + Index to use. + + Fieldable containing float values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing doubles. + Index to use. + + Fieldable containing float values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing strings. + Index to use. + + Fieldable containing string values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to a field containing strings. + Index to use. + + Fieldable containing string values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Returns a comparator for sorting hits according to values in the given field. + The terms in the field are looked at to determine whether they contain integers, + floats or strings. Once the type is determined, one of the other static methods + in this class is called to get the comparator. + + Index to use. + + Fieldable containing values. + + Comparator for sorting hits. + + IOException If an error occurs reading the index. + + + Expert: Internal cache. + + + Expert: The default cache implementation, storing all values in memory. + A WeakHashMap is used for storage. -

FieldCacheRangeFilter builds a single cache for the field the first time it is used. - Each subsequent FieldCacheRangeFilter on the same field then reuses this cache, - even if the range itself changes. +

Created: May 19, 2004 4:40:36 PM -

This means that FieldCacheRangeFilter is much faster (sometimes more than 100x as fast) - as building a {@link TermRangeFilter} (or {@link ConstantScoreRangeQuery} on a {@link TermRangeFilter}) - for each query, if using a {@link #newStringRange}. However, if the range never changes it - is slower (around 2x as slow) than building a CachingWrapperFilter on top of a single TermRangeFilter. - - For numeric data types, this filter may be significantly faster than {@link NumericRangeFilter}. - Furthermore, it does not need the numeric values encoded by {@link NumericField}. But - it has the problem that it only works with exact one value/document (see below). - -

As with all {@link FieldCache} based functionality, FieldCacheRangeFilter is only valid for - fields which exact one term for each document (except for {@link #newStringRange} - where 0 terms are also allowed). Due to a restriction of {@link FieldCache}, for numeric ranges - all terms that do not have a numeric value, 0 is assumed. - -

Thus it works on dates, prices and other single value fields but will not work on - regular text fields. It is preferable to use a NOT_ANALYZED field to ensure that - there is only a single term. - -

This class does not have an constructor, use one of the static factory methods available, - that create a correct instance for different data types supported by {@link FieldCache}.

+ lucene 1.4 + + $Id: FieldCacheImpl.java 807572 2009-08-25 11:44:45Z mikemccand $ +
- - This method is implemented for each data type - - - Creates a string range query using {@link FieldCache#getStringIndex}. This works with all - fields containing zero or one term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetBytes(IndexReader,String)}. This works with all - byte fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetBytes(IndexReader,String,FieldCache.ByteParser)}. This works with all - byte fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetShorts(IndexReader,String)}. This works with all - short fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetShorts(IndexReader,String,FieldCache.ShortParser)}. This works with all - short fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetInts(IndexReader,String)}. This works with all - int fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetInts(IndexReader,String,FieldCache.IntParser)}. This works with all - int fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetLongs(IndexReader,String)}. This works with all - long fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetLongs(IndexReader,String,FieldCache.LongParser)}. This works with all - long fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetFloats(IndexReader,String)}. This works with all - float fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetFloats(IndexReader,String,FieldCache.FloatParser)}. This works with all - float fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetDoubles(IndexReader,String)}. This works with all - double fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - Creates a numeric range query using {@link FieldCache#GetDoubles(IndexReader,String,FieldCache.DoubleParser)}. This works with all - double fields containing exactly one numeric term in the field. The range can be half-open by setting one - of the values to null. - - - - A DocIdSet contains a set of doc ids. Implementing classes must - only implement {@link #iterator} to provide access to the set. - - - - An empty {@code DocIdSet} instance for easy use, e.g. in Filters that hit no documents. - - - Provides a {@link DocIdSetIterator} to access the set. - This implementation can return null or - {@linkplain #EMPTY_DOCIDSET}.iterator() if there - are no docs that match. - - - - This method is a hint for {@link CachingWrapperFilter}, if this DocIdSet - should be cached without copying it into a BitSet. The default is to return - false. If you have an own DocIdSet implementation - that does its iteration very effective and fast without doing disk I/O, - override this method and return true. - - - - this method checks, if a doc is a hit, should throw AIOBE, when position invalid - - - this DocIdSet is cacheable, if it works solely with FieldCache and no TermDocs - - - @deprecated use {@link #NextDoc()} instead. - - - use {@link #Advance(int)} instead. + + Will be removed in 3.0, this is for binary compatibility only - - use {@link #DocID()} instead. + + Will be removed in 3.0, this is for binary compatibility only - - use {@link #NextDoc()} instead. + + Will be removed in 3.0, this is for binary compatibility only - - use {@link #Advance(int)} instead. + + Will be removed in 3.0, this is for binary compatibility only - - use {@link #DocID()} instead. + + The pattern used to detect float values in a field + removed for java 1.3 compatibility + protected static final Object pFloats = Pattern.compile ("[0-9+\\-\\.eEfFdD]+"); + + + + + + Only needed because of Entry (ab)use by + FieldSortedHitQueue, remove when FieldSortedHitQueue + is removed + + + + Only needed because of Entry (ab)use by + FieldSortedHitQueue, remove when FieldSortedHitQueue + is removed + + + + Adds warning to super.toString if Local or sortFieldType were specified + Only needed because of Entry (ab)use by + FieldSortedHitQueue, remove when FieldSortedHitQueue + is removed + + + + Hack: When thrown from a Parser (NUMERIC_UTILS_* ones), this stops + processing terms and returns the current FieldCache + array. + + + + Expert: Internal cache. + + + Expert: Every composite-key in the internal cache is of this type. + + + Only (ab)used by FieldSortedHitQueue, + remove when FieldSortedHitQueue is removed + + + + Only (ab)used by FieldSortedHitQueue, + remove when FieldSortedHitQueue is removed + + + + Only (ab)used by FieldSortedHitQueue, + remove when FieldSortedHitQueue is removed + + + + Creates one of these objects for a custom comparator/parser. + + + Only (ab)used by FieldSortedHitQueue, + remove when FieldSortedHitQueue is removed + + + + Two of these are equal iff they reference the same field and type. + + + Composes a hashcode based on the field and type. + + + Please specify the exact type, instead. + Especially, guessing does not work with the new + {@link NumericField} type. + + + + + + + + Compares two ScoreDoc objects and returns a result indicating their + sort order. + + First ScoreDoc + + Second ScoreDoc + + a negative integer if i should come before j
+ a positive integer if i should come after j
+ 0 if they are equal +
+ + +
+ + Returns the value used to sort the given document. The + object returned must implement the java.io.Serializable + interface. This is used by multisearchers to determine how + to collate results from their searchers. + + + + Document + + Serializable object + + + + Returns the type of sort. Should return SortField.SCORE, + SortField.DOC, SortField.STRING, + SortField.INTEGER, SortField.FLOAT or + SortField.CUSTOM. It is not valid to return + SortField.AUTO. + This is used by multisearchers to determine how to collate results + from their searchers. + + One of the constants in SortField. + + + + A clause in a BooleanQuery. @@ -2709,113 +6554,6 @@ of a MUST_NOT clause.
- - Describes the input token stream. - - - An integer that describes the kind of this token. This numbering - system is determined by JavaCCParser, and a table of these numbers is - stored in the file ...Constants.java. - - - - The line number of the first character of this Token. - - - The column number of the first character of this Token. - - - The line number of the last character of this Token. - - - The column number of the last character of this Token. - - - The string image of the token. - - - A reference to the next regular (non-special) token from the input - stream. If this is the last token from the input stream, or if the - token manager has not read tokens beyond this one, this field is - set to null. This is true only if this token is also a regular - token. Otherwise, see below for a description of the contents of - this field. - - - - This field is used to access special tokens that occur prior to this - token, but after the immediately preceding regular (non-special) token. - If there are no such special tokens, this field is set to null. - When there are more than one such special token, this field refers - to the last of these special tokens, which in turn refers to the next - previous special token through its specialToken field, and so on - until the first special token (whose specialToken field is null). - The next fields of special tokens refer to other special tokens that - immediately follow it (without an intervening regular token). If there - is no such token, this field is null. - - - - An optional attribute value of the Token. - Tokens which are not used as syntactic sugar will often contain - meaningful values that will be used later on by the compiler or - interpreter. This attribute value is often different from the image. - Any subclass of Token that actually wants to return a non-null value can - override this method as appropriate. - - - - No-argument constructor - - - Constructs a new token for the specified Image. - - - Constructs a new token for the specified Image and Kind. - - - Returns the image. - - - Returns a new Token object, by default. However, if you want, you - can create and return subclass objects based on the value of ofKind. - Simply add the cases to the switch for all those special cases. - For example, if you have a subclass of Token called IDToken that - you want to create if ofKind is ID, simply add something like : - - case MyParserConstants.ID : return new IDToken(ofKind, image); - - to the following switch statement. Then you can cast matchedToken - variable to the appropriate type and use sit in your lexical actions. - - - - This class implements {@link InvertedDocConsumer}, which - is passed each token produced by the analyzer on each - field. It stores these tokens in a hash table, and - allocates separate byte streams per token. Consumers of - this class, eg {@link FreqProxTermsWriter} and {@link - TermVectorsTermsWriter}, write their own byte streams - under each term. - - - - Add a new thread - - - Abort (called after hitting AbortException) - - - Flush a new segment - - - Close doc stores - - - Attempt to free RAM, returning true if any RAM was - freed - - Extends TermFreqVector to provide additional information about positions in which each of the terms is found. A TermPositionVector not necessarily @@ -2891,1948 +6629,17 @@ An array of TermVectorOffsetInfo objects or the empty list - -

Expert: represents a single commit into an index as seen by the - {@link IndexDeletionPolicy} or {@link IndexReader}.

- -

Changes to the content of an index are made visible - only after the writer who made that change commits by - writing a new segments file - (segments_N). This point in time, when the - action of writing of a new segments file to the directory - is completed, is an index commit.

- -

Each index commit point has a unique segments file - associated with it. The segments file associated with a - later index commit point would have a larger N.

- -

WARNING: This API is a new and experimental and - may suddenly change.

-

+ + A TermInfo is the record of information stored for a term. - - Please subclass IndexCommit class instead - + + The number of documents which contain the term. - - Get the segments file (segments_N) associated - with this commit point. - - - - Returns all index files referenced by this commit point. - - - Delete this commit point. -

- Upon calling this, the writer is notified that this commit - point should be deleted. -

- Decision that a commit-point should be deleted is taken by the {@link IndexDeletionPolicy} in effect - and therefore this should only be called by its {@link IndexDeletionPolicy#onInit onInit()} or - {@link IndexDeletionPolicy#onCommit onCommit()} methods. -

-
- - Get the segments file (segments_N) associated - with this commit point. - - - - Returns all index files referenced by this commit point. - - - Returns the {@link Directory} for the index. - - - Delete this commit point. This only applies when using - the commit point in the context of IndexWriter's - IndexDeletionPolicy. -

- Upon calling this, the writer is notified that this commit - point should be deleted. -

- Decision that a commit-point should be deleted is taken by the {@link IndexDeletionPolicy} in effect - and therefore this should only be called by its {@link IndexDeletionPolicy#onInit onInit()} or - {@link IndexDeletionPolicy#onCommit onCommit()} methods. -

-
- - Returns true if this commit is an optimized index. - - - Two IndexCommits are equal if both their Directory and versions are equal. - - - Returns the version for this IndexCommit. This is the - same value that {@link IndexReader#getVersion} would - return if it were opened on this commit. - - - - Returns the generation (the _N in segments_N) for this - IndexCommit - - - - Convenience method that returns the last modified time - of the segments_N file corresponding to this index - commit, equivalent to - getDirectory().fileModified(getSegmentsFileName()). - - - - Returns userData, previously passed to {@link - IndexWriter#Commit(Map)} for this commit. Map is - String -> String. - - - - This is the base class for an in-memory posting list, - keyed by a Token. {@link TermsHash} maintains a hash - table holding one instance of this per unique Token. - Consumers of TermsHash ({@link TermsHashConsumer}) must - subclass this class with its own concrete class. - FreqProxTermsWriter.PostingList is a private inner class used - for the freq/prox postings, and - TermVectorsTermsWriter.PostingList is a private inner class - used to hold TermVectors postings. - - - - An {@link Analyzer} that filters {@link LetterTokenizer} - with {@link LowerCaseFilter} - - - - An Analyzer builds TokenStreams, which analyze text. It thus represents a - policy for extracting index terms from text. -

- Typical implementations first build a Tokenizer, which breaks the stream of - characters from the Reader into raw Tokens. One or more TokenFilters may - then be applied to the output of the Tokenizer. -

-
- - Creates a TokenStream which tokenizes all the text in the provided - Reader. Must be able to handle null field name for - backward compatibility. - - - - Creates a TokenStream that is allowed to be re-used - from the previous time that the same thread called - this method. Callers that do not need to use more - than one TokenStream at the same time from this - analyzer should use this method for better - performance. - - - - Used by Analyzers that implement reusableTokenStream - to retrieve previously saved TokenStreams for re-use - by the same thread. - - - - Used by Analyzers that implement reusableTokenStream - to save a TokenStream for later re-use by the same - thread. - - - - This is only present to preserve - back-compat of classes that subclass a core analyzer - and override tokenStream but not reusableTokenStream - - - - Invoked before indexing a Fieldable instance if - terms have already been added to that field. This allows custom - analyzers to place an automatic position increment gap between - Fieldable instances using the same field name. The default value - position increment gap is 0. With a 0 position increment gap and - the typical default token position increment of 1, all terms in a field, - including across Fieldable instances, are in successive positions, allowing - exact PhraseQuery matches, for instance, across Fieldable instance boundaries. - - - Fieldable name being indexed. - - position increment gap, added to the next token emitted from {@link #TokenStream(String,Reader)} - - - - Just like {@link #getPositionIncrementGap}, except for - Token offsets instead. By default this returns 1 for - tokenized fields and, as if the fields were joined - with an extra space character, and 0 for un-tokenized - fields. This method is only called if the field - produced at least one token for indexing. - - - the field just indexed - - offset gap, added to the next token emitted from {@link #TokenStream(String,Reader)} - - - - Frees persistent resources used by this Analyzer - - - Emits the entire input as a single token. - - - A Tokenizer is a TokenStream whose input is a Reader. -

- This is an abstract class; subclasses must override {@link #IncrementToken()} -

- NOTE: Subclasses overriding {@link #next(Token)} must call - {@link AttributeSource#ClearAttributes()} before setting attributes. - Subclasses overriding {@link #IncrementToken()} must call - {@link Token#Clear()} before setting Token attributes. -

-
- - A TokenStream enumerates the sequence of tokens, either from - {@link Field}s of a {@link Document} or from query text. -

- This is an abstract class. Concrete subclasses are: -

    -
  • {@link Tokenizer}, a TokenStream whose input is a Reader; and
  • -
  • {@link TokenFilter}, a TokenStream whose input is another - TokenStream.
  • -
- A new TokenStream API has been introduced with Lucene 2.9. This API - has moved from being {@link Token} based to {@link Attribute} based. While - {@link Token} still exists in 2.9 as a convenience class, the preferred way - to store the information of a {@link Token} is to use {@link AttributeImpl}s. -

- TokenStream now extends {@link AttributeSource}, which provides - access to all of the token {@link Attribute}s for the TokenStream. - Note that only one instance per {@link AttributeImpl} is created and reused - for every token. This approach reduces object creation and allows local - caching of references to the {@link AttributeImpl}s. See - {@link #IncrementToken()} for further details. -

- The workflow of the new TokenStream API is as follows: -

    -
  1. Instantiation of TokenStream/{@link TokenFilter}s which add/get - attributes to/from the {@link AttributeSource}.
  2. -
  3. The consumer calls {@link TokenStream#Reset()}.
  4. -
  5. The consumer retrieves attributes from the stream and stores local - references to all attributes it wants to access
  6. -
  7. The consumer calls {@link #IncrementToken()} until it returns false and - consumes the attributes after each call.
  8. -
  9. The consumer calls {@link #End()} so that any end-of-stream operations - can be performed.
  10. -
  11. The consumer calls {@link #Close()} to release any resource when finished - using the TokenStream
  12. -
- To make sure that filters and consumers know which attributes are available, - the attributes must be added during instantiation. Filters and consumers are - not required to check for availability of attributes in - {@link #IncrementToken()}. -

- You can find some example code for the new API in the analysis package level - Javadoc. -

- Sometimes it is desirable to capture a current state of a TokenStream - , e. g. for buffering purposes (see {@link CachingTokenFilter}, - {@link TeeSinkTokenFilter}). For this usecase - {@link AttributeSource#CaptureState} and {@link AttributeSource#RestoreState} - can be used. -

-
- - An AttributeSource contains a list of different {@link AttributeImpl}s, - and methods to add and get them. There can only be a single instance - of an attribute in the same AttributeSource instance. This is ensured - by passing in the actual type of the Attribute (Class<Attribute>) to - the {@link #AddAttribute(Class)}, which then checks if an instance of - that type is already present. If yes, it returns the instance, otherwise - it creates a new instance and returns it. - - - - An AttributeSource using the default attribute factory {@link AttributeSource.AttributeFactory#DEFAULT_ATTRIBUTE_FACTORY}. - - - An AttributeSource that uses the same attributes as the supplied one. - - - An AttributeSource using the supplied {@link AttributeFactory} for creating new {@link Attribute} instances. - - - returns the used AttributeFactory. - - - Returns a new iterator that iterates the attribute classes - in the same order they were added in. - Signature for Java 1.5: public Iterator<Class<? extends Attribute>> getAttributeClassesIterator() - - Note that this return value is different from Java in that it enumerates over the values - and not the keys - - - - Returns a new iterator that iterates all unique Attribute implementations. - This iterator may contain less entries that {@link #getAttributeClassesIterator}, - if one instance implements more than one Attribute interface. - Signature for Java 1.5: public Iterator<AttributeImpl> getAttributeImplsIterator() - - - - a cache that stores all interfaces for known implementation classes for performance (slow reflection) - - - Adds a custom AttributeImpl instance with one or more Attribute interfaces. - - - The caller must pass in a Class<? extends Attribute> value. - This method first checks if an instance of that class is - already in this AttributeSource and returns it. Otherwise a - new instance is created, added to this AttributeSource and returned. - Signature for Java 1.5: public <T extends Attribute> T addAttribute(Class<T>) - - - - Returns true, iff this AttributeSource has any attributes - - - The caller must pass in a Class<? extends Attribute> value. - Returns true, iff this AttributeSource contains the passed-in Attribute. - Signature for Java 1.5: public boolean hasAttribute(Class<? extends Attribute>) - - - - The caller must pass in a Class<? extends Attribute> value. - Returns the instance of the passed in Attribute contained in this AttributeSource - Signature for Java 1.5: public <T extends Attribute> T getAttribute(Class<T>) - - - IllegalArgumentException if this AttributeSource does not contain the - Attribute. It is recommended to always use {@link #addAttribute} even in consumers - of TokenStreams, because you cannot know if a specific TokenStream really uses - a specific Attribute. {@link #addAttribute} will automatically make the attribute - available. If you want to only use the attribute, if it is available (to optimize - consuming), use {@link #hasAttribute}. - - - - Resets all Attributes in this AttributeSource by calling - {@link AttributeImpl#Clear()} on each Attribute implementation. - - - - Captures the state of all Attributes. The return value can be passed to - {@link #restoreState} to restore the state of this or another AttributeSource. - - - - Restores this state by copying the values of all attribute implementations - that this state contains into the attributes implementations of the targetStream. - The targetStream must contain a corresponding instance for each argument - contained in this state (e.g. it is not possible to restore the state of - an AttributeSource containing a TermAttribute into a AttributeSource using - a Token instance as implementation). - - Note that this method does not affect attributes of the targetStream - that are not contained in this state. In other words, if for example - the targetStream contains an OffsetAttribute, but this state doesn't, then - the value of the OffsetAttribute remains unchanged. It might be desirable to - reset its value to the default, in which case the caller should first - call {@link TokenStream#ClearAttributes()} on the targetStream. - - - - Performs a clone of all {@link AttributeImpl} instances returned in a new - AttributeSource instance. This method can be used to e.g. create another TokenStream - with exactly the same attributes (using {@link #AttributeSource(AttributeSource)}) - - - - An AttributeFactory creates instances of {@link AttributeImpl}s. - - - returns an {@link AttributeImpl} for the supplied {@link Attribute} interface class. -

Signature for Java 1.5: public AttributeImpl createAttributeInstance(Class%lt;? extends Attribute> attClass) -

-
- - This is the default factory that creates {@link AttributeImpl}s using the - class name of the supplied {@link Attribute} interface class by appending Impl to it. - - - - This class holds the state of an AttributeSource. - - - - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - A TokenStream using the default attribute factory. - - - A TokenStream that uses the same attributes as the supplied one. - - - A TokenStream using the supplied AttributeFactory for creating new {@link Attribute} instances. - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - For extra performance you can globally enable the new - {@link #IncrementToken} API using {@link Attribute}s. There will be a - small, but in most cases negligible performance increase by enabling this, - but it only works if all TokenStreams use the new API and - implement {@link #IncrementToken}. This setting can only be enabled - globally. -

- This setting only affects TokenStreams instantiated after this - call. All TokenStreams already created use the other setting. -

- All core {@link Analyzer}s are compatible with this setting, if you have - your own TokenStreams that are also compatible, you should enable - this. -

- When enabled, tokenization may throw {@link UnsupportedOperationException} - s, if the whole tokenizer chain is not compatible eg one of the - TokenStreams does not implement the new TokenStream API. -

- The default is false, so there is the fallback to the old API - available. - -

- This setting will no longer be needed in Lucene 3.0 as the old - API will be removed. - -
- - Returns if only the new API is used. - - - - - This setting will no longer be needed in Lucene 3.0 as - the old API will be removed. - - - - Consumers (i.e., {@link IndexWriter}) use this method to advance the stream to - the next token. Implementing classes must implement this method and update - the appropriate {@link AttributeImpl}s with the attributes of the next - token. - - The producer must make no assumptions about the attributes after the - method has been returned: the caller may arbitrarily change it. If the - producer needs to preserve the state for subsequent calls, it can use - {@link #captureState} to create a copy of the current attribute state. - - This method is called for every token of a document, so an efficient - implementation is crucial for good performance. To avoid calls to - {@link #AddAttribute(Class)} and {@link #GetAttribute(Class)} or downcasts, - references to all {@link AttributeImpl}s that this stream uses should be - retrieved during instantiation. - - To ensure that filters and consumers know which attributes are available, - the attributes must be added during instantiation. Filters and consumers - are not required to check for availability of attributes in - {@link #IncrementToken()}. - - - false for end of stream; true otherwise - - Note that this method will be defined abstract in Lucene - 3.0. - - - - This method is called by the consumer after the last token has been - consumed, after {@link #IncrementToken()} returned false - (using the new TokenStream API). Streams implementing the old API - should upgrade to use this feature. -

- This method can be used to perform any end-of-stream operations, such as - setting the final offset of a stream. The final offset of a stream might - differ from the offset of the last token eg in case one or more whitespaces - followed after the last token, but a {@link WhitespaceTokenizer} was used. - -

- IOException -
- - Returns the next token in the stream, or null at EOS. When possible, the - input Token should be used as the returned Token (this gives fastest - tokenization performance), but this is not required and a new Token may be - returned. Callers may re-use a single Token instance for successive calls - to this method. - - This implicitly defines a "contract" between consumers (callers of this - method) and producers (implementations of this method that are the source - for tokens): -
    -
  • A consumer must fully consume the previously returned {@link Token} - before calling this method again.
  • -
  • A producer must call {@link Token#Clear()} before setting the fields in - it and returning it
  • -
- Also, the producer must make no assumptions about a {@link Token} after it - has been returned: the caller may arbitrarily change it. If the producer - needs to hold onto the {@link Token} for subsequent calls, it must clone() - it before storing it. Note that a {@link TokenFilter} is considered a - consumer. - -
- a {@link Token} that may or may not be used to return; - this parameter should never be null (the callee is not required to - check for null before using it, but it is a good idea to assert that - it is not null.) - - next {@link Token} in the stream or null if end-of-stream was hit - - The new {@link #IncrementToken()} and {@link AttributeSource} - APIs should be used instead. - -
- - Returns the next {@link Token} in the stream, or null at EOS. - - - The returned Token is a "full private copy" (not re-used across - calls to {@link #Next()}) but will be slower than calling - {@link #Next(Token)} or using the new {@link #IncrementToken()} - method with the new {@link AttributeSource} API. - - - - Resets this stream to the beginning. This is an optional operation, so - subclasses may or may not implement this method. {@link #Reset()} is not needed for - the standard indexing process. However, if the tokens of a - TokenStream are intended to be consumed more than once, it is - necessary to implement {@link #Reset()}. Note that if your TokenStream - caches tokens and feeds them back again after a reset, it is imperative - that you clone the tokens when you store them away (on the first pass) as - well as when you return them (on future passes after {@link #Reset()}). - - - - Releases resources associated with this stream. - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - The text source for this Tokenizer. - - - Construct a tokenizer with null input. - - - Construct a token stream processing the given input. - - - Construct a tokenizer with null input using the given AttributeFactory. - - - Construct a token stream processing the given input using the given AttributeFactory. - - - Construct a token stream processing the given input using the given AttributeSource. - - - Construct a token stream processing the given input using the given AttributeSource. - - - By default, closes the input Reader. - - - Return the corrected offset. If {@link #input} is a {@link CharStream} subclass - this method calls {@link CharStream#CorrectOffset}, else returns currentOff. - - offset as seen in the output - - corrected offset based on the input - - - - - - Expert: Reset the tokenizer to a new reader. Typically, an - analyzer (in its reusableTokenStream method) will use - this to re-use a previously created tokenizer. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Provides support for converting byte sequences to Strings and back again. - The resulting Strings preserve the original byte sequences' sort order. - - The Strings are constructed using a Base 8000h encoding of the original - binary data - each char of an encoded String represents a 15-bit chunk - from the byte sequence. Base 8000h was chosen because it allows for all - lower 15 bits of char to be used without restriction; the surrogate range - [U+D8000-U+DFFF] does not represent valid chars, and would require - complicated handling to avoid them and allow use of char's high bit. - - Although unset bits are used as padding in the final char, the original - byte sequence could contain trailing bytes with no set bits (null bytes): - padding is indistinguishable from valid information. To overcome this - problem, a char is appended, indicating the number of encoded bytes in the - final content char. - - This class's operations are defined over CharBuffers and ByteBuffers, to - allow for wrapped arrays to be reused, reducing memory allocation costs for - repeated operations. Note that this class calls array() and arrayOffset() - on the CharBuffers and ByteBuffers it uses, so only wrapped arrays may be - used. This class interprets the arrayOffset() and limit() values returned by - its input buffers as beginning and end+1 positions on the wrapped array, - resprectively; similarly, on the output buffer, arrayOffset() is the first - position written to, and limit() is set to one past the final output array - position. - - - - Returns the number of chars required to encode the given byte sequence. - - - The byte sequence to be encoded. Must be backed by an array. - - The number of chars required to encode the given byte sequence - - IllegalArgumentException If the given ByteBuffer is not backed by an array - - - Returns the number of bytes required to decode the given char sequence. - - - The char sequence to be encoded. Must be backed by an array. - - The number of bytes required to decode the given char sequence - - IllegalArgumentException If the given CharBuffer is not backed by an array - - - Encodes the input byte sequence into the output char sequence. Before - calling this method, ensure that the output CharBuffer has sufficient - capacity by calling {@link #GetEncodedLength(java.nio.ByteBuffer)}. - - - The byte sequence to encode - - Where the char sequence encoding result will go. The limit - is set to one past the position of the final char. - - IllegalArgumentException If either the input or the output buffer - is not backed by an array - - - - Decodes the input char sequence into the output byte sequence. Before - calling this method, ensure that the output ByteBuffer has sufficient - capacity by calling {@link #GetDecodedLength(java.nio.CharBuffer)}. - - - The char sequence to decode - - Where the byte sequence decoding result will go. The limit - is set to one past the position of the final char. - - IllegalArgumentException If either the input or the output buffer - is not backed by an array - - - - Decodes the given char sequence, which must have been encoded by - {@link #Encode(java.nio.ByteBuffer)} or - {@link #Encode(java.nio.ByteBuffer, java.nio.CharBuffer)}. - - - The char sequence to decode - - A byte sequence containing the decoding result. The limit - is set to one past the position of the final char. - - IllegalArgumentException If the input buffer is not backed by an - array - - - - Encodes the input byte sequence. - - - The byte sequence to encode - - A char sequence containing the encoding result. The limit is set - to one past the position of the final char. - - IllegalArgumentException If the input buffer is not backed by an - array - - - - Base class for Attributes that can be added to a - {@link Lucene.Net.Util.AttributeSource}. -

- Attributes are used to add data in a dynamic, yet type-safe way to a source - of usually streamed objects, e. g. a {@link Lucene.Net.Analysis.TokenStream}. -

-
- - Base interface for attributes. - - - Clears the values in this AttributeImpl and resets it to its - default value. If this implementation implements more than one Attribute interface - it clears all. - - - - The default implementation of this method accesses all declared - fields of this object and prints the values in the following syntax: - -
-            public String toString() {
-            return "start=" + startOffset + ",end=" + endOffset;
-            }
-            
- - This method may be overridden by subclasses. -
-
- - Subclasses must implement this method and should compute - a hashCode similar to this: -
-            public int hashCode() {
-            int code = startOffset;
-            code = code * 31 + endOffset;
-            return code;
-            }
-            
- - see also {@link #equals(Object)} -
-
- - All values used for computation of {@link #hashCode()} - should be checked here for equality. - - see also {@link Object#equals(Object)} - - - - Copies the values from this Attribute into the passed-in - target attribute. The target implementation must support all the - Attributes this implementation supports. - - - - Shallow clone. Subclasses must override this if they - need to clone any members deeply, - - - - File-based {@link Directory} implementation that uses - mmap for reading, and {@link - SimpleFSDirectory.SimpleFSIndexOutput} for writing. - -

NOTE: memory mapping uses up a portion of the - virtual memory address space in your process equal to the - size of the file being mapped. Before using this class, - be sure your have plenty of virtual address space, e.g. by - using a 64 bit JRE, or a 32 bit JRE with indexes that are - guaranteed to fit within the address space. - On 32 bit platforms also consult {@link #setMaxChunkSize} - if you have problems with mmap failing because of fragmented - address space. If you get an OutOfMemoryException, it is recommened - to reduce the chunk size, until it works. - -

Due to - this bug in Sun's JRE, MMapDirectory's {@link IndexInput#close} - is unable to close the underlying OS file handle. Only when GC - finally collects the underlying objects, which could be quite - some time later, will the file handle be closed. - -

This will consume additional transient disk usage: on Windows, - attempts to delete or overwrite the files will result in an - exception; on other platforms, which typically have a "delete on - last close" semantics, while such operations will succeed, the bytes - are still consuming space on disk. For many applications this - limitation is not a problem (e.g. if you have plenty of disk space, - and you don't rely on overwriting files on Windows) but it's still - an important limitation to be aware of. - -

This class supplies the workaround mentioned in the bug report - (disabled by default, see {@link #setUseUnmap}), which may fail on - non-Sun JVMs. It forcefully unmaps the buffer on close by using - an undocumented internal cleanup functionality. - {@link #UNMAP_SUPPORTED} is true, if the workaround - can be enabled (with no guarantees). -

-
- - Create a new MMapDirectory for the named location. - - - the path of the directory - - the lock factory to use, or null for the default. - - IOException - - - Create a new MMapDirectory for the named location. - - - the path of the directory - - the lock factory to use, or null for the default. - - IOException - - - Create a new MMapDirectory for the named location and the default lock factory. - - - the path of the directory - - IOException - - - Create a new MMapDirectory for the named location and the default lock factory. - - - the path of the directory - - IOException - - - - - - - true, if this platform supports unmapping mmaped files. - - - This method enables the workaround for unmapping the buffers - from address space after closing {@link IndexInput}, that is - mentioned in the bug report. This hack may fail on non-Sun JVMs. - It forcefully unmaps the buffer on close by using - an undocumented internal cleanup functionality. -

NOTE: Enabling this is completely unsupported - by Java and may lead to JVM crashs if IndexInput - is closed while another thread is still accessing it (SIGSEGV). -

- IllegalArgumentException if {@link #UNMAP_SUPPORTED} - is false and the workaround cannot be enabled. - -
- - Returns true, if the unmap workaround is enabled. - - - - - Try to unmap the buffer, this method silently fails if no support - for that in the JVM. On Windows, this leads to the fact, - that mmapped files cannot be modified or deleted. - - - - Sets the maximum chunk size (default is {@link Integer#MAX_VALUE} for - 64 bit JVMs and 256 MiBytes for 32 bit JVMs) used for memory mapping. - Especially on 32 bit platform, the address space can be very fragmented, - so large index files cannot be mapped. - Using a lower chunk size makes the directory implementation a little - bit slower (as the correct chunk must be resolved on each seek) - but the chance is higher that mmap does not fail. On 64 bit - Java platforms, this parameter should always be {@link Integer#MAX_VALUE}, - as the adress space is big enough. - - - - Returns the current mmap chunk size. - - - - - Creates an IndexInput for the file with the given name. - - - Creates an IndexOutput for the file with the given name. - - - This exception is thrown when the write.lock - could not be acquired. This - happens when a writer tries to open an index - that another writer already has open. - - - - - - A Query that matches documents containing a particular sequence of terms. - A PhraseQuery is built by QueryParser for input like "new york". - -

This query may be combined with other terms or queries with a {@link BooleanQuery}. -

-
- - The abstract base class for queries. -

Instantiable subclasses are: -

    -
  • {@link TermQuery}
  • -
  • {@link MultiTermQuery}
  • -
  • {@link BooleanQuery}
  • -
  • {@link WildcardQuery}
  • -
  • {@link PhraseQuery}
  • -
  • {@link PrefixQuery}
  • -
  • {@link MultiPhraseQuery}
  • -
  • {@link FuzzyQuery}
  • -
  • {@link TermRangeQuery}
  • -
  • {@link NumericRangeQuery}
  • -
  • {@link Lucene.Net.Search.Spans.SpanQuery}
  • -
-

A parser for queries is contained in: -

    -
  • {@link Lucene.Net.QueryParsers.QueryParser QueryParser}
  • -
-
-
- - Sets the boost for this query clause to b. Documents - matching this clause will (in addition to the normal weightings) have - their score multiplied by b. - - - - Gets the boost for this clause. Documents matching - this clause will (in addition to the normal weightings) have their score - multiplied by b. The boost is 1.0 by default. - - - - Prints a query to a string, with field assumed to be the - default field and omitted. -

The representation used is one that is supposed to be readable - by {@link Lucene.Net.QueryParsers.QueryParser QueryParser}. However, - there are the following limitations: -

    -
  • If the query was created by the parser, the printed - representation may not be exactly what was parsed. For example, - characters that need to be escaped will be represented without - the required backslash.
  • -
  • Some of the more complicated queries (e.g. span queries) - don't have a representation that can be parsed by QueryParser.
  • -
-
-
- - Prints a query to a string. - - - Expert: Constructs an appropriate Weight implementation for this query. - -

- Only implemented by primitive queries, which re-write to themselves. -

-
- - Expert: Constructs and initializes a Weight for a top-level query. - - - Expert: called to re-write queries into primitive queries. For example, - a PrefixQuery will be rewritten into a BooleanQuery that consists - of TermQuerys. - - - - Expert: called when re-writing queries under MultiSearcher. - - Create a single query suitable for use by all subsearchers (in 1-1 - correspondence with queries). This is an optimization of the OR of - all queries. We handle the common optimization cases of equal - queries and overlapping clauses of boolean OR queries (as generated - by MultiTermQuery.rewrite()). - Be careful overriding this method as queries[0] determines which - method will be called and is not necessarily of the same type as - the other queries. - - - - Expert: adds all terms occuring in this query to the terms set. Only - works if this query is in its {@link #rewrite rewritten} form. - - - UnsupportedOperationException if this query is not yet rewritten - - - Expert: merges the clauses of a set of BooleanQuery's into a single - BooleanQuery. - -

A utility for use by {@link #Combine(Query[])} implementations. -

-
- - Expert: Returns the Similarity implementation to be used for this query. - Subclasses may override this method to specify their own Similarity - implementation, perhaps one that delegates through that of the Searcher. - By default the Searcher's Similarity implementation is returned. - - - - Returns a clone of this query. - - - Constructs an empty phrase query. - - - Sets the number of other words permitted between words in query phrase. - If zero, then this is an exact phrase search. For larger values this works - like a WITHIN or NEAR operator. -

The slop is in fact an edit-distance, where the units correspond to - moves of terms in the query phrase out of position. For example, to switch - the order of two words requires two moves (the first move places the words - atop one another), so to permit re-orderings of phrases, the slop must be - at least two. -

More exact matches are scored higher than sloppier matches, thus search - results are sorted by exactness. -

The slop is zero by default, requiring exact matches. -

-
- - Returns the slop. See setSlop(). - - - Adds a term to the end of the query phrase. - The relative position of the term is the one immediately after the last term added. - - - - Adds a term to the end of the query phrase. - The relative position of the term within the phrase is specified explicitly. - This allows e.g. phrases with more than one term at the same position - or phrases with gaps (e.g. in connection with stopwords). - - - - - - - - - Returns the set of terms in this phrase. - - - Returns the relative positions of terms in this phrase. - - - - - - - Prints a user-readable version of this query. - - - Returns true iff o is equal to this. - - - Returns a hash code value for this object. - - - Expert: Calculate query weights and build query scorers. -

- The purpose of {@link Weight} is to ensure searching does not - modify a {@link Query}, so that a {@link Query} instance can be reused.
- {@link Searcher} dependent state of the query should reside in the - {@link Weight}.
- {@link IndexReader} dependent state should reside in the {@link Scorer}. -

- A Weight is used in the following way: -

    -
  1. A Weight is constructed by a top-level query, given a - Searcher ({@link Query#CreateWeight(Searcher)}).
  2. -
  3. The {@link #SumOfSquaredWeights()} method is called on the - Weight to compute the query normalization factor - {@link Similarity#QueryNorm(float)} of the query clauses contained in the - query.
  4. -
  5. The query normalization factor is passed to {@link #Normalize(float)}. At - this point the weighting is complete.
  6. -
  7. A Scorer is constructed by {@link #Scorer(IndexReader,boolean,boolean)}.
  8. -
- -
- 2.9 - -
- - An explanation of the score computation for the named document. - - - sub-reader containing the give doc - - - - an Explanation for the score - - IOException - - - The query that this concerns. - - - The weight for this query. - - - Assigns the query normalization factor to this. - - - Returns a {@link Scorer} which scores documents in/out-of order according - to scoreDocsInOrder. -

- NOTE: even if scoreDocsInOrder is false, it is - recommended to check whether the returned Scorer indeed scores - documents out of order (i.e., call {@link #ScoresDocsOutOfOrder()}), as - some Scorer implementations will always return documents - in-order.
- NOTE: null can be returned if no documents will be scored by this - query. - -

- - the {@link IndexReader} for which to return the {@link Scorer}. - - specifies whether in-order scoring of documents is required. Note - that if set to false (i.e., out-of-order scoring is required), - this method can return whatever scoring mode it supports, as every - in-order scorer is also an out-of-order one. However, an - out-of-order scorer may not support {@link Scorer#NextDoc()} - and/or {@link Scorer#Advance(int)}, therefore it is recommended to - request an in-order scorer if use of these methods is required. - - - if true, {@link Scorer#Score(Collector)} will be called; if false, - {@link Scorer#NextDoc()} and/or {@link Scorer#Advance(int)} will - be called. - - a {@link Scorer} which scores documents in/out-of order. - - IOException -
- - The sum of squared weights of contained query clauses. - - - Returns true iff this implementation scores docs only out of order. This - method is used in conjunction with {@link Collector}'s - {@link Collector#AcceptsDocsOutOfOrder() acceptsDocsOutOfOrder} and - {@link #Scorer(Lucene.Net.Index.IndexReader, boolean, boolean)} to - create a matching {@link Scorer} instance for a given {@link Collector}, or - vice versa. -

- NOTE: the default implementation returns false, i.e. - the Scorer scores documents in-order. -

-
- - This class is very similar to - {@link Lucene.Net.Search.Spans.SpanNearQuery} except that it factors - in the value of the payloads located at each of the positions where the - {@link Lucene.Net.Search.Spans.TermSpans} occurs. -

- In order to take advantage of this, you must override - {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} - which returns 1 by default. -

- Payload scores are aggregated using a pluggable {@link PayloadFunction}. - -

- - -
- - Matches spans which are near one another. One can specify slop, the - maximum number of intervening unmatched positions, as well as whether - matches are required to be in-order. - - - - Base class for span-based queries. - - - Expert: Returns the matches for this query in an index. Used internally - to search for spans. - - - - Returns the name of the field matched by this query. - - - Returns a collection of all terms matched by this query. - use extractTerms instead - - - - - - Construct a SpanNearQuery. Matches spans matching a span from each - clause, with up to slop total unmatched positions between - them. * When inOrder is true, the spans from each clause - must be * ordered as in clauses. - - - - Return the clauses whose spans are matched. - - - Return the maximum number of intervening unmatched positions permitted. - - - Return true if matches are required to be in-order. - - - Returns a collection of all terms matched by this query. - use extractTerms instead - - - - - - Returns true iff o is equal to this. - - - Expert-only. Public for use by other weight implementations - - - By default, uses the {@link PayloadFunction} to score the payloads, but - can be overridden to do other things. - - - The payloads - - The start position of the span being scored - - The end position of the span being scored - - - - - - - Filter caching singleton. It can be used - to save filters locally for reuse. - This class makes it possble to cache Filters even when using RMI, as it - keeps the cache on the seaercher side of the RMI connection. - - Also could be used as a persistent storage for any filter as long as the - filter provides a proper hashCode(), as that is used as the key in the cache. - - The cache is periodically cleaned up from a separate thread to ensure the - cache doesn't exceed the maximum size. - - - - The default maximum number of Filters in the cache - - - The default frequency of cache clenup - - - The cache itself - - - Maximum allowed cache size - - - Cache cleaning frequency - - - Cache cleaner that runs in a separate thread - - - Sets up the FilterManager singleton. - - - Sets the max size that cache should reach before it is cleaned up - maximum allowed cache size - - - - Sets the cache cleaning frequency in milliseconds. - cleaning frequency in millioseconds - - - - Returns the cached version of the filter. Allows the caller to pass up - a small filter but this will keep a persistent version around and allow - the caching filter to do its job. - - - The input filter - - The cached version of the filter - - - - Holds the filter and the last time the filter was used, to make LRU-based - cache cleaning possible. - TODO: Clean this up when we switch to Java 1.5 - - - - Keeps the cache from getting too big. - If we were using Java 1.5, we could use LinkedHashMap and we would not need this thread - to clean out the cache. - - The SortedSet sortedFilterItems is used only to sort the items from the cache, - so when it's time to clean up we have the TreeSet sort the FilterItems by - timestamp. - - Removes 1.5 * the numbers of items to make the cache smaller. - For example: - If cache clean size is 10, and the cache is at 15, we would remove (15 - 10) * 1.5 = 7.5 round up to 8. - This way we clean the cache a bit more, and avoid having the cache cleaner having to do it frequently. - - - - - This interface should be implemented by any class whose instances are intended - to be executed by a thread. - - - - - This method has to be implemented in order that starting of the thread causes the object's - run method to be called in that separately executing thread. - - - - Abstract class for enumerating a subset of all terms. -

Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. -

-
- - Abstract class for enumerating terms. -

Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. -

-
- - Increments the enumeration to the next element. True if one exists. - - - Returns the current Term in the enumeration. - - - Returns the docFreq of the current Term in the enumeration. - - - Closes the enumeration to further activity, freeing resources. - - - Skips terms to the first beyond the current whose value is - greater or equal to target.

Returns true iff there is such - an entry.

Behaves as if written:

-            public boolean skipTo(Term target) {
-            do {
-            if (!next())
-            return false;
-            } while (target > term());
-            return true;
-            }
-            
- Some implementations *could* be considerably more efficient than a linear scan. - Check the implementation to be sure. -
- This method is not performant and will be removed in Lucene 3.0. - Use {@link IndexReader#Terms(Term)} to create a new TermEnum positioned at a - given term. - -
- - the current term - - - the delegate enum - to set this member use {@link #setEnum} - - - Equality compare on the term - - - Equality measure on the term - - - Indicates the end of the enumeration has been reached - - - use this method to set the actual TermEnum (e.g. in ctor), - it will be automatically positioned on the first matching term. - - - - Returns the docFreq of the current Term in the enumeration. - Returns -1 if no Term matches or all terms have been enumerated. - - - - Increments the enumeration to the next element. True if one exists. - - - Returns the current Term in the enumeration. - Returns null if no Term matches or all terms have been enumerated. - - - - Closes the enumeration to further activity, freeing resources. - - - Expert: The default cache implementation, storing all values in memory. - A WeakHashMap is used for storage. - -

Created: May 19, 2004 4:40:36 PM - -

- lucene 1.4 - - $Id: FieldCacheImpl.java 807572 2009-08-25 11:44:45Z mikemccand $ - -
- - Checks the internal cache for an appropriate entry, and if none is - found, reads the terms in field as a single byte and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - Used to get field values. - - Which field contains the single byte values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is found, - reads the terms in field as bytes and returns an array of - size reader.maxDoc() of the value each document has in the - given field. - - Used to get field values. - - Which field contains the bytes. - - Computes byte for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is - found, reads the terms in field as shorts and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - Used to get field values. - - Which field contains the shorts. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is found, - reads the terms in field as shorts and returns an array of - size reader.maxDoc() of the value each document has in the - given field. - - Used to get field values. - - Which field contains the shorts. - - Computes short for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is - found, reads the terms in field as integers and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - Used to get field values. - - Which field contains the integers. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is found, - reads the terms in field as integers and returns an array of - size reader.maxDoc() of the value each document has in the - given field. - - Used to get field values. - - Which field contains the integers. - - Computes integer for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if - none is found, reads the terms in field as floats and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - Used to get field values. - - Which field contains the floats. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if - none is found, reads the terms in field as floats and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - Used to get field values. - - Which field contains the floats. - - Computes float for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is - found, reads the terms in field as longs and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - - Used to get field values. - - Which field contains the longs. - - The values in the given field for each document. - - java.io.IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is found, - reads the terms in field as longs and returns an array of - size reader.maxDoc() of the value each document has in the - given field. - - - Used to get field values. - - Which field contains the longs. - - Computes integer for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is - found, reads the terms in field as integers and returns an array - of size reader.maxDoc() of the value each document - has in the given field. - - - Used to get field values. - - Which field contains the doubles. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none is found, - reads the terms in field as doubles and returns an array of - size reader.maxDoc() of the value each document has in the - given field. - - - Used to get field values. - - Which field contains the doubles. - - Computes integer for string values. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none - is found, reads the term values in field and returns an array - of size reader.maxDoc() containing the value each document - has in the given field. - - Used to get field values. - - Which field contains the strings. - - The values in the given field for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if none - is found reads the term values in field and returns - an array of them in natural order, along with an array telling - which element in the term array each document uses. - - Used to get field values. - - Which field contains the strings. - - Array of terms and index into the array for each document. - - IOException If any error occurs. - - - Checks the internal cache for an appropriate entry, and if - none is found reads field to see if it contains integers, longs, floats - or strings, and then calls one of the other methods in this class to get the - values. For string values, a StringIndex is returned. After - calling this method, there is an entry in the cache for both - type AUTO and the actual found type. - - Used to get field values. - - Which field contains the values. - - int[], long[], float[] or StringIndex. - - IOException If any error occurs. - Please specify the exact type, instead. - Especially, guessing does not work with the new - {@link NumericField} type. - - - - Checks the internal cache for an appropriate entry, and if none - is found reads the terms out of field and calls the given SortComparator - to get the sort values. A hit in the cache will happen if reader, - field, and comparator are the same (using equals()) - as a previous call to this method. - - Used to get field values. - - Which field contains the values. - - Used to convert terms into something to sort by. - - Array of sort objects, one for each document. - - IOException If any error occurs. - Please implement {@link - FieldComparatorSource} directly, instead. - - - - EXPERT: Generates an array of CacheEntry objects representing all items - currently in the FieldCache. -

- NOTE: These CacheEntry objects maintain a strong refrence to the - Cached Values. Maintaining refrences to a CacheEntry the IndexReader - associated with it has garbage collected will prevent the Value itself - from being garbage collected when the Cache drops the WeakRefrence. -

-

- EXPERIMENTAL API: This API is considered extremely advanced - and experimental. It may be removed or altered w/o warning in future - releases - of Lucene. -

-

-
- -

- EXPERT: Instructs the FieldCache to forcibly expunge all entries - from the underlying caches. This is intended only to be used for - test methods as a way to ensure a known base state of the Cache - (with out needing to rely on GC to free WeakReferences). - It should not be relied on for "Cache maintenance" in general - application code. -

-

- EXPERIMENTAL API: This API is considered extremely advanced - and experimental. It may be removed or altered w/o warning in future - releases - of Lucene. -

-

-
- - If non-null, FieldCacheImpl will warn whenever - entries are created that are not sane according to - {@link Lucene.Net.Util.FieldCacheSanityChecker}. - - - - counterpart of {@link #SetInfoStream(PrintStream)} - - - Will be removed in 3.0, this is for binary compatibility only - - - - Will be removed in 3.0, this is for binary compatibility only - - - - Will be removed in 3.0, this is for binary compatibility only - - - - Will be removed in 3.0, this is for binary compatibility only - - - - The pattern used to detect float values in a field - removed for java 1.3 compatibility - protected static final Object pFloats = Pattern.compile ("[0-9+\\-\\.eEfFdD]+"); - - - - - - - - EXPERT: A unique Identifier/Description for each item in the FieldCache. - Can be useful for logging/debugging. -

- EXPERIMENTAL API: This API is considered extremely advanced - and experimental. It may be removed or altered w/o warning in future - releases - of Lucene. -

-

-
- - - - - - Computes (and stores) the estimated size of the cache Value - - - - - The most recently estimated size of the value, null unless - estimateSize has been called. - - - - Only needed because of Entry (ab)use by - FieldSortedHitQueue, remove when FieldSortedHitQueue - is removed - - - - Only needed because of Entry (ab)use by - FieldSortedHitQueue, remove when FieldSortedHitQueue - is removed - - - - Adds warning to super.toString if Local or sortFieldType were specified - Only needed because of Entry (ab)use by - FieldSortedHitQueue, remove when FieldSortedHitQueue - is removed - - - - Hack: When thrown from a Parser (NUMERIC_UTILS_* ones), this stops - processing terms and returns the current FieldCache - array. - - - - Expert: Internal cache. - - - Expert: Every composite-key in the internal cache is of this type. - - - Only (ab)used by FieldSortedHitQueue, - remove when FieldSortedHitQueue is removed - - - - Only (ab)used by FieldSortedHitQueue, - remove when FieldSortedHitQueue is removed - - - - Only (ab)used by FieldSortedHitQueue, - remove when FieldSortedHitQueue is removed - - - - Creates one of these objects for a custom comparator/parser. - - - Only (ab)used by FieldSortedHitQueue, - remove when FieldSortedHitQueue is removed - - - - Two of these are equal iff they reference the same field and type. - - - Composes a hashcode based on the field and type. - - - Please specify the exact type, instead. - Especially, guessing does not work with the new - {@link NumericField} type. - - - - - - - - A query that wraps a filter and simply returns a constant score equal to the - query boost for every document in the filter. - - - - $Id: ConstantScoreQuery.java 807180 2009-08-24 12:26:43Z markrmiller $ - - - - Returns the encapsulated filter - - - Prints a user-readable version of this query. - - - Returns true if o is equal to this. - - - Returns a hash code value for this object. - - - use {@link #NextDoc()} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #Advance(int)} instead. - - - - Scorer for conjunctions, sets of queries, all of which are required. - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - Compares {@link Lucene.Net.Index.TermVectorEntry}s first by frequency and then by - the term (case-sensitive) - - - - - - A {@link IndexDeletionPolicy} that wraps around any other - {@link IndexDeletionPolicy} and adds the ability to hold and - later release a single "snapshot" of an index. While - the snapshot is held, the {@link IndexWriter} will not - remove any files associated with it even if the index is - otherwise being actively, arbitrarily changed. Because - we wrap another arbitrary {@link IndexDeletionPolicy}, this - gives you the freedom to continue using whatever {@link - IndexDeletionPolicy} you would normally want to use with your - index. Note that you can re-use a single instance of - SnapshotDeletionPolicy across multiple writers as long - as they are against the same index Directory. Any - snapshot held when a writer is closed will "survive" - when the next writer is opened. - -

WARNING: This API is a new and experimental and - may suddenly change.

+ +

This {@link IndexDeletionPolicy} implementation that + keeps only the most recent commit and immediately removes + all prior commits after a new commit is done. This is + the default deletion policy. @@ -4912,9414 +6719,11 @@ sorted by age (the 0th one is the oldest commit). - - Take a snapshot of the most recent commit to the - index. You must call release() to free this snapshot. - Note that while the snapshot is held, the files it - references will not be deleted, which will consume - additional disk space in your index. If you take a - snapshot at a particularly bad time (say just before - you call optimize()) then in the worst case this could - consume an extra 1X of your total index size, until - you release the snapshot. - + + Deletes all commits except the most recent one. - - Release the currently held snapshot. - - - NOTE: this API is experimental and will likely change - - - Adds a new term in this field; term ends with U+FFFF - char - - - - Called when we are done adding terms to this field - - - A FilterIndexReader contains another IndexReader, which it - uses as its basic source of data, possibly transforming the data along the - way or providing additional functionality. The class - FilterIndexReader itself simply implements all abstract methods - of IndexReader with versions that pass all requests to the - contained index reader. Subclasses of FilterIndexReader may - further override some of these methods and may also provide additional - methods and fields. - - - - IndexReader is an abstract class, providing an interface for accessing an - index. Search of an index is done entirely through this abstract interface, - so that any subclass which implements it is searchable. -

Concrete subclasses of IndexReader are usually constructed with a call to - one of the static open() methods, e.g. {@link - #Open(String, boolean)}. -

For efficiency, in this API documents are often referred to via - document numbers, non-negative integers which each name a unique - document in the index. These document numbers are ephemeral--they may change - as documents are added to and deleted from an index. Clients should thus not - rely on a given document having the same number between sessions. -

An IndexReader can be opened on a directory for which an IndexWriter is - opened already, but it cannot be used to delete documents from the index then. -

- NOTE: for backwards API compatibility, several methods are not listed - as abstract, but have no useful implementations in this base class and - instead always throw UnsupportedOperationException. Subclasses are - strongly encouraged to override these methods, but in many cases may not - need to. -

-

- NOTE: as of 2.4, it's possible to open a read-only - IndexReader using one of the static open methods that - accepts the boolean readOnly parameter. Such a reader has - better concurrency as it's not necessary to synchronize on - the isDeleted method. Currently the default for readOnly - is false, meaning if not specified you will get a - read/write IndexReader. But in 3.0 this default will - change to true, meaning you must explicitly specify false - if you want to make changes with the resulting IndexReader. -

-

NOTE: {@link - IndexReader} instances are completely thread - safe, meaning multiple threads can call any of its methods, - concurrently. If your application requires external - synchronization, you should not synchronize on the - IndexReader instance; use your own - (non-Lucene) objects instead. -

- $Id: IndexReader.java 826049 2009-10-16 19:28:55Z mikemccand $ - -
- - Expert: returns the current refCount for this reader - - - Expert: increments the refCount of this IndexReader - instance. RefCounts are used to determine when a - reader can be closed safely, i.e. as soon as there are - no more references. Be sure to always call a - corresponding {@link #decRef}, in a finally clause; - otherwise the reader may never be closed. Note that - {@link #close} simply calls decRef(), which means that - the IndexReader will not really be closed until {@link - #decRef} has been called for all outstanding - references. - - - - - - - Expert: decreases the refCount of this IndexReader - instance. If the refCount drops to 0, then pending - changes (if any) are committed to the index and this - reader is closed. - - - IOException in case an IOException occurs in commit() or doClose() - - - - - - - will be deleted when IndexReader(Directory) is deleted - - - - - - Legacy Constructor for backwards compatibility. - -

- This Constructor should not be used, it exists for backwards - compatibility only to support legacy subclasses that did not "own" - a specific directory, but needed to specify something to be returned - by the directory() method. Future subclasses should delegate to the - no arg constructor and implement the directory() method as appropriate. - -

- Directory to be returned by the directory() method - - - - - use IndexReader() - -
- - AlreadyClosedException if this IndexReader is closed - - - Returns a read/write IndexReader reading the index in an FSDirectory in the named - path. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #Open(Directory, boolean)} instead. - This method will be removed in the 3.0 release. - - - the path to the index directory - - - - Returns an IndexReader reading the index in an - FSDirectory in the named path. You should pass - readOnly=true, since it gives much better concurrent - performance, unless you intend to do write operations - (delete documents or change norms) with the reader. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - the path to the index directory - - true if this should be a readOnly - reader - - Use {@link #Open(Directory, boolean)} instead. - This method will be removed in the 3.0 release. - - - - - Returns a read/write IndexReader reading the index in an FSDirectory in the named - path. - - the path to the index directory - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #Open(Directory, boolean)} instead. - This method will be removed in the 3.0 release. - - - - - Returns an IndexReader reading the index in an - FSDirectory in the named path. You should pass - readOnly=true, since it gives much better concurrent - performance, unless you intend to do write operations - (delete documents or change norms) with the reader. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - the path to the index directory - - true if this should be a readOnly - reader - - Use {@link #Open(Directory, boolean)} instead. - This method will be removed in the 3.0 release. - - - - - Returns a read/write IndexReader reading the index in - the given Directory. - - the index directory - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #Open(Directory, boolean)} instead - This method will be removed in the 3.0 release. - - - - - Returns an IndexReader reading the index in the given - Directory. You should pass readOnly=true, since it - gives much better concurrent performance, unless you - intend to do write operations (delete documents or - change norms) with the reader. - - the index directory - - true if no changes (deletions, norms) will be made with this IndexReader - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns a read/write IndexReader reading the index in the given - {@link IndexCommit}. - - the commit point to open - - CorruptIndexException if the index is corrupt - Use {@link #Open(IndexCommit, boolean)} instead. - This method will be removed in the 3.0 release. - - - IOException if there is a low-level IO error - - - Expert: returns an IndexReader reading the index in the given - {@link IndexCommit}. You should pass readOnly=true, since it - gives much better concurrent performance, unless you - intend to do write operations (delete documents or - change norms) with the reader. - - the commit point to open - - true if no changes (deletions, norms) will be made with this IndexReader - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns a read/write IndexReader reading the index in the given - Directory, with a custom {@link IndexDeletionPolicy}. - - the index directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - Use {@link #Open(Directory, IndexDeletionPolicy, boolean)} instead. - This method will be removed in the 3.0 release. - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns an IndexReader reading the index in - the given Directory, with a custom {@link - IndexDeletionPolicy}. You should pass readOnly=true, - since it gives much better concurrent performance, - unless you intend to do write operations (delete - documents or change norms) with the reader. - - the index directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - true if no changes (deletions, norms) will be made with this IndexReader - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns an IndexReader reading the index in - the given Directory, with a custom {@link - IndexDeletionPolicy}. You should pass readOnly=true, - since it gives much better concurrent performance, - unless you intend to do write operations (delete - documents or change norms) with the reader. - - the index directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - true if no changes (deletions, norms) will be made with this IndexReader - - Subsamples which indexed - terms are loaded into RAM. This has the same effect as {@link - IndexWriter#setTermIndexInterval} except that setting - must be done at indexing time while this setting can be - set per reader. When set to N, then one in every - N*termIndexInterval terms in the index is loaded into - memory. By setting this to a value > 1 you can reduce - memory usage, at the expense of higher latency when - loading a TermInfo. The default value is 1. Set this - to -1 to skip loading the terms index entirely. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns a read/write IndexReader reading the index in the given - Directory, using a specific commit and with a custom - {@link IndexDeletionPolicy}. - - the specific {@link IndexCommit} to open; - see {@link IndexReader#listCommits} to list all commits - in a directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - Use {@link #Open(IndexCommit, IndexDeletionPolicy, boolean)} instead. - This method will be removed in the 3.0 release. - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns an IndexReader reading the index in - the given Directory, using a specific commit and with - a custom {@link IndexDeletionPolicy}. You should pass - readOnly=true, since it gives much better concurrent - performance, unless you intend to do write operations - (delete documents or change norms) with the reader. - - the specific {@link IndexCommit} to open; - see {@link IndexReader#listCommits} to list all commits - in a directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - true if no changes (deletions, norms) will be made with this IndexReader - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Expert: returns an IndexReader reading the index in - the given Directory, using a specific commit and with - a custom {@link IndexDeletionPolicy}. You should pass - readOnly=true, since it gives much better concurrent - performance, unless you intend to do write operations - (delete documents or change norms) with the reader. - - the specific {@link IndexCommit} to open; - see {@link IndexReader#listCommits} to list all commits - in a directory - - a custom deletion policy (only used - if you use this reader to perform deletes or to set - norms); see {@link IndexWriter} for details. - - true if no changes (deletions, norms) will be made with this IndexReader - - Subsambles which indexed - terms are loaded into RAM. This has the same effect as {@link - IndexWriter#setTermIndexInterval} except that setting - must be done at indexing time while this setting can be - set per reader. When set to N, then one in every - N*termIndexInterval terms in the index is loaded into - memory. By setting this to a value > 1 you can reduce - memory usage, at the expense of higher latency when - loading a TermInfo. The default value is 1. Set this - to -1 to skip loading the terms index entirely. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Refreshes an IndexReader if the index has changed since this instance - was (re)opened. -

- Opening an IndexReader is an expensive operation. This method can be used - to refresh an existing IndexReader to reduce these costs. This method - tries to only load segments that have changed or were created after the - IndexReader was (re)opened. -

- If the index has not changed since this instance was (re)opened, then this - call is a NOOP and returns this instance. Otherwise, a new instance is - returned. The old instance is not closed and remains usable.
-

- If the reader is reopened, even though they share - resources internally, it's safe to make changes - (deletions, norms) with the new reader. All shared - mutable state obeys "copy on write" semantics to ensure - the changes are not seen by other readers. -

- You can determine whether a reader was actually reopened by comparing the - old instance with the instance returned by this method: -

-            IndexReader reader = ... 
-            ...
-            IndexReader newReader = r.reopen();
-            if (newReader != reader) {
-            ...     // reader was reopened
-            reader.close(); 
-            }
-            reader = newReader;
-            ...
-            
- - Be sure to synchronize that code so that other threads, - if present, can never use reader after it has been - closed and before it's switched to newReader. - -

NOTE: If this reader is a near real-time - reader (obtained from {@link IndexWriter#GetReader()}, - reopen() will simply call writer.getReader() again for - you, though this may change in the future. - -

- CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
- - Just like {@link #Reopen()}, except you can change the - readOnly of the original reader. If the index is - unchanged but readOnly is different then a new reader - will be returned. - - - - Expert: reopen this reader on a specific commit point. - This always returns a readOnly reader. If the - specified commit point matches what this reader is - already on, and this reader is already readOnly, then - this same instance is returned; if it is not already - readOnly, a readOnly clone is returned. - - - - Efficiently clones the IndexReader (sharing most - internal state). -

- On cloning a reader with pending changes (deletions, - norms), the original reader transfers its write lock to - the cloned reader. This means only the cloned reader - may make further changes to the index, and commit the - changes to the index on close, but the old reader still - reflects all changes made up until it was cloned. -

- Like {@link #Reopen()}, it's safe to make changes to - either the original or the cloned reader: all shared - mutable state obeys "copy on write" semantics to ensure - the changes are not seen by other readers. -

-

- CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
- - Clones the IndexReader and optionally changes readOnly. A readOnly - reader cannot open a writeable reader. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Returns the directory associated with this index. The Default - implementation returns the directory specified by subclasses when - delegating to the IndexReader(Directory) constructor, or throws an - UnsupportedOperationException if one was not specified. - - UnsupportedOperationException if no directory - - - Returns the time the index in the named directory was last modified. - Do not use this to check whether the reader is still up-to-date, use - {@link #IsCurrent()} instead. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #LastModified(Directory)} instead. - This method will be removed in the 3.0 release. - - - - Returns the time the index in the named directory was last modified. - Do not use this to check whether the reader is still up-to-date, use - {@link #IsCurrent()} instead. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #LastModified(Directory)} instead. - This method will be removed in the 3.0 release. - - - - - Returns the time the index in the named directory was last modified. - Do not use this to check whether the reader is still up-to-date, use - {@link #IsCurrent()} instead. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Reads version number from segments files. The version number is - initialized with a timestamp and then increased by one for each change of - the index. - - - where the index resides. - - version number. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #GetCurrentVersion(Directory)} instead. - This method will be removed in the 3.0 release. - - - - Reads version number from segments files. The version number is - initialized with a timestamp and then increased by one for each change of - the index. - - - where the index resides. - - version number. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - Use {@link #GetCurrentVersion(Directory)} instead. - This method will be removed in the 3.0 release. - - - - Reads version number from segments files. The version number is - initialized with a timestamp and then increased by one for each change of - the index. - - - where the index resides. - - version number. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Reads commitUserData, previously passed to {@link - IndexWriter#Commit(Map)}, from current index - segments file. This will return null if {@link - IndexWriter#Commit(Map)} has never been called for - this index. - - - where the index resides. - - commit userData. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - - - - - Version number when this IndexReader was opened. Not implemented in the - IndexReader base class. - -

- If this reader is based on a Directory (ie, was created by calling - {@link #Open}, or {@link #Reopen} on a reader based on a Directory), then - this method returns the version recorded in the commit that the reader - opened. This version is advanced every time {@link IndexWriter#Commit} is - called. -

- -

- If instead this reader is a near real-time reader (ie, obtained by a call - to {@link IndexWriter#GetReader}, or by calling {@link #Reopen} on a near - real-time reader), then this method returns the version of the last - commit done by the writer. Note that even as further changes are made - with the writer, the version will not changed until a commit is - completed. Thus, you should not rely on this method to determine when a - near real-time reader should be opened. Use {@link #IsCurrent} instead. -

- -

- UnsupportedOperationException - unless overridden in subclass - -
- - Retrieve the String userData optionally passed to - IndexWriter#commit. This will return null if {@link - IndexWriter#Commit(Map)} has never been called for - this index. - - - - - - -

For IndexReader implementations that use - TermInfosReader to read terms, this sets the - indexDivisor to subsample the number of indexed terms - loaded into memory. This has the same effect as {@link - IndexWriter#setTermIndexInterval} except that setting - must be done at indexing time while this setting can be - set per reader. When set to N, then one in every - N*termIndexInterval terms in the index is loaded into - memory. By setting this to a value > 1 you can reduce - memory usage, at the expense of higher latency when - loading a TermInfo. The default value is 1.

- - NOTE: you must call this before the term - index is loaded. If the index is already loaded, - an IllegalStateException is thrown. -

- IllegalStateException if the term index has already been loaded into memory - Please use {@link IndexReader#Open(Directory, IndexDeletionPolicy, boolean, int)} to specify the required TermInfos index divisor instead. - -
- -

For IndexReader implementations that use - TermInfosReader to read terms, this returns the - current indexDivisor as specified when the reader was - opened. -

-
- - Check whether any new changes have occurred to the index since this - reader was opened. - -

- If this reader is based on a Directory (ie, was created by calling - {@link #open}, or {@link #reopen} on a reader based on a Directory), then - this method checks if any further commits (see {@link IndexWriter#commit} - have occurred in that directory). -

- -

- If instead this reader is a near real-time reader (ie, obtained by a call - to {@link IndexWriter#getReader}, or by calling {@link #reopen} on a near - real-time reader), then this method checks if either a new commmit has - occurred, or any new uncommitted changes have taken place via the writer. - Note that even if the writer has only performed merging, this method will - still return false. -

- -

- In any event, if this returns false, you should call {@link #reopen} to - get a new reader that sees the changes. -

- -

- CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - UnsupportedOperationException unless overridden in subclass -
- - Checks is the index is optimized (if it has a single segment and - no deletions). Not implemented in the IndexReader base class. - - true if the index is optimized; false otherwise - - UnsupportedOperationException unless overridden in subclass - - - Return an array of term frequency vectors for the specified document. - The array contains a vector for each vectorized field in the document. - Each vector contains terms and frequencies for all terms in a given vectorized field. - If no such fields existed, the method returns null. The term vectors that are - returned may either be of type {@link TermFreqVector} - or of type {@link TermPositionVector} if - positions or offsets have been stored. - - - document for which term frequency vectors are returned - - array of term frequency vectors. May be null if no term vectors have been - stored for the specified document. - - IOException if index cannot be accessed - - - - - Return a term frequency vector for the specified document and field. The - returned vector contains terms and frequencies for the terms in - the specified field of this document, if the field had the storeTermVector - flag set. If termvectors had been stored with positions or offsets, a - {@link TermPositionVector} is returned. - - - document for which the term frequency vector is returned - - field for which the term frequency vector is returned. - - term frequency vector May be null if field does not exist in the specified - document or term vector was not stored. - - IOException if index cannot be accessed - - - - - Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of - the {@link TermFreqVector}. - - The number of the document to load the vector for - - The name of the field to load - - The {@link TermVectorMapper} to process the vector. Must not be null - - IOException if term vectors cannot be accessed or if they do not exist on the field and doc. specified. - - - - - Map all the term vectors for all fields in a Document - The number of the document to load the vector for - - The {@link TermVectorMapper} to process the vector. Must not be null - - IOException if term vectors cannot be accessed or if they do not exist on the field and doc. specified. - - - Returns true if an index exists at the specified directory. - If the directory does not exist or if there is no index in it. - false is returned. - - the directory to check for an index - - true if an index exists; false otherwise - - Use {@link #IndexExists(Directory)} instead - This method will be removed in the 3.0 release. - - - - - Returns true if an index exists at the specified directory. - If the directory does not exist or if there is no index in it. - - the directory to check for an index - - true if an index exists; false otherwise - - Use {@link #IndexExists(Directory)} instead. - This method will be removed in the 3.0 release. - - - - - Returns true if an index exists at the specified directory. - If the directory does not exist or if there is no index in it. - - the directory to check for an index - - true if an index exists; false otherwise - - IOException if there is a problem with accessing the index - - - Returns the number of documents in this index. - - - Returns one greater than the largest possible document number. - This may be used to, e.g., determine how big to allocate an array which - will have an element for every document number in an index. - - - - Returns the number of deleted documents. - - - Returns the stored fields of the nth - Document in this index. -

- NOTE: for performance reasons, this method does not check if the - requested document is deleted, and therefore asking for a deleted document - may yield unspecified results. Usually this is not required, however you - can call {@link #IsDeleted(int)} with the requested document ID to verify - the document is not deleted. - -

- CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
- - Get the {@link Lucene.Net.Documents.Document} at the n - th position. The {@link FieldSelector} may be used to determine - what {@link Lucene.Net.Documents.Field}s to load and how they should - be loaded. NOTE: If this Reader (more specifically, the underlying - FieldsReader) is closed before the lazy - {@link Lucene.Net.Documents.Field} is loaded an exception may be - thrown. If you want the value of a lazy - {@link Lucene.Net.Documents.Field} to be available after closing you - must explicitly load it or fetch the Document again with a new loader. -

- NOTE: for performance reasons, this method does not check if the - requested document is deleted, and therefore asking for a deleted document - may yield unspecified results. Usually this is not required, however you - can call {@link #IsDeleted(int)} with the requested document ID to verify - the document is not deleted. - -

- Get the document at the nth position - - The {@link FieldSelector} to use to determine what - Fields should be loaded on the Document. May be null, in which case - all Fields will be loaded. - - The stored fields of the - {@link Lucene.Net.Documents.Document} at the nth position - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - - - - - - -
- - Returns true if document n has been deleted - - - Returns true if any documents have been deleted - - - Returns true if there are norms stored for this field. - - - Returns the byte-encoded normalization factor for the named field of - every document. This is used by the search code to score documents. - - - - - - - Reads the byte-encoded normalization factor for the named field of every - document. This is used by the search code to score documents. - - - - - - - Expert: Resets the normalization factor for the named field of the named - document. The norm represents the product of the field's {@link - Lucene.Net.Documents.Fieldable#SetBoost(float) boost} and its {@link Similarity#LengthNorm(String, - int) length normalization}. Thus, to preserve the length normalization - values when resetting this, one should base the new value upon the old. - - NOTE: If this field does not store norms, then - this method call will silently do nothing. - - - - - - - StaleReaderException if the index has changed - since this reader was opened - - CorruptIndexException if the index is corrupt - LockObtainFailedException if another writer - has this index open (write.lock could not - be obtained) - - IOException if there is a low-level IO error - - - Implements setNorm in subclass. - - - Expert: Resets the normalization factor for the named field of the named - document. - - - - - - - - StaleReaderException if the index has changed - since this reader was opened - - CorruptIndexException if the index is corrupt - LockObtainFailedException if another writer - has this index open (write.lock could not - be obtained) - - IOException if there is a low-level IO error - - - Returns an enumeration of all the terms in the index. The - enumeration is ordered by Term.compareTo(). Each term is greater - than all that precede it in the enumeration. Note that after - calling terms(), {@link TermEnum#Next()} must be called - on the resulting enumeration before calling other methods such as - {@link TermEnum#Term()}. - - IOException if there is a low-level IO error - - - Returns an enumeration of all terms starting at a given term. If - the given term does not exist, the enumeration is positioned at the - first term greater than the supplied term. The enumeration is - ordered by Term.compareTo(). Each term is greater than all that - precede it in the enumeration. - - IOException if there is a low-level IO error - - - Returns the number of documents containing the term t. - IOException if there is a low-level IO error - - - Returns an enumeration of all the documents which contain - term. For each document, the document number, the frequency of - the term in that document is also provided, for use in - search scoring. If term is null, then all non-deleted - docs are returned with freq=1. - Thus, this method implements the mapping: -

    - Term    =>    <docNum, freq>* -
-

The enumeration is ordered by document number. Each document number - is greater than all that precede it in the enumeration. -

- IOException if there is a low-level IO error -
- - Returns an unpositioned {@link TermDocs} enumerator. - IOException if there is a low-level IO error - - - Returns an enumeration of all the documents which contain - term. For each document, in addition to the document number - and frequency of the term in that document, a list of all of the ordinal - positions of the term in the document is available. Thus, this method - implements the mapping: - -

    - Term    =>    <docNum, freq, - <pos1, pos2, ... - posfreq-1> - >* -
-

This positional information facilitates phrase and proximity searching. -

The enumeration is ordered by document number. Each document number is - greater than all that precede it in the enumeration. -

- IOException if there is a low-level IO error -
- - Returns an unpositioned {@link TermPositions} enumerator. - IOException if there is a low-level IO error - - - Deletes the document numbered docNum. Once a document is - deleted it will not appear in TermDocs or TermPostitions enumerations. - Attempts to read its field with the {@link #document} - method will result in an error. The presence of this document may still be - reflected in the {@link #docFreq} statistic, though - this will be corrected eventually as the index is further modified. - - - StaleReaderException if the index has changed - since this reader was opened - - CorruptIndexException if the index is corrupt - LockObtainFailedException if another writer - has this index open (write.lock could not - be obtained) - - IOException if there is a low-level IO error - - - Implements deletion of the document numbered docNum. - Applications should call {@link #DeleteDocument(int)} or {@link #DeleteDocuments(Term)}. - - - - Deletes all documents that have a given term indexed. - This is useful if one uses a document field to hold a unique ID string for - the document. Then to delete such a document, one merely constructs a - term with the appropriate field and the unique ID string as its text and - passes it to this method. - See {@link #DeleteDocument(int)} for information about when this deletion will - become effective. - - - the number of documents deleted - - StaleReaderException if the index has changed - since this reader was opened - - CorruptIndexException if the index is corrupt - LockObtainFailedException if another writer - has this index open (write.lock could not - be obtained) - - IOException if there is a low-level IO error - - - Undeletes all documents currently marked as deleted in this index. - - - StaleReaderException if the index has changed - since this reader was opened - - LockObtainFailedException if another writer - has this index open (write.lock could not - be obtained) - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Implements actual undeleteAll() in subclass. - - - Does nothing by default. Subclasses that require a write lock for - index modifications must implement this method. - - - - - IOException - - - Opaque Map (String -> String) - that's recorded into the segments file in the index, - and retrievable by {@link - IndexReader#getCommitUserData}. - - IOException - - - Commit changes resulting from delete, undeleteAll, or - setNorm operations - - If an exception is hit, then either no changes or all - changes will have been committed to the index - (transactional semantics). - - IOException if there is a low-level IO error - - - Commit changes resulting from delete, undeleteAll, or - setNorm operations - - If an exception is hit, then either no changes or all - changes will have been committed to the index - (transactional semantics). - - IOException if there is a low-level IO error - - - Implements commit. - Please implement {@link #DoCommit(Map) - instead}. - - - - Implements commit. NOTE: subclasses should override - this. In 3.0 this will become an abstract method. - - - - Closes files associated with this index. - Also saves any new deletions to disk. - No other methods should be called after this has been called. - - IOException if there is a low-level IO error - - - Implements close. - - - Get a list of unique field names that exist in this index and have the specified - field option information. - - specifies which field option should be available for the returned fields - - Collection of Strings indicating the names of the fields. - - - - - - Returns true iff the index in the named directory is - currently locked. - - the directory to check for a lock - - IOException if there is a low-level IO error - Please use {@link IndexWriter#IsLocked(Directory)} instead. - This method will be removed in the 3.0 release. - - - - - Returns true iff the index in the named directory is - currently locked. - - the directory to check for a lock - - IOException if there is a low-level IO error - Use {@link #IsLocked(Directory)} instead. - This method will be removed in the 3.0 release. - - - - - Forcibly unlocks the index in the named directory. -

- Caution: this should only be used by failure recovery code, - when it is known that no other process nor thread is in fact - currently accessing this index. -

- Please use {@link IndexWriter#Unlock(Directory)} instead. - This method will be removed in the 3.0 release. - - -
- - Expert: return the IndexCommit that this reader has - opened. This method is only implemented by those - readers that correspond to a Directory with its own - segments_N file. - -

WARNING: this API is new and experimental and - may suddenly change.

-

-
- - Prints the filename and size of each file within a given compound file. - Add the -extract flag to extract files to the current working directory. - In order to make the extracted version of the index work, you have to copy - the segments file from the compound index into the directory where the extracted files are stored. - - Usage: Lucene.Net.Index.IndexReader [-extract] <cfsfile> - - - - Returns all commit points that exist in the Directory. - Normally, because the default is {@link - KeepOnlyLastCommitDeletionPolicy}, there would be only - one commit point. But if you're using a custom {@link - IndexDeletionPolicy} then there could be many commits. - Once you have a given commit, you can open a reader on - it by calling {@link IndexReader#Open(IndexCommit)} - There must be at least one commit in - the Directory, else this method throws {@link - java.io.IOException}. Note that if a commit is in - progress while this method is running, that commit - may or may not be returned array. - - - - Expert: returns the sequential sub readers that this - reader is logically composed of. For example, - IndexSearcher uses this API to drive searching by one - sub reader at a time. If this reader is not composed - of sequential child readers, it should return null. - If this method returns an empty array, that means this - reader is a null reader (for example a MultiReader - that has no sub readers). -

- NOTE: You should not try using sub-readers returned by - this method to make any changes (setNorm, deleteDocument, - etc.). While this might succeed for one composite reader - (like MultiReader), it will most likely lead to index - corruption for other readers (like DirectoryReader obtained - through {@link #open}. Use the parent reader directly. -

-
- - Expert - - - - - Returns the number of unique terms (across all fields) - in this reader. - - This method returns long, even though internally - Lucene cannot handle more than 2^31 unique terms, for - a possible future when this limitation is removed. - - - UnsupportedOperationException if this count - cannot be easily determined (eg Multi*Readers). - Instead, you should call {@link - #getSequentialSubReaders} and ask each sub reader for - its unique term count. - - - - Expert: Return the state of the flag that disables fakes norms in favor of representing the absence of field norms with null. - true if fake norms are disabled - - This currently defaults to false (to remain - back-compatible), but in 3.0 it will be hardwired to - true, meaning the norms() methods will return null for - fields that had disabled norms. - - - - Expert: Set the state of the flag that disables fakes norms in favor of representing the absence of field norms with null. - true to disable fake norms, false to preserve the legacy behavior - - This currently defaults to false (to remain - back-compatible), but in 3.0 it will be hardwired to - true, meaning the norms() methods will return null for - fields that had disabled norms. - - - - Utility class for executing code that needs to do - something with the current segments file. This is - necessary with lock-less commits because from the time - you locate the current segments file name, until you - actually open it, read its contents, or check modified - time, etc., it could have been deleted due to a writer - commit finishing. - - - - A collection of segmentInfo objects with methods for operating on - those segments in relation to the file system. - -

NOTE: This API is new and still experimental - (subject to change suddenly in the next release)

-

-
- - The file format version, a negative number. - - - This format adds details used for lockless commits. It differs - slightly from the previous format in that file names - are never re-used (write once). Instead, each file is - written to the next generation. For example, - segments_1, segments_2, etc. This allows us to not use - a commit lock. See file - formats for details. - - - - This format adds a "hasSingleNormFile" flag into each segment info. - See LUCENE-756 - for details. - - - - This format allows multiple segments to share a single - vectors and stored fields file. - - - - This format adds a checksum at the end of the file to - ensure all bytes were successfully written. - - - - This format adds the deletion count for each segment. - This way IndexWriter can efficiently report numDocs(). - - - - This format adds the boolean hasProx to record if any - fields in the segment store prox information (ie, have - omitTermFreqAndPositions==false) - - - - This format adds optional commit userData (String) storage. - - - This format adds optional per-segment String - dianostics storage, and switches userData to Map - - - - counts how often the index has been changed by adding or deleting docs. - starting with the current time in milliseconds forces to create unique version numbers. - - - - If non-null, information about loading segments_N files - - - - - Get the generation (N) of the current segments_N file - from a list of files. - - - -- array of file names to check - - - - Get the generation (N) of the current segments_N file - in the directory. - - - -- directory to search for the latest segments_N file - - - - Get the filename of the current segments_N file - from a list of files. - - - -- array of file names to check - - - - Get the filename of the current segments_N file - in the directory. - - - -- directory to search for the latest segments_N file - - - - Get the segments_N filename in use by this segment infos. - - - Parse the generation off the segments file name and - return it. - - - - Get the next segments_N filename that will be written. - - - Read a particular segmentFileName. Note that this may - throw an IOException if a commit is in process. - - - -- directory containing the segments file - - -- segment file to load - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - This version of read uses the retry logic (for lock-less - commits) to find the right segments file to load. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Returns a copy of this instance, also copying each - SegmentInfo. - - - - version number when this SegmentInfos was generated. - - - Current version number from segments file. - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Returns userData from latest segments file - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - If non-null, information about retries when loading - the segments file will be printed to this. - - - - Advanced: set how many times to try loading the - segments.gen file contents to determine current segment - generation. This file is only referenced when the - primary method (listing the directory) fails. - - - - - - - - Advanced: set how many milliseconds to pause in between - attempts to load the segments.gen file. - - - - - - - - Advanced: set how many times to try incrementing the - gen when loading the segments file. This only runs if - the primary (listing directory) and secondary (opening - segments.gen file) methods fail to find the segments - file. - - - - - - - - - - - - Returns a new SegmentInfos containg the SegmentInfo - instances in the specified range first (inclusive) to - last (exclusive), so total number of segments returned - is last-first. - - - - Call this to start a commit. This writes the new - segments file, but writes an invalid checksum at the - end, so that it is not visible to readers. Once this - is called you must call {@link #finishCommit} to complete - the commit or {@link #rollbackCommit} to abort it. - - - - Returns all file names referenced by SegmentInfo - instances matching the provided Directory (ie files - associated with any "external" segments are skipped). - The returned collection is recomputed on each - invocation. - - - - Writes & syncs to the Directory dir, taking care to - remove the segments file on exception - - - - Replaces all segments in this instance, but keeps - generation, version, counter so that future commits - remain write once. - - - - - Simple brute force implementation. - If size is equal, compare items one by one. - - SegmentInfos object to check equality for - true if lists are equal, false otherwise - - - - Calculate hash code of SegmentInfos - - hash code as in java version of ArrayList - - - Utility class for executing code that needs to do - something with the current segments file. This is - necessary with lock-less commits because from the time - you locate the current segments file name, until you - actually open it, read its contents, or check modified - time, etc., it could have been deleted due to a writer - commit finishing. - - - - Subclass must implement this. The assumption is an - IOException will be thrown if something goes wrong - during the processing that could have been caused by - a writer committing. - - - - Constants describing field properties, for example used for - {@link IndexReader#GetFieldNames(FieldOption)}. - - - - All fields - - - All indexed fields - - - All fields that store payloads - - - All fields that omit tf - - - Renamed to {@link #OMIT_TERM_FREQ_AND_POSITIONS} - - - - All fields which are not indexed - - - All fields which are indexed with termvectors enabled - - - All fields which are indexed but don't have termvectors enabled - - - All fields with termvectors enabled. Please note that only standard termvector fields are returned - - - All fields with termvectors with position values enabled - - - All fields with termvectors with offset values enabled - - - All fields with termvectors with offset values and position values enabled - - -

Construct a FilterIndexReader based on the specified base reader. - Directory locking for delete, undeleteAll, and setNorm operations is - left to the base reader.

-

Note that base reader is closed if this FilterIndexReader is closed.

-

- specified base reader. - -
- - - - - - Base class for filtering {@link TermDocs} implementations. - - - TermDocs provides an interface for enumerating <document, frequency> - pairs for a term.

The document portion names each document containing - the term. Documents are indicated by number. The frequency portion gives - the number of times the term occurred in each document.

The pairs are - ordered by document number. -

- - -
- - Sets this to the data for a term. - The enumeration is reset to the start of the data for this term. - - - - Sets this to the data for the current term in a {@link TermEnum}. - This may be optimized in some implementations. - - - - Returns the current document number.

This is invalid until {@link - #Next()} is called for the first time. -

-
- - Returns the frequency of the term within the current document.

This - is invalid until {@link #Next()} is called for the first time. -

-
- - Moves to the next pair in the enumeration.

Returns true iff there is - such a next pair in the enumeration. -

-
- - Attempts to read multiple entries from the enumeration, up to length of - docs. Document numbers are stored in docs, and term - frequencies are stored in freqs. The freqs array must be as - long as the docs array. - -

Returns the number of entries read. Zero is only returned when the - stream has been exhausted. -

-
- - Skips entries to the first beyond the current whose document number is - greater than or equal to target.

Returns true iff there is such - an entry.

Behaves as if written:

-            boolean skipTo(int target) {
-            do {
-            if (!next())
-            return false;
-            } while (target > doc());
-            return true;
-            }
-            
- Some implementations are considerably more efficient than that. -
-
- - Frees associated resources. - - - Base class for filtering {@link TermPositions} implementations. - - - TermPositions provides an interface for enumerating the <document, - frequency, <position>* > tuples for a term.

The document and - frequency are the same as for a TermDocs. The positions portion lists the ordinal - positions of each occurrence of a term in a document. - -

- - -
- - Returns next position in the current document. It is an error to call - this more than {@link #Freq()} times - without calling {@link #Next()}

This is - invalid until {@link #Next()} is called for - the first time. -

-
- - Returns the length of the payload at the current term position. - This is invalid until {@link #NextPosition()} is called for - the first time.
-
- length of the current payload in number of bytes - -
- - Returns the payload data at the current term position. - This is invalid until {@link #NextPosition()} is called for - the first time. - This method must not be called more than once after each call - of {@link #NextPosition()}. However, payloads are loaded lazily, - so if the payload data for the current position is not needed, - this method may not be called at all for performance reasons.
- -
- the array into which the data of this payload is to be - stored, if it is big enough; otherwise, a new byte[] array - is allocated for this purpose. - - the offset in the array into which the data of this payload - is to be stored. - - a byte[] array containing the data of this payload - - IOException -
- - Checks if a payload can be loaded at this position. -

- Payloads can only be loaded once per call to - {@link #NextPosition()}. - -

- true if there is a payload available at this position that can be loaded - -
- - Base class for filtering {@link TermEnum} implementations. - - - - - - - - - Constructs a new runtime exception with null as its - detail message. The cause is not initialized, and may subsequently be - initialized by a call to {@link #innerException}. - - - - Constructs a new runtime exception with the specified cause and a - detail message of (cause==null ? null : cause.toString()) - (which typically contains the class and detail message of - cause). -

- This constructor is useful for runtime exceptions - that are little more than wrappers for other throwables. - -

- the cause (which is saved for later retrieval by the - {@link #InnerException()} method). (A null value is - permitted, and indicates that the cause is nonexistent or - unknown.) - - 1.4 - -
- - Constructs a new runtime exception with the specified detail message. - The cause is not initialized, and may subsequently be initialized by a - call to {@link #innerException}. - - - the detail message. The detail message is saved for - later retrieval by the {@link #getMessage()} method. - - - - Constructs a new runtime exception with the specified detail message and - cause.

Note that the detail message associated with - cause is not automatically incorporated in - this runtime exception's detail message. - -

- the detail message (which is saved for later retrieval - by the {@link #getMessage()} method). - - the cause (which is saved for later retrieval by the - {@link #InnerException()} method). (A null value is - permitted, and indicates that the cause is nonexistent or - unknown.) - - 1.4 - -
- - Holds buffered deletes, by docID, term or query. We - hold two instances of this class: one for the deletes - prior to the last flush, the other for deletes after - the last flush. This is so if we need to abort - (discard all buffered docs) we can also discard the - buffered deletes yet keep the deletes done during - previously flushed segments. - - - - - - - - - - Synonymous with {@link Field}. - -

WARNING: This interface may change within minor versions, despite Lucene's backward compatibility requirements. - This means new methods may be added from version to version. This change only affects the Fieldable API; other backwards - compatibility promises remain intact. For example, Lucene can still - read and write indices created within the same major version. -

- - -

-
- - Sets the boost factor hits on this field. This value will be - multiplied into the score of all hits on this this field of this - document. - -

The boost is multiplied by {@link Lucene.Net.Documents.Document#GetBoost()} of the document - containing this field. If a document has multiple fields with the same - name, all such values are multiplied together. This product is then - used to compute the norm factor for the field. By - default, in the {@link - Lucene.Net.Search.Similarity#ComputeNorm(String, - FieldInvertState)} method, the boost value is multiplied - by the {@link - Lucene.Net.Search.Similarity#LengthNorm(String, - int)} and then rounded by {@link Lucene.Net.Search.Similarity#EncodeNorm(float)} before it is stored in the - index. One should attempt to ensure that this product does not overflow - the range of that encoding. - -

- - - - - - -
- - Returns the boost factor for hits for this field. - -

The default value is 1.0. - -

Note: this value is not stored directly with the document in the index. - Documents returned from {@link Lucene.Net.Index.IndexReader#Document(int)} and - {@link Lucene.Net.Search.Hits#Doc(int)} may thus not have the same value present as when - this field was indexed. - -

- - -
- - Returns the name of the field as an interned string. - For example "date", "title", "body", ... - - - - The value of the field as a String, or null. -

- For indexing, if isStored()==true, the stringValue() will be used as the stored field value - unless isBinary()==true, in which case binaryValue() will be used. - - If isIndexed()==true and isTokenized()==false, this String value will be indexed as a single token. - If isIndexed()==true and isTokenized()==true, then tokenStreamValue() will be used to generate indexed tokens if not null, - else readerValue() will be used to generate indexed tokens if not null, else stringValue() will be used to generate tokens. -

-
- - The value of the field as a Reader, which can be used at index time to generate indexed tokens. - - - - - The value of the field in Binary, or null. - - - - - The TokenStream for this field to be used when indexing, or null. - - - - - True if the value of the field is to be stored in the index for return - with search hits. - - - - True if the value of the field is to be indexed, so that it may be - searched on. - - - - True if the value of the field should be tokenized as text prior to - indexing. Un-tokenized fields are indexed as a single word and may not be - Reader-valued. - - - - True if the value of the field is stored and compressed within the index - - - True if the term or terms used to index this field are stored as a term - vector, available from {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. - These methods do not provide access to the original content of the field, - only to terms used to index it. If the original content must be - preserved, use the stored attribute instead. - - - - - - - True if terms are stored as term vector together with their offsets - (start and end positon in source text). - - - - True if terms are stored as term vector together with their token positions. - - - True if the value of the field is stored as binary - - - True if norms are omitted for this indexed field - - - Expert: - - If set, omit normalization factors associated with this indexed field. - This effectively disables indexing boosts and length normalization for this field. - - - - Renamed to {@link AbstractField#setOmitTermFreqAndPositions} - - - - Renamed to {@link AbstractField#getOmitTermFreqAndPositions} - - - - Indicates whether a Field is Lazy or not. The semantics of Lazy loading are such that if a Field is lazily loaded, retrieving - it's values via {@link #StringValue()} or {@link #BinaryValue()} is only valid as long as the {@link Lucene.Net.Index.IndexReader} that - retrieved the {@link Document} is still open. - - - true if this field can be loaded lazily - - - - Returns offset into byte[] segment that is used as value, if Field is not binary - returned value is undefined - - index of the first character in byte[] segment that represents this Field value - - - - Returns length of byte[] segment that is used as value, if Field is not binary - returned value is undefined - - length of byte[] segment that represents this Field value - - - - Return the raw byte[] for the binary field. Note that - you must also call {@link #getBinaryLength} and {@link - #getBinaryOffset} to know which range of bytes in this - returned array belong to the field. - - reference to the Field value as byte[]. - - - - Return the raw byte[] for the binary field. Note that - you must also call {@link #getBinaryLength} and {@link - #getBinaryOffset} to know which range of bytes in this - returned array belong to the field.

- About reuse: if you pass in the result byte[] and it is - used, likely the underlying implementation will hold - onto this byte[] and return it in future calls to - {@link #BinaryValue()} or {@link #GetBinaryValue()}. - So if you subsequently re-use the same byte[] elsewhere - it will alter this Fieldable's value. -

- User defined buffer that will be used if - possible. If this is null or not large enough, a new - buffer is allocated - - reference to the Field value as byte[]. - -
- - Sets the boost factor hits on this field. This value will be - multiplied into the score of all hits on this this field of this - document. - -

The boost is multiplied by {@link Lucene.Net.Documents.Document#GetBoost()} of the document - containing this field. If a document has multiple fields with the same - name, all such values are multiplied together. This product is then - used to compute the norm factor for the field. By - default, in the {@link - Lucene.Net.Search.Similarity#ComputeNorm(String, - FieldInvertState)} method, the boost value is multipled - by the {@link - Lucene.Net.Search.Similarity#LengthNorm(String, - int)} and then - rounded by {@link Lucene.Net.Search.Similarity#EncodeNorm(float)} before it is stored in the - index. One should attempt to ensure that this product does not overflow - the range of that encoding. - -

- - - - - - -
- - Returns the boost factor for hits for this field. - -

The default value is 1.0. - -

Note: this value is not stored directly with the document in the index. - Documents returned from {@link Lucene.Net.Index.IndexReader#Document(int)} and - {@link Lucene.Net.Search.Hits#Doc(int)} may thus not have the same value present as when - this field was indexed. - -

- - -
- - Returns the name of the field as an interned string. - For example "date", "title", "body", ... - - - - True iff the value of the field is to be stored in the index for return - with search hits. It is an error for this to be true if a field is - Reader-valued. - - - - True iff the value of the field is to be indexed, so that it may be - searched on. - - - - True iff the value of the field should be tokenized as text prior to - indexing. Un-tokenized fields are indexed as a single word and may not be - Reader-valued. - - - - True if the value of the field is stored and compressed within the index - - - True iff the term or terms used to index this field are stored as a term - vector, available from {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. - These methods do not provide access to the original content of the field, - only to terms used to index it. If the original content must be - preserved, use the stored attribute instead. - - - - - - - True iff terms are stored as term vector together with their offsets - (start and end position in source text). - - - - True iff terms are stored as term vector together with their token positions. - - - True iff the value of the filed is stored as binary - - - Return the raw byte[] for the binary field. Note that - you must also call {@link #getBinaryLength} and {@link - #getBinaryOffset} to know which range of bytes in this - returned array belong to the field. - - reference to the Field value as byte[]. - - - - Returns length of byte[] segment that is used as value, if Field is not binary - returned value is undefined - - length of byte[] segment that represents this Field value - - - - Returns offset into byte[] segment that is used as value, if Field is not binary - returned value is undefined - - index of the first character in byte[] segment that represents this Field value - - - - True if norms are omitted for this indexed field - - - Renamed to {@link #getOmitTermFreqAndPositions} - - - - - - - - Expert: - - If set, omit normalization factors associated with this indexed field. - This effectively disables indexing boosts and length normalization for this field. - - - - Renamed to {@link #setOmitTermFreqAndPositions} - - - - Expert: - - If set, omit term freq, positions and payloads from - postings for this field. - -

NOTE: While this option reduces storage space - required in the index, it also means any query - requiring positional information, such as {@link - PhraseQuery} or {@link SpanQuery} subclasses will - silently fail to find results. -

-
- - Prints a Field for human consumption. - - - An Analyzer that uses {@link WhitespaceTokenizer}. - - - A simple class that stores Strings as char[]'s in a - hash table. Note that this is not a general purpose - class. For example, it cannot remove items from the - set, nor does it resize its hash table to be smaller, - etc. It is designed to be quick to test if a char[] - is in the set without the necessity of converting it - to a String first. - - - - Create set with enough capacity to hold startSize - terms - - - - Create set from a Collection of char[] or String - - - Create set from entries - - - true if the len chars of text starting at off - are in the set - - - - true if the System.String is in the set - - - Returns true if the String is in the set - - - Add this String into the set - - - Add this char[] directly to the set. - If ignoreCase is true for this Set, the text array will be directly modified. - The user should never modify this text array after calling this method. - - - - Returns an unmodifiable {@link CharArraySet}. This allows to provide - unmodifiable views of internal sets for "read-only" use. - - - a set for which the unmodifiable set is returned. - - an new unmodifiable {@link CharArraySet}. - - NullPointerException - if the given set is null. - - - - Adds all of the elements in the specified collection to this collection - - - Removes all elements from the set - - - Removes from this set all of its elements that are contained in the specified collection - - - Retains only the elements in this set that are contained in the specified collection - - - The Iterator<String> for this set. Strings are constructed on the fly, so - use nextCharArray for more efficient access. - - - - do not modify the returned char[] - - - Returns the next String, as a Set<String> would... - use nextCharArray() for better efficiency. - - - - Efficient unmodifiable {@link CharArraySet}. This implementation does not - delegate calls to a give {@link CharArraySet} like - {@link Collections#UnmodifiableSet(java.util.Set)} does. Instead is passes - the internal representation of a {@link CharArraySet} to a super - constructor and overrides all mutators. - - - - An "open" BitSet implementation that allows direct access to the array of words - storing the bits. -

- Unlike java.util.bitset, the fact that bits are packed into an array of longs - is part of the interface. This allows efficient implementation of other algorithms - by someone other than the author. It also allows one to efficiently implement - alternate serialization or interchange formats. -

- OpenBitSet is faster than java.util.BitSet in most operations - and *much* faster at calculating cardinality of sets and results of set operations. - It can also handle sets of larger cardinality (up to 64 * 2**32-1) -

- The goals of OpenBitSet are the fastest implementation possible, and - maximum code reuse. Extra safety and encapsulation - may always be built on top, but if that's built in, the cost can never be removed (and - hence people re-implement their own version in order to get better performance). - If you want a "safe", totally encapsulated (and slower and limited) BitSet - class, use java.util.BitSet. -

-

Performance Results

- - Test system: Pentium 4, Sun Java 1.5_06 -server -Xbatch -Xmx64M -
BitSet size = 1,000,000 -
Results are java.util.BitSet time divided by OpenBitSet time. - - - - - - - - - - -
cardinality intersect_count union nextSetBit get iterator
50% full 3.36 3.96 1.44 1.46 1.99 1.58
1% full 3.31 3.90   1.04   0.99
-
- Test system: AMD Opteron, 64 bit linux, Sun Java 1.5_06 -server -Xbatch -Xmx64M -
BitSet size = 1,000,000 -
Results are java.util.BitSet time divided by OpenBitSet time. - - - - - - - - - - -
cardinality intersect_count union nextSetBit get iterator
50% full 2.50 3.50 1.00 1.03 1.12 1.25
1% full 2.51 3.49   1.00   1.02
-
- $Id$ - -
- - Constructs an OpenBitSet large enough to hold numBits. - - - - - - - Constructs an OpenBitSet from an existing long[]. -
- The first 64 bits are in long[0], - with bit index 0 at the least significant bit, and bit index 63 at the most significant. - Given a bit index, - the word containing it is long[index/64], and it is at bit number index%64 within that word. -

- numWords are the number of elements in the array that contain - set bits (non-zero longs). - numWords should be <= bits.length, and - any existing words in the array at position >= numWords should be zero. - -

-
- - This DocIdSet implementation is cacheable. - - - Returns the current capacity in bits (1 greater than the index of the last bit) - - - Returns the current capacity of this set. Included for - compatibility. This is *not* equal to {@link #cardinality} - - - - Returns true if there are no set bits - - - Expert: returns the long[] storing the bits - - - Expert: sets a new long[] to use as the bit storage - - - Expert: gets the number of longs in the array that are in use - - - Expert: sets the number of longs in the array that are in use - - - Returns true or false for the specified bit index. - - - Returns true or false for the specified bit index. - The index should be less than the OpenBitSet size - - - - Returns true or false for the specified bit index - - - Returns true or false for the specified bit index. - The index should be less than the OpenBitSet size. - - - - returns 1 if the bit is set, 0 if not. - The index should be less than the OpenBitSet size - - - - sets a bit, expanding the set size if necessary - - - Sets the bit at the specified index. - The index should be less than the OpenBitSet size. - - - - Sets the bit at the specified index. - The index should be less than the OpenBitSet size. - - - - Sets a range of bits, expanding the set size if necessary - - - lower index - - one-past the last bit to set - - - - clears a bit. - The index should be less than the OpenBitSet size. - - - - clears a bit. - The index should be less than the OpenBitSet size. - - - - clears a bit, allowing access beyond the current set size without changing the size. - - - Clears a range of bits. Clearing past the end does not change the size of the set. - - - lower index - - one-past the last bit to clear - - - - Clears a range of bits. Clearing past the end does not change the size of the set. - - - lower index - - one-past the last bit to clear - - - - Sets a bit and returns the previous value. - The index should be less than the OpenBitSet size. - - - - Sets a bit and returns the previous value. - The index should be less than the OpenBitSet size. - - - - flips a bit. - The index should be less than the OpenBitSet size. - - - - flips a bit. - The index should be less than the OpenBitSet size. - - - - flips a bit, expanding the set size if necessary - - - flips a bit and returns the resulting bit value. - The index should be less than the OpenBitSet size. - - - - flips a bit and returns the resulting bit value. - The index should be less than the OpenBitSet size. - - - - Flips a range of bits, expanding the set size if necessary - - - lower index - - one-past the last bit to flip - - - - the number of set bits - - - - Returns the popcount or cardinality of the intersection of the two sets. - Neither set is modified. - - - - Returns the popcount or cardinality of the union of the two sets. - Neither set is modified. - - - - Returns the popcount or cardinality of "a and not b" - or "intersection(a, not(b))". - Neither set is modified. - - - - Returns the popcount or cardinality of the exclusive-or of the two sets. - Neither set is modified. - - - - Returns the index of the first set bit starting at the index specified. - -1 is returned if there are no more set bits. - - - - Returns the index of the first set bit starting at the index specified. - -1 is returned if there are no more set bits. - - - - this = this AND other - - - this = this OR other - - - Remove all elements set in other. this = this AND_NOT other - - - this = this XOR other - - - returns true if the sets have any elements in common - - - Expand the long[] with the size given as a number of words (64 bit longs). - getNumWords() is unchanged by this call. - - - - Ensure that the long[] is big enough to hold numBits, expanding it if necessary. - getNumWords() is unchanged by this call. - - - - Lowers numWords, the number of words in use, - by checking for trailing zero words. - - - - returns the number of 64 bit words it would take to hold numBits - - - returns true if both sets have the same bits set - - - Implements {@link LockFactory} for a single in-process instance, - meaning all locking will take place through this one instance. - Only use this {@link LockFactory} when you are certain all - IndexReaders and IndexWriters for a given index are running - against a single shared in-process Directory instance. This is - currently the default locking for RAMDirectory. - - - - - - -

Base class for Locking implementation. {@link Directory} uses - instances of this class to implement locking.

- -

Note that there are some useful tools to verify that - your LockFactory is working correctly: {@link - VerifyingLockFactory}, {@link LockStressTest}, {@link - LockVerifyServer}.

- -

- - - - - - -
- - Set the prefix in use for all locks created in this - LockFactory. This is normally called once, when a - Directory gets this LockFactory instance. However, you - can also call this (after this instance is assigned to - a Directory) to override the prefix in use. This - is helpful if you're running Lucene on machines that - have different mount points for the same shared - directory. - - - - Get the prefix in use for all locks created in this LockFactory. - - - Return a new Lock instance identified by lockName. - name of the lock to be created. - - - - Attempt to clear (forcefully unlock and remove) the - specified lock. Only call this at a time when you are - certain this lock is no longer in use. - - name of the lock to be cleared. - - - - An interprocess mutex lock. -

Typical use might look like:

-            new Lock.With(directory.makeLock("my.lock")) {
-            public Object doBody() {
-            ... code to execute while locked ...
-            }
-            }.run();
-            
- - -
- $Id: Lock.java 769409 2009-04-28 14:05:43Z mikemccand $ - - - -
- - Pass this value to {@link #Obtain(long)} to try - forever to obtain the lock. - - - - How long {@link #Obtain(long)} waits, in milliseconds, - in between attempts to acquire the lock. - - - - Attempts to obtain exclusive access and immediately return - upon success or failure. - - true iff exclusive access is obtained - - - - If a lock obtain called, this failureReason may be set - with the "root cause" Exception as to why the lock was - not obtained. - - - - Attempts to obtain an exclusive lock within amount of - time given. Polls once per {@link #LOCK_POLL_INTERVAL} - (currently 1000) milliseconds until lockWaitTimeout is - passed. - - length of time to wait in - milliseconds or {@link - #LOCK_OBTAIN_WAIT_FOREVER} to retry forever - - true if lock was obtained - - LockObtainFailedException if lock wait times out - IllegalArgumentException if lockWaitTimeout is - out of bounds - - IOException if obtain() throws IOException - - - Releases exclusive access. - - - Returns true if the resource is currently locked. Note that one must - still call {@link #Obtain()} before using the resource. - - - - Utility class for executing code with exclusive access. - - - Constructs an executor that will grab the named lock. - - - Code to execute with exclusive access. - - - Calls {@link #doBody} while lock is obtained. Blocks if lock - cannot be obtained immediately. Retries to obtain lock once per second - until it is obtained, or until it has tried ten times. Lock is released when - {@link #doBody} exits. - - LockObtainFailedException if lock could not - be obtained - - IOException if {@link Lock#obtain} throws IOException - - - A {@link HitCollector} implementation that collects the top-scoring - documents, returning them as a {@link TopDocs}. This is used by {@link - IndexSearcher} to implement {@link TopDocs}-based search. - -

This may be extended, overriding the collect method to, e.g., - conditionally invoke super() in order to filter which - documents are collected. - -

- Please use {@link TopScoreDocCollector} - instead, which has better performance. - - -
- - Lower-level search API.
- HitCollectors are primarily meant to be used to implement queries, sorting - and filtering. See {@link Collector} for a lower level and higher performance - (on a multi-segment index) API. - -
- - - $Id: HitCollector.java 764551 2009-04-13 18:33:56Z mikemccand $ - - Please use {@link Collector} instead. - -
- - Called once for every document matching a query, with the document - number and its raw score. - -

If, for example, an application wished to collect all of the hits for a - query in a BitSet, then it might:

-            Searcher searcher = new IndexSearcher(indexReader);
-            final BitSet bits = new BitSet(indexReader.maxDoc());
-            searcher.search(query, new HitCollector() {
-            public void collect(int doc, float score) {
-            bits.set(doc);
-            }
-            });
-            
- -

Note: This is called in an inner search loop. For good search - performance, implementations of this method should not call - {@link Searcher#Doc(int)} or - {@link Lucene.Net.Index.IndexReader#Document(int)} on every - document number encountered. Doing so can slow searches by an order - of magnitude or more. -

Note: The score passed to this method is a raw score. - In other words, the score will not necessarily be a float whose value is - between 0 and 1. -

-
- - The total number of hits the collector encountered. - - - The priority queue which holds the top-scoring documents. - - - Construct to collect a given number of hits. - the maximum number of hits to collect - - - - use TopDocCollector(hq) instead. numHits is not used by this - constructor. It will be removed in a future release. - - - - Constructor to collect the top-scoring documents by using the given PQ. - the PQ to use by this instance. - - - - The total number of documents that matched this query. - - - The top-scoring hits. - - - Encapsulates sort criteria for returned hits. - -

The fields used to determine sort order must be carefully chosen. - Documents must contain a single term in such a field, - and the value of the term should indicate the document's relative position in - a given sort order. The field must be indexed, but should not be tokenized, - and does not need to be stored (unless you happen to want it back with the - rest of your document data). In other words: - -

document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));

- - -

Valid Types of Values

- -

There are four possible kinds of term values which may be put into - sorting fields: Integers, Longs, Floats, or Strings. Unless - {@link SortField SortField} objects are specified, the type of value - in the field is determined by parsing the first term in the field. - -

Integer term values should contain only digits and an optional - preceding negative sign. Values must be base 10 and in the range - Integer.MIN_VALUE and Integer.MAX_VALUE inclusive. - Documents which should appear first in the sort - should have low value integers, later documents high values - (i.e. the documents should be numbered 1..n where - 1 is the first and n the last). - -

Long term values should contain only digits and an optional - preceding negative sign. Values must be base 10 and in the range - Long.MIN_VALUE and Long.MAX_VALUE inclusive. - Documents which should appear first in the sort - should have low value integers, later documents high values. - -

Float term values should conform to values accepted by - {@link Float Float.valueOf(String)} (except that NaN - and Infinity are not supported). - Documents which should appear first in the sort - should have low values, later documents high values. - -

String term values can contain any valid String, but should - not be tokenized. The values are sorted according to their - {@link Comparable natural order}. Note that using this type - of term value has higher memory requirements than the other - two types. - -

Object Reuse

- -

One of these objects can be - used multiple times and the sort order changed between usages. - -

This class is thread safe. - -

Memory Usage

- -

Sorting uses of caches of term values maintained by the - internal HitQueue(s). The cache is static and contains an integer - or float array of length IndexReader.maxDoc() for each field - name for which a sort is performed. In other words, the size of the - cache in bytes is: - -

4 * IndexReader.maxDoc() * (# of different fields actually used to sort) - -

For String fields, the cache is larger: in addition to the - above array, the value of every term in the field is kept in memory. - If there are many unique terms in the field, this could - be quite large. - -

Note that the size of the cache is not affected by how many - fields are in the index and might be used to sort - only by - the ones actually used to sort a result set. - -

Created: Feb 12, 2004 10:53:57 AM - -

- lucene 1.4 - - $Id: Sort.java 795179 2009-07-17 18:23:30Z mikemccand $ - -
- - Represents sorting by computed relevance. Using this sort criteria returns - the same results as calling - {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, - only with slightly more overhead. - - - - Represents sorting by index order. - - - Sorts by computed relevance. This is the same sort criteria as calling - {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, - only with slightly more overhead. - - - - Sorts by the terms in field then by index order (document - number). The type of value in field is determined - automatically. - - - - - Please specify the type explicitly by - first creating a {@link SortField} and then use {@link - #Sort(SortField)} - - - - Sorts possibly in reverse by the terms in field then by - index order (document number). The type of value in field is - determined automatically. - - - - - Please specify the type explicitly by - first creating a {@link SortField} and then use {@link - #Sort(SortField)} - - - - Sorts in succession by the terms in each field. The type of value in - field is determined automatically. - - - - - Please specify the type explicitly by - first creating {@link SortField}s and then use {@link - #Sort(SortField[])} - - - - Sorts by the criteria in the given SortField. - - - Sorts in succession by the criteria in each SortField. - - - Sets the sort to the terms in field then by index order - (document number). - - Please specify the type explicitly by - first creating a {@link SortField} and then use {@link - #SetSort(SortField)} - - - - Sets the sort to the terms in field possibly in reverse, - then by index order (document number). - - Please specify the type explicitly by - first creating a {@link SortField} and then use {@link - #SetSort(SortField)} - - - - Sets the sort to the terms in each field in succession. - Please specify the type explicitly by - first creating {@link SortField}s and then use {@link - #SetSort(SortField[])} - - - - Sets the sort to the given criteria. - - - Sets the sort to the given criteria in succession. - - - Representation of the sort criteria. - Array of SortField objects used in this sort criteria - - - - Returns true if o is equal to this. - - - Returns a hash code value for this object. - - - Calculate the final score as the average score of all payloads seen. -

- Is thread safe and completely reusable. - - -

-
- - An abstract class that defines a way for Payload*Query instances - to transform the cumulative effects of payload scores for a document. - - - for more information - -

- This class and its derivations are experimental and subject to change - - - - - -

Calculate the score up to this point for this doc and field - The current doc - - The field - - The start position of the matching Span - - The end position of the matching Span - - The number of payloads seen so far - - The current score so far - - The score for the current payload - - The new current Score - - - - -
- - Calculate the final score for all the payloads seen so far for this doc/field - The current doc - - The current field - - The total number of payloads seen on this document - - The raw score for those payloads - - The final score for the payloads - - - - A query that scores each document as the value of the numeric input field. -

- The query matches all documents, and scores each document according to the numeric - value of that field. -

- It is assumed, and expected, that: -

    -
  • The field used here is indexed, and has exactly - one token in every scored document.
  • -
  • Best if this field is un_tokenized.
  • -
  • That token is parsable to the selected type.
  • -
-

- Combining this query in a FunctionQuery allows much freedom in affecting document scores. - Note, that with this freedom comes responsibility: it is more than likely that the - default Lucene scoring is superior in quality to scoring modified as explained here. - However, in some cases, and certainly for research experiments, this capability may turn useful. -

- When contructing this query, select the appropriate type. That type should match the data stored in the - field. So in fact the "right" type should be selected before indexing. Type selection - has effect on the RAM usage: -

    -
  • {@link Type#BYTE} consumes 1 * maxDocs bytes.
  • -
  • {@link Type#SHORT} consumes 2 * maxDocs bytes.
  • -
  • {@link Type#INT} consumes 4 * maxDocs bytes.
  • -
  • {@link Type#FLOAT} consumes 8 * maxDocs bytes.
  • -
-

- Caching: - Values for the numeric field are loaded once and cached in memory for further use with the same IndexReader. - To take advantage of this, it is extremely important to reuse index-readers or index-searchers, - otherwise, for instance if for each query a new index reader is opened, large penalties would be - paid for loading the field values into memory over and over again! - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. -

-
- - Expert: A Query that sets the scores of document to the - values obtained from a {@link Lucene.Net.Search.Function.ValueSource ValueSource}. -

- This query provides a score for each and every undeleted document in the index. -

- The value source can be based on a (cached) value of an indexed field, but it - can also be based on an external source, e.g. values read from an external database. -

- Score is set as: Score(doc,query) = query.getBoost()2 * valueSource(doc). - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. -

-
- - Create a value source query - provides the values defines the function to be used for scoring - - - - Returns true if o is equal to this. - - - Returns a hash code value for this object. - - - A scorer that (simply) matches all documents, and scores each document with - the value of the value soure in effect. As an example, if the value source - is a (cached) field source, then value of that field in that document will - be used. (assuming field is indexed for this doc, with a single token.) - - - - use {@link #NextDoc()} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #Advance(int)} instead. - - - - Create a FieldScoreQuery - a query that scores each document as the value of the numeric input field. -

- The type param tells how to parse the field string values into a numeric score value. -

- the numeric field to be used. - - the type of the field: either - {@link Type#BYTE}, {@link Type#SHORT}, {@link Type#INT}, or {@link Type#FLOAT}. - -
- - Type of score field, indicating how field values are interpreted/parsed. -

- The type selected at search search time should match the data stored in the field. - Different types have different RAM requirements: -

    -
  • {@link #BYTE} consumes 1 * maxDocs bytes.
  • -
  • {@link #SHORT} consumes 2 * maxDocs bytes.
  • -
  • {@link #INT} consumes 4 * maxDocs bytes.
  • -
  • {@link #FLOAT} consumes 8 * maxDocs bytes.
  • -
-
-
- - field values are interpreted as numeric byte values. - - - field values are interpreted as numeric short values. - - - field values are interpreted as numeric int values. - - - field values are interpreted as numeric float values. - - - Abstract decorator class for a DocIdSet implementation - that provides on-demand filtering/validation - mechanism on a given DocIdSet. - -

- - Technically, this same functionality could be achieved - with ChainedFilter (under contrib/misc), however the - benefit of this class is it never materializes the full - bitset for the filter. Instead, the {@link #match} - method is invoked on-demand, per docID visited during - searching. If you know few docIDs will be visited, and - the logic behind {@link #match} is relatively costly, - this may be a better way to filter than ChainedFilter. - -

- - -
- - Constructor. - Underlying DocIdSet - - - - This DocIdSet implementation is cacheable if the inner set is cacheable. - - - Validation method to determine whether a docid should be in the result set. - docid to be tested - - true if input docid should be in the result set, false otherwise. - - - - Implementation of the contract to build a DocIdSetIterator. - - - - - - - Abstract decorator class of a DocIdSetIterator - implementation that provides on-demand filter/validation - mechanism on an underlying DocIdSetIterator. See {@link - FilteredDocIdSet}. - - - - Constructor. - Underlying DocIdSetIterator. - - - - Validation method to determine whether a docid should be in the result set. - docid to be tested - - true if input docid should be in the result set, false otherwise. - - - - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - This interface is obsolete, use {@link FieldCache} instead. - - - Use {@link FieldCache}, this will be removed in Lucene 3.0 - - - - - Use {@link FieldCache#DEFAULT}; this will be removed in Lucene 3.0 - - - - Use {@link FieldCache.LongParser}, this will be removed in Lucene 3.0 - - - - Interface to parse long from document fields. - - - Use {@link FieldCache.LongParser}, this will be removed in Lucene 3.0 - - - - Marker interface as super-interface to all parsers. It - is used to specify a custom parser to {@link - SortField#SortField(String, FieldCache.Parser)}. - - - - Return an long representation of this field's value. - - - Use {@link FieldCache.DoubleParser}, this will be removed in Lucene 3.0 - - - - Interface to parse doubles from document fields. - - - Use {@link FieldCache.DoubleParser}, this will be removed in Lucene 3.0 - - - - Return an long representation of this field's value. - - - Expert: Describes the score computation for document and query. - - - Indicates whether or not this Explanation models a good match. - -

- By default, an Explanation represents a "match" if the value is positive. -

-

- - -
- - The value assigned to this explanation node. - - - Sets the value assigned to this explanation node. - - - A description of this explanation node. - - - Sets the description of this explanation node. - - - A short one line summary which should contain all high level - information about this Explanation, without the "Details" - - - - The sub-nodes of this explanation node. - - - Adds a sub-node to this explanation node. - - - Render an explanation as text. - - - Render an explanation as HTML. - - - Small Util class used to pass both an idf factor as well as an - explanation for that factor. - - This class will likely be held on a {@link Weight}, so be aware - before storing any large or un-serializable fields. - - - - - the idf factor - - - - This should be calculated lazily if possible. - - - the explanation for the idf factor. - - - - Expert: Scoring functionality for phrase queries. -
A document is considered matching if it contains the phrase-query terms - at "valid" positons. What "valid positions" are - depends on the type of the phrase query: for an exact phrase query terms are required - to appear in adjacent locations, while for a sloppy phrase query some distance between - the terms is allowed. The abstract method {@link #PhraseFreq()} of extending classes - is invoked for each document containing all the phrase query terms, in order to - compute the frequency of the phrase query in that document. A non zero frequency - means a match. -
-
- - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - For a document containing all the phrase query terms, compute the - frequency of the phrase in that document. - A non zero frequency means a match. -
Note, that containing all phrase terms does not guarantee a match - they have to be found in matching locations. -
- frequency of the phrase in current doc, 0 if not found. - -
- - Expert: Default scoring implementation. - - - Expert: Scoring API. -

Subclasses implement search scoring. - -

The score of query q for document d correlates to the - cosine-distance or dot-product between document and query vectors in a - - Vector Space Model (VSM) of Information Retrieval. - A document whose vector is closer to the query vector in that model is scored higher. - - The score is computed as follows: - -

- - -
- - - - - - - - - - - -
- score(q,d)   =   - coord(q,d)  ·  - queryNorm(q)  ·  - - - - ( - tf(t in d)  ·  - idf(t)2  ·  - t.getBoost() ·  - norm(t,d) - ) -
t in q
-
- -

where -

    -
  1. - - tf(t in d) - correlates to the term's frequency, - defined as the number of times term t appears in the currently scored document d. - Documents that have more occurrences of a given term receive a higher score. - The default computation for tf(t in d) in - {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) DefaultSimilarity} is: - -
     
    - - - - - -
    - {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) tf(t in d)}   =   - - frequency½ -
    -
     
    -
  2. - -
  3. - - idf(t) stands for Inverse Document Frequency. This value - correlates to the inverse of docFreq - (the number of documents in which the term t appears). - This means rarer terms give higher contribution to the total score. - The default computation for idf(t) in - {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) DefaultSimilarity} is: - -
     
    - - - - - - - -
    - {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) idf(t)}  =   - - 1 + log ( - - - - - -
    numDocs
    –––––––––
    docFreq+1
    -
    - ) -
    -
     
    -
  4. - -
  5. - - coord(q,d) - is a score factor based on how many of the query terms are found in the specified document. - Typically, a document that contains more of the query's terms will receive a higher score - than another document with fewer query terms. - This is a search time factor computed in - {@link #Coord(int, int) coord(q,d)} - by the Similarity in effect at search time. -
     
    -
  6. - -
  7. - - queryNorm(q) - - is a normalizing factor used to make scores between queries comparable. - This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), - but rather just attempts to make scores from different queries (or even different indexes) comparable. - This is a search time factor computed by the Similarity in effect at search time. - - The default computation in - {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) DefaultSimilarity} - is: -
     
    - - - - - -
    - queryNorm(q)   =   - {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) queryNorm(sumOfSquaredWeights)} -   =   - - - - - -
    1
    - –––––––––––––– -
    sumOfSquaredWeights½
    -
    -
     
    - - The sum of squared weights (of the query terms) is - computed by the query {@link Lucene.Net.Search.Weight} object. - For example, a {@link Lucene.Net.Search.BooleanQuery boolean query} - computes this value as: - -
     
    - - - - - - - - - - - -
    - {@link Lucene.Net.Search.Weight#SumOfSquaredWeights() sumOfSquaredWeights}   =   - {@link Lucene.Net.Search.Query#GetBoost() q.getBoost()} 2 -  ·  - - - - ( - idf(t)  ·  - t.getBoost() - ) 2 -
    t in q
    -
     
    - -
  8. - -
  9. - - t.getBoost() - is a search time boost of term t in the query q as - specified in the query text - (see query syntax), - or as set by application calls to - {@link Lucene.Net.Search.Query#SetBoost(float) setBoost()}. - Notice that there is really no direct API for accessing a boost of one term in a multi term query, - but rather multi terms are represented in a query as multi - {@link Lucene.Net.Search.TermQuery TermQuery} objects, - and so the boost of a term in the query is accessible by calling the sub-query - {@link Lucene.Net.Search.Query#GetBoost() getBoost()}. -
     
    -
  10. - -
  11. - - norm(t,d) encapsulates a few (indexing time) boost and length factors: - -
      -
    • Document boost - set by calling - {@link Lucene.Net.Documents.Document#SetBoost(float) doc.setBoost()} - before adding the document to the index. -
    • -
    • Field boost - set by calling - {@link Lucene.Net.Documents.Fieldable#SetBoost(float) field.setBoost()} - before adding the field to a document. -
    • -
    • {@link #LengthNorm(String, int) lengthNorm(field)} - computed - when the document is added to the index in accordance with the number of tokens - of this field in the document, so that shorter fields contribute more to the score. - LengthNorm is computed by the Similarity class in effect at indexing. -
    • -
    - -

    - When a document is added to the index, all the above factors are multiplied. - If the document has multiple fields with the same name, all their boosts are multiplied together: - -
     
    - - - - - - - - - - - -
    - norm(t,d)   =   - {@link Lucene.Net.Documents.Document#GetBoost() doc.getBoost()} -  ·  - {@link #LengthNorm(String, int) lengthNorm(field)} -  ·  - - - - {@link Lucene.Net.Documents.Fieldable#GetBoost() f.getBoost}() -
    field f in d named as t
    -
     
    - However the resulted norm value is {@link #EncodeNorm(float) encoded} as a single byte - before being stored. - At search time, the norm byte value is read from the index - {@link Lucene.Net.Store.Directory directory} and - {@link #DecodeNorm(byte) decoded} back to a float norm value. - This encoding/decoding, while reducing index size, comes with the price of - precision loss - it is not guaranteed that decode(encode(x)) = x. - For instance, decode(encode(0.89)) = 0.75. - Also notice that search time is too late to modify this norm part of scoring, e.g. by - using a different {@link Similarity} for search. -
     
    -

  12. -
- -
- - - - - - -
- - Set the default Similarity implementation used by indexing and search - code. - - - - - - - - - Return the default Similarity implementation used by indexing and search - code. - -

This is initially an instance of {@link DefaultSimilarity}. - -

- - - - -
- - Cache of decoded bytes. - - - Decodes a normalization factor stored in an index. - - - - - Returns a table for decoding normalization bytes. - - - - - Compute the normalization value for a field, given the accumulated - state of term processing for this field (see {@link FieldInvertState}). - -

Implementations should calculate a float value based on the field - state and then return that value. - -

For backward compatibility this method by default calls - {@link #LengthNorm(String, int)} passing - {@link FieldInvertState#GetLength()} as the second argument, and - then multiplies this value by {@link FieldInvertState#GetBoost()}.

- -

WARNING: This API is new and experimental and may - suddenly change.

- -

- field name - - current processing state for this field - - the calculated float norm - -
- - Computes the normalization value for a field given the total number of - terms contained in a field. These values, together with field boosts, are - stored in an index and multipled into scores for hits on each field by the - search code. - -

Matches in longer fields are less precise, so implementations of this - method usually return smaller values when numTokens is large, - and larger values when numTokens is small. - -

Note that the return values are computed under - {@link Lucene.Net.Index.IndexWriter#AddDocument(Lucene.Net.Documents.Document)} - and then stored using - {@link #EncodeNorm(float)}. - Thus they have limited precision, and documents - must be re-indexed if this method is altered. - -

- the name of the field - - the total number of tokens contained in fields named - fieldName of doc. - - a normalization factor for hits on this field of this document - - - - -
- - Computes the normalization value for a query given the sum of the squared - weights of each of the query terms. This value is then multipled into the - weight of each query term. - -

This does not affect ranking, but rather just attempts to make scores - from different queries comparable. - -

- the sum of the squares of query term weights - - a normalization factor for query weights - -
- - Encodes a normalization factor for storage in an index. - -

The encoding uses a three-bit mantissa, a five-bit exponent, and - the zero-exponent point at 15, thus - representing values from around 7x10^9 to 2x10^-9 with about one - significant decimal digit of accuracy. Zero is also represented. - Negative numbers are rounded up to zero. Values too large to represent - are rounded down to the largest representable value. Positive values too - small to represent are rounded up to the smallest positive representable - value. - -

- - - - -
- - Computes a score factor based on a term or phrase's frequency in a - document. This value is multiplied by the {@link #Idf(Term, Searcher)} - factor for each term in the query and these products are then summed to - form the initial score for a document. - -

Terms and phrases repeated in a document indicate the topic of the - document, so implementations of this method usually return larger values - when freq is large, and smaller values when freq - is small. - -

The default implementation calls {@link #Tf(float)}. - -

- the frequency of a term within a document - - a score factor based on a term's within-document frequency - -
- - Computes the amount of a sloppy phrase match, based on an edit distance. - This value is summed for each sloppy phrase match in a document to form - the frequency that is passed to {@link #Tf(float)}. - -

A phrase match with a small edit distance to a document passage more - closely matches the document, so implementations of this method usually - return larger values when the edit distance is small and smaller values - when it is large. - -

- - - the edit distance of this sloppy phrase match - - the frequency increment for this match - -
- - Computes a score factor based on a term or phrase's frequency in a - document. This value is multiplied by the {@link #Idf(Term, Searcher)} - factor for each term in the query and these products are then summed to - form the initial score for a document. - -

Terms and phrases repeated in a document indicate the topic of the - document, so implementations of this method usually return larger values - when freq is large, and smaller values when freq - is small. - -

- the frequency of a term within a document - - a score factor based on a term's within-document frequency - -
- - Computes a score factor for a simple term. - -

The default implementation is:

-            return idf(searcher.docFreq(term), searcher.maxDoc());
-            
- - Note that {@link Searcher#MaxDoc()} is used instead of - {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is proportional to - {@link Searcher#DocFreq(Term)} , i.e., when one is inaccurate, - so is the other, and in the same direction. - -
- the term in question - - the document collection being searched - - a score factor for the term - - see {@link #IdfExplain(Term, Searcher)} - -
- - Computes a score factor for a simple term and returns an explanation - for that score factor. - -

- The default implementation uses: - -

-            idf(searcher.docFreq(term), searcher.maxDoc());
-            
- - Note that {@link Searcher#MaxDoc()} is used instead of - {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is - proportional to {@link Searcher#DocFreq(Term)} , i.e., when one is - inaccurate, so is the other, and in the same direction. - -
- the term in question - - the document collection being searched - - an IDFExplain object that includes both an idf score factor - and an explanation for the term. - - IOException -
- - Computes a score factor for a phrase. - -

The default implementation sums the {@link #Idf(Term,Searcher)} factor - for each term in the phrase. - -

- the terms in the phrase - - the document collection being searched - - idf score factor - - see {@link #idfExplain(Collection, Searcher)} - -
- - Computes a score factor for a phrase. - -

- The default implementation sums the idf factor for - each term in the phrase. - -

- the terms in the phrase - - the document collection being searched - - an IDFExplain object that includes both an idf - score factor for the phrase and an explanation - for each term. - - IOException -
- - Computes a score factor based on a term's document frequency (the number - of documents which contain the term). This value is multiplied by the - {@link #Tf(int)} factor for each term in the query and these products are - then summed to form the initial score for a document. - -

Terms that occur in fewer documents are better indicators of topic, so - implementations of this method usually return larger values for rare terms, - and smaller values for common terms. - -

- the number of documents which contain the term - - the total number of documents in the collection - - a score factor based on the term's document frequency - -
- - Computes a score factor based on the fraction of all query terms that a - document contains. This value is multiplied into scores. - -

The presence of a large portion of the query terms indicates a better - match with the query, so implementations of this method usually return - larger values when the ratio between these parameters is large and smaller - values when the ratio between them is small. - -

- the number of query terms matched in the document - - the total number of terms in the query - - a score factor based on term overlap with the query - -
- - Calculate a scoring factor based on the data in the payload. Overriding implementations - are responsible for interpreting what is in the payload. Lucene makes no assumptions about - what is in the byte array. -

- The default implementation returns 1. - -

- The fieldName of the term this payload belongs to - - The payload byte array to be scored - - The offset into the payload array - - The length in the array - - An implementation dependent float to be used as a scoring factor - - - See {@link #ScorePayload(int, String, int, int, byte[], int, int)} - -
- - Calculate a scoring factor based on the data in the payload. Overriding implementations - are responsible for interpreting what is in the payload. Lucene makes no assumptions about - what is in the byte array. -

- The default implementation returns 1. - -

- The docId currently being scored. If this value is {@link #NO_DOC_ID_PROVIDED}, then it should be assumed that the PayloadQuery implementation does not provide document information - - The fieldName of the term this payload belongs to - - The start position of the payload - - The end position of the payload - - The payload byte array to be scored - - The offset into the payload array - - The length in the array - - An implementation dependent float to be used as a scoring factor - - -
- - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - The Similarity implementation used by default. - TODO: move back to top when old API is removed! - - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Remove this when old API is removed! - - - - Implemented as - state.getBoost()*lengthNorm(numTerms), where - numTerms is {@link FieldInvertState#GetLength()} if {@link - #setDiscountOverlaps} is false, else it's {@link - FieldInvertState#GetLength()} - {@link - FieldInvertState#GetNumOverlap()}. - -

WARNING: This API is new and experimental, and may suddenly - change.

-

-
- - Implemented as 1/sqrt(numTerms). - - - Implemented as 1/sqrt(sumOfSquaredWeights). - - - Implemented as sqrt(freq). - - - Implemented as 1 / (distance + 1). - - - Implemented as log(numDocs/(docFreq+1)) + 1. - - - Implemented as overlap / maxOverlap. - - - Determines whether overlap tokens (Tokens with - 0 position increment) are ignored when computing - norm. By default this is false, meaning overlap - tokens are counted just like non-overlap tokens. - -

WARNING: This API is new and experimental, and may suddenly - change.

- -

- - -
- - - - - - An efficient implementation of JavaCC's CharStream interface.

Note that - this does not do line-number counting, but instead keeps track of the - character position of the token in the input, as required by Lucene's {@link - Lucene.Net.Analysis.Token} API. - -

-
- - This interface describes a character stream that maintains line and - column number positions of the characters. It also has the capability - to backup the stream to some extent. An implementation of this - interface is used in the TokenManager implementation generated by - JavaCCParser. - - All the methods except backup can be implemented in any fashion. backup - needs to be implemented correctly for the correct operation of the lexer. - Rest of the methods are all used to get information like line number, - column number and the String that constitutes a token and are not used - by the lexer. Hence their implementation won't affect the generated lexer's - operation. - - - - Returns the next character from the selected input. The method - of selecting the input is the responsibility of the class - implementing this interface. Can throw any java.io.IOException. - - - - Returns the column position of the character last read. - - - - - - - Returns the line number of the character last read. - - - - - - - Returns the column number of the last character for current token (being - matched after the last call to BeginTOken). - - - - Returns the line number of the last character for current token (being - matched after the last call to BeginTOken). - - - - Returns the column number of the first character for current token (being - matched after the last call to BeginTOken). - - - - Returns the line number of the first character for current token (being - matched after the last call to BeginTOken). - - - - Backs up the input stream by amount steps. Lexer calls this method if it - had already read some characters, but could not use them to match a - (longer) token. So, they will be used again as the prefix of the next - token and it is the implemetation's responsibility to do this right. - - - - Returns the next character that marks the beginning of the next token. - All characters must remain in the buffer between two successive calls - to this method to implement backup correctly. - - - - Returns a string made up of characters from the marked token beginning - to the current buffer position. Implementations have the choice of returning - anything that they want to. For example, for efficiency, one might decide - to just return null, which is a valid implementation. - - - - Returns an array of characters that make up the suffix of length 'len' for - the currently matched token. This is used to build up the matched string - for use in actions in the case of MORE. A simple and inefficient - implementation of this is as follows : - - { - String t = GetImage(); - return t.substring(t.length() - len, t.length()).toCharArray(); - } - - - - The lexer calls this function to indicate that it is done with the stream - and hence implementations can free any resources held by this class. - Again, the body of this function can be just empty and it will not - affect the lexer's operation. - - - - Constructs from a Reader. - - - This exception is thrown when an {@link IndexReader} - tries to make changes to the index (via {@link - IndexReader#deleteDocument}, {@link - IndexReader#undeleteAll} or {@link IndexReader#setNorm}) - but changes have already been committed to the index - since this reader was instantiated. When this happens - you must open a new reader on the current index to make - the changes. - - - - Called when DocumentsWriter decides to create a new - segment - - - - Called when DocumentsWriter decides to close the doc - stores - - - - Called when an aborting exception is hit - - - Add a new thread - - - Called when DocumentsWriter is using too much RAM. - The consumer should free RAM, if possible, returning - true if any RAM was in fact freed. - - - - Basic tool and API to check the health of an index and - write a new segments file that removes reference to - problematic segments. - -

As this tool checks every byte in the index, on a large - index it can take quite a long time to run. - -

WARNING: this tool and API is new and - experimental and is subject to suddenly change in the - next release. Please make a complete backup of your - index before using this to fix your index! -

-
- - Default PrintStream for all CheckIndex instances. - Use {@link #setInfoStream} per instance, - instead. - - - - Create a new CheckIndex on the directory. - - - Set infoStream where messages should go. If null, no - messages are printed - - - - Returns true if index is clean, else false. - Please instantiate a CheckIndex and then use {@link #CheckIndex()} instead - - - - Returns true if index is clean, else false. - Please instantiate a CheckIndex and then use {@link #CheckIndex(List)} instead - - - - Returns a {@link Status} instance detailing - the state of the index. - -

As this method checks every byte in the index, on a large - index it can take quite a long time to run. - -

WARNING: make sure - you only call this when the index is not opened by any - writer. -

-
- - Returns a {@link Status} instance detailing - the state of the index. - - - list of specific segment names to check - -

As this method checks every byte in the specified - segments, on a large index it can take quite a long - time to run. - -

WARNING: make sure - you only call this when the index is not opened by any - writer. - - - -

Test field norms. -
- - Test the term index. - - - Test stored fields for a segment. - - - Test term vectors for a segment. - - - Repairs the index using previously returned result - from {@link #checkIndex}. Note that this does not - remove any of the unreferenced files after it's done; - you must separately open an {@link IndexWriter}, which - deletes unreferenced files when it's created. - -

WARNING: this writes a - new segments file into the index, effectively removing - all documents in broken segments from the index. - BE CAREFUL. - -

WARNING: Make sure you only call this when the - index is not opened by any writer. -

-
- - Command-line interface to check and fix an index. -

- Run it like this: -

-            java -ea:Lucene.Net... Lucene.Net.Index.CheckIndex pathToIndex [-fix] [-segment X] [-segment Y]
-            
-
    -
  • -fix: actually write a new segments_N file, removing any problematic segments
  • -
  • -segment X: only check the specified - segment(s). This can be specified multiple times, - to check more than one segment, eg -segment _2 - -segment _a. You can't use this with the -fix - option.
  • -
-

WARNING: -fix should only be used on an emergency basis as it will cause - documents (perhaps many) to be permanently removed from the index. Always make - a backup copy of your index before running this! Do not run this tool on an index - that is actively being written to. You have been warned! -

Run without -fix, this tool will open the index, report version information - and report any exceptions it hits and what action it would take if -fix were - specified. With -fix, this tool will remove any segments that have issues and - write a new segments_N file. This means all documents contained in the affected - segments will be removed. -

- This tool exits with exit code 1 if the index cannot be opened or has any - corruption, else 0. -

-
- - Returned from {@link #CheckIndex()} detailing the health and status of the index. - -

WARNING: this API is new and experimental and is - subject to suddenly change in the next release. - -

-
- - True if no problems were found with the index. - - - True if we were unable to locate and load the segments_N file. - - - True if we were unable to open the segments_N file. - - - True if we were unable to read the version number from segments_N file. - - - Name of latest segments_N file in the index. - - - Number of segments in the index. - - - String description of the version of the index. - - - Empty unless you passed specific segments list to check as optional 3rd argument. - - - - - True if the index was created with a newer version of Lucene than the CheckIndex tool. - - - List of {@link SegmentInfoStatus} instances, detailing status of each segment. - - - Directory index is in. - - - SegmentInfos instance containing only segments that - had no problems (this is used with the {@link CheckIndex#fixIndex} - method to repair the index. - - - - How many documents will be lost to bad segments. - - - How many bad segments were found. - - - True if we checked only specific segments ({@link - #CheckIndex(List)}) was called with non-null - argument). - - - - Holds the userData of the last commit in the index - - - Holds the status of each segment in the index. - See {@link #segmentInfos}. - -

WARNING: this API is new and experimental and is - subject to suddenly change in the next release. -

-
- - Name of the segment. - - - Document count (does not take deletions into account). - - - True if segment is compound file format. - - - Number of files referenced by this segment. - - - Net size (MB) of the files referenced by this - segment. - - - - Doc store offset, if this segment shares the doc - store files (stored fields and term vectors) with - other segments. This is -1 if it does not share. - - - - String of the shared doc store segment, or null if - this segment does not share the doc store files. - - - - True if the shared doc store files are compound file - format. - - - - True if this segment has pending deletions. - - - Name of the current deletions file name. - - - Number of deleted documents. - - - True if we were able to open a SegmentReader on this - segment. - - - - Number of fields in this segment. - - - True if at least one of the fields in this segment - does not omitTermFreqAndPositions. - - - - - - Map<String, String> that includes certain - debugging details that IndexWriter records into - each segment it creates - - - - Status for testing of field norms (null if field norms could not be tested). - - - Status for testing of indexed terms (null if indexed terms could not be tested). - - - Status for testing of stored fields (null if stored fields could not be tested). - - - Status for testing of term vectors (null if term vectors could not be tested). - - - Status from testing field norms. - - - Number of fields successfully tested - - - Exception thrown during term index test (null on success) - - - Status from testing term index. - - - Total term count - - - Total frequency across all terms. - - - Total number of positions. - - - Exception thrown during term index test (null on success) - - - Status from testing stored fields. - - - Number of documents tested. - - - Total number of stored fields tested. - - - Exception thrown during stored fields test (null on success) - - - Status from testing stored fields. - - - Number of documents tested. - - - Total number of term vectors tested. - - - Exception thrown during term vector test (null on success) - - - Optimized implementation. - - - Overridden by SegmentTermPositions to skip in prox stream. - - - Optimized implementation. - - - The payload of a Token. See also {@link Payload}. - - - Returns this Token's payload. - - - Sets this Token's payload. - - - Works in conjunction with the SinkTokenizer to provide the ability to set aside tokens - that have already been analyzed. This is useful in situations where multiple fields share - many common analysis steps and then go their separate ways. -

- It is also useful for doing things like entity extraction or proper noun analysis as - part of the analysis workflow and saving off those tokens for use in another field. - -

-            SinkTokenizer sink1 = new SinkTokenizer();
-            SinkTokenizer sink2 = new SinkTokenizer();
-            TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1), sink1), sink2);
-            TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2), sink1), sink2);
-            TokenStream final1 = new LowerCaseFilter(source1);
-            TokenStream final2 = source2;
-            TokenStream final3 = new EntityDetect(sink1);
-            TokenStream final4 = new URLDetect(sink2);
-            d.add(new Field("f1", final1));
-            d.add(new Field("f2", final2));
-            d.add(new Field("f3", final3));
-            d.add(new Field("f4", final4));
-            
- In this example, sink1 and sink2 will both get tokens from both - reader1 and reader2 after whitespace tokenizer - and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired. - It is important, that tees are consumed before sinks (in the above example, the field names must be - less the sink's field names). - Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene -

- - See LUCENE-1058. -

- WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API. - If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers - the same functionality. -

- - - Use {@link TeeSinkTokenFilter} instead - - -
- - A TokenFilter is a TokenStream whose input is another TokenStream. -

- This is an abstract class; subclasses must override {@link #IncrementToken()}. - -

- - -
- - The source of tokens for this filter. - - - Construct a token stream filtering the given input. - - - Performs end-of-stream operations, if any, and calls then end() on the - input TokenStream.

- NOTE: Be sure to call super.end() first when overriding this method. -

-
- - Close the input TokenStream. - - - Reset the filter as well as the input TokenStream. - - - A filter that replaces accented characters in the ISO Latin 1 character set - (ISO-8859-1) by their unaccented equivalent. The case will not be altered. -

- For instance, 'À' will be replaced by 'a'. -

- -

- in favor of {@link ASCIIFoldingFilter} which covers a superset - of Latin 1. This class will be removed in Lucene 3.0. - -
- - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - To replace accented characters in a String by unaccented equivalents. - - - CharStream adds {@link #CorrectOffset} - functionality over {@link Reader}. All Tokenizers accept a - CharStream instead of {@link Reader} as input, which enables - arbitrary character based filtering before tokenization. - The {@link #CorrectOffset} method fixed offsets to account for - removal or insertion of characters, so that the offsets - reported in the tokens match the character offsets of the - original Reader. - - - - Called by CharFilter(s) and Tokenizer to correct token offset. - - - offset as seen in the output - - corrected offset based on the input - - - - Subclasses of StringInterner are required to - return the same single String object for all equal strings. - Depending on the implementation, this may not be - the same object returned as String.intern(). - - This StringInterner base class simply delegates to String.intern(). - - - - Returns a single object instance for each equal string. - - - Returns a single object instance for each equal string. - - - Optimized implementation of a vector of bits. This is more-or-less like - java.util.BitSet, but also includes the following: -
    -
  • a count() method, which efficiently computes the number of one bits;
  • -
  • optimized read from and write to disk;
  • -
  • inlinable get() method;
  • -
  • store and load, as bit set or d-gaps, depending on sparseness;
  • -
-
- $Id: BitVector.java 765649 2009-04-16 14:29:26Z mikemccand $ - -
- - Constructs a vector capable of holding n bits. - - - Sets the value of bit to one. - - - Sets the value of bit to true, and - returns true if bit was already set - - - - Sets the value of bit to zero. - - - Returns true if bit is one and - false if it is zero. - - - - Returns the number of bits in this vector. This is also one greater than - the number of the largest valid bit number. - - - - Returns the total number of one bits in this vector. This is efficiently - computed and cached, so that, if the vector is not changed, no - recomputation is done for repeated calls. - - - - Writes this vector to the file name in Directory - d, in a format that can be read by the constructor {@link - #BitVector(Directory, String)}. - - - - Write as a bit set - - - Write as a d-gaps list - - - Indicates if the bit vector is sparse and should be saved as a d-gaps list, or dense, and should be saved as a bit set. - - - Constructs a bit vector from the file name in Directory - d, as written by the {@link #write} method. - - - - Read as a bit set - - - read as a d-gaps list - - - Retrieve a subset of this BitVector. - - - starting index, inclusive - - ending index, exclusive - - subset - - - - - Contains conversion support elements such as classes, interfaces and static methods. - - - - - Copies an array of chars obtained from a String into a specified array of chars - - The String to get the chars from - Position of the String to start getting the chars - Position of the String to end getting the chars - Array to return the chars - Position of the destination array of chars to start storing the chars - An array of chars - - - - Support class used to handle threads - - - - - The instance of System.Threading.Thread - - - - - Initializes a new instance of the ThreadClass class - - - - - Initializes a new instance of the Thread class. - - The name of the thread - - - - Initializes a new instance of the Thread class. - - A ThreadStart delegate that references the methods to be invoked when this thread begins executing - - - - Initializes a new instance of the Thread class. - - A ThreadStart delegate that references the methods to be invoked when this thread begins executing - The name of the thread - - - - This method has no functionality unless the method is overridden - - - - - Causes the operating system to change the state of the current thread instance to ThreadState.Running - - - - - Interrupts a thread that is in the WaitSleepJoin thread state - - - - - Blocks the calling thread until a thread terminates - - - - - Blocks the calling thread until a thread terminates or the specified time elapses - - Time of wait in milliseconds - - - - Blocks the calling thread until a thread terminates or the specified time elapses - - Time of wait in milliseconds - Time of wait in nanoseconds - - - - Resumes a thread that has been suspended - - - - - Raises a ThreadAbortException in the thread on which it is invoked, - to begin the process of terminating the thread. Calling this method - usually terminates the thread - - - - - Raises a ThreadAbortException in the thread on which it is invoked, - to begin the process of terminating the thread while also providing - exception information about the thread termination. - Calling this method usually terminates the thread. - - An object that contains application-specific information, such as state, which can be used by the thread being aborted - - - - Suspends the thread, if the thread is already suspended it has no effect - - - - - Obtain a String that represents the current object - - A String that represents the current object - - - - Gets the currently running thread - - The currently running thread - - - - Gets the current thread instance - - - - - Gets or sets the name of the thread - - - - - Gets or sets a value indicating the scheduling priority of a thread - - - - - Gets a value indicating the execution status of the current thread - - - - - Gets or sets a value indicating whether or not a thread is a background thread. - - - - - Represents the methods to support some operations over files. - - - - - Returns an array of abstract pathnames representing the files and directories of the specified path. - - The abstract pathname to list it childs. - An array of abstract pathnames childs of the path specified or null if the path is not a directory - - - - Returns a list of files in a give directory. - - The full path name to the directory. - - An array containing the files. - - - - Flushes the specified file stream. Ensures that all buffered - data is actually written to the file system. - - The file stream. - - - - A simple class for number conversions. - - - - - Min radix value. - - - - - Max radix value. - - - - - Converts a number to System.String. - - - - - - - Converts a number to System.String. - - - - - - - Converts a number to System.String in the specified radix. - - A number to be converted. - A radix. - A System.String representation of the number in the specified redix. - - - - Parses a number in the specified radix. - - An input System.String. - A radix. - The parsed number in the specified radix. - - - - Performs an unsigned bitwise right shift with the specified number - - Number to operate on - Ammount of bits to shift - The resulting number from the shift operation - - - - Performs an unsigned bitwise right shift with the specified number - - Number to operate on - Ammount of bits to shift - The resulting number from the shift operation - - - - Returns the index of the first bit that is set to true that occurs - on or after the specified starting index. If no such bit exists - then -1 is returned. - - The BitArray object. - The index to start checking from (inclusive). - The index of the next set bit. - - - - Converts a System.String number to long. - - - - - - - Mimics Java's Character class. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - This class provides supporting methods of java.util.BitSet - that are not present in System.Collections.BitArray. - - - - - Returns the next set bit at or after index, or -1 if no such bit exists. - - - the index of bit array at which to start checking - the next set bit or -1 - - - - Returns the next un-set bit at or after index, or -1 if no such bit exists. - - - the index of bit array at which to start checking - the next set bit or -1 - - - - Returns the number of bits set to true in this BitSet. - - The BitArray object. - The number of bits set to true in this BitSet. - - - - Summary description for TestSupportClass. - - - - - Compares two Term arrays for equality. - - First Term array to compare - Second Term array to compare - true if the Terms are equal in both arrays, false otherwise - - - - A Hashtable which holds weak references to its keys so they - can be collected during GC. - - - - - Serves as a simple "GC Monitor" that indicates whether cleanup is needed. - If collectableObject.IsAlive is false, GC has occurred and we should perform cleanup - - - - - Customize the hashtable lookup process by overriding KeyEquals. KeyEquals - will compare both WeakKey to WeakKey and WeakKey to real keys - - - - - Perform cleanup if GC occurred - - - - - Iterate over all keys and remove keys that were collected - - - - - Wrap each key with a WeakKey and add it to the hashtable - - - - - Create a temporary copy of the real keys and return that - - - - - A weak referene wrapper for the hashtable keys. Whenever a key\value pair - is added to the hashtable, the key is wrapped using a WeakKey. WeakKey saves the - value of the original object hashcode for fast comparison. - - - - - A Dictionary enumerator which wraps the original hashtable enumerator - and performs 2 tasks: Extract the real key from a WeakKey and skip keys - that were already collected. - - - - - Support class used to handle Hashtable addition, which does a check - first to make sure the added item is unique in the hash. - - - - - Converts the specified collection to its string representation. - - The collection to convert to string. - A string representation of the specified collection. - - - - Compares two string arrays for equality. - - First string array list to compare - Second string array list to compare - true if the strings are equal in both arrays, false otherwise - - - - Sorts an IList collections - - The System.Collections.IList instance that will be sorted - The Comparator criteria, null to use natural comparator. - - - - Fills the array with an specific value from an specific index to an specific index. - - The array to be filled. - The first index to be filled. - The last index to be filled. - The value to fill the array with. - - - - Fills the array with an specific value. - - The array to be filled. - The value to fill the array with. - - - - Compares the entire members of one array whith the other one. - - The array to be compared. - The array to be compared with. - Returns true if the two specified arrays of Objects are equal - to one another. The two arrays are considered equal if both arrays - contain the same number of elements, and all corresponding pairs of - elements in the two arrays are equal. Two objects e1 and e2 are - considered equal if (e1==null ? e2==null : e1.equals(e2)). In other - words, the two arrays are equal if they contain the same elements in - the same order. Also, two array references are considered equal if - both are null. - - - A collection of which can be - looked up by instances of . - The type of the items contains in this - collection. - The type of the keys that can be used to look - up the items. - - - Creates a new instance of the - class. - The which will convert - instances of to - when the override of is called. - - - The which will convert - instances of to - when the override of is called. - - - Converts an item that is added to the collection to - a key. - The instance of - to convert into an instance of . - The instance of which is the - key for this item. - - - Determines if a key for an item exists in this - collection. - The instance of - to see if it exists in this collection. - True if the key exists in the collection, false otherwise. - - - Represents a strongly typed list of objects that can be accessed by index. - Provides methods to search, sort, and manipulate lists. Also provides functionality - to compare lists against each other through an implementations of - . - The type of elements in the list. - - - Initializes a new instance of the - class that is empty and has the - default initial capacity. - - - Initializes a new instance of the - class that contains elements copied from the specified collection and has - sufficient capacity to accommodate the number of elements copied. - The collection whose elements are copied to the new list. - - - Initializes a new instance of the - class that is empty and has the specified initial capacity. - The number of elements that the new list can initially store. - - - Adds a range of objects represented by the - implementation. - The - implementation to add to this list. - - - Compares the counts of two - implementations. - This uses a trick in LINQ, sniffing types for implementations - of interfaces that might supply shortcuts when trying to make comparisons. - In this case, that is the and - interfaces, either of which can provide a count - which can be used in determining the equality of sequences (if they don't have - the same count, then they can't be equal). - The from the left hand side of the - comparison to check the count of. - The from the right hand side of the - comparison to check the count of. - Null if the result is indeterminate. This occurs when either - or doesn't implement or . - Otherwise, it will get the count from each and return true if they are equal, false otherwise. - - - Compares the contents of a - implementation to another one to determine equality. - Thinking of the implementation as - a string with any number of characters, the algorithm checks - each item in each list. If any item of the list is not equal (or - one list contains all the elements of another list), then that list - element is compared to the other list element to see which - list is greater. - The implementation - that is considered the left hand side. - The implementation - that is considered the right hand side. - True if the items are equal, false otherwise. - - - Compares this sequence to another - implementation, returning true if they are equal, false otherwise. - The other implementation - to compare against. - True if the sequence in - is the same as this one. - - - Compares this object for equality against other. - The other object to compare this object against. - True if this object and are equal, false - otherwise. - - - Gets the hash code for the list. - The hash code value. - - - Gets the hash code for the list. - The - implementation which will have all the contents hashed. - The hash code value. - - - Clones the . - This is a shallow clone. - A new shallow clone of this - . - - - - A simple wrapper to allow for the use of the GeneralKeyedCollection. The - wrapper is required as there can be several keys for an object depending - on how many interfaces it implements. - - - - - Provides platform infos. - - - - - Whether we run under a Unix platform. - - - - - Whether we run under a supported Windows platform. - - - - A memory-resident {@link IndexOutput} implementation. - - - $Id: RAMOutputStream.java 691694 2008-09-03 17:34:29Z mikemccand $ - - - - Construct an empty output buffer. - - - Copy the current contents of this buffer to the named output. - - - Resets this to an empty buffer. - - - Returns byte usage of all buffers. - - - A base class for all collectors that return a {@link TopDocs} output. This - collector allows easy extension by providing a single constructor which - accepts a {@link PriorityQueue} as well as protected members for that - priority queue and a counter of the number of total hits.
- Extending classes can override {@link #TopDocs(int, int)} and - {@link #GetTotalHits()} in order to provide their own implementation. -
-
- -

Expert: Collectors are primarily meant to be used to - gather raw results from a search, and implement sorting - or custom result filtering, collation, etc.

- -

As of 2.9, this class replaces the deprecated - HitCollector, and offers an API for efficient collection - of hits across sequential {@link IndexReader}s. {@link - IndexSearcher} advances the collector through each of the - sub readers, in an arbitrary order. This results in a - higher performance means of collection.

- -

Lucene's core collectors are derived from Collector. - Likely your application can use one of these classes, or - subclass {@link TopDocsCollector}, instead of - implementing Collector directly: - -

    - -
  • {@link TopDocsCollector} is an abstract base class - that assumes you will retrieve the top N docs, - according to some criteria, after collection is - done.
  • - -
  • {@link TopScoreDocCollector} is a concrete subclass - {@link TopDocsCollector} and sorts according to score + - docID. This is used internally by the {@link - IndexSearcher} search methods that do not take an - explicit {@link Sort}. It is likely the most frequently - used collector.
  • - -
  • {@link TopFieldCollector} subclasses {@link - TopDocsCollector} and sorts according to a specified - {@link Sort} object (sort by field). This is used - internally by the {@link IndexSearcher} search methods - that take an explicit {@link Sort}.
  • - -
  • {@link TimeLimitingCollector}, which wraps any other - Collector and aborts the search if it's taken too much - time, will subclass Collector in 3.0 (presently it - subclasses the deprecated HitCollector).
  • - -
  • {@link PositiveScoresOnlyCollector} wraps any other - Collector and prevents collection of hits whose score - is <= 0.0
  • - -
- -

Collector decouples the score from the collected doc: - the score computation is skipped entirely if it's not - needed. Collectors that do need the score should - implement the {@link #setScorer} method, to hold onto the - passed {@link Scorer} instance, and call {@link - Scorer#Score()} within the collect method to compute the - current hit's score. If your collector may request the - score for a single hit multiple times, you should use - {@link ScoreCachingWrappingScorer}.

- -

NOTE: The doc that is passed to the collect - method is relative to the current reader. If your - collector needs to resolve this to the docID space of the - Multi*Reader, you must re-base it by recording the - docBase from the most recent setNextReader call. Here's - a simple example showing how to collect docIDs into a - BitSet:

- -

-            Searcher searcher = new IndexSearcher(indexReader);
-            final BitSet bits = new BitSet(indexReader.maxDoc());
-            searcher.search(query, new Collector() {
-            private int docBase;
-            
-            // ignore scorer
-            public void setScorer(Scorer scorer) {
-            }
-            
-            // accept docs out of order (for a BitSet it doesn't matter)
-            public boolean acceptsDocsOutOfOrder() {
-            return true;
-            }
-            
-            public void collect(int doc) {
-            bits.set(doc + docBase);
-            }
-            
-            public void setNextReader(IndexReader reader, int docBase) {
-            this.docBase = docBase;
-            }
-            });
-            
- -

Not all collectors will need to rebase the docID. For - example, a collector that simply counts the total number - of hits would skip it.

- -

NOTE: Prior to 2.9, Lucene silently filtered - out hits with score <= 0. As of 2.9, the core Collectors - no longer do that. It's very unusual to have such hits - (a negative query boost, or function query returning - negative custom scores, could cause it to happen). If - you need that behavior, use {@link - PositiveScoresOnlyCollector}.

- -

NOTE: This API is experimental and might change - in incompatible ways in the next release.

- -

- 2.9 - -
- - Called before successive calls to {@link #Collect(int)}. Implementations - that need the score of the current document (passed-in to - {@link #Collect(int)}), should save the passed-in Scorer and call - scorer.score() when needed. - - - - Called once for every document matching a query, with the unbased document - number. - -

- Note: This is called in an inner search loop. For good search performance, - implementations of this method should not call {@link Searcher#Doc(int)} or - {@link Lucene.Net.Index.IndexReader#Document(int)} on every hit. - Doing so can slow searches by an order of magnitude or more. -

-
- - Called before collecting from each IndexReader. All doc ids in - {@link #Collect(int)} will correspond to reader. - - Add docBase to the current IndexReaders internal document id to re-base ids - in {@link #Collect(int)}. - - - next IndexReader - - - - - - - * Return true if this collector does not - * require the matching docIDs to be delivered in int sort - * order (smallest to largest) to {@link #collect}. - * - *

Most Lucene Query implementations will visit - * matching docIDs in order. However, some queries - * (currently limited to certain cases of {@link - * BooleanQuery}) can achieve faster searching if the - * Collector allows them to deliver the - * docIDs out of order. - * - *

Many collectors don't mind getting docIDs out of - * order, so it's important to return true - * here. - * -

- -
- - The priority queue which holds the top documents. Note that different - implementations of PriorityQueue give different meaning to 'top documents'. - HitQueue for example aggregates the top scoring documents, while other PQ - implementations may hold documents sorted by other criteria. - - - - The total number of documents that the collector encountered. - - - Populates the results array with the ScoreDoc instaces. This can be - overridden in case a different ScoreDoc type should be returned. - - - - Returns a {@link TopDocs} instance containing the given results. If - results is null it means there are no results to return, - either because there were 0 calls to collect() or because the arguments to - topDocs were invalid. - - - - The total number of documents that matched this query. - - - Returns the top docs that were collected by this collector. - - - Returns the documents in the rage [start .. pq.size()) that were collected - by this collector. Note that if start >= pq.size(), an empty TopDocs is - returned.
- This method is convenient to call if the application allways asks for the - last results, starting from the last 'page'.
- NOTE: you cannot call this method more than once for each search - execution. If you need to call it more than once, passing each time a - different start, you should call {@link #TopDocs()} and work - with the returned {@link TopDocs} object, which will contain all the - results this search execution collected. -
-
- - Returns the documents in the rage [start .. start+howMany) that were - collected by this collector. Note that if start >= pq.size(), an empty - TopDocs is returned, and if pq.size() - start < howMany, then only the - available documents in [start .. pq.size()) are returned.
- This method is useful to call in case pagination of search results is - allowed by the search application, as well as it attempts to optimize the - memory used by allocating only as much as requested by howMany.
- NOTE: you cannot call this method more than once for each search - execution. If you need to call it more than once, passing each time a - different range, you should call {@link #TopDocs()} and work with the - returned {@link TopDocs} object, which will contain all the results this - search execution collected. -
-
- - A Query that matches documents containing a term. - This may be combined with other terms with a {@link BooleanQuery}. - - - - Constructs a query for the term t. - - - Returns the term of this query. - - - Prints a user-readable version of this query. - - - Returns true iff o is equal to this. - - - Returns a hash code value for this object. - - - Removes matches which overlap with another SpanQuery. - - - Construct a SpanNotQuery matching spans from include which - have no overlap with spans from exclude. - - - - Return the SpanQuery whose matches are filtered. - - - Return the SpanQuery whose matches must not overlap those returned. - - - Returns a collection of all terms matched by this query. - use extractTerms instead - - - - - - Returns true iff o is equal to this. - - - A Scorer for queries with a required part and an optional part. - Delays skipTo() on the optional part until a score() is needed. -
- This Scorer implements {@link Scorer#SkipTo(int)}. -
-
- - The scorers passed from the constructor. - These are set to null as soon as their next() or skipTo() returns false. - - - - Construct a ReqOptScorer. - The required scorer. This must match. - - The optional scorer. This is used for scoring only. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - use {@link #DocID()} instead. - - - - Returns the score of the current document matching the query. - Initially invalid, until {@link #Next()} is called the first time. - - The score of the required scorer, eventually increased by the score - of the optional scorer when it also matches the current document. - - - - Explain the score of a document. - TODO: Also show the total score. - See BooleanScorer.explain() on how to do this. - - - - The BoostingTermQuery is very similar to the {@link Lucene.Net.Search.Spans.SpanTermQuery} except - that it factors in the value of the payload located at each of the positions where the - {@link Lucene.Net.Index.Term} occurs. -

- In order to take advantage of this, you must override {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} - which returns 1 by default. -

- Payload scores are averaged across term occurrences in the document. - -

- - - - See {@link Lucene.Net.Search.Payloads.PayloadTermQuery} - -
- - This class is very similar to - {@link Lucene.Net.Search.Spans.SpanTermQuery} except that it factors - in the value of the payload located at each of the positions where the - {@link Lucene.Net.Index.Term} occurs. -

- In order to take advantage of this, you must override - {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} - which returns 1 by default. -

- Payload scores are aggregated using a pluggable {@link PayloadFunction}. - -

-
- - Matches spans containing a term. - - - Construct a SpanTermQuery matching the named term's spans. - - - Return the term whose spans are matched. - - - Returns a collection of all terms matched by this query. - use extractTerms instead - - - - - - - {@link #GetSpanScore()} * {@link #GetPayloadScore()} - - IOException - - - Returns the SpanScorer score only. -

- Should not be overriden without good cause! - -

- the score for just the Span part w/o the payload - - IOException - - - - -
- - The score for the payload - - - The score, as calculated by - {@link PayloadFunction#DocScore(int, String, int, float)} - - - - An iterator over {@link Hits} that provides lazy fetching of each document. - {@link Hits#Iterator()} returns an instance of this class. Calls to {@link #next()} - return a {@link Hit} instance. - - - Use {@link TopScoreDocCollector} and {@link TopDocs} instead. Hits will be removed in Lucene 3.0. - - - - Constructed from {@link Hits#Iterator()}. - - - true if current hit is less than the total number of {@link Hits}. - - - - Unsupported operation. - - - UnsupportedOperationException - - - Returns the total number of hits. - - - Returns a {@link Hit} instance representing the next hit in {@link Hits}. - - - Next {@link Hit}. - - - - - - Creates a new instance of the provider class for the given IndexReader. - - - - - * Compute a custom score by the subQuery score and a number of - ValueSourceQuery scores. -

- Subclasses can override this method to modify the custom score. -

- If your custom scoring is different than the default herein you - should override at least one of the two customScore() methods. - If the number of ValueSourceQueries is always < 2 it is - sufficient to override the other - {@link #customScore(int, float, float) customScore()} - method, which is simpler. -

- The default computation herein is a multiplication of given scores: -

-                ModifiedScore = valSrcScore * valSrcScores[0] * valSrcScores[1] * ...
-            
-
- id of scored doc - score of that doc by the subQuery - scores of that doc by the ValueSourceQuery - custom score -
- - - - Explain the custom score. - Whenever overriding {@link #customScore(int, float, float[])}, - this method should also be overridden to provide the correct explanation - for the part of the custom scoring. - - doc being explained - explanation for the sub-query part - explanation for the value source part - an explanation for the custom score - - - - Explain the custom score. - Whenever overriding {@link #customScore(int, float, float)}, - this method should also be overridden to provide the correct explanation - for the part of the custom scoring. - - - doc being explained - explanation for the sub-query part - explanation for the value source part - an explanation for the custom score - - - Token Manager Error. - - - Lexical error occurred. - - - An attempt was made to create a second instance of a static token manager. - - - Tried to change to an invalid lexical state. - - - Detected (and bailed out of) an infinite loop in the token manager. - - - Indicates the reason why the exception is thrown. It will have - one of the above 4 values. - - - - Replaces unprintable characters by their escaped (or unicode escaped) - equivalents in the given string - - - - Returns a detailed message for the Error when it is thrown by the - token manager to indicate a lexical error. - Parameters : - EOFSeen : indicates if EOF caused the lexical error - curLexState : lexical state in which this error occurred - errorLine : line number when the error occurred - errorColumn : column number when the error occurred - errorAfter : prefix that was seen before this error occurred - curchar : the offending character - Note: You can customize the lexical error message by modifying this method. - - - - No arg constructor. - - - Constructor with message and reason. - - - Full Constructor. - - - You can also modify the body of this method to customize your error messages. - For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not - of end-users concern, so you can return something like : - - "Internal Error : Please file a bug report .... " - - from this method for such cases in the release version of your parser. - - - - The SegmentMerger class combines two or more Segments, represented by an IndexReader ({@link #add}, - into a single Segment. After adding the appropriate readers, call the merge method to combine the - segments. -

- If the compoundFile flag is set, then the segments will be merged into a compound file. - - -

- - - - -
- - Maximum number of contiguous documents to bulk-copy - when merging stored fields - - - - norms header placeholder - - - This ctor used only by test code. - - - The Directory to merge the other segments into - - The name of the new segment - - - - Add an IndexReader to the collection of readers that are to be merged - - - - - - The index of the reader to return - - The ith reader to be merged - - - - Merges the readers specified by the {@link #add} method into the directory passed to the constructor - The number of documents that were merged - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Merges the readers specified by the {@link #add} method - into the directory passed to the constructor. - - if false, we will not merge the - stored fields nor vectors files - - The number of documents that were merged - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - close all IndexReaders that have been added. - Should not be called before merge(). - - IOException - - - - The number of documents in all of the readers - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Merge the TermVectors from each of the segments into the new one. - IOException - - - Process postings from multiple segments all positioned on the - same term. Writes out merged entries into freqOutput and - the proxOutput streams. - - - array of segments - - number of cells in the array actually occupied - - number of documents across all segments where this term was found - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Records the fact that roughly units amount of work - have been done since this method was last called. - When adding time-consuming code into SegmentMerger, - you should test different values for units to ensure - that the time in between calls to merge.checkAborted - is up to ~ 1 second. - - - - Similar to a {@link java.io.FileFilter}, the FieldSelector allows one to make decisions about - what Fields get loaded on a {@link Document} by {@link Lucene.Net.Index.IndexReader#Document(int,Lucene.Net.Documents.FieldSelector)} - - - - - - - the field to accept or reject - - an instance of {@link FieldSelectorResult} - if the {@link Field} named fieldName should be loaded. - - - - This is a DocFieldConsumer that inverts each field, - separately, from a Document, and accepts a - InvertedTermsConsumer to process those terms. - - - - Processes all occurrences of a single field - - - This class keeps track of closing the underlying directory. It is used to wrap - DirectoryReaders, that are created using a String/File parameter - in IndexReader.open() with FSDirectory.getDirectory(). - - This helper class is removed with all String/File - IndexReader.open() methods in Lucene 3.0 - - - - This member contains the ref counter, that is passed to each instance after cloning/reopening, - and is global to all DirectoryOwningReader derived from the original one. - This reuses the class {@link SegmentReader.Ref} - - - -

This class provides a {@link Field} that enables indexing - of numeric values for efficient range filtering and - sorting. Here's an example usage, adding an int value: -

-            document.add(new NumericField(name).setIntValue(value));
-            
- - For optimal performance, re-use the - NumericField and {@link Document} instance for more than - one document: - -
-            NumericField field = new NumericField(name);
-            Document document = new Document();
-            document.add(field);
-            
-            for(all documents) {
-            ...
-            field.setIntValue(value)
-            writer.addDocument(document);
-            ...
-            }
-            
- -

The java native types int, long, - float and double are - directly supported. However, any value that can be - converted into these native types can also be indexed. - For example, date/time values represented by a - {@link java.util.Date} can be translated into a long - value using the {@link java.util.Date#getTime} method. If you - don't need millisecond precision, you can quantize the - value, either by dividing the result of - {@link java.util.Date#getTime} or using the separate getters - (for year, month, etc.) to construct an int or - long value.

- -

To perform range querying or filtering against a - NumericField, use {@link NumericRangeQuery} or {@link - NumericRangeFilter}. To sort according to a - NumericField, use the normal numeric sort types, eg - {@link SortField#INT} (note that {@link SortField#AUTO} - will not work with these fields). NumericField values - can also be loaded directly from {@link FieldCache}.

- -

By default, a NumericField's value is not stored but - is indexed for range filtering and sorting. You can use - the {@link #NumericField(String,Field.Store,boolean)} - constructor if you need to change these defaults.

- -

You may add the same field name as a NumericField to - the same document more than once. Range querying and - filtering will be the logical OR of all values; so a range query - will hit all documents that have at least one value in - the range. However sort behavior is not defined. If you need to sort, - you should separately index a single-valued NumericField.

- -

A NumericField will consume somewhat more disk space - in the index than an ordinary single-valued field. - However, for a typical index that includes substantial - textual content per document, this increase will likely - be in the noise.

- -

Within Lucene, each numeric value is indexed as a - trie structure, where each term is logically - assigned to larger and larger pre-defined brackets (which - are simply lower-precision representations of the value). - The step size between each successive bracket is called the - precisionStep, measured in bits. Smaller - precisionStep values result in larger number - of brackets, which consumes more disk space in the index - but may result in faster range search performance. The - default value, 4, was selected for a reasonable tradeoff - of disk space consumption versus performance. You can - use the expert constructor {@link - #NumericField(String,int,Field.Store,boolean)} if you'd - like to change the value. Note that you must also - specify a congruent value when creating {@link - NumericRangeQuery} or {@link NumericRangeFilter}. - For low cardinality fields larger precision steps are good. - If the cardinality is < 100, it is fair - to use {@link Integer#MAX_VALUE}, which produces one - term per value. - -

For more information on the internals of numeric trie - indexing, including the precisionStep - configuration, see {@link NumericRangeQuery}. The format of - indexed values is described in {@link NumericUtils}. - -

If you only need to sort by numeric value, and never - run range querying/filtering, you can index using a - precisionStep of {@link Integer#MAX_VALUE}. - This will minimize disk space consumed.

- -

More advanced users can instead use {@link - NumericTokenStream} directly, when indexing numbers. This - class is a wrapper around this token stream type for - easier, more intuitive usage.

- -

NOTE: This class is only used during - indexing. When retrieving the stored field value from a - {@link Document} instance after search, you will get a - conventional {@link Fieldable} instance where the numeric - values are returned as {@link String}s (according to - toString(value) of the used data type). - -

NOTE: This API is - experimental and might change in incompatible ways in the - next release. - -

- 2.9 - -
- - Creates a field for numeric values using the default precisionStep - {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with - a numeric value, before indexing a document containing this field, - set a value using the various set???Value() methods. - This constructor creates an indexed, but not stored field. - - the field name - - - - Creates a field for numeric values using the default precisionStep - {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with - a numeric value, before indexing a document containing this field, - set a value using the various set???Value() methods. - - the field name - - if the field should be stored in plain text form - (according to toString(value) of the used data type) - - if the field should be indexed using {@link NumericTokenStream} - - - - Creates a field for numeric values with the specified - precisionStep. The instance is not yet initialized with - a numeric value, before indexing a document containing this field, - set a value using the various set???Value() methods. - This constructor creates an indexed, but not stored field. - - the field name - - the used precision step - - - - Creates a field for numeric values with the specified - precisionStep. The instance is not yet initialized with - a numeric value, before indexing a document containing this field, - set a value using the various set???Value() methods. - - the field name - - the used precision step - - if the field should be stored in plain text form - (according to toString(value) of the used data type) - - if the field should be indexed using {@link NumericTokenStream} - - - - Returns a {@link NumericTokenStream} for indexing the numeric value. - - - Returns always null for numeric fields - - - Returns always null for numeric fields - - - Returns always null for numeric fields - - - Returns the numeric value as a string (how it is stored, when {@link Field.Store#YES} is chosen). - - - Returns the current numeric value as a subclass of {@link Number}, null if not yet initialized. - - - Initializes the field with the supplied long value. - the numeric value - - this instance, because of this you can use it the following way: - document.add(new NumericField(name, precisionStep).SetLongValue(value)) - - - - Initializes the field with the supplied int value. - the numeric value - - this instance, because of this you can use it the following way: - document.add(new NumericField(name, precisionStep).setIntValue(value)) - - - - Initializes the field with the supplied double value. - the numeric value - - this instance, because of this you can use it the following way: - document.add(new NumericField(name, precisionStep).setDoubleValue(value)) - - - - Initializes the field with the supplied float value. - the numeric value - - this instance, because of this you can use it the following way: - document.add(new NumericField(name, precisionStep).setFloatValue(value)) - - - - Provides support for converting longs to Strings, and back again. The strings - are structured so that lexicographic sorting order is preserved. - -

- That is, if l1 is less than l2 for any two longs l1 and l2, then - NumberTools.longToString(l1) is lexicographically less than - NumberTools.longToString(l2). (Similarly for "greater than" and "equals".) - -

- This class handles all long values (unlike - {@link Lucene.Net.Documents.DateField}). - -

- For new indexes use {@link NumericUtils} instead, which - provides a sortable binary representation (prefix encoded) of numeric - values. - To index and efficiently query numeric values use {@link NumericField} - and {@link NumericRangeQuery}. - This class is included for use with existing - indices and will be removed in a future release. - -
- - Equivalent to longToString(Long.MIN_VALUE) - - - Equivalent to longToString(Long.MAX_VALUE) - - - The length of (all) strings returned by {@link #longToString} - - - Converts a long to a String suitable for indexing. - - - Converts a String that was returned by {@link #longToString} back to a - long. - - - IllegalArgumentException - if the input is null - - NumberFormatException - if the input does not parse (it was not a String returned by - longToString()). - - - - Loader for text files that represent a list of stopwords. - - - - $Id: WordlistLoader.java 706342 2008-10-20 17:19:29Z gsingers $ - - - - Loads a text file and adds every line as an entry to a HashSet (omitting - leading and trailing whitespace). Every line of the file should contain only - one word. The words need to be in lowercase if you make use of an - Analyzer which uses LowerCaseFilter (like StandardAnalyzer). - - - File containing the wordlist - - A HashSet with the file's words - - - - Loads a text file and adds every non-comment line as an entry to a HashSet (omitting - leading and trailing whitespace). Every line of the file should contain only - one word. The words need to be in lowercase if you make use of an - Analyzer which uses LowerCaseFilter (like StandardAnalyzer). - - - File containing the wordlist - - The comment string to ignore - - A HashSet with the file's words - - - - Reads lines from a Reader and adds every line as an entry to a HashSet (omitting - leading and trailing whitespace). Every line of the Reader should contain only - one word. The words need to be in lowercase if you make use of an - Analyzer which uses LowerCaseFilter (like StandardAnalyzer). - - - Reader containing the wordlist - - A HashSet with the reader's words - - - - Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting - leading and trailing whitespace). Every line of the Reader should contain only - one word. The words need to be in lowercase if you make use of an - Analyzer which uses LowerCaseFilter (like StandardAnalyzer). - - - Reader containing the wordlist - - The string representing a comment. - - A HashSet with the reader's words - - - - Reads a stem dictionary. Each line contains: -
word\tstem
- (i.e. two tab seperated words) - -
- stem dictionary that overrules the stemming algorithm - - IOException -
- - The start and end character offset of a Token. - - - Returns this Token's starting offset, the position of the first character - corresponding to this token in the source text. - Note that the difference between endOffset() and startOffset() may not be - equal to termText.length(), as the term text may have been altered by a - stemmer or some other filter. - - - - Set the starting and ending offset. - See StartOffset() and EndOffset() - - - - Returns this Token's ending offset, one greater than the position of the - last character corresponding to this token in the source text. The length - of the token in the source text is (endOffset - startOffset). - - - - An abstract base class for simple, character-oriented tokenizers. - - - Returns true iff a character should be included in a token. This - tokenizer generates as tokens adjacent sequences of characters which - satisfy this predicate. Characters for which this is false are used to - define token boundaries and are not included in tokens. - - - - Called on each token character to normalize it before it is added to the - token. The default implementation does nothing. Subclasses may use this - to, e.g., lowercase tokens. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - A ScorerDocQueue maintains a partial ordering of its Scorers such that the - least Scorer can always be found in constant time. Put()'s and pop()'s - require log(size) time. The ordering is by Scorer.doc(). - - - - Create a ScorerDocQueue with a maximum size. - - - Adds a Scorer to a ScorerDocQueue in log(size) time. - If one tries to add more Scorers than maxSize - a RuntimeException (ArrayIndexOutOfBound) is thrown. - - - - Adds a Scorer to the ScorerDocQueue in log(size) time if either - the ScorerDocQueue is not full, or not lessThan(scorer, top()). - - - - true if scorer is added, false otherwise. - - - - Returns the least Scorer of the ScorerDocQueue in constant time. - Should not be used when the queue is empty. - - - - Returns document number of the least Scorer of the ScorerDocQueue - in constant time. - Should not be used when the queue is empty. - - - - Removes and returns the least scorer of the ScorerDocQueue in log(size) - time. - Should not be used when the queue is empty. - - - - Removes the least scorer of the ScorerDocQueue in log(size) time. - Should not be used when the queue is empty. - - - - Should be called when the scorer at top changes doc() value. - Still log(n) worst case, but it's at least twice as fast to
-            { pq.top().change(); pq.adjustTop(); }
-            
instead of
-            { o = pq.pop(); o.change(); pq.push(o); }
-            
-
-
- - Returns the number of scorers currently stored in the ScorerDocQueue. - - - Removes all entries from the ScorerDocQueue. - - - Simple cache implementation that uses a HashMap to store (key, value) pairs. - This cache is not synchronized, use {@link Cache#SynchronizedCache(Cache)} - if needed. - - - - Base class for cache implementations. - - - Returns a thread-safe cache backed by the specified cache. - In order to guarantee thread-safety, all access to the backed cache must - be accomplished through the returned cache. - - - - Called by {@link #SynchronizedCache(Cache)}. This method - returns a {@link SynchronizedCache} instance that wraps - this instance by default and can be overridden to return - e. g. subclasses of {@link SynchronizedCache} or this - in case this cache is already synchronized. - - - - Puts a (key, value)-pair into the cache. - - - Returns the value for the given key. - - - Returns whether the given key is in this cache. - - - Closes the cache. - - - Simple Cache wrapper that synchronizes all - calls that access the cache. - - - - Returns a Set containing all keys in this cache. - - - - The maximum number of items to cache. - - - - - The list to efficiently maintain the LRU state. - - - - - The dictionary to hash into any location in the list. - - - - - The node instance to use/re-use when adding an item to the cache. - - - - - Container to hold the key and value to aid in removal from - the dictionary when an item is removed from cache. - - - - Helper class for keeping Listss of Objects associated with keys. WARNING: THIS CLASS IS NOT THREAD SAFE - - - the backing store for this object - - - - direct access to the map backing this object. - - - - Adds val to the Set associated with key in the Map. If key is not - already in the map, a new Set will first be created. - - the size of the Set associated with key once val is added to it. - - - - Adds multiple vals to the Set associated with key in the Map. - If key is not - already in the map, a new Set will first be created. - - the size of the Set associated with key once val is added to it. - - - - Some useful constants. - - - - $Id: Constants.java 828327 2009-10-22 06:47:40Z uschindler $ - - - - - The value of System.getProperty("java.version"). * - - - True iff this is Java version 1.1. - - - True iff this is Java version 1.2. - - - True iff this is Java version 1.3. - - - The value of System.getProperty("os.name"). * - - - True iff running on Linux. - - - True iff running on Windows. - - - True iff running on SunOS. - - - A memory-resident {@link Directory} implementation. Locking - implementation is by default the {@link SingleInstanceLockFactory} - but can be changed with {@link #setLockFactory}. - - - $Id: RAMDirectory.java 781333 2009-06-03 10:38:57Z mikemccand $ - - - - Constructs an empty {@link Directory}. - - - Creates a new RAMDirectory instance from a different - Directory implementation. This can be used to load - a disk-based index into memory. -

- This should be used only with indices that can fit into memory. -

- Note that the resulting RAMDirectory instance is fully - independent from the original Directory (it is a - complete copy). Any subsequent changes to the - original Directory will not be visible in the - RAMDirectory instance. - -

- a Directory value - - if an error occurs - -
- - Creates a new RAMDirectory instance from the {@link FSDirectory}. - - - a File specifying the index directory - - - - - Use {@link #RAMDirectory(Directory)} instead - - - - Creates a new RAMDirectory instance from the {@link FSDirectory}. - - - a String specifying the full index directory path - - - - - Use {@link #RAMDirectory(Directory)} instead - - - - Returns true iff the named file exists in this directory. - - - Returns the time the named file was last modified. - IOException if the file does not exist - - - Set the modified time of an existing file to now. - IOException if the file does not exist - - - Returns the length in bytes of a file in the directory. - IOException if the file does not exist - - - Return total size in bytes of all files in this - directory. This is currently quantized to - RAMOutputStream.BUFFER_SIZE. - - - - Removes an existing file in the directory. - IOException if the file does not exist - - - Renames an existing file in the directory. - FileNotFoundException if from does not exist - - - - - Creates a new, empty file in the directory with the given name. Returns a stream writing this file. - - - Returns a stream reading an existing file. - - - Closes the store to future operations, releasing associated memory. - - - A {@link Collector} implementation that collects the top-scoring hits, - returning them as a {@link TopDocs}. This is used by {@link IndexSearcher} to - implement {@link TopDocs}-based search. Hits are sorted by score descending - and then (when the scores are tied) docID ascending. When you create an - instance of this collector you should know in advance whether documents are - going to be collected in doc Id order or not. - -

NOTE: The values {@link Float#NaN} and - {Float#NEGATIVE_INFINITY} are not valid scores. This - collector will not properly collect hits with such - scores. -

-
- - Creates a new {@link TopScoreDocCollector} given the number of hits to - collect and whether documents are scored in order by the input - {@link Scorer} to {@link #SetScorer(Scorer)}. - -

NOTE: The instances returned by this method - pre-allocate a full array of length - numHits, and fill the array with sentinel - objects. -

-
- - A Query that matches documents within an exclusive range of terms. - -

This query matches the documents looking for terms that fall into the - supplied range according to {@link String#compareTo(String)}. It is not intended - for numerical ranges, use {@link NumericRangeQuery} instead. - -

This query uses the {@link - MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} - rewrite method. -

- 2.9 - -
- - An abstract {@link Query} that matches documents - containing a subset of terms provided by a {@link - FilteredTermEnum} enumeration. - -

This query cannot be used directly; you must subclass - it and define {@link #getEnum} to provide a {@link - FilteredTermEnum} that iterates through the terms to be - matched. - -

NOTE: if {@link #setRewriteMethod} is either - {@link #CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE} or {@link - #SCORING_BOOLEAN_QUERY_REWRITE}, you may encounter a - {@link BooleanQuery.TooManyClauses} exception during - searching, which happens when the number of terms to be - searched exceeds {@link - BooleanQuery#GetMaxClauseCount()}. Setting {@link - #setRewriteMethod} to {@link #CONSTANT_SCORE_FILTER_REWRITE} - prevents this. - -

The recommended rewrite method is {@link - #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}: it doesn't spend CPU - computing unhelpful scores, and it tries to pick the most - performant rewrite method given the query. - - Note that {@link QueryParser} produces - MultiTermQueries using {@link - #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} by default. -

-
- - A rewrite method that first creates a private Filter, - by visiting each term in sequence and marking all docs - for that term. Matching documents are assigned a - constant score equal to the query's boost. - -

This method is faster than the BooleanQuery - rewrite methods when the number of matched terms or - matched documents is non-trivial. Also, it will never - hit an errant {@link BooleanQuery.TooManyClauses} - exception. - -

- - -
- - A rewrite method that first translates each term into - {@link BooleanClause.Occur#SHOULD} clause in a - BooleanQuery, and keeps the scores as computed by the - query. Note that typically such scores are - meaningless to the user, and require non-trivial CPU - to compute, so it's almost always better to use {@link - #CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} instead. - -

NOTE: This rewrite method will hit {@link - BooleanQuery.TooManyClauses} if the number of terms - exceeds {@link BooleanQuery#getMaxClauseCount}. - -

- - -
- - Like {@link #SCORING_BOOLEAN_QUERY_REWRITE} except - scores are not computed. Instead, each matching - document receives a constant score equal to the - query's boost. - -

NOTE: This rewrite method will hit {@link - BooleanQuery.TooManyClauses} if the number of terms - exceeds {@link BooleanQuery#getMaxClauseCount}. - -

- - -
- - Read-only default instance of {@link - ConstantScoreAutoRewrite}, with {@link - ConstantScoreAutoRewrite#setTermCountCutoff} set to - {@link - ConstantScoreAutoRewrite#DEFAULT_TERM_COUNT_CUTOFF} - and {@link - ConstantScoreAutoRewrite#setDocCountPercent} set to - {@link - ConstantScoreAutoRewrite#DEFAULT_DOC_COUNT_PERCENT}. - Note that you cannot alter the configuration of this - instance; you'll need to create a private instance - instead. - - - - Constructs a query for terms matching term. - check sub class for possible term access - the Term does not - make sense for all MultiTermQuerys and will be removed. - - - - Constructs a query matching terms that cannot be represented with a single - Term. - - - - Returns the pattern term. - check sub class for possible term access - getTerm does not - make sense for all MultiTermQuerys and will be removed. - - - - Construct the enumeration to be used, expanding the pattern term. - - - Expert: Return the number of unique terms visited during execution of the query. - If there are many of them, you may consider using another query type - or optimize your total term count in index. -

This method is not thread safe, be sure to only call it when no query is running! - If you re-use the same query instance for another - search, be sure to first reset the term counter - with {@link #clearTotalNumberOfTerms}. -

On optimized indexes / no MultiReaders, you get the correct number of - unique terms for the whole index. Use this number to compare different queries. - For non-optimized indexes this number can also be achived in - non-constant-score mode. In constant-score mode you get the total number of - terms seeked for all segments / sub-readers. -

- - -
- - Expert: Resets the counting of unique terms. - Do this before executing the query/filter. - - - - - - - - - - Sets the rewrite method to be used when executing the - query. You can use one of the four core methods, or - implement your own subclass of {@link RewriteMethod}. - - - - A rewrite method that tries to pick the best - constant-score rewrite method based on term and - document counts from the query. If both the number of - terms and documents is small enough, then {@link - #CONSTANT_SCORE_BOOLEAN_QUERY_REWRITE} is used. - Otherwise, {@link #CONSTANT_SCORE_FILTER_REWRITE} is - used. - - - - Abstract class that defines how the query is rewritten. - - - If the number of terms in this query is equal to or - larger than this setting then {@link - #CONSTANT_SCORE_FILTER_REWRITE} is used. - - - - - - - - If the number of documents to be visited in the - postings exceeds this specified percentage of the - maxDoc() for the index, then {@link - #CONSTANT_SCORE_FILTER_REWRITE} is used. - - 0.0 to 100.0 - - - - - - - - Constructs a query selecting all terms greater/equal than lowerTerm - but less/equal than upperTerm. - -

- If an endpoint is null, it is said - to be "open". Either or both endpoints may be open. Open endpoints may not - be exclusive (you can't select all but the first or last term without - explicitly specifying the term to exclude.) - -

- The field that holds both lower and upper terms. - - The term text at the lower end of the range - - The term text at the upper end of the range - - If true, the lowerTerm is - included in the range. - - If true, the upperTerm is - included in the range. - -
- - Constructs a query selecting all terms greater/equal than - lowerTerm but less/equal than upperTerm. -

- If an endpoint is null, it is said - to be "open". Either or both endpoints may be open. Open endpoints may not - be exclusive (you can't select all but the first or last term without - explicitly specifying the term to exclude.) -

- If collator is not null, it will be used to decide whether - index terms are within the given range, rather than using the Unicode code - point order in which index terms are stored. -

- WARNING: Using this constructor and supplying a non-null - value in the collator parameter will cause every single - index Term in the Field referenced by lowerTerm and/or upperTerm to be - examined. Depending on the number of index Terms in this Field, the - operation could be very slow. - -

- The Term text at the lower end of the range - - The Term text at the upper end of the range - - If true, the lowerTerm is - included in the range. - - If true, the upperTerm is - included in the range. - - The collator to use to collate index Terms, to determine - their membership in the range bounded by lowerTerm and - upperTerm. - -
- - Returns the field name for this query - - - Returns the lower value of this range query - - - Returns the upper value of this range query - - - Returns true if the lower endpoint is inclusive - - - Returns true if the upper endpoint is inclusive - - - Returns the collator used to determine range inclusion, if any. - - - Prints a user-readable version of this query. - - - A Filter that restricts search results to values that have a matching prefix in a given - field. - - - - A wrapper for {@link MultiTermQuery}, that exposes its - functionality as a {@link Filter}. -

- MultiTermQueryWrapperFilter is not designed to - be used by itself. Normally you subclass it to provide a Filter - counterpart for a {@link MultiTermQuery} subclass. -

- For example, {@link TermRangeFilter} and {@link PrefixFilter} extend - MultiTermQueryWrapperFilter. - This class also provides the functionality behind - {@link MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE}; - this is why it is not abstract. -

-
- - Wrap a {@link MultiTermQuery} as a Filter. - - - Expert: Return the number of unique terms visited during execution of the filter. - If there are many of them, you may consider using another filter type - or optimize your total term count in index. -

This method is not thread safe, be sure to only call it when no filter is running! - If you re-use the same filter instance for another - search, be sure to first reset the term counter - with {@link #clearTotalNumberOfTerms}. -

- - -
- - Expert: Resets the counting of unique terms. - Do this before executing the filter. - - - - - - Returns a BitSet with true for documents which should be - permitted in search results, and false for those that should - not. - - Use {@link #GetDocIdSet(IndexReader)} instead. - - - - Returns a DocIdSet with documents that should be - permitted in search results. - - - - Prints a user-readable version of this query. - - - A {@link Collector} implementation which wraps another - {@link Collector} and makes sure only documents with - scores > 0 are collected. - - - - A ranked list of documents, used to hold search results. -

- Caution: Iterate only over the hits needed. Iterating over all hits is - generally not desirable and may be the source of performance issues. If you - need to iterate over many or all hits, consider using the search method that - takes a {@link HitCollector}. -

-

- Note: Deleting matching documents concurrently with traversing the - hits, might, when deleting hits that were not yet retrieved, decrease - {@link #Length()}. In such case, - {@link java.util.ConcurrentModificationException - ConcurrentModificationException} is thrown when accessing hit n - > current_{@link #Length()} (but n < {@link #Length()} - _at_start). - -

- see {@link Searcher#Search(Query, int)}, - {@link Searcher#Search(Query, Filter, int)} and - {@link Searcher#Search(Query, Filter, int, Sort)}:
- -
-            TopDocs topDocs = searcher.Search(query, numHits);
-            ScoreDoc[] hits = topDocs.scoreDocs;
-            for (int i = 0; i < hits.Length; i++) {
-            int docId = hits[i].doc;
-            Document d = searcher.Doc(docId);
-            // do something with current hit
-            ...
-            
-
-
- - Tries to add new documents to hitDocs. - Ensures that the hit numbered min has been retrieved. - - - - Returns the total number of hits available in this set. - - - Returns the stored fields of the nth document in this set. -

Documents are cached, so that repeated requests for the same element may - return the same Document object. -

- CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
- - Returns the score for the nth document in this set. - - - Returns the id for the nth document in this set. - Note that ids may change when the index changes, so you cannot - rely on the id to be stable. - - - - Returns a {@link HitIterator} to navigate the Hits. Each item returned - from {@link Iterator#next()} is a {@link Hit}. -

- Caution: Iterate only over the hits needed. Iterating over all - hits is generally not desirable and may be the source of - performance issues. If you need to iterate over many or all hits, consider - using a search method that takes a {@link HitCollector}. -

-

-
- - Expert: Maintains caches of term values. - -

Created: May 19, 2004 11:13:14 AM - -

- lucene 1.4 - - $Id: FieldCache.java 807841 2009-08-25 22:27:31Z markrmiller $ - - - -
- - Expert: Stores term text values and document ordering data. - - - All the term values, in natural order. - - - For each document, an index into the lookup array. - - - Creates one of these objects - - - Indicator for StringIndex values in the cache. - - - Expert: The cache used internally by sorting and range query classes. - - - The default parser for byte values, which are encoded by {@link Byte#toString(byte)} - - - The default parser for short values, which are encoded by {@link Short#toString(short)} - - - The default parser for int values, which are encoded by {@link Integer#toString(int)} - - - The default parser for float values, which are encoded by {@link Float#toString(float)} - - - The default parser for long values, which are encoded by {@link Long#toString(long)} - - - The default parser for double values, which are encoded by {@link Double#toString(double)} - - - A parser instance for int values encoded by {@link NumericUtils#IntToPrefixCoded(int)}, e.g. when indexed - via {@link NumericField}/{@link NumericTokenStream}. - - - - A parser instance for float values encoded with {@link NumericUtils}, e.g. when indexed - via {@link NumericField}/{@link NumericTokenStream}. - - - - A parser instance for long values encoded by {@link NumericUtils#LongToPrefixCoded(long)}, e.g. when indexed - via {@link NumericField}/{@link NumericTokenStream}. - - - - A parser instance for double values encoded with {@link NumericUtils}, e.g. when indexed - via {@link NumericField}/{@link NumericTokenStream}. - - - - Interface to parse bytes from document fields. - - - - - Return a single Byte representation of this field's value. - - - Interface to parse shorts from document fields. - - - - - Return a short representation of this field's value. - - - Interface to parse ints from document fields. - - - - - Return an integer representation of this field's value. - - - Interface to parse floats from document fields. - - - - - Return an float representation of this field's value. - - - This class is generated by JavaCC. The most important method is - {@link #Parse(String)}. - - The syntax for query strings is as follows: - A Query is a series of clauses. - A clause may be prefixed by: -
    -
  • a plus (+) or a minus (-) sign, indicating - that the clause is required or prohibited respectively; or
  • -
  • a term followed by a colon, indicating the field to be searched. - This enables one to construct queries which search multiple fields.
  • -
- - A clause may be either: -
    -
  • a term, indicating all the documents that contain this term; or
  • -
  • a nested query, enclosed in parentheses. Note that this may be used - with a +/- prefix to require any of a set of - terms.
  • -
- - Thus, in BNF, the query grammar is: -
-            Query  ::= ( Clause )*
-            Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
-            
- -

- Examples of appropriately formatted queries can be found in the query syntax - documentation. -

- -

- In {@link TermRangeQuery}s, QueryParser tries to detect date values, e.g. - date:[6/1/2005 TO 6/4/2005] produces a range query that searches - for "date" fields between 2005-06-01 and 2005-06-04. Note that the format - of the accepted input depends on {@link #SetLocale(Locale) the locale}. - By default a date is converted into a search term using the deprecated - {@link DateField} for compatibility reasons. - To use the new {@link DateTools} to convert dates, a - {@link Lucene.Net.Documents.DateTools.Resolution} has to be set. -

-

- The date resolution that shall be used for RangeQueries can be set - using {@link #SetDateResolution(DateTools.Resolution)} - or {@link #SetDateResolution(String, DateTools.Resolution)}. The former - sets the default date resolution for all fields, whereas the latter can - be used to set field specific date resolutions. Field specific date - resolutions take, if set, precedence over the default date resolution. -

-

- If you use neither {@link DateField} nor {@link DateTools} in your - index, you can create your own - query parser that inherits QueryParser and overwrites - {@link #GetRangeQuery(String, String, String, boolean)} to - use a different method for date conversion. -

- -

Note that QueryParser is not thread-safe.

- -

NOTE: there is a new QueryParser in contrib, which matches - the same syntax as this class, but is more modular, - enabling substantial customization to how a query is created. - -

NOTE: there is a new QueryParser in contrib, which matches - the same syntax as this class, but is more modular, - enabling substantial customization to how a query is created. - NOTE: You must specify the required {@link Version} compatibility when - creating QueryParser: -

    -
  • As of 2.9, {@link #SetEnablePositionIncrements} is true by default.
  • -
-
-
- - Token literal values and constants. - Generated by org.javacc.parser.OtherFilesGen#start() - - - - End of File. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - RegularExpression Id. - - - Lexical state. - - - Lexical state. - - - Lexical state. - - - Lexical state. - - - Literal token values. - - - Alternative form of QueryParser.Operator.AND - - - Alternative form of QueryParser.Operator.OR - - - The actual operator that parser uses to combine query terms - - - Constructs a query parser. - the default field for query terms. - - used to find terms in the query text. - - Use {@link #QueryParser(Version, String, Analyzer)} instead - - - - Constructs a query parser. - - - Lucene version to match. See above) - - the default field for query terms. - - used to find terms in the query text. - - - - Parses a query string, returning a {@link Lucene.Net.Search.Query}. - the query string to be parsed. - - ParseException if the parsing fails - - - Returns the analyzer. - - - - Returns the field. - - - - Get the minimal similarity for fuzzy queries. - - - Set the minimum similarity for fuzzy queries. - Default is 0.5f. - - - - Get the prefix length for fuzzy queries. - Returns the fuzzyPrefixLength. - - - - Set the prefix length for fuzzy queries. Default is 0. - The fuzzyPrefixLength to set. - - - - Sets the default slop for phrases. If zero, then exact phrase matches - are required. Default value is zero. - - - - Gets the default slop for phrases. - - - Set to true to allow leading wildcard characters. -

- When set, * or ? are allowed as - the first character of a PrefixQuery and WildcardQuery. - Note that this can produce very slow - queries on big indexes. -

- Default: false. -

-
- - - - - - Set to true to enable position increments in result query. -

- When set, result phrase and multi-phrase queries will - be aware of position increments. - Useful when e.g. a StopFilter increases the position increment of - the token that follows an omitted token. -

- Default: false. -

-
- - - - - - Sets the boolean operator of the QueryParser. - In default mode (OR_OPERATOR) terms without any modifiers - are considered optional: for example capital of Hungary is equal to - capital OR of OR Hungary.
- In AND_OPERATOR mode terms are considered to be in conjunction: the - above mentioned query is parsed as capital AND of AND Hungary -
-
- - Gets implicit operator setting, which will be either AND_OPERATOR - or OR_OPERATOR. - - - - Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically - lower-cased or not. Default is true. - - - - - - - - Please use {@link #setMultiTermRewriteMethod} instead. - - - - Please use {@link #getMultiTermRewriteMethod} instead. - - - - By default QueryParser uses {@link MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} - when creating a PrefixQuery, WildcardQuery or RangeQuery. This implementation is generally preferable because it - a) Runs faster b) Does not have the scarcity of terms unduly influence score - c) avoids any "TooManyBooleanClauses" exception. - However, if your application really needs to use the - old-fashioned BooleanQuery expansion rewriting and the above - points are not relevant then use this to change - the rewrite method. - - - - - - - - Set locale used by date range parsing. - - - Returns current locale, allowing access by subclasses. - - - Sets the default date resolution used by RangeQueries for fields for which no - specific date resolutions has been set. Field specific resolutions can be set - with {@link #SetDateResolution(String, DateTools.Resolution)}. - - - the default date resolution to set - - - - Sets the date resolution used by RangeQueries for a specific field. - - - field for which the date resolution is to be set - - date resolution to set - - - - Returns the date resolution that is used by RangeQueries for the given field. - Returns null, if no default or field specific date resolution has been set - for the given field. - - - - - Sets the collator used to determine index term inclusion in ranges - for RangeQuerys. -

- WARNING: Setting the rangeCollator to a non-null - collator using this method will cause every single index Term in the - Field referenced by lowerTerm and/or upperTerm to be examined. - Depending on the number of index Terms in this Field, the operation could - be very slow. - -

- the collator to use when constructing RangeQuerys - -
- - the collator used to determine index term inclusion in ranges - for RangeQuerys. - - - - use {@link #AddClause(List, int, int, Query)} instead. - - - - throw in overridden method to disallow - - - - Base implementation delegates to {@link #GetFieldQuery(String,String)}. - This method may be overridden, for example, to return - a SpanNearQuery instead of a PhraseQuery. - - - throw in overridden method to disallow - - - - throw in overridden method to disallow - - - - Builds a new BooleanQuery instance - disable coord - - new BooleanQuery instance - - - - Builds a new BooleanClause instance - sub query - - how this clause should occur when matching documents - - new BooleanClause instance - - - - Builds a new TermQuery instance - term - - new TermQuery instance - - - - Builds a new PhraseQuery instance - new PhraseQuery instance - - - - Builds a new MultiPhraseQuery instance - new MultiPhraseQuery instance - - - - Builds a new PrefixQuery instance - Prefix term - - new PrefixQuery instance - - - - Builds a new FuzzyQuery instance - Term - - minimum similarity - - prefix length - - new FuzzyQuery Instance - - - - Builds a new TermRangeQuery instance - Field - - min - - max - - true if range is inclusive - - new TermRangeQuery instance - - - - Builds a new MatchAllDocsQuery instance - new MatchAllDocsQuery instance - - - - Builds a new WildcardQuery instance - wildcard term - - new WildcardQuery instance - - - - Factory method for generating query, given a set of clauses. - By default creates a boolean query composed of clauses passed in. - - Can be overridden by extending classes, to modify query being - returned. - - - List that contains {@link BooleanClause} instances - to join. - - - Resulting {@link Query} object. - - throw in overridden method to disallow - - use {@link #GetBooleanQuery(List)} instead - - - - Factory method for generating query, given a set of clauses. - By default creates a boolean query composed of clauses passed in. - - Can be overridden by extending classes, to modify query being - returned. - - - List that contains {@link BooleanClause} instances - to join. - - - Resulting {@link Query} object. - - throw in overridden method to disallow - - - - Factory method for generating query, given a set of clauses. - By default creates a boolean query composed of clauses passed in. - - Can be overridden by extending classes, to modify query being - returned. - - - List that contains {@link BooleanClause} instances - to join. - - true if coord scoring should be disabled. - - - Resulting {@link Query} object. - - throw in overridden method to disallow - - use {@link #GetBooleanQuery(List, boolean)} instead - - - - Factory method for generating query, given a set of clauses. - By default creates a boolean query composed of clauses passed in. - - Can be overridden by extending classes, to modify query being - returned. - - - List that contains {@link BooleanClause} instances - to join. - - true if coord scoring should be disabled. - - - Resulting {@link Query} object. - - throw in overridden method to disallow - - - - Factory method for generating a query. Called when parser - parses an input term token that contains one or more wildcard - characters (? and *), but is not a prefix term token (one - that has just a single * character at the end) -

- Depending on settings, prefix term may be lower-cased - automatically. It will not go through the default Analyzer, - however, since normal Analyzers are unlikely to work properly - with wildcard templates. -

- Can be overridden by extending classes, to provide custom handling for - wildcard queries, which may be necessary due to missing analyzer calls. - -

- Name of the field query will use. - - Term token that contains one or more wild card - characters (? or *), but is not simple prefix term - - - Resulting {@link Query} built for the term - - throw in overridden method to disallow - -
- - Factory method for generating a query (similar to - {@link #getWildcardQuery}). Called when parser parses an input term - token that uses prefix notation; that is, contains a single '*' wildcard - character as its last character. Since this is a special case - of generic wildcard term, and such a query can be optimized easily, - this usually results in a different query object. -

- Depending on settings, a prefix term may be lower-cased - automatically. It will not go through the default Analyzer, - however, since normal Analyzers are unlikely to work properly - with wildcard templates. -

- Can be overridden by extending classes, to provide custom handling for - wild card queries, which may be necessary due to missing analyzer calls. - -

- Name of the field query will use. - - Term token to use for building term for the query - (without trailing '*' character!) - - - Resulting {@link Query} built for the term - - throw in overridden method to disallow - -
- - Factory method for generating a query (similar to - {@link #getWildcardQuery}). Called when parser parses - an input term token that has the fuzzy suffix (~) appended. - - - Name of the field query will use. - - Term token to use for building term for the query - - - Resulting {@link Query} built for the term - - throw in overridden method to disallow - - - - Returns a String where the escape char has been - removed, or kept only once if there was a double escape. - - Supports escaped unicode characters, e. g. translates - \\u0041 to A. - - - - - Returns the numeric value of the hexadecimal character - - - Returns a String where those characters that QueryParser - expects to be escaped are escaped by a preceding \. - - - - Command line tool to test QueryParser, using {@link Lucene.Net.Analysis.SimpleAnalyzer}. - Usage:
- java Lucene.Net.QueryParsers.QueryParser <input> -
-
- - Generated Token Manager. - - - Current token. - - - Next token. - - - Constructor with user supplied CharStream. - - - Reinitialise. - - - Constructor with generated Token Manager. - - - Reinitialise. - - - Get the next Token. - - - Get the specific Token. - - - Generate ParseException. - - - Enable tracing. - - - Disable tracing. - - - The default operator for parsing queries. - Use {@link QueryParser#setDefaultOperator} to change it. - - - - This exception is thrown when parse errors are encountered. - You can explicitly create objects of this exception type by - calling the method generateParseException in the generated - parser. - - You can modify this class to customize your error reporting - mechanisms so long as you retain the public fields. - - - - This constructor is used by the method "generateParseException" - in the generated parser. Calling this constructor generates - a new object of this type with the fields "currentToken", - "expectedTokenSequences", and "tokenImage" set. The boolean - flag "specialConstructor" is also set to true to indicate that - this constructor was used to create this object. - This constructor calls its super class with the empty string - to force the "toString" method of parent class "Throwable" to - print the error message in the form: - ParseException: <result of getMessage> - - - - The following constructors are for use by you for whatever - purpose you can think of. Constructing the exception in this - manner makes the exception behave in the normal way - i.e., as - documented in the class "Throwable". The fields "errorToken", - "expectedTokenSequences", and "tokenImage" do not contain - relevant information. The JavaCC generated code does not use - these constructors. - - - - Constructor with message. - - - Constructor with message. - - - This variable determines which constructor was used to create - this object and thereby affects the semantics of the - "getMessage" method (see below). - - - - This is the last token that has been consumed successfully. If - this object has been created due to a parse error, the token - followng this token will (therefore) be the first error token. - - - - Each entry in this array is an array of integers. Each array - of integers represents a sequence of tokens (by their ordinal - values) that is expected at this point of the parse. - - - - This is a reference to the "tokenImage" array of the generated - parser within which the parse error occurred. This array is - defined in the generated ...Constants interface. - - - - The end of line string for this machine. - - - Used to convert raw characters to their escaped version - when these raw version cannot be used as part of an ASCII - string literal. - - - - This method has the standard behavior when this object has been - created using the standard constructors. Otherwise, it uses - "currentToken" and "expectedTokenSequences" to generate a parse - error message and returns it. If this object has been created - due to a parse error, and you do not catch it (it gets thrown - from the parser), then this method is called during the printing - of the final stack trace, and hence the correct error message - gets displayed. - - - - For each Field, store position by position information. It ignores frequency information -

- This is not thread-safe. -

-
- - The TermVectorMapper can be used to map Term Vectors into your own - structure instead of the parallel array structure used by - {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. -

- It is up to the implementation to make sure it is thread-safe. - - - -

-
- - - true if this mapper should tell Lucene to ignore positions even if they are stored - - similar to ignoringPositions - - - - Tell the mapper what to expect in regards to field, number of terms, offset and position storage. - This method will be called once before retrieving the vector for a field. - - This method will be called before {@link #Map(String,int,TermVectorOffsetInfo[],int[])}. - - The field the vector is for - - The number of terms that need to be mapped - - true if the mapper should expect offset information - - true if the mapper should expect positions info - - - - Map the Term Vector information into your own structure - The term to add to the vector - - The frequency of the term in the document - - null if the offset is not specified, otherwise the offset into the field of the term - - null if the position is not specified, otherwise the position in the field of the term - - - - Indicate to Lucene that even if there are positions stored, this mapper is not interested in them and they - can be skipped over. Derived classes should set this to true if they want to ignore positions. The default - is false, meaning positions will be loaded if they are stored. - - false - - - - - Same principal as {@link #IsIgnoringPositions()}, but applied to offsets. false by default. - - false - - - - Passes down the index of the document whose term vector is currently being mapped, - once for each top level call to a term vector reader. -

- Default implementation IGNORES the document number. Override if your implementation needs the document number. -

- NOTE: Document numbers are internal to Lucene and subject to change depending on indexing operations. - -

- index of document currently being mapped - -
- - A Map of Integer and TVPositionInfo - - - - - - - - Never ignores positions. This mapper doesn't make much sense unless there are positions - false - - - - Callback for the TermVectorReader. - - - - - - - - - - - Callback mechanism used by the TermVectorReader - The field being read - - The number of terms in the vector - - Whether offsets are available - - Whether positions are available - - - - Get the mapping between fields and terms, sorted by the comparator - - - A map between field names and a Map. The sub-Map key is the position as the integer, the value is {@link Lucene.Net.Index.PositionBasedTermVectorMapper.TVPositionInfo}. - - - - Container for a term at a position - - - - The position of the term - - - - Note, there may be multiple terms at the same position - A List of Strings - - - - Parallel list (to {@link #getTerms()}) of TermVectorOffsetInfo objects. There may be multiple entries since there may be multiple terms at a position - A List of TermVectorOffsetInfo objects, if offsets are store. - - - - Change to true to see details of reference counts when - infoStream != null - - - - Initialize the deleter: find all previous commits in - the Directory, incref the files they reference, call - the policy to let it delete commits. This will remove - any files not referenced by any of the commits. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Remove the CommitPoints in the commitsToDelete List by - DecRef'ing all files from each SegmentInfos. - - - - Writer calls this when it has hit an error and had to - roll back, to tell us that there may now be - unreferenced files in the filesystem. So we re-list - the filesystem and delete such files. If segmentName - is non-null, we will only delete files corresponding to - that segment. - - - - For definition of "check point" see IndexWriter comments: - "Clarification: Check Points (and commits)". - - Writer calls this when it has made a "consistent - change" to the index, meaning new files are written to - the index and the in-memory SegmentInfos have been - modified to point to those files. - - This may or may not be a commit (segments_N may or may - not have been written). - - We simply incref the files referenced by the new - SegmentInfos and decref the files we had previously - seen (if any). - - If this is a commit, we also call the policy to give it - a chance to remove other commits. If any commits are - removed, we decref their files as well. - - - - Deletes the specified files, but only if they are new - (have not yet been incref'd). - - - - Tracks the reference count for a single index file: - - - Holds details for each commit point. This class is - also passed to the deletion policy. Note: this class - has a natural ordering that is inconsistent with - equals. - - - - Called only be the deletion policy, to remove this - commit point from the index. - - - - Bulk write a contiguous series of documents. The - lengths array is the length (in bytes) of each raw - document. The stream IndexInput is the - fieldsStream from which we should bulk-copy all - bytes. - - - - Class responsible for access to stored document fields. -

- It uses <segment>.fdt and <segment>.fdx; files. - -

- $Id: FieldsReader.java 801344 2009-08-05 18:05:06Z yonik $ - -
- - Returns a cloned FieldsReader that shares open - IndexInputs with the original one. It is the caller's - job not to close the original FieldsReader until all - clones are called (eg, currently SegmentReader manages - this logic). - - - - AlreadyClosedException if this FieldsReader is closed - - - Closes the underlying {@link Lucene.Net.Store.IndexInput} streams, including any ones associated with a - lazy implementation of a Field. This means that the Fields values will not be accessible. - - - IOException - - - Returns the length in bytes of each raw document in a - contiguous range of length numDocs starting with - startDocID. Returns the IndexInput (the fieldStream), - already seeked to the starting point for startDocID. - - - - Skip the field. We still have to read some of the information about the field, but can skip past the actual content. - This will have the most payoff on large fields. - - - - A Lazy implementation of Fieldable that differs loading of fields until asked for, instead of when the Document is - loaded. - - - - The value of the field in Binary, or null. If null, the Reader value, - String value, or TokenStream value is used. Exactly one of stringValue(), - readerValue(), binaryValue(), and tokenStreamValue() must be set. - - - - The value of the field as a Reader, or null. If null, the String value, - binary value, or TokenStream value is used. Exactly one of stringValue(), - readerValue(), binaryValue(), and tokenStreamValue() must be set. - - - - The value of the field as a TokenStream, or null. If null, the Reader value, - String value, or binary value is used. Exactly one of stringValue(), - readerValue(), binaryValue(), and tokenStreamValue() must be set. - - - - The value of the field as a String, or null. If null, the Reader value, - binary value, or TokenStream value is used. Exactly one of stringValue(), - readerValue(), binaryValue(), and tokenStreamValue() must be set. - - - - Access to the Fieldable Info file that describes document fields and whether or - not they are indexed. Each segment has a separate Fieldable Info file. Objects - of this class are thread-safe for multiple readers, but only one thread can - be adding documents at a time, with no other reader or writer threads - accessing this object. - - - - Construct a FieldInfos object using the directory and the name of the file - IndexInput - - The directory to open the IndexInput from - - The name of the file to open the IndexInput from in the Directory - - IOException - - - Returns a deep clone of this FieldInfos instance. - - - Adds field info for a Document. - - - Returns true if any fields do not omitTermFreqAndPositions - - - Add fields that are indexed. Whether they have termvectors has to be specified. - - - The names of the fields - - Whether the fields store term vectors or not - - true if positions should be stored. - - true if offsets should be stored - - - - Assumes the fields are not storing term vectors. - - - The names of the fields - - Whether the fields are indexed or not - - - - - - - Calls 5 parameter add with false for all TermVector parameters. - - - The name of the Fieldable - - true if the field is indexed - - - - - - Calls 5 parameter add with false for term vector positions and offsets. - - - The name of the field - - true if the field is indexed - - true if the term vector should be stored - - - - If the field is not yet known, adds it. If it is known, checks to make - sure that the isIndexed flag is the same as was given previously for this - field. If not - marks it as being indexed. Same goes for the TermVector - parameters. - - - The name of the field - - true if the field is indexed - - true if the term vector should be stored - - true if the term vector with positions should be stored - - true if the term vector with offsets should be stored - - - - If the field is not yet known, adds it. If it is known, checks to make - sure that the isIndexed flag is the same as was given previously for this - field. If not - marks it as being indexed. Same goes for the TermVector - parameters. - - - The name of the field - - true if the field is indexed - - true if the term vector should be stored - - true if the term vector with positions should be stored - - true if the term vector with offsets should be stored - - true if the norms for the indexed field should be omitted - - - - If the field is not yet known, adds it. If it is known, checks to make - sure that the isIndexed flag is the same as was given previously for this - field. If not - marks it as being indexed. Same goes for the TermVector - parameters. - - - The name of the field - - true if the field is indexed - - true if the term vector should be stored - - true if the term vector with positions should be stored - - true if the term vector with offsets should be stored - - true if the norms for the indexed field should be omitted - - true if payloads should be stored for this field - - true if term freqs should be omitted for this field - - - - Return the fieldName identified by its number. - - - - - the fieldName or an empty string when the field - with the given number doesn't exist. - - - - Return the fieldinfo object referenced by the fieldNumber. - - - the FieldInfo object or null when the given fieldNumber - doesn't exist. - - - - - Base class for enumerating all but deleted docs. - -

NOTE: this class is meant only to be used internally - by Lucene; it's only public so it can be shared across - packages. This means the API is freely subject to - change, and, the class could be removed entirely, in any - Lucene release. Use directly at your own risk! */ -

-
- - Removes stop words from a token stream. - - - Construct a token stream filtering the given input. - Use {@link #StopFilter(boolean, TokenStream, String[])} instead - - - - Construct a token stream filtering the given input. - true if token positions should record the removed stop words - - input TokenStream - - array of stop words - - Use {@link #StopFilter(boolean, TokenStream, Set)} instead. - - - - Constructs a filter which removes words from the input - TokenStream that are named in the array of words. - - Use {@link #StopFilter(boolean, TokenStream, String[], boolean)} instead - - - - Constructs a filter which removes words from the input - TokenStream that are named in the array of words. - - true if token positions should record the removed stop words - - input TokenStream - - array of stop words - - true if case is ignored - - Use {@link #StopFilter(boolean, TokenStream, Set, boolean)} instead. - - - - Construct a token stream filtering the given input. - If stopWords is an instance of {@link CharArraySet} (true if - makeStopSet() was used to construct the set) it will be directly used - and ignoreCase will be ignored since CharArraySet - directly controls case sensitivity. -

- If stopWords is not an instance of {@link CharArraySet}, - a new CharArraySet will be constructed and ignoreCase will be - used to specify the case sensitivity of that set. - -

- - - The set of Stop Words. - - -Ignore case when stopping. - - Use {@link #StopFilter(boolean, TokenStream, Set, boolean)} instead - -
- - Construct a token stream filtering the given input. - If stopWords is an instance of {@link CharArraySet} (true if - makeStopSet() was used to construct the set) it will be directly used - and ignoreCase will be ignored since CharArraySet - directly controls case sensitivity. -

- If stopWords is not an instance of {@link CharArraySet}, - a new CharArraySet will be constructed and ignoreCase will be - used to specify the case sensitivity of that set. - -

- true if token positions should record the removed stop words - - Input TokenStream - - The set of Stop Words. - - -Ignore case when stopping. - -
- - Constructs a filter which removes words from the input - TokenStream that are named in the Set. - - - - - Use {@link #StopFilter(boolean, TokenStream, Set)} instead - - - - Constructs a filter which removes words from the input - TokenStream that are named in the Set. - - - true if token positions should record the removed stop words - - Input stream - - The set of Stop Words. - - - - - - Builds a Set from an array of stop words, - appropriate for passing into the StopFilter constructor. - This permits this stopWords construction to be cached once when - an Analyzer is constructed. - - - passing false to ignoreCase - - - - Builds a Set from an array of stop words, - appropriate for passing into the StopFilter constructor. - This permits this stopWords construction to be cached once when - an Analyzer is constructed. - - - passing false to ignoreCase - - - - - An array of stopwords - - If true, all words are lower cased first. - - a Set containing the words - - - - - A List of Strings representing the stopwords - - if true, all words are lower cased first - - A Set containing the words - - - - Returns the next input Token whose term() is not a stop word. - - - - - Please specify this when you create the StopFilter - - - - Returns version-dependent default for enablePositionIncrements. Analyzers - that embed StopFilter use this method when creating the StopFilter. Prior - to 2.9, this returns {@link #getEnablePositionIncrementsDefault}. On 2.9 - or later, it returns true. - - - - Set the default position increments behavior of every StopFilter created - from now on. -

- Note: behavior of a single StopFilter instance can be modified with - {@link #SetEnablePositionIncrements(boolean)}. This static method allows - control over behavior of classes using StopFilters internally, for - example {@link Lucene.Net.Analysis.Standard.StandardAnalyzer - StandardAnalyzer} if used with the no-arg ctor. -

- Default : false. - -

- - - Please specify this when you create the StopFilter - -
- - - - - - If true, this StopFilter will preserve - positions of the incoming tokens (ie, accumulate and - set position increments of the removed stop tokens). - Generally, true is best as it does not - lose information (positions of the original tokens) - during indexing. - -

When set, when a token is stopped - (omitted), the position increment of the following - token is incremented. - -

NOTE: be sure to also - set {@link QueryParser#setEnablePositionIncrements} if - you use QueryParser to create queries. -

-
- - Normalizes tokens extracted with {@link StandardTokenizer}. - - - Construct filtering in. - - - Returns the next token in the stream, or null at EOS. -

Removes 's from the end of words. -

Removes dots from acronyms. -

-
- - Simplistic {@link CharFilter} that applies the mappings - contained in a {@link NormalizeCharMap} to the character - stream, and correcting the resulting changes to the - offsets. - - - - Base utility class for implementing a {@link CharFilter}. - You subclass this, and then record mappings by calling - {@link #addOffCorrectMap}, and then invoke the correct - method to correct an offset. - -

NOTE: This class is not particularly efficient. - For example, a new class instance is created for every - call to {@link #addOffCorrectMap}, which is then appended - to a private list. -

-
- - Subclasses of CharFilter can be chained to filter CharStream. - They can be used as {@link java.io.Reader} with additional offset - correction. {@link Tokenizer}s will automatically use {@link #CorrectOffset} - if a CharFilter/CharStream subclass is used. - - - $Id$ - - - - - Subclass may want to override to correct the current offset. - - - current offset - - corrected offset - - - - Chains the corrected offset through the input - CharFilter. - - - - Retrieve the corrected offset. Note that this method - is slow, if you correct positions far before the most - recently added position, as it's a simple linear - search backwards through all offset corrections added - by {@link #addOffCorrectMap}. - - - - Default constructor that takes a {@link CharStream}. - - - Easy-use constructor that takes a {@link Reader}. - - - A LetterTokenizer is a tokenizer that divides text at non-letters. That's - to say, it defines tokens as maximal strings of adjacent letters, as defined - by java.lang.Character.isLetter() predicate. - Note: this does a decent job for most European languages, but does a terrible - job for some Asian languages, where words are not separated by spaces. - - - - Construct a new LetterTokenizer. - - - Construct a new LetterTokenizer using a given {@link AttributeSource}. - - - Construct a new LetterTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. - - - Collects only characters which satisfy - {@link Character#isLetter(char)}. - - - - CharReader is a Reader wrapper. It reads chars from - Reader and outputs {@link CharStream}, defining an - identify function {@link #CorrectOffset} method that - simply returns the provided offset. - - - - Use by certain classes to match version compatibility - across releases of Lucene. -

- WARNING: When changing the version parameter - that you supply to components in Lucene, do not simply - change the version at search-time, but instead also adjust - your indexing code to match, and re-index. -

-
- - -

WARNING: if you use this setting, and then - upgrade to a newer release of Lucene, sizable changes - may happen. If precise back compatibility is important - then you should instead explicitly specify an actual - version. - If you use this constant then you may need to - re-index all of your documents when upgrading - Lucene, as the way text is indexed may have changed. - Additionally, you may need to re-test your entire - application to ensure it behaves as expected, as - some defaults may have changed and may break functionality - in your application. -

-
- - Match settings and bugs in Lucene's 2.0 release. - - - Match settings and bugs in Lucene's 2.1 release. - - - Match settings and bugs in Lucene's 2.2 release. - - - Match settings and bugs in Lucene's 2.3 release. - - - Match settings and bugs in Lucene's 2.3 release. - - - - Helper methods to ease implementing {@link Object#toString()}. - - - for printing boost only if not 1.0 - - - Stores and iterate on sorted integers in compressed form in RAM.
- The code for compressing the differences between ascending integers was - borrowed from {@link Lucene.Net.Store.IndexInput} and - {@link Lucene.Net.Store.IndexOutput}. -

- NOTE: this class assumes the stored integers are doc Ids (hence why it - extends {@link DocIdSet}). Therefore its {@link #Iterator()} assumes {@link - DocIdSetIterator#NO_MORE_DOCS} can be used as sentinel. If you intent to use - this value, then make sure it's not used during search flow. -

-
- - When a BitSet has fewer than 1 in BITS2VINTLIST_SIZE bits set, - a SortedVIntList representing the index numbers of the set bits - will be smaller than that BitSet. - - - - Create a SortedVIntList from all elements of an array of integers. - - - A sorted array of non negative integers. - - - - Create a SortedVIntList from an array of integers. - An array of sorted non negative integers. - - The number of integers to be used from the array. - - - - Create a SortedVIntList from a BitSet. - A bit set representing a set of integers. - - - - Create a SortedVIntList from an OpenBitSet. - A bit set representing a set of integers. - - - - Create a SortedVIntList. - An iterator providing document numbers as a set of integers. - This DocIdSetIterator is iterated completely when this constructor - is called and it must provide the integers in non - decreasing order. - - - - The total number of sorted integers. - - - - The size of the byte array storing the compressed sorted integers. - - - - This DocIdSet implementation is cacheable. - - - An iterator over the sorted integers. - - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - Construct an OpenBitSetDISI with its bits set - from the doc ids of the given DocIdSetIterator. - Also give a maximum size one larger than the largest doc id for which a - bit may ever be set on this OpenBitSetDISI. - - - - Construct an OpenBitSetDISI with no bits set, and a given maximum size - one larger than the largest doc id for which a bit may ever be set - on this OpenBitSetDISI. - - - - Perform an inplace OR with the doc ids from a given DocIdSetIterator, - setting the bit for each such doc id. - These doc ids should be smaller than the maximum size passed to the - constructor. - - - - Perform an inplace AND with the doc ids from a given DocIdSetIterator, - leaving only the bits set for which the doc ids are in common. - These doc ids should be smaller than the maximum size passed to the - constructor. - - - - Perform an inplace NOT with the doc ids from a given DocIdSetIterator, - clearing all the bits for each such doc id. - These doc ids should be smaller than the maximum size passed to the - constructor. - - - - Perform an inplace XOR with the doc ids from a given DocIdSetIterator, - flipping all the bits for each such doc id. - These doc ids should be smaller than the maximum size passed to the - constructor. - - - - Returns primitive memory sizes for estimating RAM usage. - - - - - size of array beyond contents - - - - Class size overhead - - - - a primitive Class - bool, byte, char, short, long, float, - short, double, int - - the size in bytes of given primitive Class - - - - size of reference - - - - Provides methods for sanity checking that entries in the FieldCache - are not wasteful or inconsistent. -

-

- Lucene 2.9 Introduced numerous enhancements into how the FieldCache - is used by the low levels of Lucene searching (for Sorting and - ValueSourceQueries) to improve both the speed for Sorting, as well - as reopening of IndexReaders. But these changes have shifted the - usage of FieldCache from "top level" IndexReaders (frequently a - MultiReader or DirectoryReader) down to the leaf level SegmentReaders. - As a result, existing applications that directly access the FieldCache - may find RAM usage increase significantly when upgrading to 2.9 or - Later. This class provides an API for these applications (or their - Unit tests) to check at run time if the FieldCache contains "insane" - usages of the FieldCache. -

-

- EXPERIMENTAL API: This API is considered extremely advanced and - experimental. It may be removed or altered w/o warning in future releases - of Lucene. -

-

- - - - - - -
- - If set, will be used to estimate size for all CacheEntry objects - dealt with. - - - - Quick and dirty convenience method - - - - - Quick and dirty convenience method that instantiates an instance with - "good defaults" and uses it to test the CacheEntry[] - - - - - - Tests a CacheEntry[] for indication of "insane" cache usage. -

- NOTE:FieldCache CreationPlaceholder objects are ignored. - (:TODO: is this a bad idea? are we masking a real problem?) -

-

-
- - Internal helper method used by check that iterates over - valMismatchKeys and generates a Collection of Insanity - instances accordingly. The MapOfSets are used to populate - the Insantiy objects. - - - - - - Internal helper method used by check that iterates over - the keys of readerFieldToValIds and generates a Collection - of Insanity instances whenever two (or more) ReaderField instances are - found that have an ancestery relationships. - - - - - - - Checks if the seed is an IndexReader, and if so will walk - the hierarchy of subReaders building up a list of the objects - returned by obj.getFieldCacheKey() - - - - Simple pair object for using "readerKey + fieldName" a Map key - - - Simple container for a collection of related CacheEntry objects that - in conjunction with eachother represent some "insane" usage of the - FieldCache. - - - - Type of insane behavior this object represents - - - Description of hte insane behavior - - - CacheEntry objects which suggest a problem - - - Multi-Line representation of this Insanity object, starting with - the Type and Msg, followed by each CacheEntry.toString() on it's - own line prefaced by a tab character - - - - An Enumaration of the differnet types of "insane" behavior that - may be detected in a FieldCache. - - - - - - - - - - - Indicates an overlap in cache usage on a given field - in sub/super readers. - - - -

- Indicates entries have the same reader+fieldname but - different cached values. This can happen if different datatypes, - or parsers are used -- and while it's not necessarily a bug - it's typically an indication of a possible problem. -

-

- PNOTE: Only the reader, fieldname, and cached value are actually - tested -- if two cache entries have different parsers or datatypes but - the cached values are the same Object (== not just equal()) this method - does not consider that a red flag. This allows for subtle variations - in the way a Parser is specified (null vs DEFAULT_LONG_PARSER, etc...) -

-

-
- - Indicates an expected bit of "insanity". This may be useful for - clients that wish to preserve/log information about insane usage - but indicate that it was expected. - - - -

Implements {@link LockFactory} using {@link - File#createNewFile()}.

- -

NOTE: the javadocs - for File.createNewFile contain a vague - yet spooky warning about not using the API for file - locking. This warning was added due to this - bug, and in fact the only known problem with using - this API for locking is that the Lucene write lock may - not be released when the JVM exits abnormally.

-

When this happens, a {@link LockObtainFailedException} - is hit when trying to create a writer, in which case you - need to explicitly clear the lock file first. You can - either manually remove the file, or use the {@link - org.apache.lucene.index.IndexReader#unlock(Directory)} - API. But, first be certain that no writer is in fact - writing to the index otherwise you can easily corrupt - your index.

- -

If you suspect that this or any other LockFactory is - not working properly in your environment, you can easily - test it by using {@link VerifyingLockFactory}, {@link - LockVerifyServer} and {@link LockStressTest}.

- -

- - -
- - Base class for file system based locking implementation. - - - Directory for the lock files. - - - Set the lock directory. This method can be only called - once to initialize the lock directory. It is used by {@link FSDirectory} - to set the lock directory to itsself. - Subclasses can also use this method to set the directory - in the constructor. - - - - Retrieve the lock directory. - - - Create a SimpleFSLockFactory instance, with null (unset) - lock directory. When you pass this factory to a {@link FSDirectory} - subclass, the lock directory is automatically set to the - directory itsself. Be sure to create one instance for each directory - your create! - - - - Instantiate using the provided directory (as a File instance). - where lock files should be created. - - - - Instantiate using the provided directory (as a File instance). - where lock files should be created. - - - - Instantiate using the provided directory name (String). - where lock files should be created. - - - - Simple standalone server that must be running when you - use {@link VerifyingLockFactory}. This server simply - verifies at most one process holds the lock at a time. - Run without any args to see usage. - - - - - - - - - Subclass of FilteredTermEnum for enumerating all terms that match the - specified wildcard filter term. -

- Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. - -

- $Id: WildcardTermEnum.java 783371 2009-06-10 14:39:56Z mikemccand $ - -
- - ***************************************** - String equality with support for wildcards - ****************************************** - - - - Creates a new WildcardTermEnum. -

- After calling the constructor the enumeration is already pointing to the first - valid term if such a term exists. -

-
- - Determines if a word matches a wildcard pattern. - Work released by Granta Design Ltd after originally being done on - company time. - - - -

Wrapper to allow {@link SpanQuery} objects participate in composite - single-field SpanQueries by 'lying' about their search field. That is, - the masked SpanQuery will function as normal, - but {@link SpanQuery#GetField()} simply hands back the value supplied - in this class's constructor.

- -

This can be used to support Queries like {@link SpanNearQuery} or - {@link SpanOrQuery} across different fields, which is not ordinarily - permitted.

- -

This can be useful for denormalized relational data: for example, when - indexing a document with conceptually many 'children':

- -

-            teacherid: 1
-            studentfirstname: james
-            studentsurname: jones
-            
-            teacherid: 2
-            studenfirstname: james
-            studentsurname: smith
-            studentfirstname: sally
-            studentsurname: jones
-            
- -

a SpanNearQuery with a slop of 0 can be applied across two - {@link SpanTermQuery} objects as follows: -

-            SpanQuery q1  = new SpanTermQuery(new Term("studentfirstname", "james"));
-            SpanQuery q2  = new SpanTermQuery(new Term("studentsurname", "jones"));
-            SpanQuery q2m new FieldMaskingSpanQuery(q2, "studentfirstname");
-            Query q = new SpanNearQuery(new SpanQuery[]{q1, q2m}, -1, false);
-            
- to search for 'studentfirstname:james studentsurname:jones' and find - teacherid 1 without matching teacherid 2 (which has a 'james' in position 0 - and 'jones' in position 1).

- -

Note: as {@link #GetField()} returns the masked field, scoring will be - done using the norms of the field name supplied. This may lead to unexpected - scoring behaviour.

-

-
- - use {@link #ExtractTerms(Set)} instead. - - - - Constrains search results to only match those which also match a provided - query. Results are cached, so that searches after the first on the same - index using this filter are much faster. - - - $Id: QueryFilter.java 528298 2007-04-13 00:59:28Z hossman $ - - use a CachingWrapperFilter with QueryWrapperFilter - - - - Wraps another filter's result and caches it. The purpose is to allow - filters to simply filter, and then wrap with this class to add caching. - - - - A transient Filter cache. - - - Filter to cache results of - - - - Use {@link #GetDocIdSet(IndexReader)} instead. - - - - Provide the DocIdSet to be cached, using the DocIdSet provided - by the wrapped Filter. - This implementation returns the given DocIdSet. - - - - Constructs a filter which only matches documents matching - query. - - - - A Query that matches documents containing terms with a specified prefix. A PrefixQuery - is built by QueryParser for input like app*. - -

This query uses the {@link - MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} - rewrite method. -

-
- - Constructs a query for terms starting with prefix. - - - Returns the prefix of this query. - - - Prints a user-readable version of this query. - - - Expert: obtains float field values from the - {@link Lucene.Net.Search.FieldCache FieldCache} - using getFloats() and makes those values - available as other numeric types, casting as needed. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

- for requirements" - on the field. - -

NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

- - - -

Create a cached float field source with default string-to-float parser. -
- - Create a cached float field source with a specific string-to-float parser. - - - Expert: obtains single byte field values from the - {@link Lucene.Net.Search.FieldCache FieldCache} - using getBytes() and makes those values - available as other numeric types, casting as needed. - -

- WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

- for requirements" - on the field. - -

NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

- - - -

Create a cached byte field source with default string-to-byte parser. -
- - Create a cached byte field source with a specific string-to-byte parser. - - - Convenience class for holding TermVector information. - - - Collapse the hash table & sort in-place. - - - Compares term text for two Posting instance and - returns -1 if p1 < p2; 1 if p1 > p2; else 0. - - - - Test whether the text for current RawPostingList p equals - current tokenText. - - - - Called when postings hash is too small (> 50% - occupied) or too large (< 20% occupied). - - - - Called by super.skipTo(). + + Deletes all commits except the most recent one. An IndexWriter creates and maintains an index. @@ -16573,241 +8977,4051 @@ been carried over to the merged segment. - - A {@link MergeScheduler} that runs each merge using a - separate thread, up until a maximum number of threads - ({@link #setMaxThreadCount}) at which when a merge is - needed, the thread(s) that are updating the index will - pause until one or more merges completes. This is a - simple way to use concurrency in the indexing process - without having to create and manage application level - threads. + + IndexReader is an abstract class, providing an interface for accessing an + index. Search of an index is done entirely through this abstract interface, + so that any subclass which implements it is searchable. +

Concrete subclasses of IndexReader are usually constructed with a call to + one of the static open() methods, e.g. {@link + #Open(String, boolean)}. +

For efficiency, in this API documents are often referred to via + document numbers, non-negative integers which each name a unique + document in the index. These document numbers are ephemeral--they may change + as documents are added to and deleted from an index. Clients should thus not + rely on a given document having the same number between sessions. +

An IndexReader can be opened on a directory for which an IndexWriter is + opened already, but it cannot be used to delete documents from the index then. +

+ NOTE: for backwards API compatibility, several methods are not listed + as abstract, but have no useful implementations in this base class and + instead always throw UnsupportedOperationException. Subclasses are + strongly encouraged to override these methods, but in many cases may not + need to. +

+

+ NOTE: as of 2.4, it's possible to open a read-only + IndexReader using one of the static open methods that + accepts the boolean readOnly parameter. Such a reader has + better concurrency as it's not necessary to synchronize on + the isDeleted method. Currently the default for readOnly + is false, meaning if not specified you will get a + read/write IndexReader. But in 3.0 this default will + change to true, meaning you must explicitly specify false + if you want to make changes with the resulting IndexReader. +

+

NOTE: {@link + IndexReader} instances are completely thread + safe, meaning multiple threads can call any of its methods, + concurrently. If your application requires external + synchronization, you should not synchronize on the + IndexReader instance; use your own + (non-Lucene) objects instead. +

+ $Id: IndexReader.java 826049 2009-10-16 19:28:55Z mikemccand $ + +
+ + Expert: returns the current refCount for this reader + + + Expert: increments the refCount of this IndexReader + instance. RefCounts are used to determine when a + reader can be closed safely, i.e. as soon as there are + no more references. Be sure to always call a + corresponding {@link #decRef}, in a finally clause; + otherwise the reader may never be closed. Note that + {@link #close} simply calls decRef(), which means that + the IndexReader will not really be closed until {@link + #decRef} has been called for all outstanding + references. + + + + + + + Expert: decreases the refCount of this IndexReader + instance. If the refCount drops to 0, then pending + changes (if any) are committed to the index and this + reader is closed. + + + IOException in case an IOException occurs in commit() or doClose() + + + + + + + will be deleted when IndexReader(Directory) is deleted + + + + + + Legacy Constructor for backwards compatibility. + +

+ This Constructor should not be used, it exists for backwards + compatibility only to support legacy subclasses that did not "own" + a specific directory, but needed to specify something to be returned + by the directory() method. Future subclasses should delegate to the + no arg constructor and implement the directory() method as appropriate. + +

+ Directory to be returned by the directory() method + + + + - use IndexReader() + +
+ + AlreadyClosedException if this IndexReader is closed + + + Returns a read/write IndexReader reading the index in an FSDirectory in the named + path. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #Open(Directory, boolean)} instead. + This method will be removed in the 3.0 release. + + + the path to the index directory + + + + Returns an IndexReader reading the index in an + FSDirectory in the named path. You should pass + readOnly=true, since it gives much better concurrent + performance, unless you intend to do write operations + (delete documents or change norms) with the reader. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + the path to the index directory + + true if this should be a readOnly + reader + + Use {@link #Open(Directory, boolean)} instead. + This method will be removed in the 3.0 release. + + + + + Returns a read/write IndexReader reading the index in an FSDirectory in the named + path. + + the path to the index directory + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #Open(Directory, boolean)} instead. + This method will be removed in the 3.0 release. + + + + + Returns an IndexReader reading the index in an + FSDirectory in the named path. You should pass + readOnly=true, since it gives much better concurrent + performance, unless you intend to do write operations + (delete documents or change norms) with the reader. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + the path to the index directory + + true if this should be a readOnly + reader + + Use {@link #Open(Directory, boolean)} instead. + This method will be removed in the 3.0 release. + + + + + Returns a read/write IndexReader reading the index in + the given Directory. + + the index directory + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #Open(Directory, boolean)} instead + This method will be removed in the 3.0 release. + + + + + Returns an IndexReader reading the index in the given + Directory. You should pass readOnly=true, since it + gives much better concurrent performance, unless you + intend to do write operations (delete documents or + change norms) with the reader. + + the index directory + + true if no changes (deletions, norms) will be made with this IndexReader + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns a read/write IndexReader reading the index in the given + {@link IndexCommit}. + + the commit point to open + + CorruptIndexException if the index is corrupt + Use {@link #Open(IndexCommit, boolean)} instead. + This method will be removed in the 3.0 release. + + + IOException if there is a low-level IO error + + + Expert: returns an IndexReader reading the index in the given + {@link IndexCommit}. You should pass readOnly=true, since it + gives much better concurrent performance, unless you + intend to do write operations (delete documents or + change norms) with the reader. + + the commit point to open + + true if no changes (deletions, norms) will be made with this IndexReader + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns a read/write IndexReader reading the index in the given + Directory, with a custom {@link IndexDeletionPolicy}. + + the index directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + Use {@link #Open(Directory, IndexDeletionPolicy, boolean)} instead. + This method will be removed in the 3.0 release. + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns an IndexReader reading the index in + the given Directory, with a custom {@link + IndexDeletionPolicy}. You should pass readOnly=true, + since it gives much better concurrent performance, + unless you intend to do write operations (delete + documents or change norms) with the reader. + + the index directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + true if no changes (deletions, norms) will be made with this IndexReader + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns an IndexReader reading the index in + the given Directory, with a custom {@link + IndexDeletionPolicy}. You should pass readOnly=true, + since it gives much better concurrent performance, + unless you intend to do write operations (delete + documents or change norms) with the reader. + + the index directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + true if no changes (deletions, norms) will be made with this IndexReader + + Subsamples which indexed + terms are loaded into RAM. This has the same effect as {@link + IndexWriter#setTermIndexInterval} except that setting + must be done at indexing time while this setting can be + set per reader. When set to N, then one in every + N*termIndexInterval terms in the index is loaded into + memory. By setting this to a value > 1 you can reduce + memory usage, at the expense of higher latency when + loading a TermInfo. The default value is 1. Set this + to -1 to skip loading the terms index entirely. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns a read/write IndexReader reading the index in the given + Directory, using a specific commit and with a custom + {@link IndexDeletionPolicy}. + + the specific {@link IndexCommit} to open; + see {@link IndexReader#listCommits} to list all commits + in a directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + Use {@link #Open(IndexCommit, IndexDeletionPolicy, boolean)} instead. + This method will be removed in the 3.0 release. + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns an IndexReader reading the index in + the given Directory, using a specific commit and with + a custom {@link IndexDeletionPolicy}. You should pass + readOnly=true, since it gives much better concurrent + performance, unless you intend to do write operations + (delete documents or change norms) with the reader. + + the specific {@link IndexCommit} to open; + see {@link IndexReader#listCommits} to list all commits + in a directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + true if no changes (deletions, norms) will be made with this IndexReader + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Expert: returns an IndexReader reading the index in + the given Directory, using a specific commit and with + a custom {@link IndexDeletionPolicy}. You should pass + readOnly=true, since it gives much better concurrent + performance, unless you intend to do write operations + (delete documents or change norms) with the reader. + + the specific {@link IndexCommit} to open; + see {@link IndexReader#listCommits} to list all commits + in a directory + + a custom deletion policy (only used + if you use this reader to perform deletes or to set + norms); see {@link IndexWriter} for details. + + true if no changes (deletions, norms) will be made with this IndexReader + + Subsambles which indexed + terms are loaded into RAM. This has the same effect as {@link + IndexWriter#setTermIndexInterval} except that setting + must be done at indexing time while this setting can be + set per reader. When set to N, then one in every + N*termIndexInterval terms in the index is loaded into + memory. By setting this to a value > 1 you can reduce + memory usage, at the expense of higher latency when + loading a TermInfo. The default value is 1. Set this + to -1 to skip loading the terms index entirely. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Refreshes an IndexReader if the index has changed since this instance + was (re)opened. +

+ Opening an IndexReader is an expensive operation. This method can be used + to refresh an existing IndexReader to reduce these costs. This method + tries to only load segments that have changed or were created after the + IndexReader was (re)opened. +

+ If the index has not changed since this instance was (re)opened, then this + call is a NOOP and returns this instance. Otherwise, a new instance is + returned. The old instance is not closed and remains usable.
+

+ If the reader is reopened, even though they share + resources internally, it's safe to make changes + (deletions, norms) with the new reader. All shared + mutable state obeys "copy on write" semantics to ensure + the changes are not seen by other readers. +

+ You can determine whether a reader was actually reopened by comparing the + old instance with the instance returned by this method: +

+            IndexReader reader = ... 
+            ...
+            IndexReader newReader = r.reopen();
+            if (newReader != reader) {
+            ...     // reader was reopened
+            reader.close(); 
+            }
+            reader = newReader;
+            ...
+            
+ + Be sure to synchronize that code so that other threads, + if present, can never use reader after it has been + closed and before it's switched to newReader. + +

NOTE: If this reader is a near real-time + reader (obtained from {@link IndexWriter#GetReader()}, + reopen() will simply call writer.getReader() again for + you, though this may change in the future. + +

+ CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error +
+ + Just like {@link #Reopen()}, except you can change the + readOnly of the original reader. If the index is + unchanged but readOnly is different then a new reader + will be returned. - -

Expert: {@link IndexWriter} uses an instance - implementing this interface to execute the merges - selected by a {@link MergePolicy}. The default - MergeScheduler is {@link ConcurrentMergeScheduler}.

+ +

Expert: reopen this reader on a specific commit point. + This always returns a readOnly reader. If the + specified commit point matches what this reader is + already on, and this reader is already readOnly, then + this same instance is returned; if it is not already + readOnly, a readOnly clone is returned. + + + + Efficiently clones the IndexReader (sharing most + internal state). +

+ On cloning a reader with pending changes (deletions, + norms), the original reader transfers its write lock to + the cloned reader. This means only the cloned reader + may make further changes to the index, and commit the + changes to the index on close, but the old reader still + reflects all changes made up until it was cloned. +

+ Like {@link #Reopen()}, it's safe to make changes to + either the original or the cloned reader: all shared + mutable state obeys "copy on write" semantics to ensure + the changes are not seen by other readers. +

+

+ CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error +
+ + Clones the IndexReader and optionally changes readOnly. A readOnly + reader cannot open a writeable reader. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Returns the directory associated with this index. The Default + implementation returns the directory specified by subclasses when + delegating to the IndexReader(Directory) constructor, or throws an + UnsupportedOperationException if one was not specified. + + UnsupportedOperationException if no directory + + + Returns the time the index in the named directory was last modified. + Do not use this to check whether the reader is still up-to-date, use + {@link #IsCurrent()} instead. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #LastModified(Directory)} instead. + This method will be removed in the 3.0 release. + + + + Returns the time the index in the named directory was last modified. + Do not use this to check whether the reader is still up-to-date, use + {@link #IsCurrent()} instead. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #LastModified(Directory)} instead. + This method will be removed in the 3.0 release. + + + + + Returns the time the index in the named directory was last modified. + Do not use this to check whether the reader is still up-to-date, use + {@link #IsCurrent()} instead. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Reads version number from segments files. The version number is + initialized with a timestamp and then increased by one for each change of + the index. + + + where the index resides. + + version number. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #GetCurrentVersion(Directory)} instead. + This method will be removed in the 3.0 release. + + + + Reads version number from segments files. The version number is + initialized with a timestamp and then increased by one for each change of + the index. + + + where the index resides. + + version number. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + Use {@link #GetCurrentVersion(Directory)} instead. + This method will be removed in the 3.0 release. + + + + Reads version number from segments files. The version number is + initialized with a timestamp and then increased by one for each change of + the index. + + + where the index resides. + + version number. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Reads commitUserData, previously passed to {@link + IndexWriter#Commit(Map)}, from current index + segments file. This will return null if {@link + IndexWriter#Commit(Map)} has never been called for + this index. + + + where the index resides. + + commit userData. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + + + + + Version number when this IndexReader was opened. Not implemented in the + IndexReader base class. + +

+ If this reader is based on a Directory (ie, was created by calling + {@link #Open}, or {@link #Reopen} on a reader based on a Directory), then + this method returns the version recorded in the commit that the reader + opened. This version is advanced every time {@link IndexWriter#Commit} is + called. +

+ +

+ If instead this reader is a near real-time reader (ie, obtained by a call + to {@link IndexWriter#GetReader}, or by calling {@link #Reopen} on a near + real-time reader), then this method returns the version of the last + commit done by the writer. Note that even as further changes are made + with the writer, the version will not changed until a commit is + completed. Thus, you should not rely on this method to determine when a + near real-time reader should be opened. Use {@link #IsCurrent} instead. +

+ +

+ UnsupportedOperationException + unless overridden in subclass + +
+ + Retrieve the String userData optionally passed to + IndexWriter#commit. This will return null if {@link + IndexWriter#Commit(Map)} has never been called for + this index. + + + + + + +

For IndexReader implementations that use + TermInfosReader to read terms, this sets the + indexDivisor to subsample the number of indexed terms + loaded into memory. This has the same effect as {@link + IndexWriter#setTermIndexInterval} except that setting + must be done at indexing time while this setting can be + set per reader. When set to N, then one in every + N*termIndexInterval terms in the index is loaded into + memory. By setting this to a value > 1 you can reduce + memory usage, at the expense of higher latency when + loading a TermInfo. The default value is 1.

+ + NOTE: you must call this before the term + index is loaded. If the index is already loaded, + an IllegalStateException is thrown. +

+ IllegalStateException if the term index has already been loaded into memory + Please use {@link IndexReader#Open(Directory, IndexDeletionPolicy, boolean, int)} to specify the required TermInfos index divisor instead. + +
+ +

For IndexReader implementations that use + TermInfosReader to read terms, this returns the + current indexDivisor as specified when the reader was + opened. +

+
+ + Check whether any new changes have occurred to the index since this + reader was opened. + +

+ If this reader is based on a Directory (ie, was created by calling + {@link #open}, or {@link #reopen} on a reader based on a Directory), then + this method checks if any further commits (see {@link IndexWriter#commit} + have occurred in that directory). +

+ +

+ If instead this reader is a near real-time reader (ie, obtained by a call + to {@link IndexWriter#getReader}, or by calling {@link #reopen} on a near + real-time reader), then this method checks if either a new commmit has + occurred, or any new uncommitted changes have taken place via the writer. + Note that even if the writer has only performed merging, this method will + still return false. +

+ +

+ In any event, if this returns false, you should call {@link #reopen} to + get a new reader that sees the changes. +

+ +

+ CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + UnsupportedOperationException unless overridden in subclass +
+ + Checks is the index is optimized (if it has a single segment and + no deletions). Not implemented in the IndexReader base class. + + true if the index is optimized; false otherwise + + UnsupportedOperationException unless overridden in subclass + + + Return an array of term frequency vectors for the specified document. + The array contains a vector for each vectorized field in the document. + Each vector contains terms and frequencies for all terms in a given vectorized field. + If no such fields existed, the method returns null. The term vectors that are + returned may either be of type {@link TermFreqVector} + or of type {@link TermPositionVector} if + positions or offsets have been stored. + + + document for which term frequency vectors are returned + + array of term frequency vectors. May be null if no term vectors have been + stored for the specified document. + + IOException if index cannot be accessed + + + + + Return a term frequency vector for the specified document and field. The + returned vector contains terms and frequencies for the terms in + the specified field of this document, if the field had the storeTermVector + flag set. If termvectors had been stored with positions or offsets, a + {@link TermPositionVector} is returned. + + + document for which the term frequency vector is returned + + field for which the term frequency vector is returned. + + term frequency vector May be null if field does not exist in the specified + document or term vector was not stored. + + IOException if index cannot be accessed + + + + + Load the Term Vector into a user-defined data structure instead of relying on the parallel arrays of + the {@link TermFreqVector}. + + The number of the document to load the vector for + + The name of the field to load + + The {@link TermVectorMapper} to process the vector. Must not be null + + IOException if term vectors cannot be accessed or if they do not exist on the field and doc. specified. + + + + + Map all the term vectors for all fields in a Document + The number of the document to load the vector for + + The {@link TermVectorMapper} to process the vector. Must not be null + + IOException if term vectors cannot be accessed or if they do not exist on the field and doc. specified. + + + Returns true if an index exists at the specified directory. + If the directory does not exist or if there is no index in it. + false is returned. + + the directory to check for an index + + true if an index exists; false otherwise + + Use {@link #IndexExists(Directory)} instead + This method will be removed in the 3.0 release. + + + + + Returns true if an index exists at the specified directory. + If the directory does not exist or if there is no index in it. + + the directory to check for an index + + true if an index exists; false otherwise + + Use {@link #IndexExists(Directory)} instead. + This method will be removed in the 3.0 release. + + + + + Returns true if an index exists at the specified directory. + If the directory does not exist or if there is no index in it. + + the directory to check for an index + + true if an index exists; false otherwise + + IOException if there is a problem with accessing the index + + + Returns the number of documents in this index. + + + Returns one greater than the largest possible document number. + This may be used to, e.g., determine how big to allocate an array which + will have an element for every document number in an index. + + + + Returns the number of deleted documents. + + + Returns the stored fields of the nth + Document in this index. +

+ NOTE: for performance reasons, this method does not check if the + requested document is deleted, and therefore asking for a deleted document + may yield unspecified results. Usually this is not required, however you + can call {@link #IsDeleted(int)} with the requested document ID to verify + the document is not deleted. + +

+ CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error +
+ + Get the {@link Lucene.Net.Documents.Document} at the n + th position. The {@link FieldSelector} may be used to determine + what {@link Lucene.Net.Documents.Field}s to load and how they should + be loaded. NOTE: If this Reader (more specifically, the underlying + FieldsReader) is closed before the lazy + {@link Lucene.Net.Documents.Field} is loaded an exception may be + thrown. If you want the value of a lazy + {@link Lucene.Net.Documents.Field} to be available after closing you + must explicitly load it or fetch the Document again with a new loader. +

+ NOTE: for performance reasons, this method does not check if the + requested document is deleted, and therefore asking for a deleted document + may yield unspecified results. Usually this is not required, however you + can call {@link #IsDeleted(int)} with the requested document ID to verify + the document is not deleted. + +

+ Get the document at the nth position + + The {@link FieldSelector} to use to determine what + Fields should be loaded on the Document. May be null, in which case + all Fields will be loaded. + + The stored fields of the + {@link Lucene.Net.Documents.Document} at the nth position + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + + + + + + +
+ + Returns true if document n has been deleted + + + Returns true if any documents have been deleted + + + Returns true if there are norms stored for this field. + + + Returns the byte-encoded normalization factor for the named field of + every document. This is used by the search code to score documents. + + + + + + + Reads the byte-encoded normalization factor for the named field of every + document. This is used by the search code to score documents. + + + + + + + Expert: Resets the normalization factor for the named field of the named + document. The norm represents the product of the field's {@link + Lucene.Net.Documents.Fieldable#SetBoost(float) boost} and its {@link Similarity#LengthNorm(String, + int) length normalization}. Thus, to preserve the length normalization + values when resetting this, one should base the new value upon the old. + + NOTE: If this field does not store norms, then + this method call will silently do nothing. + + + + + + + StaleReaderException if the index has changed + since this reader was opened + + CorruptIndexException if the index is corrupt + LockObtainFailedException if another writer + has this index open (write.lock could not + be obtained) + + IOException if there is a low-level IO error + + + Implements setNorm in subclass. + + + Expert: Resets the normalization factor for the named field of the named + document. + + + + + + + + StaleReaderException if the index has changed + since this reader was opened + + CorruptIndexException if the index is corrupt + LockObtainFailedException if another writer + has this index open (write.lock could not + be obtained) + + IOException if there is a low-level IO error + + + Returns an enumeration of all the terms in the index. The + enumeration is ordered by Term.compareTo(). Each term is greater + than all that precede it in the enumeration. Note that after + calling terms(), {@link TermEnum#Next()} must be called + on the resulting enumeration before calling other methods such as + {@link TermEnum#Term()}. + + IOException if there is a low-level IO error + + + Returns an enumeration of all terms starting at a given term. If + the given term does not exist, the enumeration is positioned at the + first term greater than the supplied term. The enumeration is + ordered by Term.compareTo(). Each term is greater than all that + precede it in the enumeration. + + IOException if there is a low-level IO error + + + Returns the number of documents containing the term t. + IOException if there is a low-level IO error + + + Returns an enumeration of all the documents which contain + term. For each document, the document number, the frequency of + the term in that document is also provided, for use in + search scoring. If term is null, then all non-deleted + docs are returned with freq=1. + Thus, this method implements the mapping: +

    + Term    =>    <docNum, freq>* +
+

The enumeration is ordered by document number. Each document number + is greater than all that precede it in the enumeration. +

+ IOException if there is a low-level IO error +
+ + Returns an unpositioned {@link TermDocs} enumerator. + IOException if there is a low-level IO error + + + Returns an enumeration of all the documents which contain + term. For each document, in addition to the document number + and frequency of the term in that document, a list of all of the ordinal + positions of the term in the document is available. Thus, this method + implements the mapping: + +

    + Term    =>    <docNum, freq, + <pos1, pos2, ... + posfreq-1> + >* +
+

This positional information facilitates phrase and proximity searching. +

The enumeration is ordered by document number. Each document number is + greater than all that precede it in the enumeration. +

+ IOException if there is a low-level IO error +
+ + Returns an unpositioned {@link TermPositions} enumerator. + IOException if there is a low-level IO error + + + Deletes the document numbered docNum. Once a document is + deleted it will not appear in TermDocs or TermPostitions enumerations. + Attempts to read its field with the {@link #document} + method will result in an error. The presence of this document may still be + reflected in the {@link #docFreq} statistic, though + this will be corrected eventually as the index is further modified. + + + StaleReaderException if the index has changed + since this reader was opened + + CorruptIndexException if the index is corrupt + LockObtainFailedException if another writer + has this index open (write.lock could not + be obtained) + + IOException if there is a low-level IO error + + + Implements deletion of the document numbered docNum. + Applications should call {@link #DeleteDocument(int)} or {@link #DeleteDocuments(Term)}. + + + + Deletes all documents that have a given term indexed. + This is useful if one uses a document field to hold a unique ID string for + the document. Then to delete such a document, one merely constructs a + term with the appropriate field and the unique ID string as its text and + passes it to this method. + See {@link #DeleteDocument(int)} for information about when this deletion will + become effective. + + + the number of documents deleted + + StaleReaderException if the index has changed + since this reader was opened + + CorruptIndexException if the index is corrupt + LockObtainFailedException if another writer + has this index open (write.lock could not + be obtained) + + IOException if there is a low-level IO error + + + Undeletes all documents currently marked as deleted in this index. + + + StaleReaderException if the index has changed + since this reader was opened + + LockObtainFailedException if another writer + has this index open (write.lock could not + be obtained) + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Implements actual undeleteAll() in subclass. + + + Does nothing by default. Subclasses that require a write lock for + index modifications must implement this method. + + + + + IOException + + + Opaque Map (String -> String) + that's recorded into the segments file in the index, + and retrievable by {@link + IndexReader#getCommitUserData}. + + IOException + + + Commit changes resulting from delete, undeleteAll, or + setNorm operations + + If an exception is hit, then either no changes or all + changes will have been committed to the index + (transactional semantics). + + IOException if there is a low-level IO error + + + Commit changes resulting from delete, undeleteAll, or + setNorm operations + + If an exception is hit, then either no changes or all + changes will have been committed to the index + (transactional semantics). + + IOException if there is a low-level IO error + + + Implements commit. + Please implement {@link #DoCommit(Map) + instead}. + + + + Implements commit. NOTE: subclasses should override + this. In 3.0 this will become an abstract method. + + + + Closes files associated with this index. + Also saves any new deletions to disk. + No other methods should be called after this has been called. + + IOException if there is a low-level IO error + + + Implements close. + + + Get a list of unique field names that exist in this index and have the specified + field option information. + + specifies which field option should be available for the returned fields + + Collection of Strings indicating the names of the fields. + + + + + + Returns true iff the index in the named directory is + currently locked. + + the directory to check for a lock + + IOException if there is a low-level IO error + Please use {@link IndexWriter#IsLocked(Directory)} instead. + This method will be removed in the 3.0 release. + + + + + Returns true iff the index in the named directory is + currently locked. + + the directory to check for a lock + + IOException if there is a low-level IO error + Use {@link #IsLocked(Directory)} instead. + This method will be removed in the 3.0 release. + + + + + Forcibly unlocks the index in the named directory. +

+ Caution: this should only be used by failure recovery code, + when it is known that no other process nor thread is in fact + currently accessing this index. +

+ Please use {@link IndexWriter#Unlock(Directory)} instead. + This method will be removed in the 3.0 release. + + +
+ + Expert: return the IndexCommit that this reader has + opened. This method is only implemented by those + readers that correspond to a Directory with its own + segments_N file. + +

WARNING: this API is new and experimental and + may suddenly change.

+

+
+ + Prints the filename and size of each file within a given compound file. + Add the -extract flag to extract files to the current working directory. + In order to make the extracted version of the index work, you have to copy + the segments file from the compound index into the directory where the extracted files are stored. + + Usage: Lucene.Net.Index.IndexReader [-extract] <cfsfile> + + + + Returns all commit points that exist in the Directory. + Normally, because the default is {@link + KeepOnlyLastCommitDeletionPolicy}, there would be only + one commit point. But if you're using a custom {@link + IndexDeletionPolicy} then there could be many commits. + Once you have a given commit, you can open a reader on + it by calling {@link IndexReader#Open(IndexCommit)} + There must be at least one commit in + the Directory, else this method throws {@link + java.io.IOException}. Note that if a commit is in + progress while this method is running, that commit + may or may not be returned array. + + + + Expert: returns the sequential sub readers that this + reader is logically composed of. For example, + IndexSearcher uses this API to drive searching by one + sub reader at a time. If this reader is not composed + of sequential child readers, it should return null. + If this method returns an empty array, that means this + reader is a null reader (for example a MultiReader + that has no sub readers). +

+ NOTE: You should not try using sub-readers returned by + this method to make any changes (setNorm, deleteDocument, + etc.). While this might succeed for one composite reader + (like MultiReader), it will most likely lead to index + corruption for other readers (like DirectoryReader obtained + through {@link #open}. Use the parent reader directly. +

+
+ + Expert + + + + + Returns the number of unique terms (across all fields) + in this reader. + + This method returns long, even though internally + Lucene cannot handle more than 2^31 unique terms, for + a possible future when this limitation is removed. + + + UnsupportedOperationException if this count + cannot be easily determined (eg Multi*Readers). + Instead, you should call {@link + #getSequentialSubReaders} and ask each sub reader for + its unique term count. + + + + Expert: Return the state of the flag that disables fakes norms in favor of representing the absence of field norms with null. + true if fake norms are disabled + + This currently defaults to false (to remain + back-compatible), but in 3.0 it will be hardwired to + true, meaning the norms() methods will return null for + fields that had disabled norms. + + + + Expert: Set the state of the flag that disables fakes norms in favor of representing the absence of field norms with null. + true to disable fake norms, false to preserve the legacy behavior + + This currently defaults to false (to remain + back-compatible), but in 3.0 it will be hardwired to + true, meaning the norms() methods will return null for + fields that had disabled norms. + + + + Utility class for executing code that needs to do + something with the current segments file. This is + necessary with lock-less commits because from the time + you locate the current segments file name, until you + actually open it, read its contents, or check modified + time, etc., it could have been deleted due to a writer + commit finishing. + + + + A collection of segmentInfo objects with methods for operating on + those segments in relation to the file system.

NOTE: This API is new and still experimental (subject to change suddenly in the next release)

- -

NOTE: This class typically requires access to - package-private APIs (eg, SegmentInfos) to do its job; - if you implement your own MergePolicy, you'll need to put - it in package Lucene.Net.Index in order to use - these APIs.

- - Run the merges provided by {@link IndexWriter#GetNextMerge()}. + + The file format version, a negative number. - - Close this MergeScheduler. - - - Sets the max # simultaneous threads that may be - running. If a merge is necessary yet we already have - this many threads running, the incoming thread (that - is calling add/updateDocument) will block until - a merge thread has completed. + + This format adds details used for lockless commits. It differs + slightly from the previous format in that file names + are never re-used (write once). Instead, each file is + written to the next generation. For example, + segments_1, segments_2, etc. This allows us to not use + a commit lock. See file + formats for details. - - Get the max # simultaneous threads that may be - + + This format adds a "hasSingleNormFile" flag into each segment info. + See LUCENE-756 + for details. + + + + This format allows multiple segments to share a single + vectors and stored fields file. + + + + This format adds a checksum at the end of the file to + ensure all bytes were successfully written. + + + + This format adds the deletion count for each segment. + This way IndexWriter can efficiently report numDocs(). + + + + This format adds the boolean hasProx to record if any + fields in the segment store prox information (ie, have + omitTermFreqAndPositions==false) + + + + This format adds optional commit userData (String) storage. + + + This format adds optional per-segment String + dianostics storage, and switches userData to Map + + + + counts how often the index has been changed by adding or deleting docs. + starting with the current time in milliseconds forces to create unique version numbers. + + + + If non-null, information about loading segments_N files + - - Return the priority that merge threads run at. By - default the priority is 1 plus the priority of (ie, - slightly higher priority than) the first thread that - calls merge. - - - - Return the priority that merge threads run at. - - - Does the actual merge, by calling {@link IndexWriter#merge} - - - Create and return a new MergeThread - - - Called when an exception is hit in a background merge - thread - - - - Used for testing - - - Used for testing - - - Used for testing - - - Used for testing - - - Used for testing - - - Declare what fields to load normally and what fields to load lazily - + + Get the generation (N) of the current segments_N file + from a list of files. - - - Pass in the Set of {@link Field} names to load and the Set of {@link Field} names to load lazily. If both are null, the - Document will not have any {@link Field} on it. - - A Set of {@link String} field names to load. May be empty, but not null - - A Set of {@link String} field names to load lazily. May be empty, but not null + -- array of file names to check - - Indicate whether to load the field with the given name or not. If the {@link Field#Name()} is not in either of the - initializing Sets, then {@link Lucene.Net.Documents.FieldSelectorResult#NO_LOAD} is returned. If a Field name - is in both fieldsToLoad and lazyFieldsToLoad, lazy has precedence. + + Get the generation (N) of the current segments_N file + in the directory. - The {@link Field} name to check + -- directory to search for the latest segments_N file - The {@link FieldSelectorResult} + + + Get the filename of the current segments_N file + from a list of files. + + + -- array of file names to check + + + + Get the filename of the current segments_N file + in the directory. + + + -- directory to search for the latest segments_N file + + + + Get the segments_N filename in use by this segment infos. + + + Parse the generation off the segments file name and + return it. + + + + Get the next segments_N filename that will be written. + + + Read a particular segmentFileName. Note that this may + throw an IOException if a commit is in process. + + + -- directory containing the segments file + + -- segment file to load + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + This version of read uses the retry logic (for lock-less + commits) to find the right segments file to load. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Returns a copy of this instance, also copying each + SegmentInfo. + + + + version number when this SegmentInfos was generated. + + + Current version number from segments file. + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Returns userData from latest segments file + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + If non-null, information about retries when loading + the segments file will be printed to this. + + + + Advanced: set how many times to try loading the + segments.gen file contents to determine current segment + generation. This file is only referenced when the + primary method (listing the directory) fails. + + + + + + + + Advanced: set how many milliseconds to pause in between + attempts to load the segments.gen file. + + + + + + + + Advanced: set how many times to try incrementing the + gen when loading the segments file. This only runs if + the primary (listing directory) and secondary (opening + segments.gen file) methods fail to find the segments + file. + + + + + + + + + + + + Returns a new SegmentInfos containg the SegmentInfo + instances in the specified range first (inclusive) to + last (exclusive), so total number of segments returned + is last-first. + + + + Call this to start a commit. This writes the new + segments file, but writes an invalid checksum at the + end, so that it is not visible to readers. Once this + is called you must call {@link #finishCommit} to complete + the commit or {@link #rollbackCommit} to abort it. + + + + Returns all file names referenced by SegmentInfo + instances matching the provided Directory (ie files + associated with any "external" segments are skipped). + The returned collection is recomputed on each + invocation. + + + + Writes & syncs to the Directory dir, taking care to + remove the segments file on exception + + + + Replaces all segments in this instance, but keeps + generation, version, counter so that future commits + remain write once. + + + + + Simple brute force implementation. + If size is equal, compare items one by one. + + SegmentInfos object to check equality for + true if lists are equal, false otherwise + + + + Calculate hash code of SegmentInfos + + hash code as in java version of ArrayList + + + Utility class for executing code that needs to do + something with the current segments file. This is + necessary with lock-less commits because from the time + you locate the current segments file name, until you + actually open it, read its contents, or check modified + time, etc., it could have been deleted due to a writer + commit finishing. + + + + Subclass must implement this. The assumption is an + IOException will be thrown if something goes wrong + during the processing that could have been caused by + a writer committing. + + + + Constants describing field properties, for example used for + {@link IndexReader#GetFieldNames(FieldOption)}. + + + + All fields + + + All indexed fields + + + All fields that store payloads + + + All fields that omit tf + + + Renamed to {@link #OMIT_TERM_FREQ_AND_POSITIONS} + + + + All fields which are not indexed + + + All fields which are indexed with termvectors enabled + + + All fields which are indexed but don't have termvectors enabled + + + All fields with termvectors enabled. Please note that only standard termvector fields are returned + + + All fields with termvectors with position values enabled + + + All fields with termvectors with offset values enabled + + + All fields with termvectors with offset values and position values enabled + + + Please subclass IndexCommit class instead + + + + Get the segments file (segments_N) associated + with this commit point. + + + + Returns all index files referenced by this commit point. + + + Delete this commit point. +

+ Upon calling this, the writer is notified that this commit + point should be deleted. +

+ Decision that a commit-point should be deleted is taken by the {@link IndexDeletionPolicy} in effect + and therefore this should only be called by its {@link IndexDeletionPolicy#onInit onInit()} or + {@link IndexDeletionPolicy#onCommit onCommit()} methods. +

+
+ + Add a new position & payload. If payloadLength > 0 + you must read those bytes from the IndexInput. + + + + Called when we are done adding positions & payloads + + + Holds all per thread, per field state. + + + This is just a "splitter" class: it lets you wrap two + DocFieldConsumer instances as a single consumer. + + + + Called when DocumentsWriter decides to create a new + segment + + + + Called when DocumentsWriter decides to close the doc + stores + + + + Called when an aborting exception is hit + + + Add a new thread + + + Called when DocumentsWriter is using too much RAM. + The consumer should free RAM, if possible, returning + true if any RAM was in fact freed. + + + + Consumer returns this on each doc. This holds any + state that must be flushed synchronized "in docID + order". We gather these and flush them in order. + + + + This class accepts multiple added documents and directly + writes a single segment file. It does this more + efficiently than creating a single segment per document + (with DocumentWriter) and doing standard merges on those + segments. + + Each added document is passed to the {@link DocConsumer}, + which in turn processes the document and interacts with + other consumers in the indexing chain. Certain + consumers, like {@link StoredFieldsWriter} and {@link + TermVectorsTermsWriter}, digest a document and + immediately write bytes to the "doc store" files (ie, + they do not consume RAM per document, except while they + are processing the document). + + Other consumers, eg {@link FreqProxTermsWriter} and + {@link NormsWriter}, buffer bytes in RAM and flush only + when a new segment is produced. + Once we have used our allowed RAM buffer, or the number + of added docs is large enough (in the case we are + flushing by doc count instead of RAM usage), we create a + real segment and flush it to the Directory. + + Threads: + + Multiple threads are allowed into addDocument at once. + There is an initial synchronized call to getThreadState + which allocates a ThreadState for this thread. The same + thread will get the same ThreadState over time (thread + affinity) so that if there are consistent patterns (for + example each thread is indexing a different content + source) then we make better use of RAM. Then + processDocument is called on that ThreadState without + synchronization (most of the "heavy lifting" is in this + call). Finally the synchronized "finishDocument" is + called to flush changes to the directory. + + When flush is called by IndexWriter, or, we flush + internally when autoCommit=false, we forcefully idle all + threads and flush only once they are all idle. This + means you can call flush with a given thread even while + other threads are actively adding/deleting documents. + + + Exceptions: + + Because this class directly updates in-memory posting + lists, and flushes stored fields and term vectors + directly to files in the directory, there are certain + limited times when an exception can corrupt this state. + For example, a disk full while flushing stored fields + leaves this file in a corrupt state. Or, an OOM + exception while appending to the in-memory posting lists + can corrupt that posting list. We call such exceptions + "aborting exceptions". In these cases we must call + abort() to discard all docs added since the last flush. + + All other exceptions ("non-aborting exceptions") can + still partially update the index structures. These + updates are consistent, but, they represent only a part + of the document seen up until the exception was hit. + When this happens, we immediately mark the document as + deleted so that the document is always atomically ("all + or none") added to the index. + + + + Returns true if any of the fields in the current + buffered docs have omitTermFreqAndPositions==false + + + + If non-null, various details of indexing are printed + here. + + + + Set how much RAM we can use before flushing. + + + Set max buffered docs, which means we will flush by + doc count instead of by RAM usage. + + + + Get current segment name we are writing. + + + Returns how many docs are currently buffered in RAM. + + + Returns the current doc store segment we are writing + to. This will be the same as segment when autoCommit + * is true. + + + + Returns the doc offset into the shared doc store for + the current buffered docs. + + + + Closes the current open doc stores an returns the doc + store segment name. This returns null if there are * + no buffered documents. + + + + Called if we hit an exception at a bad time (when + updating the index files) and must discard all + currently buffered docs. This resets our state, + discarding any docs added since last flush. + + + + Reset after a flush + + + Flush all pending docs to a new segment + + + Build compound file for the segment we just flushed + + + Set flushPending if it is not already set and returns + whether it was set. This is used by IndexWriter to + trigger a single flush even when multiple threads are + trying to do so. + + + + Returns a free (idle) ThreadState that may be used for + indexing this one document. This call also pauses if a + flush is pending. If delTerm is non-null then we + buffer this deleted term after the thread state has + been acquired. + + + + Returns true if the caller (IndexWriter) should now + flush. + + + + Called whenever a merge has completed and the merged segments had deletions + + + Does the synchronized work to finish/flush the + inverted document. + + + + The IndexingChain must define the {@link #GetChain(DocumentsWriter)} method + which returns the DocConsumer that the DocumentsWriter calls to process the + documents. + + + + Consumer returns this on each doc. This holds any + state that must be flushed synchronized "in docID + order". We gather these and flush them in order. + + + + Process the document. If there is + something for this document to be done in docID order, + you should encapsulate that as a + DocumentsWriter.DocWriter and return it. + DocumentsWriter then calls finish() on this object + when it's its turn. + + + + Holds buffered deletes, by docID, term or query. We + hold two instances of this class: one for the deletes + prior to the last flush, the other for deletes after + the last flush. This is so if we need to abort + (discard all buffered docs) we can also discard the + buffered deletes yet keep the deletes done during + previously flushed segments. + + + + A WhitespaceTokenizer is a tokenizer that divides text at whitespace. + Adjacent sequences of non-Whitespace characters form tokens. + + + + An abstract base class for simple, character-oriented tokenizers. + + + A Tokenizer is a TokenStream whose input is a Reader. +

+ This is an abstract class; subclasses must override {@link #IncrementToken()} +

+ NOTE: Subclasses overriding {@link #next(Token)} must call + {@link AttributeSource#ClearAttributes()} before setting attributes. + Subclasses overriding {@link #IncrementToken()} must call + {@link Token#Clear()} before setting Token attributes. +

+
+ + The text source for this Tokenizer. + + + Construct a tokenizer with null input. + + + Construct a token stream processing the given input. + + + Construct a tokenizer with null input using the given AttributeFactory. + + + Construct a token stream processing the given input using the given AttributeFactory. + + + Construct a token stream processing the given input using the given AttributeSource. + + + Construct a token stream processing the given input using the given AttributeSource. + + + By default, closes the input Reader. + + + Return the corrected offset. If {@link #input} is a {@link CharStream} subclass + this method calls {@link CharStream#CorrectOffset}, else returns currentOff. + + offset as seen in the output + + corrected offset based on the input + + + + + + Expert: Reset the tokenizer to a new reader. Typically, an + analyzer (in its reusableTokenStream method) will use + this to re-use a previously created tokenizer. + + + + Returns true iff a character should be included in a token. This + tokenizer generates as tokens adjacent sequences of characters which + satisfy this predicate. Characters for which this is false are used to + define token boundaries and are not included in tokens. + + + + Called on each token character to normalize it before it is added to the + token. The default implementation does nothing. Subclasses may use this + to, e.g., lowercase tokens. + + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + Construct a new WhitespaceTokenizer. + + + Construct a new WhitespaceTokenizer using a given {@link AttributeSource}. + + + Construct a new WhitespaceTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. + + + Collects only characters which do not satisfy + {@link Character#isWhitespace(char)}. + + + + An {@link Analyzer} that filters {@link LetterTokenizer} + with {@link LowerCaseFilter} + + + + An Analyzer builds TokenStreams, which analyze text. It thus represents a + policy for extracting index terms from text. +

+ Typical implementations first build a Tokenizer, which breaks the stream of + characters from the Reader into raw Tokens. One or more TokenFilters may + then be applied to the output of the Tokenizer. +

+
+ + Creates a TokenStream which tokenizes all the text in the provided + Reader. Must be able to handle null field name for + backward compatibility. + + + + Creates a TokenStream that is allowed to be re-used + from the previous time that the same thread called + this method. Callers that do not need to use more + than one TokenStream at the same time from this + analyzer should use this method for better + performance. + + + + Used by Analyzers that implement reusableTokenStream + to retrieve previously saved TokenStreams for re-use + by the same thread. + + + + Used by Analyzers that implement reusableTokenStream + to save a TokenStream for later re-use by the same + thread. + + + + This is only present to preserve + back-compat of classes that subclass a core analyzer + and override tokenStream but not reusableTokenStream + + + + Invoked before indexing a Fieldable instance if + terms have already been added to that field. This allows custom + analyzers to place an automatic position increment gap between + Fieldable instances using the same field name. The default value + position increment gap is 0. With a 0 position increment gap and + the typical default token position increment of 1, all terms in a field, + including across Fieldable instances, are in successive positions, allowing + exact PhraseQuery matches, for instance, across Fieldable instance boundaries. + + + Fieldable name being indexed. + + position increment gap, added to the next token emitted from {@link #TokenStream(String,Reader)} - - A Token's lexical type. The Default value is "word". - - - A Token's lexical type. The Default value is "word". - - - Returns this Token's lexical type. Defaults to "word". - - - Set the lexical type. - - - - - Returns this Token's lexical type. Defaults to "word". - - - Set the lexical type. - - - - - Simple DocIdSet and DocIdSetIterator backed by a BitSet - - - This DocIdSet implementation is cacheable. - - - Returns the underlying BitSet. - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - A {@link HitCollector} implementation that collects the top-sorting - documents, returning them as a {@link TopFieldDocs}. This is used by {@link - IndexSearcher} to implement {@link TopFieldDocs}-based search. - -

This may be extended, overriding the collect method to, e.g., - conditionally invoke super() in order to filter which - documents are collected. + +

Just like {@link #getPositionIncrementGap}, except for + Token offsets instead. By default this returns 1 for + tokenized fields and, as if the fields were joined + with an extra space character, and 0 for un-tokenized + fields. This method is only called if the field + produced at least one token for indexing. - Please use {@link TopFieldCollector} instead. - - - - Construct to collect a given number of hits. - the index to be searched - - the sort criteria - - the maximum number of hits to collect + the field just indexed + offset gap, added to the next token emitted from {@link #TokenStream(String,Reader)} + - - Matches spans near the beginning of a field. + + Frees persistent resources used by this Analyzer - - Construct a SpanFirstQuery matching spans in match whose end - position is less than or equal to end. + + CharReader is a Reader wrapper. It reads chars from + Reader and outputs {@link CharStream}, defining an + identify function {@link #CorrectOffset} method that + simply returns the provided offset. - - Return the SpanQuery whose matches are filtered. + + This class can be used if the token attributes of a TokenStream + are intended to be consumed more than once. It caches + all token attribute states locally in a List. + +

CachingTokenFilter implements the optional method + {@link TokenStream#Reset()}, which repositions the + stream to the first Token. +

- - Return the maximum end position permitted in a match. - - - Returns a collection of all terms matched by this query. - use extractTerms instead + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. - + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + + This interface should be implemented by any class whose instances are intended + to be executed by a thread. + + + + + This method has to be implemented in order that starting of the thread causes the object's + run method to be called in that separately executing thread. + + + + + Contains conversion support elements such as classes, interfaces and static methods. + + + + + Copies an array of chars obtained from a String into a specified array of chars + + The String to get the chars from + Position of the String to start getting the chars + Position of the String to end getting the chars + Array to return the chars + Position of the destination array of chars to start storing the chars + An array of chars + + + + Support class used to handle threads + + + + + The instance of System.Threading.Thread + + + + + Initializes a new instance of the ThreadClass class + + + + + Initializes a new instance of the Thread class. + + The name of the thread + + + + Initializes a new instance of the Thread class. + + A ThreadStart delegate that references the methods to be invoked when this thread begins executing + + + + Initializes a new instance of the Thread class. + + A ThreadStart delegate that references the methods to be invoked when this thread begins executing + The name of the thread + + + + This method has no functionality unless the method is overridden + + + + + Causes the operating system to change the state of the current thread instance to ThreadState.Running + + + + + Interrupts a thread that is in the WaitSleepJoin thread state + + + + + Blocks the calling thread until a thread terminates + + + + + Blocks the calling thread until a thread terminates or the specified time elapses + + Time of wait in milliseconds + + + + Blocks the calling thread until a thread terminates or the specified time elapses + + Time of wait in milliseconds + Time of wait in nanoseconds + + + + Resumes a thread that has been suspended + + + + + Raises a ThreadAbortException in the thread on which it is invoked, + to begin the process of terminating the thread. Calling this method + usually terminates the thread + + + + + Raises a ThreadAbortException in the thread on which it is invoked, + to begin the process of terminating the thread while also providing + exception information about the thread termination. + Calling this method usually terminates the thread. + + An object that contains application-specific information, such as state, which can be used by the thread being aborted + + + + Suspends the thread, if the thread is already suspended it has no effect + + + + + Obtain a String that represents the current object + + A String that represents the current object + + + + Gets the currently running thread + + The currently running thread + + + + Gets the current thread instance + + + + + Gets or sets the name of the thread + + + + + Gets or sets a value indicating the scheduling priority of a thread + + + + + Gets a value indicating the execution status of the current thread + + + + + Gets or sets a value indicating whether or not a thread is a background thread. + + + + + Represents the methods to support some operations over files. + + + + + Returns an array of abstract pathnames representing the files and directories of the specified path. + + The abstract pathname to list it childs. + An array of abstract pathnames childs of the path specified or null if the path is not a directory + + + + Returns a list of files in a give directory. + + The full path name to the directory. + + An array containing the files. + + + + Flushes the specified file stream. Ensures that all buffered + data is actually written to the file system. + + The file stream. + + + + A simple class for number conversions. + + + + + Min radix value. + + + + + Max radix value. + + + + + Converts a number to System.String. + + + + + + + Converts a number to System.String. + + + + + + + Converts a number to System.String in the specified radix. + + A number to be converted. + A radix. + A System.String representation of the number in the specified redix. + + + + Parses a number in the specified radix. + + An input System.String. + A radix. + The parsed number in the specified radix. + + + + Performs an unsigned bitwise right shift with the specified number + + Number to operate on + Ammount of bits to shift + The resulting number from the shift operation + + + + Performs an unsigned bitwise right shift with the specified number + + Number to operate on + Ammount of bits to shift + The resulting number from the shift operation + + + + Returns the index of the first bit that is set to true that occurs + on or after the specified starting index. If no such bit exists + then -1 is returned. + + The BitArray object. + The index to start checking from (inclusive). + The index of the next set bit. + + + + Converts a System.String number to long. + + + + + + + Mimics Java's Character class. + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + This class provides supporting methods of java.util.BitSet + that are not present in System.Collections.BitArray. + + + + + Returns the next set bit at or after index, or -1 if no such bit exists. + + + the index of bit array at which to start checking + the next set bit or -1 + + + + Returns the next un-set bit at or after index, or -1 if no such bit exists. + + + the index of bit array at which to start checking + the next set bit or -1 + + + + Returns the number of bits set to true in this BitSet. + + The BitArray object. + The number of bits set to true in this BitSet. + + + + Summary description for TestSupportClass. + + + + + Compares two Term arrays for equality. + + First Term array to compare + Second Term array to compare + true if the Terms are equal in both arrays, false otherwise + + + + A Hashtable which holds weak references to its keys so they + can be collected during GC. + + + + + Serves as a simple "GC Monitor" that indicates whether cleanup is needed. + If collectableObject.IsAlive is false, GC has occurred and we should perform cleanup + + + + + Customize the hashtable lookup process by overriding KeyEquals. KeyEquals + will compare both WeakKey to WeakKey and WeakKey to real keys + + + + + Perform cleanup if GC occurred + + + + + Iterate over all keys and remove keys that were collected + + + + + Wrap each key with a WeakKey and add it to the hashtable + + + + + Create a temporary copy of the real keys and return that + + + + + A weak referene wrapper for the hashtable keys. Whenever a key\value pair + is added to the hashtable, the key is wrapped using a WeakKey. WeakKey saves the + value of the original object hashcode for fast comparison. + + + + + A Dictionary enumerator which wraps the original hashtable enumerator + and performs 2 tasks: Extract the real key from a WeakKey and skip keys + that were already collected. + + + + + Support class used to handle Hashtable addition, which does a check + first to make sure the added item is unique in the hash. + + + + + Converts the specified collection to its string representation. + + The collection to convert to string. + A string representation of the specified collection. + + + + Compares two string arrays for equality. + + First string array list to compare + Second string array list to compare + true if the strings are equal in both arrays, false otherwise + + + + Sorts an IList collections + + The System.Collections.IList instance that will be sorted + The Comparator criteria, null to use natural comparator. + + + + Fills the array with an specific value from an specific index to an specific index. + + The array to be filled. + The first index to be filled. + The last index to be filled. + The value to fill the array with. + + + + Fills the array with an specific value. + + The array to be filled. + The value to fill the array with. + + + + Compares the entire members of one array whith the other one. + + The array to be compared. + The array to be compared with. + Returns true if the two specified arrays of Objects are equal + to one another. The two arrays are considered equal if both arrays + contain the same number of elements, and all corresponding pairs of + elements in the two arrays are equal. Two objects e1 and e2 are + considered equal if (e1==null ? e2==null : e1.equals(e2)). In other + words, the two arrays are equal if they contain the same elements in + the same order. Also, two array references are considered equal if + both are null. + + + A collection of which can be + looked up by instances of . + The type of the items contains in this + collection. + The type of the keys that can be used to look + up the items. + + + Creates a new instance of the + class. + The which will convert + instances of to + when the override of is called. + + + The which will convert + instances of to + when the override of is called. + + + Converts an item that is added to the collection to + a key. + The instance of + to convert into an instance of . + The instance of which is the + key for this item. + + + Determines if a key for an item exists in this + collection. + The instance of + to see if it exists in this collection. + True if the key exists in the collection, false otherwise. + + + Represents a strongly typed list of objects that can be accessed by index. + Provides methods to search, sort, and manipulate lists. Also provides functionality + to compare lists against each other through an implementations of + . + The type of elements in the list. + + + Initializes a new instance of the + class that is empty and has the + default initial capacity. + + + Initializes a new instance of the + class that contains elements copied from the specified collection and has + sufficient capacity to accommodate the number of elements copied. + The collection whose elements are copied to the new list. + + + Initializes a new instance of the + class that is empty and has the specified initial capacity. + The number of elements that the new list can initially store. + + + Adds a range of objects represented by the + implementation. + The + implementation to add to this list. + + + Compares the counts of two + implementations. + This uses a trick in LINQ, sniffing types for implementations + of interfaces that might supply shortcuts when trying to make comparisons. + In this case, that is the and + interfaces, either of which can provide a count + which can be used in determining the equality of sequences (if they don't have + the same count, then they can't be equal). + The from the left hand side of the + comparison to check the count of. + The from the right hand side of the + comparison to check the count of. + Null if the result is indeterminate. This occurs when either + or doesn't implement or . + Otherwise, it will get the count from each and return true if they are equal, false otherwise. + + + Compares the contents of a + implementation to another one to determine equality. + Thinking of the implementation as + a string with any number of characters, the algorithm checks + each item in each list. If any item of the list is not equal (or + one list contains all the elements of another list), then that list + element is compared to the other list element to see which + list is greater. + The implementation + that is considered the left hand side. + The implementation + that is considered the right hand side. + True if the items are equal, false otherwise. + + + Compares this sequence to another + implementation, returning true if they are equal, false otherwise. + The other implementation + to compare against. + True if the sequence in + is the same as this one. + + + Compares this object for equality against other. + The other object to compare this object against. + True if this object and are equal, false + otherwise. + + + Gets the hash code for the list. + The hash code value. + + + Gets the hash code for the list. + The + implementation which will have all the contents hashed. + The hash code value. + + + Clones the . + This is a shallow clone. + A new shallow clone of this + . + + + + A simple wrapper to allow for the use of the GeneralKeyedCollection. The + wrapper is required as there can be several keys for an object depending + on how many interfaces it implements. + + + + + Provides platform infos. + + + + + Whether we run under a Unix platform. + + + + + Whether we run under a supported Windows platform. + + + + A straightforward implementation of {@link FSDirectory} + using java.io.RandomAccessFile. However, this class has + poor concurrent performance (multiple threads will + bottleneck) as it synchronizes when multiple threads + read from the same file. It's usually better to use + {@link NIOFSDirectory} or {@link MMapDirectory} instead. + + + + + Base class for Directory implementations that store index + files in the file system. There are currently three core + subclasses: + + + + Unfortunately, because of system peculiarities, there is + no single overall best implementation. Therefore, we've + added the {@link #open} method, to allow Lucene to choose + the best FSDirectory implementation given your + environment, and the known limitations of each + implementation. For users who have no reason to prefer a + specific implementation, it's best to simply use {@link + #open}. For all others, you should instantiate the + desired implementation directly. + +

The locking implementation is by default {@link + NativeFSLockFactory}, but can be changed by + passing in a custom {@link LockFactory} instance. + The deprecated getDirectory methods default to use + {@link SimpleFSLockFactory} for backwards compatibility. + The system properties + org.apache.lucene.store.FSDirectoryLockFactoryClass + and org.apache.lucene.FSDirectory.class + are deprecated and only used by the deprecated + getDirectory methods. The system property + org.apache.lucene.lockDir is ignored completely, + If you really want to store locks + elsewhere, you can create your own {@link + SimpleFSLockFactory} (or {@link NativeFSLockFactory}, + etc.) passing in your preferred lock directory. + +

In 3.0 this class will become abstract. + +

+
- - Expert: A hit queue for sorting by hits by terms in more than one field. - Uses FieldCache.DEFAULT for maintaining internal term lookup tables. + + A Directory is a flat list of files. Files may be written once, when they + are created. Once a file is created it may only be opened for read, or + deleted. Random access is permitted both when reading and writing. -

Created: Dec 8, 2003 12:56:03 PM +

Java's i/o APIs not used directly, but rather all i/o is + through this API. This permits things such as:

    +
  • implementation of RAM-based indices;
  • +
  • implementation indices stored in a database, via JDBC;
  • +
  • implementation of an index as a single file;
  • +
+ + Directory locking is implemented by an instance of {@link + LockFactory}, and can be changed for each Directory + instance using {@link #setLockFactory}. + +
+
+ + Holds the LockFactory instance (implements locking for + this Directory instance). + + + + For some Directory implementations ({@link + FSDirectory}, and its subclasses), this method + silently filters its results to include only index + files. Please use {@link #listAll} instead, which + does no filtering. + + + + Returns an array of strings, one for each file in the + directory. Unlike {@link #list} this method does no + filtering of the contents in a directory, and it will + never return null (throws IOException instead). + + Currently this method simply fallsback to {@link + #list} for Directory impls outside of Lucene's core & + contrib, but in 3.0 that method will be removed and + this method will become abstract. + + + + Returns true iff a file with the given name exists. + + + Returns the time the named file was last modified. + + + Set the modified time of an existing file to now. + + + Removes an existing file in the directory. + + + Renames an existing file in the directory. + If a file already exists with the new name, then it is replaced. + This replacement is not guaranteed to be atomic. + + + + + + Returns the length of a file in the directory. + + + Creates a new, empty file in the directory with the given name. + Returns a stream writing this file. + + + + Ensure that any writes to this file are moved to + stable storage. Lucene uses this to properly commit + changes to the index, to prevent a machine/OS crash + from corrupting the index. + + + + Returns a stream reading an existing file. + + + Returns a stream reading an existing file, with the + specified read buffer size. The particular Directory + implementation may ignore the buffer size. Currently + the only Directory implementations that respect this + parameter are {@link FSDirectory} and {@link + Lucene.Net.Index.CompoundFileReader}. + + + + Construct a {@link Lock}. + the name of the lock file + + + + Attempt to clear (forcefully unlock and remove) the + specified lock. Only call this at a time when you are + certain this lock is no longer in use. + + name of the lock to be cleared. + + + + Closes the store. + + + Set the LockFactory that this Directory instance should + use for its locking implementation. Each * instance of + LockFactory should only be used for one directory (ie, + do not share a single instance across multiple + Directories). + + + instance of {@link LockFactory}. + + + + Get the LockFactory that this Directory instance is + using for its locking implementation. Note that this + may be null for Directory implementations that provide + their own locking implementation. + + + + Return a string identifier that uniquely differentiates + this Directory instance from other Directory instances. + This ID should be the same if two Directory instances + (even in different JVMs and/or on different machines) + are considered "the same index". This is how locking + "scopes" to the right index. + + + + Copy contents of a directory src to a directory dest. + If a file in src already exists in dest then the + one in dest will be blindly overwritten. + +

NOTE: the source directory cannot change + while this method is running. Otherwise the results + are undefined and you could easily hit a + FileNotFoundException. + +

NOTE: this method only copies files that look + like index files (ie, have extensions matching the + known extensions of index files). + +

+ source directory + + destination directory + + if true, call {@link #Close()} method on source directory + + IOException +
+ + AlreadyClosedException if this Directory is closed + + + This cache of directories ensures that there is a unique Directory + instance per path, so that synchronization on the Directory can be used to + synchronize access between readers and writers. We use + refcounts to ensure when the last use of an FSDirectory + instance for a given canonical path is closed, we remove the + instance from the cache. See LUCENE-776 + for some relevant discussion. + + Not used by any non-deprecated methods anymore + + + + Set whether Lucene's use of lock files is disabled. By default, + lock files are enabled. They should only be disabled if the index + is on a read-only medium like a CD-ROM. + + Use a {@link #open(File, LockFactory)} or a constructor + that takes a {@link LockFactory} and supply + {@link NoLockFactory#getNoLockFactory}. This setting does not work + with {@link #open(File)} only the deprecated getDirectory + respect this setting. + + + + Returns whether Lucene's use of lock files is disabled. + true if locks are disabled, false if locks are enabled. + + + + Use a constructor that takes a {@link LockFactory} and + supply {@link NoLockFactory#getNoLockFactory}. + + + + The default class which implements filesystem-based directories. + + + A buffer optionally used in renameTo method + + + Returns the directory instance for the named location. + + + Use {@link #Open(File)} + + + the path to the directory. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use {@link #Open(File, LockFactory)} + + + the path to the directory. + + instance of {@link LockFactory} providing the + locking implementation. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use {@link #Open(File)} + + + the path to the directory. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use {@link #Open(File)} + + + the path to the directory. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use {@link #Open(File, LockFactory)} + + + the path to the directory. + + instance of {@link LockFactory} providing the + locking implementation. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use {@link #Open(File, LockFactory)} + + + the path to the directory. + + instance of {@link LockFactory} providing the + locking implementation. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use IndexWriter's create flag, instead, to + create a new index. + + + the path to the directory. + + if true, create, or erase any existing contents. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use IndexWriter's create flag, instead, to + create a new index. + + + the path to the directory. + + if true, create, or erase any existing contents. + + the FSDirectory for the named file. + + + + Returns the directory instance for the named location. + + + Use IndexWriter's create flag, instead, to + create a new index. + + + the path to the directory. + + if true, create, or erase any existing contents. + + the FSDirectory for the named file. + + + + + + + + Initializes the directory to create a new file with the given name. + This method should be used in {@link #createOutput}. + + + + The underlying filesystem directory + + + + + + + + + + + Create a new FSDirectory for the named location (ctor for subclasses). + the path of the directory + + the lock factory to use, or null for the default + ({@link NativeFSLockFactory}); + + IOException + + + Creates an FSDirectory instance, trying to pick the + best implementation given the current environment. + The directory returned uses the {@link NativeFSLockFactory}. + +

Currently this returns {@link SimpleFSDirectory} as + NIOFSDirectory is currently not supported. + +

Currently this returns {@link SimpleFSDirectory} as + NIOFSDirectory is currently not supported. + +

NOTE: this method may suddenly change which + implementation is returned from release to release, in + the event that higher performance defaults become + possible; if the precise implementation is important to + your application, please instantiate it directly, + instead. On 64 bit systems, it may also good to + return {@link MMapDirectory}, but this is disabled + because of officially missing unmap support in Java. + For optimal performance you should consider using + this implementation on 64 bit JVMs. + +

See above +

+
+ + Creates an FSDirectory instance, trying to pick the + best implementation given the current environment. + The directory returned uses the {@link NativeFSLockFactory}. + +

Currently this returns {@link SimpleFSDirectory} as + NIOFSDirectory is currently not supported. + +

NOTE: this method may suddenly change which + implementation is returned from release to release, in + the event that higher performance defaults become + possible; if the precise implementation is important to + your application, please instantiate it directly, + instead. On 64 bit systems, it may also good to + return {@link MMapDirectory}, but this is disabled + because of officially missing unmap support in Java. + For optimal performance you should consider using + this implementation on 64 bit JVMs. + +

See above +

+
+ + Just like {@link #Open(File)}, but allows you to + also specify a custom {@link LockFactory}. + + + + Lists all files (not subdirectories) in the + directory. This method never returns null (throws + {@link IOException} instead). + + + NoSuchDirectoryException if the directory + does not exist, or does exist but is not a + directory. + + IOException if list() returns null + + + Lists all files (not subdirectories) in the + directory. This method never returns null (throws + {@link IOException} instead). + + + NoSuchDirectoryException if the directory + does not exist, or does exist but is not a + directory. + + IOException if list() returns null + + + Lists all files (not subdirectories) in the + directory. + + + + + + Returns true iff a file with the given name exists. + + + Returns the time the named file was last modified. + + + Returns the time the named file was last modified. + + + Set the modified time of an existing file to now. + + + Returns the length in bytes of a file in the directory. + + + Removes an existing file in the directory. + + + Renames an existing file in the directory. + Warning: This is not atomic. + + + + + + Creates an IndexOutput for the file with the given name. + In 3.0 this method will become abstract. + + + + Creates an IndexInput for the file with the given name. + In 3.0 this method will become abstract. + + + + So we can do some byte-to-hexchar conversion below + + + Closes the store to future operations. + + + For debug output. + + + Default read chunk size. This is a conditional + default: on 32bit JVMs, it defaults to 100 MB. On + 64bit JVMs, it's Integer.MAX_VALUE. + + + + + + Sets the maximum number of bytes read at once from the + underlying file during {@link IndexInput#readBytes}. + The default value is {@link #DEFAULT_READ_CHUNK_SIZE}; + +

This was introduced due to Sun + JVM Bug 6478546, which throws an incorrect + OutOfMemoryError when attempting to read too many bytes + at once. It only happens on 32bit JVMs with a large + maximum heap size.

+ +

Changes to this value will not impact any + already-opened {@link IndexInput}s. You should call + this before attempting to open an index on the + directory.

+ +

NOTE: This value should be as large as + possible to reduce any possible performance impact. If + you still encounter an incorrect OutOfMemoryError, + trying lowering the chunk size.

+

+
+ + The maximum number of bytes to read at once from the + underlying file during {@link IndexInput#readBytes}. + + + + + + Use SimpleFSDirectory.SimpleFSIndexInput instead + + + + Base implementation class for buffered {@link IndexInput}. + + + Default buffer size + + + Inits BufferedIndexInput with a specific bufferSize + + + Change the buffer size used by this IndexInput + + + + + + + Expert: implements buffer refill. Reads bytes from the current position + in the input. + + the array to read bytes into + + the offset in the array to start storing bytes + + the number of bytes to read + + + + Expert: implements seek. Sets current position in this file, where the + next {@link #ReadInternal(byte[],int,int)} will occur. + + + + + + A straightforward implementation of {@link FSDirectory} + using java.io.RandomAccessFile. However, this class has + poor concurrent performance (multiple threads will + bottleneck) as it synchronizes when multiple threads + read from the same file. It's usually better to use + {@link NIOFSDirectory} or {@link MMapDirectory} instead. + + + + Create a new SimpleFSDirectory for the named location. + + + the path of the directory + + the lock factory to use, or null for the default. + + IOException + + + Create a new SimpleFSDirectory for the named location. + + + the path of the directory + + the lock factory to use, or null for the default. + + IOException + + + Create a new SimpleFSDirectory for the named location and the default lock factory. + + + the path of the directory + + IOException + + + + + + + Create a new SimpleFSDirectory for the named location and the default lock factory. + + + the path of the directory + + IOException + + + Creates an IndexOutput for the file with the given name. + + + Creates an IndexInput for the file with the given name. + + + Please use ctor taking chunkSize + + + + Please use ctor taking chunkSize + + + + IndexInput methods + + + Method used for testing. Returns true if the underlying + file descriptor is valid. + + + + Base implementation class for buffered {@link IndexOutput}. + + + Abstract base class for output to a file in a Directory. A random-access + output stream. Used for all Lucene index output operations. + + + + + + + + Writes a single byte. + + + + + Writes an array of bytes. + the bytes to write + + the number of bytes to write + + + + + + Writes an array of bytes. + the bytes to write + + the offset in the byte array + + the number of bytes to write + + + + + + Writes an int as four bytes. + + + + + Writes an int in a variable-length format. Writes between one and + five bytes. Smaller values take fewer bytes. Negative numbers are not + supported. + + + + + + Writes a long as eight bytes. + + + + + Writes an long in a variable-length format. Writes between one and five + bytes. Smaller values take fewer bytes. Negative numbers are not + supported. + + + + + + Writes a string. + + + + + Writes a sub sequence of characters from s as the old + format (modified UTF-8 encoded bytes). + + the source of the characters + + the first character in the sequence + + the number of characters in the sequence + + -- please pre-convert to utf8 bytes + instead or use {@link #writeString} + + + + Writes a sub sequence of characters from char[] as + the old format (modified UTF-8 encoded bytes). + + the source of the characters + + the first character in the sequence + + the number of characters in the sequence + + -- please pre-convert to utf8 bytes instead or use {@link #writeString} + + + + Copy numBytes bytes from input to ourself. + + + Forces any buffered output to be written. + + + Closes this stream to further operations. + + + Returns the current position in this file, where the next write will + occur. + + + + + + Sets current position in this file, where the next write will occur. + + + + + The number of bytes in the file. + + + Set the file length. By default, this method does + nothing (it's optional for a Directory to implement + it). But, certain Directory implementations (for + + can use this to inform the + underlying IO system to pre-allocate the file to the + specified size. If the length is longer than the + current file length, the bytes added to the file are + undefined. Otherwise the file is truncated. + + file length + + + + Writes a single byte. + + + + + Writes an array of bytes. + the bytes to write + + the number of bytes to write + + + + + + Forces any buffered output to be written. + + + Expert: implements buffer write. Writes bytes at the current position in + the output. + + the bytes to write + + the number of bytes to write + + + + Expert: implements buffer write. Writes bytes at the current position in + the output. + + the bytes to write + + the offset in the byte array + + the number of bytes to write + + + + Closes this stream to further operations. + + + Returns the current position in this file, where the next write will + occur. + + + + + + Sets current position in this file, where the next write will occur. + + + + + The number of bytes in the file. + + + output methods: + + + Random-access methods + + + + + + + + + + + + + + + + + + + Use SimpleFSDirectory.SimpleFSIndexOutput instead + + + + + + + + The results of a SpanQueryFilter. Wraps the BitSet and the position information from the SpanQuery + +

+ NOTE: This API is still experimental and subject to change. + + +

+
+ + + + + + + The bits for the Filter + + A List of {@link Lucene.Net.Search.SpanFilterResult.PositionInfo} objects + + Use {@link #SpanFilterResult(DocIdSet, List)} instead + + + + + The DocIdSet for the Filter + + A List of {@link Lucene.Net.Search.SpanFilterResult.PositionInfo} objects + + + + The first entry in the array corresponds to the first "on" bit. + Entries are increasing by document order + + A List of PositionInfo objects + + + + Use {@link #GetDocIdSet()} + + + + Returns the docIdSet + + + + A List of {@link Lucene.Net.Search.SpanFilterResult.StartEnd} objects + + + + + The end position of this match + + + + The Start position + The start position of this match + + + + Expert: Compares two ScoreDoc objects for sorting. + +

Created: Feb 3, 2004 9:00:16 AM

lucene 1.4 - $Id: FieldSortedHitQueue.java 803676 2009-08-12 19:31:38Z hossman $ + $Id: ScoreDocComparator.java 738219 2009-01-27 20:15:21Z mikemccand $ + + use {@link FieldComparator} + +
+ + Special comparator for sorting hits according to computed relevance (document score). + + + Special comparator for sorting hits according to index order (document number). + + + A Scorer for queries with a required part and an optional part. + Delays skipTo() on the optional part until a score() is needed. +
+ This Scorer implements {@link Scorer#SkipTo(int)}. +
+
+ + The scorers passed from the constructor. + These are set to null as soon as their next() or skipTo() returns false. + + + + Construct a ReqOptScorer. + The required scorer. This must match. + + The optional scorer. This is used for scoring only. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + use {@link #DocID()} instead. + + + + Returns the score of the current document matching the query. + Initially invalid, until {@link #Next()} is called the first time. + + The score of the required scorer, eventually increased by the score + of the optional scorer when it also matches the current document. + + + + Explain the score of a document. + TODO: Also show the total score. + See BooleanScorer.explain() on how to do this. + + + + Expert: Scoring functionality for phrase queries. +
A document is considered matching if it contains the phrase-query terms + at "valid" positons. What "valid positions" are + depends on the type of the phrase query: for an exact phrase query terms are required + to appear in adjacent locations, while for a sloppy phrase query some distance between + the terms is allowed. The abstract method {@link #PhraseFreq()} of extending classes + is invoked for each document containing all the phrase query terms, in order to + compute the frequency of the phrase query in that document. A non zero frequency + means a match. +
+
+ + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + For a document containing all the phrase query terms, compute the + frequency of the phrase in that document. + A non zero frequency means a match. +
Note, that containing all phrase terms does not guarantee a match - they have to be found in matching locations. +
+ frequency of the phrase in current doc, 0 if not found. + +
+ + An abstract class that defines a way for Payload*Query instances + to transform the cumulative effects of payload scores for a document. + + + for more information + +

+ This class and its derivations are experimental and subject to change + + + + + +

Calculate the score up to this point for this doc and field + The current doc + + The field + + The start position of the matching Span + + The end position of the matching Span + + The number of payloads seen so far + + The current score so far + + The score for the current payload + + The new current Score + + + + +
+ + Calculate the final score for all the payloads seen so far for this doc/field + The current doc + + The current field + + The total number of payloads seen on this document + + The raw score for those payloads + + The final score for the payloads + + + + Implements parallel search over a set of Searchables. + +

Applications usually need only call the inherited {@link #Search(Query)} + or {@link #Search(Query,Filter)} methods. +

+
+ + Implements search over a set of Searchables. + +

Applications usually need only call the inherited {@link #Search(Query)} + or {@link #Search(Query,Filter)} methods. +

+
+ + Creates a searcher which searches searchers. + + + Return the array of {@link Searchable}s this searches. + + + Returns index of the searcher for document n in the array + used to construct this searcher. + + + + Returns the document number of document n within its + sub-index. + + + + Create weight in multiple index scenario. + + Distributed query processing is done in the following steps: + 1. rewrite query + 2. extract necessary terms + 3. collect dfs for these terms from the Searchables + 4. create query weight using aggregate dfs. + 5. distribute that weight to Searchables + 6. merge results + + Steps 1-4 are done here, 5+6 in the search() methods + + + rewritten queries + + + + Document Frequency cache acting as a Dummy-Searcher. This class is no + full-fledged Searcher, but only supports the methods necessary to + initialize Weights. + + + + Creates a searchable which searches searchables. + + + TODO: parallelize this one too + + + A search implementation which spans a new thread for each + Searchable, waits for each search to complete and merge + the results back together. + + + + A search implementation allowing sorting which spans a new thread for each + Searchable, waits for each search to complete and merges + the results back together. + + + + Lower-level search API. + +

{@link Collector#Collect(int)} is called for every matching document. + +

Applications should only use this if they need all of the + matching documents. The high-level search API ({@link + Searcher#Search(Query)}) is usually more efficient, as it skips + non-high-scoring hits. + +

+ to match documents + + if non-null, a bitset used to eliminate some documents + + to receive hits + + TODO: parallelize this one too + +
+ + A thread subclass for searching a single searchable + + + Filter caching singleton. It can be used + to save filters locally for reuse. + This class makes it possble to cache Filters even when using RMI, as it + keeps the cache on the seaercher side of the RMI connection. + + Also could be used as a persistent storage for any filter as long as the + filter provides a proper hashCode(), as that is used as the key in the cache. + + The cache is periodically cleaned up from a separate thread to ensure the + cache doesn't exceed the maximum size. + + + + The default maximum number of Filters in the cache + + + The default frequency of cache clenup + + + The cache itself + + + Maximum allowed cache size + + + Cache cleaning frequency + + + Cache cleaner that runs in a separate thread + + + Sets up the FilterManager singleton. + + + Sets the max size that cache should reach before it is cleaned up + maximum allowed cache size + + + + Sets the cache cleaning frequency in milliseconds. + cleaning frequency in millioseconds + + + + Returns the cached version of the filter. Allows the caller to pass up + a small filter but this will keep a persistent version around and allow + the caching filter to do its job. + + + The input filter + + The cached version of the filter + + + + Holds the filter and the last time the filter was used, to make LRU-based + cache cleaning possible. + TODO: Clean this up when we switch to Java 1.5 + + + + Keeps the cache from getting too big. + If we were using Java 1.5, we could use LinkedHashMap and we would not need this thread + to clean out the cache. + + The SortedSet sortedFilterItems is used only to sort the items from the cache, + so when it's time to clean up we have the TreeSet sort the FilterItems by + timestamp. + + Removes 1.5 * the numbers of items to make the cache smaller. + For example: + If cache clean size is 10, and the cache is at 15, we would remove (15 - 10) * 1.5 = 7.5 round up to 8. + This way we clean the cache a bit more, and avoid having the cache cleaner having to do it frequently. + + + + Expert: A hit queue for sorting by hits by terms in more than one field. + Uses FieldCache.DEFAULT for maintaining + internal term lookup tables. + + This class will not resolve SortField.AUTO types, and expects the type + of all SortFields used for construction to already have been resolved. + {@link SortField#DetectFieldType(IndexReader, String)} is a utility method which + may be used for field type detection. + + NOTE: This API is experimental and might change in + incompatible ways in the next release. + + + 2.9 + + $Id: - see {@link FieldValueHitQueue} - - - Creates a hit queue sorted by the given list of fields. - Index to use. + + Creates a hit queue sorted by the given list of fields. + +

NOTE: The instances returned by this method + pre-allocate a full array of length numHits. + +

+ SortField array we are sorting by in priority order (highest + priority first); cannot be null or empty - Fieldable names, in priority order (highest priority first). Cannot be null or empty. - - The number of hits to retain. Must be greater than zero. + The number of hits to retain. Must be greater than zero. IOException
- - Stores a comparator corresponding to each field being sorted by - - + Stores the sort criteria being used. - - Stores the maximum score value encountered, needed for normalizing. + + Given a queue Entry, creates a corresponding FieldDoc + that contains the values used to sort the given document. + These values are not the raw values out of the index, but the internal + representation of them. This is so the given search hit can be collated by + a MultiSearcher with other search hits. + + + The Entry used to create a FieldDoc + + The newly created FieldDoc + + + - - returns the maximum score encountered by elements inserted via insert() + + Returns the SortFields being used by this hit queue. - + + An implementation of {@link FieldValueHitQueue} which is optimized in case + there is just one comparator. + + + Returns whether a is less relevant than b. ScoreDoc @@ -16816,576 +13030,152 @@ true if document a should be sorted after document b. - - Given a FieldDoc object, stores the values used - to sort the given document. These values are not the raw - values out of the index, but the internal representation - of them. This is so the given search hit can be collated - by a MultiSearcher with other search hits. - - The FieldDoc to store sort values into. - - The same FieldDoc passed in. - - - - - - Returns the SortFields being used by this hit queue. - - - Internal cache of comparators. Similar to FieldCache, only - caches comparators instead of term values. + + An implementation of {@link FieldValueHitQueue} which is optimized in case + there is more than one comparator. - - Returns a comparator for sorting hits according to a field containing bytes. - Index to use. - - Fieldable containing integer values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing shorts. - Index to use. - - Fieldable containing integer values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing integers. - Index to use. - - Fieldable containing integer values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing integers. - Index to use. - - Fieldable containing integer values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing floats. - Index to use. - - Fieldable containing float values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing doubles. - Index to use. - - Fieldable containing float values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing strings. - Index to use. - - Fieldable containing string values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to a field containing strings. - Index to use. - - Fieldable containing string values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Returns a comparator for sorting hits according to values in the given field. - The terms in the field are looked at to determine whether they contain integers, - floats or strings. Once the type is determined, one of the other static methods - in this class is called to get the comparator. - - Index to use. - - Fieldable containing values. - - Comparator for sorting hits. - - IOException If an error occurs reading the index. - - - Expert: Describes the score computation for document and query, and - can distinguish a match independent of a positive value. - - - - The match status of this explanation node. - May be null if match status is unknown - - - - Sets the match status assigned to this explanation node. - May be null if match status is unknown - - - - Indicates whether or not this Explanation models a good match. + + Expert: Collects sorted results from Searchable's and collates them. + The elements put into this queue must be of type FieldDoc. -

- If the match status is explicitly set (i.e.: not null) this method - uses it; otherwise it defers to the superclass. -

-

- - -
- - A QueryParser which constructs queries to search multiple fields. +

Created: Feb 11, 2004 2:04:21 PM

- $Revision: 829134 $ + lucene 1.4 + + $Id: FieldDocSortedHitQueue.java 695514 2008-09-15 15:42:11Z otis $
- - Creates a MultiFieldQueryParser. Allows passing of a map with term to - Boost, and the boost to apply to each term. - -

- It will, when parse(String query) is called, construct a query like this - (assuming the query consists of two terms and you specify the two fields - title and body): -

- - - (title:term1 body:term1) (title:term2 body:term2) - - -

- When setDefaultOperator(AND_OPERATOR) is set, the result will be: -

- - - +(title:term1 body:term1) +(title:term2 body:term2) - - -

- When you pass a boost (title=>5 body=>10) you can get -

- - - +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0) - - -

- In other words, all the query's terms must appear, but it doesn't matter - in what fields they appear. -

- -

- Please use - {@link #MultiFieldQueryParser(Version, String[], Analyzer, Map)} - instead - + + Creates a hit queue sorted by the given list of fields. + Fieldable names, in priority order (highest priority first). + + The number of hits to retain. Must be greater than zero. + - - Creates a MultiFieldQueryParser. Allows passing of a map with term to - Boost, and the boost to apply to each term. - -

- It will, when parse(String query) is called, construct a query like this - (assuming the query consists of two terms and you specify the two fields - title and body): -

- - - (title:term1 body:term1) (title:term2 body:term2) - - -

- When setDefaultOperator(AND_OPERATOR) is set, the result will be: -

- - - +(title:term1 body:term1) +(title:term2 body:term2) - - -

- When you pass a boost (title=>5 body=>10) you can get -

- - - +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0) - - -

- In other words, all the query's terms must appear, but it doesn't matter - in what fields they appear. -

+ +

Allows redefinition of sort fields if they are null. + This is to handle the case using ParallelMultiSearcher where the + original list contains AUTO and we don't know the actual sort + type until the values come back. The fields can only be set once. + This method is thread safe. + + - - Creates a MultiFieldQueryParser. - -

- It will, when parse(String query) is called, construct a query like this - (assuming the query consists of two terms and you specify the two fields - title and body): -

- - - (title:term1 body:term1) (title:term2 body:term2) - - -

- When setDefaultOperator(AND_OPERATOR) is set, the result will be: -

- - - +(title:term1 body:term1) +(title:term2 body:term2) - - -

- In other words, all the query's terms must appear, but it doesn't matter - in what fields they appear. -

- -

- Please use - {@link #MultiFieldQueryParser(Version, String[], Analyzer)} - instead - + + Returns the fields being used to sort. - - Creates a MultiFieldQueryParser. - -

- It will, when parse(String query) is called, construct a query like this - (assuming the query consists of two terms and you specify the two fields - title and body): -

- - - (title:term1 body:term1) (title:term2 body:term2) - - -

- When setDefaultOperator(AND_OPERATOR) is set, the result will be: -

- - - +(title:term1 body:term1) +(title:term2 body:term2) - - -

- In other words, all the query's terms must appear, but it doesn't matter - in what fields they appear. -

+ +

Returns an array of collators, possibly null. The collators + correspond to any SortFields which were given a specific locale. - - - Parses a query which searches on the fields specified. -

- If x fields are specified, this effectively constructs: - -

-            <code>
-            (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx)
-            </code>
-            
- -
- Queries strings to parse + Array of sort fields. - Fields to search on - - Analyzer to use - - ParseException - if query parsing fails - - IllegalArgumentException - if the length of the queries array differs from the length of - the fields array - - Use {@link #Parse(Version,String[],String[],Analyzer)} - instead - -
- - Parses a query which searches on the fields specified. -

- If x fields are specified, this effectively constructs: - -

-            <code>
-            (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx)
-            </code>
-            
- -
- Lucene version to match; this is passed through to - QueryParser. - - Queries strings to parse - - Fields to search on - - Analyzer to use - - ParseException - if query parsing fails - - IllegalArgumentException - if the length of the queries array differs from the length of - the fields array - -
- - Parses a query, searching on the fields specified. - Use this if you need to specify certain fields as required, - and others as prohibited. -

-            Usage:
-            
-            String[] fields = {"filename", "contents", "description"};
-            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
-            BooleanClause.Occur.MUST,
-            BooleanClause.Occur.MUST_NOT};
-            MultiFieldQueryParser.parse("query", fields, flags, analyzer);
-            
-            
-

- The code above would construct a query: -

-            
-            (filename:query) +(contents:query) -(description:query)
-            
-            
- -
- Query string to parse - - Fields to search on - - Flags describing the fields - - Analyzer to use - - ParseException if query parsing fails - IllegalArgumentException if the length of the fields array differs - from the length of the flags array - - Use - {@link #Parse(Version, String, String[], BooleanClause.Occur[], Analyzer)} - instead - -
- - Parses a query, searching on the fields specified. Use this if you need - to specify certain fields as required, and others as prohibited. -

- -

-            Usage:
-            <code>
-            String[] fields = {"filename", "contents", "description"};
-            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
-            BooleanClause.Occur.MUST,
-            BooleanClause.Occur.MUST_NOT};
-            MultiFieldQueryParser.parse("query", fields, flags, analyzer);
-            </code>
-            
-

- The code above would construct a query: - -

-            <code>
-            (filename:query) +(contents:query) -(description:query)
-            </code>
-            
- -
- Lucene version to match; this is passed through to - QueryParser. - - Query string to parse - - Fields to search on - - Flags describing the fields - - Analyzer to use - - ParseException - if query parsing fails - - IllegalArgumentException - if the length of the fields array differs from the length of - the flags array - -
- - Parses a query, searching on the fields specified. - Use this if you need to specify certain fields as required, - and others as prohibited. -

-            Usage:
-            
-            String[] query = {"query1", "query2", "query3"};
-            String[] fields = {"filename", "contents", "description"};
-            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
-            BooleanClause.Occur.MUST,
-            BooleanClause.Occur.MUST_NOT};
-            MultiFieldQueryParser.parse(query, fields, flags, analyzer);
-            
-            
-

- The code above would construct a query: -

-            
-            (filename:query1) +(contents:query2) -(description:query3)
-            
-            
- -
- Queries string to parse - - Fields to search on - - Flags describing the fields - - Analyzer to use - - ParseException if query parsing fails - IllegalArgumentException if the length of the queries, fields, - and flags array differ - - Used - {@link #Parse(Version, String[], String[], BooleanClause.Occur[], Analyzer)} - instead - -
- - Parses a query, searching on the fields specified. Use this if you need - to specify certain fields as required, and others as prohibited. -

- -

-            Usage:
-            <code>
-            String[] query = {"query1", "query2", "query3"};
-            String[] fields = {"filename", "contents", "description"};
-            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
-            BooleanClause.Occur.MUST,
-            BooleanClause.Occur.MUST_NOT};
-            MultiFieldQueryParser.parse(query, fields, flags, analyzer);
-            </code>
-            
-

- The code above would construct a query: - -

-            <code>
-            (filename:query1) +(contents:query2) -(description:query3)
-            </code>
-            
- -
- Lucene version to match; this is passed through to - QueryParser. - - Queries string to parse - - Fields to search on - - Flags describing the fields - - Analyzer to use - - ParseException - if query parsing fails - - IllegalArgumentException - if the length of the queries, fields, and flags array differ - -
- - An IndexReader which reads indexes with multiple segments. - - - Construct reading the named set of readers. - - - This constructor is only used for {@link #Reopen()} - - - Version number when this IndexReader was opened. - - - Checks is the index is optimized (if it has a single segment and no deletions) - true if the index is optimized; false otherwise + Array, possibly null. - - Tries to acquire the WriteLock on this directory. this method is only valid if this IndexReader is directory - owner. - - - StaleReaderException if the index has changed since this reader was opened - CorruptIndexException if the index is corrupt - Lucene.Net.Store.LockObtainFailedException - if another writer has this index open (write.lock could not be - obtained) - - IOException if there is a low-level IO error + + Returns whether a is less relevant than b. + ScoreDoc + + ScoreDoc + + true if document a should be sorted after document b. + - - + + Scorer for conjunctions, sets of queries, all of which are required. + + + use {@link #DocID()} instead. - - Commit changes resulting from delete, undeleteAll, or setNorm operations -

- If an exception is hit, then either no changes or all changes will have been committed to the index (transactional - semantics). + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + +

Fills in no-term-vectors for all docs we haven't seen + since the last doc that had term vectors. + + + + This is the base class for an in-memory posting list, + keyed by a Token. {@link TermsHash} maintains a hash + table holding one instance of this per unique Token. + Consumers of TermsHash ({@link TermsHashConsumer}) must + subclass this class with its own concrete class. + FreqProxTermsWriter.PostingList is a private inner class used + for the freq/prox postings, and + TermVectorsTermsWriter.PostingList is a private inner class + used to hold TermVectors postings. + + + + + The number of the field this vector is associated with + + + + Returns an array of TermVectorOffsetInfo in which the term is found. - IOException if there is a low-level IO error - - - Returns the directory this index resides in. - - - Expert: return the IndexCommit that this reader has opened. -

-

WARNING: this API is new and experimental and may suddenly change.

-

-
- - + The position in the array to get the offsets from + + An array of TermVectorOffsetInfo objects or the empty list + + - - Optimized implementation. + + Returns an array of positions in which the term is found. + Terms are identified by the index at which its number appears in the + term String array obtained from the indexOf method. + + + + Allows you to iterate over the {@link TermPositions} for multiple {@link Term}s as + a single {@link TermPositions}. + + + + + Creates a new MultipleTermPositions instance. + + + + + + + Not implemented. + UnsupportedOperationException + + + Not implemented. + UnsupportedOperationException + + + Not implemented. + UnsupportedOperationException + + + Not implemented. + UnsupportedOperationException + + + Not implemented. + UnsupportedOperationException + + + + false +

[Note that as of 2.1, all but one of the @@ -17754,482 +13544,4286 @@ CorruptIndexException if the index is corrupt IOException if there is a low-level IO error - -

Useful constants representing filenames and extensions used by lucene + + A FilterIndexReader contains another IndexReader, which it + uses as its basic source of data, possibly transforming the data along the + way or providing additional functionality. The class + FilterIndexReader itself simply implements all abstract methods + of IndexReader with versions that pass all requests to the + contained index reader. Subclasses of FilterIndexReader may + further override some of these methods and may also provide additional + methods and fields. + + + +

Construct a FilterIndexReader based on the specified base reader. + Directory locking for delete, undeleteAll, and setNorm operations is + left to the base reader.

+

Note that base reader is closed if this FilterIndexReader is closed.

+

+ specified base reader. + +
+ + + + + + Base class for filtering {@link TermDocs} implementations. + + + Base class for filtering {@link TermPositions} implementations. + + + Base class for filtering {@link TermEnum} implementations. + + + For each Field, store a sorted collection of {@link TermVectorEntry}s +

+ This is not thread-safe. +

+
+ + The TermVectorMapper can be used to map Term Vectors into your own + structure instead of the parallel array structure used by + {@link Lucene.Net.Index.IndexReader#GetTermFreqVector(int,String)}. +

+ It is up to the implementation to make sure it is thread-safe. + +

- $rcs = ' $Id: Exp $ ' ; +
+ + + true if this mapper should tell Lucene to ignore positions even if they are stored + + similar to ignoringPositions + + + + Tell the mapper what to expect in regards to field, number of terms, offset and position storage. + This method will be called once before retrieving the vector for a field. + + This method will be called before {@link #Map(String,int,TermVectorOffsetInfo[],int[])}. + + The field the vector is for + + The number of terms that need to be mapped + + true if the mapper should expect offset information + + true if the mapper should expect positions info + + + + Map the Term Vector information into your own structure + The term to add to the vector + + The frequency of the term in the document + + null if the offset is not specified, otherwise the offset into the field of the term + + null if the position is not specified, otherwise the position in the field of the term + + + + Indicate to Lucene that even if there are positions stored, this mapper is not interested in them and they + can be skipped over. Derived classes should set this to true if they want to ignore positions. The default + is false, meaning positions will be loaded if they are stored. + + false + + + + + Same principal as {@link #IsIgnoringPositions()}, but applied to offsets. false by default. + + false + + + + Passes down the index of the document whose term vector is currently being mapped, + once for each top level call to a term vector reader. +

+ Default implementation IGNORES the document number. Override if your implementation needs the document number. +

+ NOTE: Document numbers are internal to Lucene and subject to change depending on indexing operations. + +

+ index of document currently being mapped + +
+ + + A Comparator for sorting {@link TermVectorEntry}s + + + + Get the mapping between fields and terms, sorted by the comparator + + + A map between field names and {@link java.util.SortedSet}s per field. SortedSet entries are {@link TermVectorEntry} + + + + The positionIncrement determines the position of this token + relative to the previous Token in a TokenStream, used in phrase + searching. + +

The default value is one. + +

Some common uses for this are:

    + +
  • Set it to zero to put multiple terms in the same position. This is + useful if, e.g., a word has multiple stems. Searches for phrases + including either stem will match. In this case, all but the first stem's + increment should be set to zero: the increment of the first instance + should be one. Repeating a token with an increment of zero can also be + used to boost the scores of matches on that token.
  • + +
  • Set it to values greater than one to inhibit exact phrase matches. + If, for example, one does not want phrases to match across removed stop + words, then one could build a stop word filter that removes stop words and + also sets the increment to the number of stop words removed before each + non-stop word. Then exact phrase queries will only match when the terms + occur with no intervening stop words.
  • + +
+ +
+ + +
+ + Set the position increment. The default value is one. + + + the distance from the prior term + + + + Returns the position increment of this Token. + + + + + Emits the entire input as a single token. + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + A simple class that stores Strings as char[]'s in a + hash table. Note that this is not a general purpose + class. For example, it cannot remove items from the + set, nor does it resize its hash table to be smaller, + etc. It is designed to be quick to test if a char[] + is in the set without the necessity of converting it + to a String first. + + + + Create set with enough capacity to hold startSize + terms + + + + Create set from a Collection of char[] or String + + + Create set from entries + + + true if the len chars of text starting at off + are in the set + + + + true if the System.String is in the set + + + Returns true if the String is in the set + + + Add this String into the set + + + Add this char[] directly to the set. + If ignoreCase is true for this Set, the text array will be directly modified. + The user should never modify this text array after calling this method. + + + + Returns an unmodifiable {@link CharArraySet}. This allows to provide + unmodifiable views of internal sets for "read-only" use. + + + a set for which the unmodifiable set is returned. + + an new unmodifiable {@link CharArraySet}. + + NullPointerException + if the given set is null. + + + + Adds all of the elements in the specified collection to this collection + + + Removes all elements from the set + + + Removes from this set all of its elements that are contained in the specified collection + + + Retains only the elements in this set that are contained in the specified collection + + + The Iterator<String> for this set. Strings are constructed on the fly, so + use nextCharArray for more efficient access. + + + + do not modify the returned char[] + + + Returns the next String, as a Set<String> would... + use nextCharArray() for better efficiency. + + + + Efficient unmodifiable {@link CharArraySet}. This implementation does not + delegate calls to a give {@link CharArraySet} like + {@link Collections#UnmodifiableSet(java.util.Set)} does. Instead is passes + the internal representation of a {@link CharArraySet} to a super + constructor and overrides all mutators. + + + + This class converts alphabetic, numeric, and symbolic Unicode characters + which are not in the first 127 ASCII characters (the "Basic Latin" Unicode + block) into their ASCII equivalents, if one exists. + + Characters from the following Unicode blocks are converted; however, only + those characters with reasonable ASCII alternatives are converted: + + + + See: http://en.wikipedia.org/wiki/Latin_characters_in_Unicode + + The set of character conversions supported by this class is a superset of + those supported by Lucene's {@link ISOLatin1AccentFilter} which strips + accents from Latin1 characters. For example, 'À' will be replaced by + 'a'. + + + + Converts characters above ASCII to their ASCII equivalents. For example, + accents are removed from accented characters. + + The string to fold + + The number of characters in the input string + + + + Use by certain classes to match version compatibility + across releases of Lucene. +

+ WARNING: When changing the version parameter + that you supply to components in Lucene, do not simply + change the version at search-time, but instead also adjust + your indexing code to match, and re-index. +

+
+ + +

WARNING: if you use this setting, and then + upgrade to a newer release of Lucene, sizable changes + may happen. If precise back compatibility is important + then you should instead explicitly specify an actual + version. + If you use this constant then you may need to + re-index all of your documents when upgrading + Lucene, as the way text is indexed may have changed. + Additionally, you may need to re-test your entire + application to ensure it behaves as expected, as + some defaults may have changed and may break functionality + in your application. +

+
+ + Match settings and bugs in Lucene's 2.0 release. + + + Match settings and bugs in Lucene's 2.1 release. + + + Match settings and bugs in Lucene's 2.2 release. + + + Match settings and bugs in Lucene's 2.3 release. + + + Match settings and bugs in Lucene's 2.3 release. + + + + Expert: allocate a new buffer. + Subclasses can allocate differently. + + size of allocated buffer. + + allocated buffer. + + + + Matches spans containing a term. + + + Construct a SpanTermQuery matching the named term's spans. + + + Return the term whose spans are matched. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + + A {@link Scorer} which wraps another scorer and caches the score of the + current document. Successive calls to {@link #Score()} will return the same + result and will not invoke the wrapped Scorer's score() method, unless the + current document has changed.
+ This class might be useful due to the changes done to the {@link Collector} + interface, in which the score is not computed for a document by default, only + if the collector requests it. Some collectors may need to use the score in + several places, however all they have in hand is a {@link Scorer} object, and + might end up computing the score of a document more than once. +
+
+ + Creates a new instance by wrapping the given scorer. + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Experimental class to get set of payloads for most standard Lucene queries. + Operates like Highlighter - IndexReader should only contain doc of interest, + best to use MemoryIndex. + +

+ + WARNING: The status of the Payloads feature is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

+
+ + that contains doc with payloads to extract + + + + Query should be rewritten for wild/fuzzy support. + + + + + payloads Collection + + IOException + + + Expert: Default scoring implementation. + + + Expert: Scoring API. +

Subclasses implement search scoring. + +

The score of query q for document d correlates to the + cosine-distance or dot-product between document and query vectors in a + + Vector Space Model (VSM) of Information Retrieval. + A document whose vector is closer to the query vector in that model is scored higher. + + The score is computed as follows: + +

+ + +
+ + + + + + + + + + + +
+ score(q,d)   =   + coord(q,d)  ·  + queryNorm(q)  ·  + + + + ( + tf(t in d)  ·  + idf(t)2  ·  + t.getBoost() ·  + norm(t,d) + ) +
t in q
+
+ +

where +

    +
  1. + + tf(t in d) + correlates to the term's frequency, + defined as the number of times term t appears in the currently scored document d. + Documents that have more occurrences of a given term receive a higher score. + The default computation for tf(t in d) in + {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) DefaultSimilarity} is: + +
     
    + + + + + +
    + {@link Lucene.Net.Search.DefaultSimilarity#Tf(float) tf(t in d)}   =   + + frequency½ +
    +
     
    +
  2. + +
  3. + + idf(t) stands for Inverse Document Frequency. This value + correlates to the inverse of docFreq + (the number of documents in which the term t appears). + This means rarer terms give higher contribution to the total score. + The default computation for idf(t) in + {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) DefaultSimilarity} is: + +
     
    + + + + + + + +
    + {@link Lucene.Net.Search.DefaultSimilarity#Idf(int, int) idf(t)}  =   + + 1 + log ( + + + + + +
    numDocs
    –––––––––
    docFreq+1
    +
    + ) +
    +
     
    +
  4. + +
  5. + + coord(q,d) + is a score factor based on how many of the query terms are found in the specified document. + Typically, a document that contains more of the query's terms will receive a higher score + than another document with fewer query terms. + This is a search time factor computed in + {@link #Coord(int, int) coord(q,d)} + by the Similarity in effect at search time. +
     
    +
  6. + +
  7. + + queryNorm(q) + + is a normalizing factor used to make scores between queries comparable. + This factor does not affect document ranking (since all ranked documents are multiplied by the same factor), + but rather just attempts to make scores from different queries (or even different indexes) comparable. + This is a search time factor computed by the Similarity in effect at search time. + + The default computation in + {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) DefaultSimilarity} + is: +
     
    + + + + + +
    + queryNorm(q)   =   + {@link Lucene.Net.Search.DefaultSimilarity#QueryNorm(float) queryNorm(sumOfSquaredWeights)} +   =   + + + + + +
    1
    + –––––––––––––– +
    sumOfSquaredWeights½
    +
    +
     
    + + The sum of squared weights (of the query terms) is + computed by the query {@link Lucene.Net.Search.Weight} object. + For example, a {@link Lucene.Net.Search.BooleanQuery boolean query} + computes this value as: + +
     
    + + + + + + + + + + + +
    + {@link Lucene.Net.Search.Weight#SumOfSquaredWeights() sumOfSquaredWeights}   =   + {@link Lucene.Net.Search.Query#GetBoost() q.getBoost()} 2 +  ·  + + + + ( + idf(t)  ·  + t.getBoost() + ) 2 +
    t in q
    +
     
    + +
  8. + +
  9. + + t.getBoost() + is a search time boost of term t in the query q as + specified in the query text + (see query syntax), + or as set by application calls to + {@link Lucene.Net.Search.Query#SetBoost(float) setBoost()}. + Notice that there is really no direct API for accessing a boost of one term in a multi term query, + but rather multi terms are represented in a query as multi + {@link Lucene.Net.Search.TermQuery TermQuery} objects, + and so the boost of a term in the query is accessible by calling the sub-query + {@link Lucene.Net.Search.Query#GetBoost() getBoost()}. +
     
    +
  10. + +
  11. + + norm(t,d) encapsulates a few (indexing time) boost and length factors: + +
      +
    • Document boost - set by calling + {@link Lucene.Net.Documents.Document#SetBoost(float) doc.setBoost()} + before adding the document to the index. +
    • +
    • Field boost - set by calling + {@link Lucene.Net.Documents.Fieldable#SetBoost(float) field.setBoost()} + before adding the field to a document. +
    • +
    • {@link #LengthNorm(String, int) lengthNorm(field)} - computed + when the document is added to the index in accordance with the number of tokens + of this field in the document, so that shorter fields contribute more to the score. + LengthNorm is computed by the Similarity class in effect at indexing. +
    • +
    + +

    + When a document is added to the index, all the above factors are multiplied. + If the document has multiple fields with the same name, all their boosts are multiplied together: + +
     
    + + + + + + + + + + + +
    + norm(t,d)   =   + {@link Lucene.Net.Documents.Document#GetBoost() doc.getBoost()} +  ·  + {@link #LengthNorm(String, int) lengthNorm(field)} +  ·  + + + + {@link Lucene.Net.Documents.Fieldable#GetBoost() f.getBoost}() +
    field f in d named as t
    +
     
    + However the resulted norm value is {@link #EncodeNorm(float) encoded} as a single byte + before being stored. + At search time, the norm byte value is read from the index + {@link Lucene.Net.Store.Directory directory} and + {@link #DecodeNorm(byte) decoded} back to a float norm value. + This encoding/decoding, while reducing index size, comes with the price of + precision loss - it is not guaranteed that decode(encode(x)) = x. + For instance, decode(encode(0.89)) = 0.75. + Also notice that search time is too late to modify this norm part of scoring, e.g. by + using a different {@link Similarity} for search. +
     
    +

  12. +
+ +
+ + + + + + +
+ + Set the default Similarity implementation used by indexing and search + code. + + + + + + + + + Return the default Similarity implementation used by indexing and search + code. + +

This is initially an instance of {@link DefaultSimilarity}. + +

+ + + + +
+ + Cache of decoded bytes. + + + Decodes a normalization factor stored in an index. + + + + + Returns a table for decoding normalization bytes. + + + + + Compute the normalization value for a field, given the accumulated + state of term processing for this field (see {@link FieldInvertState}). + +

Implementations should calculate a float value based on the field + state and then return that value. + +

For backward compatibility this method by default calls + {@link #LengthNorm(String, int)} passing + {@link FieldInvertState#GetLength()} as the second argument, and + then multiplies this value by {@link FieldInvertState#GetBoost()}.

+ +

WARNING: This API is new and experimental and may + suddenly change.

+ +

+ field name + + current processing state for this field + + the calculated float norm + +
+ + Computes the normalization value for a field given the total number of + terms contained in a field. These values, together with field boosts, are + stored in an index and multipled into scores for hits on each field by the + search code. + +

Matches in longer fields are less precise, so implementations of this + method usually return smaller values when numTokens is large, + and larger values when numTokens is small. + +

Note that the return values are computed under + {@link Lucene.Net.Index.IndexWriter#AddDocument(Lucene.Net.Documents.Document)} + and then stored using + {@link #EncodeNorm(float)}. + Thus they have limited precision, and documents + must be re-indexed if this method is altered. + +

+ the name of the field + + the total number of tokens contained in fields named + fieldName of doc. + + a normalization factor for hits on this field of this document + + + + +
+ + Computes the normalization value for a query given the sum of the squared + weights of each of the query terms. This value is then multipled into the + weight of each query term. + +

This does not affect ranking, but rather just attempts to make scores + from different queries comparable. + +

+ the sum of the squares of query term weights + + a normalization factor for query weights + +
+ + Encodes a normalization factor for storage in an index. + +

The encoding uses a three-bit mantissa, a five-bit exponent, and + the zero-exponent point at 15, thus + representing values from around 7x10^9 to 2x10^-9 with about one + significant decimal digit of accuracy. Zero is also represented. + Negative numbers are rounded up to zero. Values too large to represent + are rounded down to the largest representable value. Positive values too + small to represent are rounded up to the smallest positive representable + value. + +

+ + + + +
+ + Computes a score factor based on a term or phrase's frequency in a + document. This value is multiplied by the {@link #Idf(Term, Searcher)} + factor for each term in the query and these products are then summed to + form the initial score for a document. + +

Terms and phrases repeated in a document indicate the topic of the + document, so implementations of this method usually return larger values + when freq is large, and smaller values when freq + is small. + +

The default implementation calls {@link #Tf(float)}. + +

+ the frequency of a term within a document + + a score factor based on a term's within-document frequency + +
+ + Computes the amount of a sloppy phrase match, based on an edit distance. + This value is summed for each sloppy phrase match in a document to form + the frequency that is passed to {@link #Tf(float)}. + +

A phrase match with a small edit distance to a document passage more + closely matches the document, so implementations of this method usually + return larger values when the edit distance is small and smaller values + when it is large. + +

+ + + the edit distance of this sloppy phrase match + + the frequency increment for this match + +
+ + Computes a score factor based on a term or phrase's frequency in a + document. This value is multiplied by the {@link #Idf(Term, Searcher)} + factor for each term in the query and these products are then summed to + form the initial score for a document. + +

Terms and phrases repeated in a document indicate the topic of the + document, so implementations of this method usually return larger values + when freq is large, and smaller values when freq + is small. + +

+ the frequency of a term within a document + + a score factor based on a term's within-document frequency + +
+ + Computes a score factor for a simple term. + +

The default implementation is:

+            return idf(searcher.docFreq(term), searcher.maxDoc());
+            
+ + Note that {@link Searcher#MaxDoc()} is used instead of + {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is proportional to + {@link Searcher#DocFreq(Term)} , i.e., when one is inaccurate, + so is the other, and in the same direction. + +
+ the term in question + + the document collection being searched + + a score factor for the term + + see {@link #IdfExplain(Term, Searcher)} + +
+ + Computes a score factor for a simple term and returns an explanation + for that score factor. + +

+ The default implementation uses: + +

+            idf(searcher.docFreq(term), searcher.maxDoc());
+            
+ + Note that {@link Searcher#MaxDoc()} is used instead of + {@link Lucene.Net.Index.IndexReader#NumDocs()} because it is + proportional to {@link Searcher#DocFreq(Term)} , i.e., when one is + inaccurate, so is the other, and in the same direction. + +
+ the term in question + + the document collection being searched + + an IDFExplain object that includes both an idf score factor + and an explanation for the term. + + IOException +
+ + Computes a score factor for a phrase. + +

The default implementation sums the {@link #Idf(Term,Searcher)} factor + for each term in the phrase. + +

+ the terms in the phrase + + the document collection being searched + + idf score factor + + see {@link #idfExplain(Collection, Searcher)} + +
+ + Computes a score factor for a phrase. + +

+ The default implementation sums the idf factor for + each term in the phrase. + +

+ the terms in the phrase + + the document collection being searched + + an IDFExplain object that includes both an idf + score factor for the phrase and an explanation + for each term. + + IOException +
+ + Computes a score factor based on a term's document frequency (the number + of documents which contain the term). This value is multiplied by the + {@link #Tf(int)} factor for each term in the query and these products are + then summed to form the initial score for a document. + +

Terms that occur in fewer documents are better indicators of topic, so + implementations of this method usually return larger values for rare terms, + and smaller values for common terms. + +

+ the number of documents which contain the term + + the total number of documents in the collection + + a score factor based on the term's document frequency + +
+ + Computes a score factor based on the fraction of all query terms that a + document contains. This value is multiplied into scores. + +

The presence of a large portion of the query terms indicates a better + match with the query, so implementations of this method usually return + larger values when the ratio between these parameters is large and smaller + values when the ratio between them is small. + +

+ the number of query terms matched in the document + + the total number of terms in the query + + a score factor based on term overlap with the query + +
+ + Calculate a scoring factor based on the data in the payload. Overriding implementations + are responsible for interpreting what is in the payload. Lucene makes no assumptions about + what is in the byte array. +

+ The default implementation returns 1. + +

+ The fieldName of the term this payload belongs to + + The payload byte array to be scored + + The offset into the payload array + + The length in the array + + An implementation dependent float to be used as a scoring factor + + + See {@link #ScorePayload(int, String, int, int, byte[], int, int)} + +
+ + Calculate a scoring factor based on the data in the payload. Overriding implementations + are responsible for interpreting what is in the payload. Lucene makes no assumptions about + what is in the byte array. +

+ The default implementation returns 1. + +

+ The docId currently being scored. If this value is {@link #NO_DOC_ID_PROVIDED}, then it should be assumed that the PayloadQuery implementation does not provide document information + + The fieldName of the term this payload belongs to + + The start position of the payload + + The end position of the payload + + The payload byte array to be scored + + The offset into the payload array + + The length in the array + + An implementation dependent float to be used as a scoring factor + + +
+ + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + The Similarity implementation used by default. + TODO: move back to top when old API is removed! + + + + + Small Util class used to pass both an idf factor as well as an + explanation for that factor. + + This class will likely be held on a {@link Weight}, so be aware + before storing any large or un-serializable fields. + + + + + Expert: Describes the score computation for document and query. + + + Indicates whether or not this Explanation models a good match. + +

+ By default, an Explanation represents a "match" if the value is positive. +

+

+ + +
+ + The value assigned to this explanation node. + + + Sets the value assigned to this explanation node. + + + A description of this explanation node. + + + Sets the description of this explanation node. + + + A short one line summary which should contain all high level + information about this Explanation, without the "Details" + + + + The sub-nodes of this explanation node. + + + Adds a sub-node to this explanation node. + + + Render an explanation as text. + + + Render an explanation as HTML. + + + Small Util class used to pass both an idf factor as well as an + explanation for that factor. + + This class will likely be held on a {@link Weight}, so be aware + before storing any large or un-serializable fields. + + + + + the idf factor + + + + This should be calculated lazily if possible. + + + the explanation for the idf factor. + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Remove this when old API is removed! + + + + Implemented as + state.getBoost()*lengthNorm(numTerms), where + numTerms is {@link FieldInvertState#GetLength()} if {@link + #setDiscountOverlaps} is false, else it's {@link + FieldInvertState#GetLength()} - {@link + FieldInvertState#GetNumOverlap()}. + +

WARNING: This API is new and experimental, and may suddenly + change.

+

+
+ + Implemented as 1/sqrt(numTerms). + + + Implemented as 1/sqrt(sumOfSquaredWeights). + + + Implemented as sqrt(freq). + + + Implemented as 1 / (distance + 1). + + + Implemented as log(numDocs/(docFreq+1)) + 1. + + + Implemented as overlap / maxOverlap. + + + Determines whether overlap tokens (Tokens with + 0 position increment) are ignored when computing + norm. By default this is false, meaning overlap + tokens are counted just like non-overlap tokens. + +

WARNING: This API is new and experimental, and may suddenly + change.

+ +

+ + +
+ + + + + + A query that wraps a filter and simply returns a constant score equal to the + query boost for every document in the filter. + + + + $Id: ConstantScoreQuery.java 807180 2009-08-24 12:26:43Z markrmiller $ - - Name of the index segment file + + Returns the encapsulated filter - - Name of the generation reference file name + + Prints a user-readable version of this query. - - Name of the index deletable file (only used in - pre-lockless indices) + + Returns true if o is equal to this. + + + Returns a hash code value for this object. + + + use {@link #NextDoc()} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #Advance(int)} instead. + + + + Used by DocumentsWriter to implemented a StringReader + that can be reset to a new string; we use this when + tokenizing the string value from a Field. - - Extension of norms file - - - Extension of freq postings file - - - Extension of prox postings file - - - Extension of terms file - - - Extension of terms index file - - - Extension of stored fields index file - - - Extension of stored fields file - - - Extension of vectors fields file - - - Extension of vectors documents file - - - Extension of vectors index file - - - Extension of compound file - - - Extension of compound file for doc store files - - - Extension of deletes - - - Extension of field infos - - - Extension of plain norms - - - Extension of separate norms - - - Extension of gen file - - - This array contains all filename extensions used by - Lucene's index files, with two exceptions, namely the - extension made up from .f + a number and - from .s + a number. Also note that - Lucene's segments_N files do not have any - filename extension. + + Change to true to see details of reference counts when + infoStream != null - - File extensions that are added to a compound file - (same as above, minus "del", "gen", "cfs"). + + Initialize the deleter: find all previous commits in + the Directory, incref the files they reference, call + the policy to let it delete commits. This will remove + any files not referenced by any of the commits. + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Remove the CommitPoints in the commitsToDelete List by + DecRef'ing all files from each SegmentInfos. - - File extensions of old-style index files + + Writer calls this when it has hit an error and had to + roll back, to tell us that there may now be + unreferenced files in the filesystem. So we re-list + the filesystem and delete such files. If segmentName + is non-null, we will only delete files corresponding to + that segment. + - - File extensions for term vector support - - - Computes the full file name from base, extension and - generation. If the generation is -1, the file name is - null. If it's 0, the file name is - If it's > 0, the file name is + + For definition of "check point" see IndexWriter comments: + "Clarification: Check Points (and commits)". - - -- main part of the file name - - -- extension of the filename (including .) - - -- generation - - - - Returns true if the provided filename is one of the doc - store files (ends with an extension in - STORE_INDEX_EXTENSIONS). + Writer calls this when it has made a "consistent + change" to the index, meaning new files are written to + the index and the in-memory SegmentInfos have been + modified to point to those files. + + This may or may not be a commit (segments_N may or may + not have been written). + + We simply incref the files referenced by the new + SegmentInfos and decref the files we had previously + seen (if any). + + If this is a commit, we also call the policy to give it + a chance to remove other commits. If any commits are + removed, we decref their files as well. - - Holds state for inverting all occurrences of a single - field in the document. This class doesn't do anything - itself; instead, it forwards the tokens produced by - analysis to its own consumer - (InvertedDocConsumerPerField). It also interacts with an - endConsumer (InvertedDocEndConsumerPerField). + + Deletes the specified files, but only if they are new + (have not yet been incref'd). - - Process the document. If there is - something for this document to be done in docID order, - you should encapsulate that as a - DocumentsWriter.DocWriter and return it. - DocumentsWriter then calls finish() on this object - when it's its turn. + + Tracks the reference count for a single index file: + + + Holds details for each commit point. This class is + also passed to the deletion policy. Note: this class + has a natural ordering that is inconsistent with + equals. - - Load the First field and break. + +

Expert: represents a single commit into an index as seen by the + {@link IndexDeletionPolicy} or {@link IndexReader}.

+ +

Changes to the content of an index are made visible + only after the writer who made that change commits by + writing a new segments file + (segments_N). This point in time, when the + action of writing of a new segments file to the directory + is completed, is an index commit.

+ +

Each index commit point has a unique segments file + associated with it. The segments file associated with a + later index commit point would have a larger N.

+ +

WARNING: This API is a new and experimental and + may suddenly change.

+

+
+ + Get the segments file (segments_N) associated + with this commit point. + + + + Returns all index files referenced by this commit point. + + + Returns the {@link Directory} for the index. + + + Delete this commit point. This only applies when using + the commit point in the context of IndexWriter's + IndexDeletionPolicy.

- See {@link FieldSelectorResult#LOAD_AND_BREAK} + Upon calling this, the writer is notified that this commit + point should be deleted. +

+ Decision that a commit-point should be deleted is taken by the {@link IndexDeletionPolicy} in effect + and therefore this should only be called by its {@link IndexDeletionPolicy#onInit onInit()} or + {@link IndexDeletionPolicy#onCommit onCommit()} methods.

- - A field is a section of a Document. Each field has two parts, a name and a - value. Values may be free text, provided as a String or as a Reader, or they - may be atomic keywords, which are not further processed. Such keywords may - be used to represent dates, urls, etc. Fields are optionally stored in the - index, so that they may be returned with hits on the document. + + Returns true if this commit is an optimized index. + + + Two IndexCommits are equal if both their Directory and versions are equal. + + + Returns the version for this IndexCommit. This is the + same value that {@link IndexReader#getVersion} would + return if it were opened on this commit. - - The value of the field as a String, or null. If null, the Reader value or - binary value is used. Exactly one of stringValue(), - readerValue(), and getBinaryValue() must be set. + + Returns the generation (the _N in segments_N) for this + IndexCommit - - The value of the field as a Reader, or null. If null, the String value or - binary value is used. Exactly one of stringValue(), - readerValue(), and getBinaryValue() must be set. + + Convenience method that returns the last modified time + of the segments_N file corresponding to this index + commit, equivalent to + getDirectory().fileModified(getSegmentsFileName()). - - The value of the field in Binary, or null. If null, the Reader value, - or String value is used. Exactly one of stringValue(), - readerValue(), and getBinaryValue() must be set. + + Returns userData, previously passed to {@link + IndexWriter#Commit(Map)} for this commit. Map is + String -> String. - This method must allocate a new byte[] if - the {@link AbstractField#GetBinaryOffset()} is non-zero - or {@link AbstractField#GetBinaryLength()} is not the - full length of the byte[]. Please use {@link - AbstractField#GetBinaryValue()} instead, which simply - returns the byte[]. + + + Called only be the deletion policy, to remove this + commit point from the index. + + + + Used by DocumentsWriter to merge the postings from + multiple ThreadStates when creating a segment + + + +

This class provides a {@link Field} that enables indexing + of numeric values for efficient range filtering and + sorting. Here's an example usage, adding an int value: +

+            document.add(new NumericField(name).setIntValue(value));
+            
+ + For optimal performance, re-use the + NumericField and {@link Document} instance for more than + one document: + +
+            NumericField field = new NumericField(name);
+            Document document = new Document();
+            document.add(field);
+            
+            for(all documents) {
+            ...
+            field.setIntValue(value)
+            writer.addDocument(document);
+            ...
+            }
+            
+ +

The java native types int, long, + float and double are + directly supported. However, any value that can be + converted into these native types can also be indexed. + For example, date/time values represented by a + {@link java.util.Date} can be translated into a long + value using the {@link java.util.Date#getTime} method. If you + don't need millisecond precision, you can quantize the + value, either by dividing the result of + {@link java.util.Date#getTime} or using the separate getters + (for year, month, etc.) to construct an int or + long value.

+ +

To perform range querying or filtering against a + NumericField, use {@link NumericRangeQuery} or {@link + NumericRangeFilter}. To sort according to a + NumericField, use the normal numeric sort types, eg + {@link SortField#INT} (note that {@link SortField#AUTO} + will not work with these fields). NumericField values + can also be loaded directly from {@link FieldCache}.

+ +

By default, a NumericField's value is not stored but + is indexed for range filtering and sorting. You can use + the {@link #NumericField(String,Field.Store,boolean)} + constructor if you need to change these defaults.

+ +

You may add the same field name as a NumericField to + the same document more than once. Range querying and + filtering will be the logical OR of all values; so a range query + will hit all documents that have at least one value in + the range. However sort behavior is not defined. If you need to sort, + you should separately index a single-valued NumericField.

+ +

A NumericField will consume somewhat more disk space + in the index than an ordinary single-valued field. + However, for a typical index that includes substantial + textual content per document, this increase will likely + be in the noise.

+ +

Within Lucene, each numeric value is indexed as a + trie structure, where each term is logically + assigned to larger and larger pre-defined brackets (which + are simply lower-precision representations of the value). + The step size between each successive bracket is called the + precisionStep, measured in bits. Smaller + precisionStep values result in larger number + of brackets, which consumes more disk space in the index + but may result in faster range search performance. The + default value, 4, was selected for a reasonable tradeoff + of disk space consumption versus performance. You can + use the expert constructor {@link + #NumericField(String,int,Field.Store,boolean)} if you'd + like to change the value. Note that you must also + specify a congruent value when creating {@link + NumericRangeQuery} or {@link NumericRangeFilter}. + For low cardinality fields larger precision steps are good. + If the cardinality is < 100, it is fair + to use {@link Integer#MAX_VALUE}, which produces one + term per value. + +

For more information on the internals of numeric trie + indexing, including the precisionStep + configuration, see {@link NumericRangeQuery}. The format of + indexed values is described in {@link NumericUtils}. + +

If you only need to sort by numeric value, and never + run range querying/filtering, you can index using a + precisionStep of {@link Integer#MAX_VALUE}. + This will minimize disk space consumed.

+ +

More advanced users can instead use {@link + NumericTokenStream} directly, when indexing numbers. This + class is a wrapper around this token stream type for + easier, more intuitive usage.

+ +

NOTE: This class is only used during + indexing. When retrieving the stored field value from a + {@link Document} instance after search, you will get a + conventional {@link Fieldable} instance where the numeric + values are returned as {@link String}s (according to + toString(value) of the used data type). + +

NOTE: This API is + experimental and might change in incompatible ways in the + next release. + +

+ 2.9 + +
+ + Creates a field for numeric values using the default precisionStep + {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with + a numeric value, before indexing a document containing this field, + set a value using the various set???Value() methods. + This constructor creates an indexed, but not stored field. + + the field name + + + + Creates a field for numeric values using the default precisionStep + {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The instance is not yet initialized with + a numeric value, before indexing a document containing this field, + set a value using the various set???Value() methods. + + the field name + + if the field should be stored in plain text form + (according to toString(value) of the used data type) + + if the field should be indexed using {@link NumericTokenStream} + + + + Creates a field for numeric values with the specified + precisionStep. The instance is not yet initialized with + a numeric value, before indexing a document containing this field, + set a value using the various set???Value() methods. + This constructor creates an indexed, but not stored field. + + the field name + + the used precision step + + + + Creates a field for numeric values with the specified + precisionStep. The instance is not yet initialized with + a numeric value, before indexing a document containing this field, + set a value using the various set???Value() methods. + + the field name + + the used precision step + + if the field should be stored in plain text form + (according to toString(value) of the used data type) + + if the field should be indexed using {@link NumericTokenStream} + + + + Returns a {@link NumericTokenStream} for indexing the numeric value. + + + Returns always null for numeric fields + + + Returns always null for numeric fields + + + Returns always null for numeric fields + + + Returns the numeric value as a string (how it is stored, when {@link Field.Store#YES} is chosen). + + + Returns the current numeric value as a subclass of {@link Number}, null if not yet initialized. + + + Initializes the field with the supplied long value. + the numeric value + + this instance, because of this you can use it the following way: + document.add(new NumericField(name, precisionStep).SetLongValue(value)) + + + + Initializes the field with the supplied int value. + the numeric value + + this instance, because of this you can use it the following way: + document.add(new NumericField(name, precisionStep).setIntValue(value)) + + + + Initializes the field with the supplied double value. + the numeric value + + this instance, because of this you can use it the following way: + document.add(new NumericField(name, precisionStep).setDoubleValue(value)) + + + + Initializes the field with the supplied float value. + the numeric value + + this instance, because of this you can use it the following way: + document.add(new NumericField(name, precisionStep).setFloatValue(value)) + + + + Provides information about what should be done with this Field + + + + + + Load this {@link Field} every time the {@link Document} is loaded, reading in the data as it is encountered. + {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should not return null. +

+ {@link Document#Add(Fieldable)} should be called by the Reader. +

+
+ + Lazily load this {@link Field}. This means the {@link Field} is valid, but it may not actually contain its data until + invoked. {@link Document#GetField(String)} SHOULD NOT BE USED. {@link Document#GetFieldable(String)} is safe to use and should + return a valid instance of a {@link Fieldable}. +

+ {@link Document#Add(Fieldable)} should be called by the Reader. +

+
+ + Do not load the {@link Field}. {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should return null. + {@link Document#Add(Fieldable)} is not called. +

+ {@link Document#Add(Fieldable)} should not be called by the Reader. +

+
+ + Load this field as in the {@link #LOAD} case, but immediately return from {@link Field} loading for the {@link Document}. Thus, the + Document may not have its complete set of Fields. {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should + both be valid for this {@link Field} +

+ {@link Document#Add(Fieldable)} should be called by the Reader. +

+
+ + Behaves much like {@link #LOAD} but does not uncompress any compressed data. This is used for internal purposes. + {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should not return null. +

+ {@link Document#Add(Fieldable)} should be called by + the Reader. +

+ This is an internal option only, and is + no longer needed now that {@link CompressionTools} + is used for field compression.
- - The TokesStream for this field to be used when indexing, or null. If null, the Reader value - or String value is analyzed to produce the indexed tokens. + + Expert: Load the size of this {@link Field} rather than its value. + Size is measured as number of bytes required to store the field == bytes for a binary or any compressed value, and 2*chars for a String value. + The size is stored as a binary value, represented as an int in a byte[], with the higher order byte first in [0] - -

Expert: change the value of this field. This can - be used during indexing to re-use a single Field - instance to improve indexing speed by avoiding GC cost - of new'ing and reclaiming Field instances. Typically - a single {@link Document} instance is re-used as - well. This helps most on small documents.

+ +

Expert: Like {@link #SIZE} but immediately break from the field loading loop, i.e., stop loading further fields, after the size is loaded + + + Works in conjunction with the SinkTokenizer to provide the ability to set aside tokens + that have already been analyzed. This is useful in situations where multiple fields share + many common analysis steps and then go their separate ways. +

+ It is also useful for doing things like entity extraction or proper noun analysis as + part of the analysis workflow and saving off those tokens for use in another field. -

Each Field instance should only be used once - within a single {@link Document} instance. See ImproveIndexingSpeed - for details.

+

+            SinkTokenizer sink1 = new SinkTokenizer();
+            SinkTokenizer sink2 = new SinkTokenizer();
+            TokenStream source1 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader1), sink1), sink2);
+            TokenStream source2 = new TeeTokenFilter(new TeeTokenFilter(new WhitespaceTokenizer(reader2), sink1), sink2);
+            TokenStream final1 = new LowerCaseFilter(source1);
+            TokenStream final2 = source2;
+            TokenStream final3 = new EntityDetect(sink1);
+            TokenStream final4 = new URLDetect(sink2);
+            d.add(new Field("f1", final1));
+            d.add(new Field("f2", final2));
+            d.add(new Field("f3", final3));
+            d.add(new Field("f4", final4));
+            
+ In this example, sink1 and sink2 will both get tokens from both + reader1 and reader2 after whitespace tokenizer + and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired. + It is important, that tees are consumed before sinks (in the above example, the field names must be + less the sink's field names). + Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene +

+ + See LUCENE-1058. +

+ WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API. + If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers + the same functionality.

-
- - Expert: change the value of this field. See setValue(String). - - - Expert: change the value of this field. See setValue(String). - - - Expert: change the value of this field. See setValue(String). - - - Expert: change the value of this field. See setValue(String). - use {@link #setTokenStream} + + + Use {@link TeeSinkTokenFilter} instead + - - Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true. - May be combined with stored values from stringValue() or binaryValue() + + Normalizes tokens extracted with {@link StandardTokenizer}. + + + Construct filtering in. + + + Returns the next token in the stream, or null at EOS. +

Removes 's from the end of words. +

Removes dots from acronyms.

- - Create a field by specifying its name, value and how it will - be saved in the index. Term vectors will not be stored in the index. + + A filter that replaces accented characters in the ISO Latin 1 character set + (ISO-8859-1) by their unaccented equivalent. The case will not be altered. +

+ For instance, 'À' will be replaced by 'a'. +

- The name of the field - - The string to process - - Whether value should be stored in the index - - Whether the field should be indexed, and if so, if it should - be tokenized before indexing - - NullPointerException if name or value is null - IllegalArgumentException if the field is neither stored nor indexed -
- - Create a field by specifying its name, value and how it will - be saved in the index. - - - The name of the field - - The string to process - - Whether value should be stored in the index - - Whether the field should be indexed, and if so, if it should - be tokenized before indexing - - Whether term vector should be stored - - NullPointerException if name or value is null - IllegalArgumentException in any of the following situations: -
    -
  • the field is neither stored nor indexed
  • -
  • the field is not indexed but termVector is TermVector.YES
  • -
-
-
- - Create a field by specifying its name, value and how it will - be saved in the index. - - - The name of the field - - Whether to .intern() name or not - - The string to process - - Whether value should be stored in the index - - Whether the field should be indexed, and if so, if it should - be tokenized before indexing - - Whether term vector should be stored - - NullPointerException if name or value is null - IllegalArgumentException in any of the following situations: -
    -
  • the field is neither stored nor indexed
  • -
  • the field is not indexed but termVector is TermVector.YES
  • -
-
-
- - Create a tokenized and indexed field that is not stored. Term vectors will - not be stored. The Reader is read only when the Document is added to the index, - i.e. you may not close the Reader until {@link IndexWriter#AddDocument(Document)} - has been called. - - - The name of the field - - The reader with the content - - NullPointerException if name or reader is null - - - Create a tokenized and indexed field that is not stored, optionally with - storing term vectors. The Reader is read only when the Document is added to the index, - i.e. you may not close the Reader until {@link IndexWriter#AddDocument(Document)} - has been called. - - - The name of the field - - The reader with the content - - Whether term vector should be stored - - NullPointerException if name or reader is null - - - Create a tokenized and indexed field that is not stored. Term vectors will - not be stored. This is useful for pre-analyzed fields. - The TokenStream is read only when the Document is added to the index, - i.e. you may not close the TokenStream until {@link IndexWriter#AddDocument(Document)} - has been called. - - - The name of the field - - The TokenStream with the content - - NullPointerException if name or tokenStream is null - - - Create a tokenized and indexed field that is not stored, optionally with - storing term vectors. This is useful for pre-analyzed fields. - The TokenStream is read only when the Document is added to the index, - i.e. you may not close the TokenStream until {@link IndexWriter#AddDocument(Document)} - has been called. - - - The name of the field - - The TokenStream with the content - - Whether term vector should be stored - - NullPointerException if name or tokenStream is null - - - Create a stored field with binary value. Optionally the value may be compressed. - - - The name of the field - - The binary value - - How value should be stored (compressed or not) - - IllegalArgumentException if store is Store.NO - - - Create a stored field with binary value. Optionally the value may be compressed. - - - The name of the field - - The binary value - - Starting offset in value where this Field's bytes are - - Number of bytes to use for this Field, starting at offset - - How value should be stored (compressed or not) - - IllegalArgumentException if store is Store.NO - - - Specifies whether and how a field should be stored. - - - Store the original field value in the index in a compressed form. This is - useful for long documents and for binary valued fields. - - Please use {@link CompressionTools} instead. - For string fields that were previously indexed and stored using compression, - the new way to achieve this is: First add the field indexed-only (no store) - and additionally using the same field name as a binary, stored field - with {@link CompressionTools#compressString}. + in favor of {@link ASCIIFoldingFilter} which covers a superset + of Latin 1. This class will be removed in Lucene 3.0. - - Store the original field value in the index. This is useful for short texts - like a document's title which should be displayed with the results. The - value is stored in its original form, i.e. no analyzer is used before it is - stored. - - - - Do not store the field value in the index. - - - Specifies whether and how a field should be indexed. - - - Do not index the field value. This field can thus not be searched, - but one can still access its contents provided it is - {@link Field.Store stored}. - - - - Index the tokens produced by running the field's - value through an Analyzer. This is useful for - common text. - - - - this has been renamed to {@link #ANALYZED} + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. - - Index the field's value without using an Analyzer, so it can be searched. - As no analyzer is used the value will be stored as a single term. This is - useful for unique Ids like product numbers. - - - - This has been renamed to {@link #NOT_ANALYZED} + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. - - Expert: Index the field's value without an Analyzer, - and also disable the storing of norms. Note that you - can also separately enable/disable norms by calling - {@link Field#setOmitNorms}. No norms means that - index-time field and document boosting and field - length normalization are disabled. The benefit is - less memory usage as norms take up one byte of RAM - per indexed field for every document in the index, - during searching. Note that once you index a given - field with norms enabled, disabling norms will - have no effect. In other words, for this to have the - above described effect on a field, all instances of - that field must be indexed with NOT_ANALYZED_NO_NORMS - from the beginning. + + To replace accented characters in a String by unaccented equivalents. + + + Class to encode java's UTF16 char[] into UTF8 byte[] + without always allocating a new byte[] as + String.getBytes("UTF-8") does. + +

WARNING: This API is a new and experimental and + may suddenly change.

- - This has been renamed to - {@link #NOT_ANALYZED_NO_NORMS} - - - - Expert: Index the tokens produced by running the - field's value through an Analyzer, and also - separately disable the storing of norms. See - {@link #NOT_ANALYZED_NO_NORMS} for what norms are - and why you may want to disable them. + + Encode characters from a char[] source, starting at + offset and stopping when the character 0xffff is seen. + Returns the number of bytes written to bytesOut. - - Specifies whether and how a field should have term vectors. - - - Do not store term vectors. - - - Store the term vectors of each document. A term vector is a list - of the document's terms and their number of occurrences in that document. + + Encode characters from a char[] source, starting at + offset for length chars. Returns the number of bytes + written to bytesOut. - - Store the term vector + token position information + + Encode characters from this String, starting at offset + for length characters. Returns the number of bytes + written to bytesOut. + + + + Convert UTF8 bytes into UTF16 characters. If offset + is non-zero, conversion starts at that starting point + in utf8, re-using the results from the previous call + up until offset. + + + + Methods for manipulating strings. + + $Id: StringHelper.java 801344 2009-08-05 18:05:06Z yonik $ + + + + Expert: + The StringInterner implementation used by Lucene. + This shouldn't be changed to an incompatible implementation after other Lucene APIs have been used. + + + + Return the same string object for all equal strings + + + Compares two byte[] arrays, element by element, and returns the + number of elements common to both arrays. - + The first byte[] to compare + + The second byte[] to compare + + The number of common elements. + + + + Compares two strings, character by character, and returns the + first position where the two strings differ from one another. + + + The first string to compare + + The second string to compare + + The first position where the two strings differ. + + + + Floating point numbers smaller than 32 bits. + + + $Id$ + + + + Converts a 32 bit float to an 8 bit float. +
Values less than zero are all mapped to zero. +
Values are truncated (rounded down) to the nearest 8 bit value. +
Values between zero and the smallest representable value + are rounded up. + +
+ the 32 bit float to be converted to an 8 bit float (byte) + + the number of mantissa bits to use in the byte, with the remainder to be used in the exponent + + the zero-point in the range of exponent values + + the 8 bit float representation + +
+ + Converts an 8 bit float to a 32 bit float. + + + floatToByte(b, mantissaBits=3, zeroExponent=15) +
smallest non-zero value = 5.820766E-10 +
largest value = 7.5161928E9 +
epsilon = 0.125 +
+
+ + byteToFloat(b, mantissaBits=3, zeroExponent=15) + + + floatToByte(b, mantissaBits=5, zeroExponent=2) +
smallest nonzero value = 0.033203125 +
largest value = 1984.0 +
epsilon = 0.03125 +
+
+ + byteToFloat(b, mantissaBits=5, zeroExponent=2) + + + A memory-resident {@link IndexOutput} implementation. + + + $Id: RAMOutputStream.java 691694 2008-09-03 17:34:29Z mikemccand $ + + + + Construct an empty output buffer. + + + Copy the current contents of this buffer to the named output. + + + Resets this to an empty buffer. + + + Returns byte usage of all buffers. + + + Simple standalone server that must be running when you + use {@link VerifyingLockFactory}. This server simply + verifies at most one process holds the lock at a time. + Run without any args to see usage. + + + + + - - Store the term vector + Token offset information + + A base class for all collectors that return a {@link TopDocs} output. This + collector allows easy extension by providing a single constructor which + accepts a {@link PriorityQueue} as well as protected members for that + priority queue and a counter of the number of total hits.
+ Extending classes can override {@link #TopDocs(int, int)} and + {@link #GetTotalHits()} in order to provide their own implementation. +
+
+ + The priority queue which holds the top documents. Note that different + implementations of PriorityQueue give different meaning to 'top documents'. + HitQueue for example aggregates the top scoring documents, while other PQ + implementations may hold documents sorted by other criteria. + + + + The total number of documents that the collector encountered. + + + Populates the results array with the ScoreDoc instaces. This can be + overridden in case a different ScoreDoc type should be returned. + + + + Returns a {@link TopDocs} instance containing the given results. If + results is null it means there are no results to return, + either because there were 0 calls to collect() or because the arguments to + topDocs were invalid. + + + + The total number of documents that matched this query. + + + Returns the top docs that were collected by this collector. + + + Returns the documents in the rage [start .. pq.size()) that were collected + by this collector. Note that if start >= pq.size(), an empty TopDocs is + returned.
+ This method is convenient to call if the application allways asks for the + last results, starting from the last 'page'.
+ NOTE: you cannot call this method more than once for each search + execution. If you need to call it more than once, passing each time a + different start, you should call {@link #TopDocs()} and work + with the returned {@link TopDocs} object, which will contain all the + results this search execution collected. +
+
+ + Returns the documents in the rage [start .. start+howMany) that were + collected by this collector. Note that if start >= pq.size(), an empty + TopDocs is returned, and if pq.size() - start < howMany, then only the + available documents in [start .. pq.size()) are returned.
+ This method is useful to call in case pagination of search results is + allowed by the search application, as well as it attempts to optimize the + memory used by allocating only as much as requested by howMany.
+ NOTE: you cannot call this method more than once for each search + execution. If you need to call it more than once, passing each time a + different range, you should call {@link #TopDocs()} and work with the + returned {@link TopDocs} object, which will contain all the results this + search execution collected. +
+
+ + Expert: obtains float field values from the + {@link Lucene.Net.Search.FieldCache FieldCache} + using getFloats() and makes those values + available as other numeric types, casting as needed. + +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case.

- + for requirements" + on the field. + +

NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

- -

Store the term vector + Token position and offset information + + Expert: A base class for ValueSource implementations that retrieve values for + a single field from the {@link Lucene.Net.Search.FieldCache FieldCache}. +

+ Fields used herein nust be indexed (doesn't matter if these fields are stored or not). +

+ It is assumed that each such indexed field is untokenized, or at least has a single token in a document. + For documents with multiple tokens of the same field, behavior is undefined (It is likely that current + code would use the value of one of these tokens, but this is not guaranteed). +

+ Document with no tokens in this field are assigned the Zero value. + +

+ WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

+

+
+ + Create a cached field source for the input field. + + + Return cached DocValues for input field and reader. + FieldCache so that values of a field are loaded once per reader (RAM allowing) + + Field for which values are required. + + + + + + Check if equals to another {@link FieldCacheSource}, already knowing that cache and field are equal. + + + + + Return a hash code of a {@link FieldCacheSource}, without the hash-codes of the field + and the cache (those are taken care of elsewhere). + + + + + + Create a cached float field source with default string-to-float parser. + + + Create a cached float field source with a specific string-to-float parser. + + + Abstract decorator class for a DocIdSet implementation + that provides on-demand filtering/validation + mechanism on a given DocIdSet. + +

+ + Technically, this same functionality could be achieved + with ChainedFilter (under contrib/misc), however the + benefit of this class is it never materializes the full + bitset for the filter. Instead, the {@link #match} + method is invoked on-demand, per docID visited during + searching. If you know few docIDs will be visited, and + the logic behind {@link #match} is relatively costly, + this may be a better way to filter than ChainedFilter.

- + - +
+ + Constructor. + Underlying DocIdSet + + + + This DocIdSet implementation is cacheable if the inner set is cacheable. + + + Validation method to determine whether a docid should be in the result set. + docid to be tested + + true if input docid should be in the result set, false otherwise. + + + + Implementation of the contract to build a DocIdSetIterator. + - + + + Abstract decorator class of a DocIdSetIterator + implementation that provides on-demand filter/validation + mechanism on an underlying DocIdSetIterator. See {@link + FilteredDocIdSet}. + + + + Constructor. + Underlying DocIdSetIterator. + + + + Validation method to determine whether a docid should be in the result set. + docid to be tested + + true if input docid should be in the result set, false otherwise. + + + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Provides a {@link FieldComparator} for custom field sorting. + + NOTE: This API is experimental and might change in + incompatible ways in the next release. + + + + + Creates a comparator for the field in the given index. + + + Name of the field to create comparator for. + + FieldComparator. + + IOException + If an error occurs reading the index. + + + + A {@link Filter} that only accepts documents whose single + term value in the specified field is contained in the + provided set of allowed terms. + +

+ + This is the same functionality as TermsFilter (from + contrib/queries), except this filter requires that the + field contains only a single term for all documents. + Because of drastically different implementations, they + also have different performance characteristics, as + described below. + +

+ + The first invocation of this filter on a given field will + be slower, since a {@link FieldCache.StringIndex} must be + created. Subsequent invocations using the same field + will re-use this cache. However, as with all + functionality based on {@link FieldCache}, persistent RAM + is consumed to hold the cache, and is not freed until the + {@link IndexReader} is closed. In contrast, TermsFilter + has no persistent RAM consumption. + + +

+ + With each search, this filter translates the specified + set of Terms into a private {@link OpenBitSet} keyed by + term number per unique {@link IndexReader} (normally one + reader per segment). Then, during matching, the term + number for each docID is retrieved from the cache and + then checked for inclusion using the {@link OpenBitSet}. + Since all testing is done using RAM resident data + structures, performance should be very fast, most likely + fast enough to not require further caching of the + DocIdSet for each possible combination of terms. + However, because docIDs are simply scanned linearly, an + index with a great many small documents may find this + linear scan too costly. + +

+ + In contrast, TermsFilter builds up an {@link OpenBitSet}, + keyed by docID, every time it's created, by enumerating + through all matching docs using {@link TermDocs} to seek + and scan through each term's docID list. While there is + no linear scan of all docIDs, besides the allocation of + the underlying array in the {@link OpenBitSet}, this + approach requires a number of "disk seeks" in proportion + to the number of terms, which can be exceptionally costly + when there are cache misses in the OS's IO cache. + +

+ + Generally, this filter will be slower on the first + invocation for a given field, but subsequent invocations, + even if you change the allowed set of Terms, should be + faster than TermsFilter, especially as the number of + Terms being matched increases. If you are matching only + a very small number of terms, and those terms in turn + match a very small number of documents, TermsFilter may + perform faster. + +

+ + Which filter is best is very application dependent. +

+
+ + This DocIdSet implementation is cacheable. + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Token Manager. + + + Token literal values and constants. + Generated by org.javacc.parser.OtherFilesGen#start() + + + + End of File. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + RegularExpression Id. + + + Lexical state. + + + Lexical state. + + + Lexical state. + + + Lexical state. + + + Literal token values. + + + Debug output. + + + Set debug output. + + + Token literal values. + + + Lexer state names. + + + Lex State array. + + + Constructor. + + + Constructor. + + + Reinitialise parser. + + + Reinitialise parser. + + + Switch to specified lex state. + + + Get the next Token. + + + Called once per field per document if term vectors + are enabled, to write the vectors to + RAMOutputStream, which is then quickly flushed to + * the real term vectors files in the Directory. + + + + The TermVectorOffsetInfo class holds information pertaining to a Term in a {@link Lucene.Net.Index.TermPositionVector}'s + offset information. This offset information is the character offset as set during the Analysis phase (and thus may not be the actual offset in the + original content). + + + + Convenience declaration when creating a {@link Lucene.Net.Index.TermPositionVector} that stores only position information. + + + The accessor for the ending offset for the term + The offset + + + + The accessor for the starting offset of the term. + + + The offset + + + + Two TermVectorOffsetInfos are equals if both the start and end offsets are the same + The comparison Object + + true if both {@link #GetStartOffset()} and {@link #GetEndOffset()} are the same for both objects. + + + + A Term represents a word from text. This is the unit of search. It is + composed of two elements, the text of the word, as a string, and the name of + the field that the text occured in, an interned string. + Note that terms may represent more than words from text fields, but also + things like dates, email addresses, urls, etc. + + + + Constructs a Term with the given field and text. +

Note that a null field or null text value results in undefined + behavior for most Lucene APIs that accept a Term parameter. +

+
+ + Constructs a Term with the given field and empty text. + This serves two purposes: 1) reuse of a Term with the same field. + 2) pattern for a query. + + + + + + + Returns the field of this term, an interned string. The field indicates + the part of a document which this term came from. + + + + Returns the text of this term. In the case of words, this is simply the + text of the word. In the case of dates and other types, this is an + encoding of the object as a string. + + + + Optimized construction of new Terms by reusing same field as this Term + - avoids field.intern() overhead + + The text of the new term (field is implicitly same as this Term instance) + + A new Term + + + + Compares two terms, returning a negative integer if this + term belongs before the argument, zero if this term is equal to the + argument, and a positive integer if this term belongs after the argument. + The ordering of terms is first by field, then by text. + + + + Resets the field and text of a Term. + + + A {@link MergeScheduler} that simply does each merge + sequentially, using the current thread. + + + +

Expert: {@link IndexWriter} uses an instance + implementing this interface to execute the merges + selected by a {@link MergePolicy}. The default + MergeScheduler is {@link ConcurrentMergeScheduler}.

+ +

NOTE: This API is new and still experimental + (subject to change suddenly in the next release)

+ +

NOTE: This class typically requires access to + package-private APIs (eg, SegmentInfos) to do its job; + if you implement your own MergePolicy, you'll need to put + it in package Lucene.Net.Index in order to use + these APIs. +

+
+ + Run the merges provided by {@link IndexWriter#GetNextMerge()}. + + + Close this MergeScheduler. + + + Just do the merges in sequence. We do this + "synchronized" so that even if the application is using + multiple threads, only one merge may run at a time. + + + + NOTE: this API is experimental and will likely change + + + Adds a new term in this field; term ends with U+FFFF + char + + + + Called when we are done adding terms to this field + + + Adds a new term in this field + + + Called when we are done adding terms to this field + + + Provides support for converting longs to Strings, and back again. The strings + are structured so that lexicographic sorting order is preserved. + +

+ That is, if l1 is less than l2 for any two longs l1 and l2, then + NumberTools.longToString(l1) is lexicographically less than + NumberTools.longToString(l2). (Similarly for "greater than" and "equals".) + +

+ This class handles all long values (unlike + {@link Lucene.Net.Documents.DateField}). + +

+ For new indexes use {@link NumericUtils} instead, which + provides a sortable binary representation (prefix encoded) of numeric + values. + To index and efficiently query numeric values use {@link NumericField} + and {@link NumericRangeQuery}. + This class is included for use with existing + indices and will be removed in a future release. + +
+ + Equivalent to longToString(Long.MIN_VALUE) + + + Equivalent to longToString(Long.MAX_VALUE) + + + The length of (all) strings returned by {@link #longToString} + + + Converts a long to a String suitable for indexing. + + + Converts a String that was returned by {@link #longToString} back to a + long. + + + IllegalArgumentException + if the input is null + + NumberFormatException + if the input does not parse (it was not a String returned by + longToString()). + + + + Provides support for converting dates to strings and vice-versa. + The strings are structured so that lexicographic sorting orders + them by date, which makes them suitable for use as field values + and search terms. + +

This class also helps you to limit the resolution of your dates. Do not + save dates with a finer resolution than you really need, as then + RangeQuery and PrefixQuery will require more memory and become slower. + +

Compared to {@link DateField} the strings generated by the methods + in this class take slightly more space, unless your selected resolution + is set to Resolution.DAY or lower. + +

+ Another approach is {@link NumericUtils}, which provides + a sortable binary representation (prefix encoded) of numeric values, which + date/time are. + For indexing a {@link Date} or {@link Calendar}, just get the unix timestamp as + long using {@link Date#getTime} or {@link Calendar#getTimeInMillis} and + index this as a numeric value with {@link NumericField} + and use {@link NumericRangeQuery} to query it. +

+
+ + Converts a Date to a string suitable for indexing. + + + the date to be converted + + the desired resolution, see + {@link #Round(Date, DateTools.Resolution)} + + a string in format yyyyMMddHHmmssSSS or shorter, + depending on resolution; using GMT as timezone + + + + Converts a millisecond time to a string suitable for indexing. + + + the date expressed as milliseconds since January 1, 1970, 00:00:00 GMT + + the desired resolution, see + {@link #Round(long, DateTools.Resolution)} + + a string in format yyyyMMddHHmmssSSS or shorter, + depending on resolution; using GMT as timezone + + + + Converts a string produced by timeToString or + DateToString back to a time, represented as the + number of milliseconds since January 1, 1970, 00:00:00 GMT. + + + the date string to be converted + + the number of milliseconds since January 1, 1970, 00:00:00 GMT + + ParseException if dateString is not in the + expected format + + + + Converts a string produced by timeToString or + DateToString back to a time, represented as a + Date object. + + + the date string to be converted + + the parsed time as a Date object + + ParseException if dateString is not in the + expected format + + + + Limit a date's resolution. For example, the date 2004-09-21 13:50:11 + will be changed to 2004-09-01 00:00:00 when using + Resolution.MONTH. + + + The desired resolution of the date to be returned + + the date with all values more precise than resolution + set to 0 or 1 + + + + Limit a date's resolution. For example, the date 1095767411000 + (which represents 2004-09-21 13:50:11) will be changed to + 1093989600000 (2004-09-01 00:00:00) when using + Resolution.MONTH. + + + The time in milliseconds (not ticks). + The desired resolution of the date to be returned + + the date with all values more precise than resolution + set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT + + + + Specifies the time granularity. + + + This TokenFilter provides the ability to set aside attribute states + that have already been analyzed. This is useful in situations where multiple fields share + many common analysis steps and then go their separate ways. +

+ It is also useful for doing things like entity extraction or proper noun analysis as + part of the analysis workflow and saving off those tokens for use in another field. + +

+            TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader1));
+            TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream();
+            TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream();
+            TeeSinkTokenFilter source2 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader2));
+            source2.addSinkTokenStream(sink1);
+            source2.addSinkTokenStream(sink2);
+            TokenStream final1 = new LowerCaseFilter(source1);
+            TokenStream final2 = source2;
+            TokenStream final3 = new EntityDetect(sink1);
+            TokenStream final4 = new URLDetect(sink2);
+            d.add(new Field("f1", final1));
+            d.add(new Field("f2", final2));
+            d.add(new Field("f3", final3));
+            d.add(new Field("f4", final4));
+            
+ In this example, sink1 and sink2 will both get tokens from both + reader1 and reader2 after whitespace tokenizer + and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired. + It is important, that tees are consumed before sinks (in the above example, the field names must be + less the sink's field names). If you are not sure, which stream is consumed first, you can simply + add another sink and then pass all tokens to the sinks at once using {@link #consumeAllTokens}. + This TokenFilter is exhausted after this. In the above example, change + the example above to: +
+            ...
+            TokenStream final1 = new LowerCaseFilter(source1.newSinkTokenStream());
+            TokenStream final2 = source2.newSinkTokenStream();
+            sink1.consumeAllTokens();
+            sink2.consumeAllTokens();
+            ...
+            
+ In this case, the fields can be added in any order, because the sources are not used anymore and all sinks are ready. +

Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene. +

+
+ + Instantiates a new TeeSinkTokenFilter. + + + Returns a new {@link SinkTokenStream} that receives all tokens consumed by this stream. + + + Returns a new {@link SinkTokenStream} that receives all tokens consumed by this stream + that pass the supplied filter. + + + + + + Adds a {@link SinkTokenStream} created by another TeeSinkTokenFilter + to this one. The supplied stream will also receive all consumed tokens. + This method can be used to pass tokens from two different tees to one sink. + + + + TeeSinkTokenFilter passes all tokens to the added sinks + when itself is consumed. To be sure, that all tokens from the input + stream are passed to the sinks, you can call this methods. + This instance is exhausted after this, but all sinks are instant available. + + + + A filter that decides which {@link AttributeSource} states to store in the sink. + + + Returns true, iff the current state of the passed-in {@link AttributeSource} shall be stored + in the sink. + + + + Called by {@link SinkTokenStream#Reset()}. This method does nothing by default + and can optionally be overridden. + + + +

+ The TimeLimitedCollector is used to timeout search requests that take longer + than the maximum allowed search time limit. After this time is exceeded, the + search thread is stopped by throwing a TimeExceeded Exception. +

+ +

+ Use {@link TimeLimitingCollector} instead, which extends the new + {@link Collector}. This class will be removed in 3.0. + +
+ + Default timer resolution. + + + + + Default for {@link #IsGreedy()}. + + + + + Create a TimeLimitedCollector wrapper over another HitCollector with a specified timeout. + the wrapped HitCollector + + max time allowed for collecting hits after which {@link TimeExceededException} is thrown + + + + Calls collect() on the decorated HitCollector. + + + TimeExceededException if the time allowed has been exceeded. + + + Return the timer resolution. + + + + + Set the timer resolution. + The default timer resolution is 20 milliseconds. + This means that a search required to take no longer than + 800 milliseconds may be stopped after 780 to 820 milliseconds. +
Note that: +
    +
  • Finer (smaller) resolution is more accurate but less efficient.
  • +
  • Setting resolution to less than 5 milliseconds will be silently modified to 5 milliseconds.
  • +
  • Setting resolution smaller than current resolution might take effect only after current + resolution. (Assume current resolution of 20 milliseconds is modified to 5 milliseconds, + then it can take up to 20 milliseconds for the change to have effect.
  • +
+
+
+ + Checks if this time limited collector is greedy in collecting the last hit. + A non greedy collector, upon a timeout, would throw a {@link TimeExceededException} + without allowing the wrapped collector to collect current doc. A greedy one would + first allow the wrapped hit collector to collect current doc and only then + throw a {@link TimeExceededException}. + + + + + + Sets whether this time limited collector is greedy. + true to make this time limited greedy + + + + + + TimerThread provides a pseudo-clock service to all searching + threads, so that they can count elapsed time with less overhead + than repeatedly calling System.currentTimeMillis. A single + thread should be created to be used for all searches. + + + + Get the timer value in milliseconds. + + + Thrown when elapsed search time exceeds allowed search time. + + + Returns allowed time (milliseconds). + + + Returns elapsed time (milliseconds). + + + Returns last doc that was collected when the search time exceeded. + + + Expert: A Scorer for documents matching a Term. + + + Construct a TermScorer. + + + The weight of the Term in the query. + + An iterator over the documents matching the Term. + + The Similarity implementation to be used for score + computations. + + The field norms of the document fields for the Term. + + + + use {@link #Score(Collector)} instead. + + + + use {@link #Score(Collector, int, int)} instead. + + + + use {@link #DocID()} instead. + + + + Advances to the next document matching the query.
+ The iterator over the matching documents is buffered using + {@link TermDocs#Read(int[],int[])}. + +
+ true iff there is another document matching the query. + + use {@link #NextDoc()} instead. + +
+ + Advances to the next document matching the query.
+ The iterator over the matching documents is buffered using + {@link TermDocs#Read(int[],int[])}. + +
+ the document matching the query or -1 if there are no more documents. + +
+ + Skips to the first match beyond the current whose document number is + greater than or equal to a given target.
+ The implementation uses {@link TermDocs#SkipTo(int)}. + +
+ The target document number. + + true iff there is such a match. + + use {@link #Advance(int)} instead. + +
+ + Advances to the first match beyond the current whose document number is + greater than or equal to a given target.
+ The implementation uses {@link TermDocs#SkipTo(int)}. + +
+ The target document number. + + the matching document or -1 if none exist. + +
+ + Returns an explanation of the score for a document. +
When this method is used, the {@link #Next()} method + and the {@link #Score(HitCollector)} method should not be used. +
+ The document number for the explanation. + +
+ + Returns a string representation of this TermScorer. + + + Encapsulates sort criteria for returned hits. + +

The fields used to determine sort order must be carefully chosen. + Documents must contain a single term in such a field, + and the value of the term should indicate the document's relative position in + a given sort order. The field must be indexed, but should not be tokenized, + and does not need to be stored (unless you happen to want it back with the + rest of your document data). In other words: + +

document.add (new Field ("byNumber", Integer.toString(x), Field.Store.NO, Field.Index.NOT_ANALYZED));

+ + +

Valid Types of Values

+ +

There are four possible kinds of term values which may be put into + sorting fields: Integers, Longs, Floats, or Strings. Unless + {@link SortField SortField} objects are specified, the type of value + in the field is determined by parsing the first term in the field. + +

Integer term values should contain only digits and an optional + preceding negative sign. Values must be base 10 and in the range + Integer.MIN_VALUE and Integer.MAX_VALUE inclusive. + Documents which should appear first in the sort + should have low value integers, later documents high values + (i.e. the documents should be numbered 1..n where + 1 is the first and n the last). + +

Long term values should contain only digits and an optional + preceding negative sign. Values must be base 10 and in the range + Long.MIN_VALUE and Long.MAX_VALUE inclusive. + Documents which should appear first in the sort + should have low value integers, later documents high values. + +

Float term values should conform to values accepted by + {@link Float Float.valueOf(String)} (except that NaN + and Infinity are not supported). + Documents which should appear first in the sort + should have low values, later documents high values. + +

String term values can contain any valid String, but should + not be tokenized. The values are sorted according to their + {@link Comparable natural order}. Note that using this type + of term value has higher memory requirements than the other + two types. + +

Object Reuse

+ +

One of these objects can be + used multiple times and the sort order changed between usages. + +

This class is thread safe. + +

Memory Usage

+ +

Sorting uses of caches of term values maintained by the + internal HitQueue(s). The cache is static and contains an integer + or float array of length IndexReader.maxDoc() for each field + name for which a sort is performed. In other words, the size of the + cache in bytes is: + +

4 * IndexReader.maxDoc() * (# of different fields actually used to sort) + +

For String fields, the cache is larger: in addition to the + above array, the value of every term in the field is kept in memory. + If there are many unique terms in the field, this could + be quite large. + +

Note that the size of the cache is not affected by how many + fields are in the index and might be used to sort - only by + the ones actually used to sort a result set. + +

Created: Feb 12, 2004 10:53:57 AM + +

+ lucene 1.4 + + $Id: Sort.java 795179 2009-07-17 18:23:30Z mikemccand $ + +
+ + Represents sorting by computed relevance. Using this sort criteria returns + the same results as calling + {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, + only with slightly more overhead. + + + + Represents sorting by index order. + + + Sorts by computed relevance. This is the same sort criteria as calling + {@link Searcher#Search(Query) Searcher#search()}without a sort criteria, + only with slightly more overhead. + + + + Sorts by the terms in field then by index order (document + number). The type of value in field is determined + automatically. + + + + + Please specify the type explicitly by + first creating a {@link SortField} and then use {@link + #Sort(SortField)} + + + + Sorts possibly in reverse by the terms in field then by + index order (document number). The type of value in field is + determined automatically. + + + + + Please specify the type explicitly by + first creating a {@link SortField} and then use {@link + #Sort(SortField)} + + + + Sorts in succession by the terms in each field. The type of value in + field is determined automatically. + + + + + Please specify the type explicitly by + first creating {@link SortField}s and then use {@link + #Sort(SortField[])} + + + + Sorts by the criteria in the given SortField. + + + Sorts in succession by the criteria in each SortField. + + + Sets the sort to the terms in field then by index order + (document number). + + Please specify the type explicitly by + first creating a {@link SortField} and then use {@link + #SetSort(SortField)} + + + + Sets the sort to the terms in field possibly in reverse, + then by index order (document number). + + Please specify the type explicitly by + first creating a {@link SortField} and then use {@link + #SetSort(SortField)} + + + + Sets the sort to the terms in each field in succession. + Please specify the type explicitly by + first creating {@link SortField}s and then use {@link + #SetSort(SortField[])} + + + + Sets the sort to the given criteria. + + + Sets the sort to the given criteria in succession. + + + Representation of the sort criteria. + Array of SortField objects used in this sort criteria + + + + Returns true if o is equal to this. + + + Returns a hash code value for this object. + + + A Query that matches documents within an exclusive range of terms. + +

This query matches the documents looking for terms that fall into the + supplied range according to {@link Term#CompareTo(Term)}. It is not intended + for numerical ranges, use {@link NumericRangeQuery} instead. + +

This query uses {@linkplain + MultiTermQuery#SCORING_BOOLEAN_QUERY_REWRITE}. If you + want to change this, use the new {@link TermRangeQuery} + instead. + +

+ Use {@link TermRangeQuery} for term ranges or + {@link NumericRangeQuery} for numeric ranges instead. + This class will be removed in Lucene 3.0. + +
+ + Constructs a query selecting all terms greater than + lowerTerm but less than upperTerm. + There must be at least one term and either term may be null, + in which case there is no bound on that side, but if there are + two terms, both terms must be for the same field. + + + The Term at the lower end of the range + + The Term at the upper end of the range + + If true, both lowerTerm and + upperTerm will themselves be included in the range. + + + + Constructs a query selecting all terms greater than + lowerTerm but less than upperTerm. + There must be at least one term and either term may be null, + in which case there is no bound on that side, but if there are + two terms, both terms must be for the same field. +

+ If collator is not null, it will be used to decide whether + index terms are within the given range, rather than using the Unicode code + point order in which index terms are stored. +

+ WARNING: Using this constructor and supplying a non-null + value in the collator parameter will cause every single + index Term in the Field referenced by lowerTerm and/or upperTerm to be + examined. Depending on the number of index Terms in this Field, the + operation could be very slow. + +

+ The Term at the lower end of the range + + The Term at the upper end of the range + + If true, both lowerTerm and + upperTerm will themselves be included in the range. + + The collator to use to collate index Terms, to determine + their membership in the range bounded by lowerTerm and + upperTerm. + +
+ + Returns the field name for this query + + + Returns the lower term of this range query. + + + Returns the upper term of this range query. + + + Returns true if the range query is inclusive + + + Returns the collator used to determine range inclusion, if any. + + + Prints a user-readable version of this query. + + + Returns true iff o is equal to this. + + + Returns a hash code value for this object. + + + Wrapper used by {@link HitIterator} to provide a lazily loaded hit + from {@link Hits}. + + + Use {@link TopScoreDocCollector} and {@link TopDocs} instead. Hits will be removed in Lucene 3.0. + + + + Constructed from {@link HitIterator} + Hits returned from a search + + Hit index in Hits + + + + Returns document for this hit. + + + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Returns score for this hit. + + + + + + + Returns id for this hit. + + + + + + + Returns the boost factor for this hit on any field of the underlying document. + + + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Returns the string value of the field with the given name if any exist in + this document, or null. If multiple fields exist with this name, this + method returns the first value added. If only binary fields with this name + exist, returns null. + + + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Prints the parameters to be used to discover the promised result. + + + This interface is obsolete, use {@link FieldCache} instead. + + + Use {@link FieldCache}, this will be removed in Lucene 3.0 + + + + + Use {@link FieldCache#DEFAULT}; this will be removed in Lucene 3.0 + + + + Use {@link FieldCache.LongParser}, this will be removed in Lucene 3.0 + + + + Use {@link FieldCache.DoubleParser}, this will be removed in Lucene 3.0 + + + + Token Manager Error. + + + Lexical error occurred. + + + An attempt was made to create a second instance of a static token manager. + + + Tried to change to an invalid lexical state. + + + Detected (and bailed out of) an infinite loop in the token manager. + + + Indicates the reason why the exception is thrown. It will have + one of the above 4 values. + + + + Replaces unprintable characters by their escaped (or unicode escaped) + equivalents in the given string + + + + Returns a detailed message for the Error when it is thrown by the + token manager to indicate a lexical error. + Parameters : + EOFSeen : indicates if EOF caused the lexical error + curLexState : lexical state in which this error occurred + errorLine : line number when the error occurred + errorColumn : column number when the error occurred + errorAfter : prefix that was seen before this error occurred + curchar : the offending character + Note: You can customize the lexical error message by modifying this method. + + + + No arg constructor. + + + Constructor with message and reason. + + + Full Constructor. + + + You can also modify the body of this method to customize your error messages. + For example, cases like LOOP_DETECTED and INVALID_LEXICAL_STATE are not + of end-users concern, so you can return something like : + + "Internal Error : Please file a bug report .... " + + from this method for such cases in the release version of your parser. + + + + This class is generated by JavaCC. The most important method is + {@link #Parse(String)}. + + The syntax for query strings is as follows: + A Query is a series of clauses. + A clause may be prefixed by: +
    +
  • a plus (+) or a minus (-) sign, indicating + that the clause is required or prohibited respectively; or
  • +
  • a term followed by a colon, indicating the field to be searched. + This enables one to construct queries which search multiple fields.
  • +
+ + A clause may be either: +
    +
  • a term, indicating all the documents that contain this term; or
  • +
  • a nested query, enclosed in parentheses. Note that this may be used + with a +/- prefix to require any of a set of + terms.
  • +
+ + Thus, in BNF, the query grammar is: +
+            Query  ::= ( Clause )*
+            Clause ::= ["+", "-"] [<TERM> ":"] ( <TERM> | "(" Query ")" )
+            
+ +

+ Examples of appropriately formatted queries can be found in the query syntax + documentation. +

+ +

+ In {@link TermRangeQuery}s, QueryParser tries to detect date values, e.g. + date:[6/1/2005 TO 6/4/2005] produces a range query that searches + for "date" fields between 2005-06-01 and 2005-06-04. Note that the format + of the accepted input depends on {@link #SetLocale(Locale) the locale}. + By default a date is converted into a search term using the deprecated + {@link DateField} for compatibility reasons. + To use the new {@link DateTools} to convert dates, a + {@link Lucene.Net.Documents.DateTools.Resolution} has to be set. +

+

+ The date resolution that shall be used for RangeQueries can be set + using {@link #SetDateResolution(DateTools.Resolution)} + or {@link #SetDateResolution(String, DateTools.Resolution)}. The former + sets the default date resolution for all fields, whereas the latter can + be used to set field specific date resolutions. Field specific date + resolutions take, if set, precedence over the default date resolution. +

+

+ If you use neither {@link DateField} nor {@link DateTools} in your + index, you can create your own + query parser that inherits QueryParser and overwrites + {@link #GetRangeQuery(String, String, String, boolean)} to + use a different method for date conversion. +

+ +

Note that QueryParser is not thread-safe.

+ +

NOTE: there is a new QueryParser in contrib, which matches + the same syntax as this class, but is more modular, + enabling substantial customization to how a query is created. + +

NOTE: there is a new QueryParser in contrib, which matches + the same syntax as this class, but is more modular, + enabling substantial customization to how a query is created. + NOTE: You must specify the required {@link Version} compatibility when + creating QueryParser: +

    +
  • As of 2.9, {@link #SetEnablePositionIncrements} is true by default.
  • +
+
+
+ + Alternative form of QueryParser.Operator.AND + + + Alternative form of QueryParser.Operator.OR + + + The actual operator that parser uses to combine query terms + + + Constructs a query parser. + the default field for query terms. + + used to find terms in the query text. + + Use {@link #QueryParser(Version, String, Analyzer)} instead + + + + Constructs a query parser. + + + Lucene version to match. See above) + + the default field for query terms. + + used to find terms in the query text. + + + + Parses a query string, returning a {@link Lucene.Net.Search.Query}. + the query string to be parsed. + + ParseException if the parsing fails + + + Returns the analyzer. + + + + Returns the field. + + + + Get the minimal similarity for fuzzy queries. + + + Set the minimum similarity for fuzzy queries. + Default is 0.5f. + + + + Get the prefix length for fuzzy queries. + Returns the fuzzyPrefixLength. + + + + Set the prefix length for fuzzy queries. Default is 0. + The fuzzyPrefixLength to set. + + + + Sets the default slop for phrases. If zero, then exact phrase matches + are required. Default value is zero. + + + + Gets the default slop for phrases. + + + Set to true to allow leading wildcard characters. +

+ When set, * or ? are allowed as + the first character of a PrefixQuery and WildcardQuery. + Note that this can produce very slow + queries on big indexes. +

+ Default: false. +

+
+ + + + + + Set to true to enable position increments in result query. +

+ When set, result phrase and multi-phrase queries will + be aware of position increments. + Useful when e.g. a StopFilter increases the position increment of + the token that follows an omitted token. +

+ Default: false. +

+
+ + + + + + Sets the boolean operator of the QueryParser. + In default mode (OR_OPERATOR) terms without any modifiers + are considered optional: for example capital of Hungary is equal to + capital OR of OR Hungary.
+ In AND_OPERATOR mode terms are considered to be in conjunction: the + above mentioned query is parsed as capital AND of AND Hungary +
+
+ + Gets implicit operator setting, which will be either AND_OPERATOR + or OR_OPERATOR. + + + + Whether terms of wildcard, prefix, fuzzy and range queries are to be automatically + lower-cased or not. Default is true. + + + + + + + + Please use {@link #setMultiTermRewriteMethod} instead. + + + + Please use {@link #getMultiTermRewriteMethod} instead. + + + + By default QueryParser uses {@link MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} + when creating a PrefixQuery, WildcardQuery or RangeQuery. This implementation is generally preferable because it + a) Runs faster b) Does not have the scarcity of terms unduly influence score + c) avoids any "TooManyBooleanClauses" exception. + However, if your application really needs to use the + old-fashioned BooleanQuery expansion rewriting and the above + points are not relevant then use this to change + the rewrite method. + + + + + + + + Set locale used by date range parsing. + + + Returns current locale, allowing access by subclasses. + + + Sets the default date resolution used by RangeQueries for fields for which no + specific date resolutions has been set. Field specific resolutions can be set + with {@link #SetDateResolution(String, DateTools.Resolution)}. + + + the default date resolution to set + + + + Sets the date resolution used by RangeQueries for a specific field. + + + field for which the date resolution is to be set + + date resolution to set + + + + Returns the date resolution that is used by RangeQueries for the given field. + Returns null, if no default or field specific date resolution has been set + for the given field. + + + + + Sets the collator used to determine index term inclusion in ranges + for RangeQuerys. +

+ WARNING: Setting the rangeCollator to a non-null + collator using this method will cause every single index Term in the + Field referenced by lowerTerm and/or upperTerm to be examined. + Depending on the number of index Terms in this Field, the operation could + be very slow. + +

+ the collator to use when constructing RangeQuerys + +
+ + the collator used to determine index term inclusion in ranges + for RangeQuerys. + + + + use {@link #AddClause(List, int, int, Query)} instead. + + + + throw in overridden method to disallow + + + + Base implementation delegates to {@link #GetFieldQuery(String,String)}. + This method may be overridden, for example, to return + a SpanNearQuery instead of a PhraseQuery. + + + throw in overridden method to disallow + + + + throw in overridden method to disallow + + + + Builds a new BooleanQuery instance + disable coord + + new BooleanQuery instance + + + + Builds a new BooleanClause instance + sub query + + how this clause should occur when matching documents + + new BooleanClause instance + + + + Builds a new TermQuery instance + term + + new TermQuery instance + + + + Builds a new PhraseQuery instance + new PhraseQuery instance + + + + Builds a new MultiPhraseQuery instance + new MultiPhraseQuery instance + + + + Builds a new PrefixQuery instance + Prefix term + + new PrefixQuery instance + + + + Builds a new FuzzyQuery instance + Term + + minimum similarity + + prefix length + + new FuzzyQuery Instance + + + + Builds a new TermRangeQuery instance + Field + + min + + max + + true if range is inclusive + + new TermRangeQuery instance + + + + Builds a new MatchAllDocsQuery instance + new MatchAllDocsQuery instance + + + + Builds a new WildcardQuery instance + wildcard term + + new WildcardQuery instance + + + + Factory method for generating query, given a set of clauses. + By default creates a boolean query composed of clauses passed in. + + Can be overridden by extending classes, to modify query being + returned. + + + List that contains {@link BooleanClause} instances + to join. + + + Resulting {@link Query} object. + + throw in overridden method to disallow + + use {@link #GetBooleanQuery(List)} instead + + + + Factory method for generating query, given a set of clauses. + By default creates a boolean query composed of clauses passed in. + + Can be overridden by extending classes, to modify query being + returned. + + + List that contains {@link BooleanClause} instances + to join. + + + Resulting {@link Query} object. + + throw in overridden method to disallow + + + + Factory method for generating query, given a set of clauses. + By default creates a boolean query composed of clauses passed in. + + Can be overridden by extending classes, to modify query being + returned. + + + List that contains {@link BooleanClause} instances + to join. + + true if coord scoring should be disabled. + + + Resulting {@link Query} object. + + throw in overridden method to disallow + + use {@link #GetBooleanQuery(List, boolean)} instead + + + + Factory method for generating query, given a set of clauses. + By default creates a boolean query composed of clauses passed in. + + Can be overridden by extending classes, to modify query being + returned. + + + List that contains {@link BooleanClause} instances + to join. + + true if coord scoring should be disabled. + + + Resulting {@link Query} object. + + throw in overridden method to disallow + + + + Factory method for generating a query. Called when parser + parses an input term token that contains one or more wildcard + characters (? and *), but is not a prefix term token (one + that has just a single * character at the end) +

+ Depending on settings, prefix term may be lower-cased + automatically. It will not go through the default Analyzer, + however, since normal Analyzers are unlikely to work properly + with wildcard templates. +

+ Can be overridden by extending classes, to provide custom handling for + wildcard queries, which may be necessary due to missing analyzer calls. + +

+ Name of the field query will use. + + Term token that contains one or more wild card + characters (? or *), but is not simple prefix term + + + Resulting {@link Query} built for the term + + throw in overridden method to disallow + +
+ + Factory method for generating a query (similar to + {@link #getWildcardQuery}). Called when parser parses an input term + token that uses prefix notation; that is, contains a single '*' wildcard + character as its last character. Since this is a special case + of generic wildcard term, and such a query can be optimized easily, + this usually results in a different query object. +

+ Depending on settings, a prefix term may be lower-cased + automatically. It will not go through the default Analyzer, + however, since normal Analyzers are unlikely to work properly + with wildcard templates. +

+ Can be overridden by extending classes, to provide custom handling for + wild card queries, which may be necessary due to missing analyzer calls. + +

+ Name of the field query will use. + + Term token to use for building term for the query + (without trailing '*' character!) + + + Resulting {@link Query} built for the term + + throw in overridden method to disallow + +
+ + Factory method for generating a query (similar to + {@link #getWildcardQuery}). Called when parser parses + an input term token that has the fuzzy suffix (~) appended. + + + Name of the field query will use. + + Term token to use for building term for the query + + + Resulting {@link Query} built for the term + + throw in overridden method to disallow + + + + Returns a String where the escape char has been + removed, or kept only once if there was a double escape. + + Supports escaped unicode characters, e. g. translates + \\u0041 to A. + + + + + Returns the numeric value of the hexadecimal character + + + Returns a String where those characters that QueryParser + expects to be escaped are escaped by a preceding \. + + + + Command line tool to test QueryParser, using {@link Lucene.Net.Analysis.SimpleAnalyzer}. + Usage:
+ java Lucene.Net.QueryParsers.QueryParser <input> +
+
+ + Generated Token Manager. + + + Current token. + + + Next token. + + + Constructor with user supplied CharStream. + + + Reinitialise. + + + Constructor with generated Token Manager. + + + Reinitialise. + + + Get the next Token. + + + Get the specific Token. + + + Generate ParseException. + + + Enable tracing. + + + Disable tracing. + + + The default operator for parsing queries. + Use {@link QueryParser#setDefaultOperator} to change it. + + + + This interface describes a character stream that maintains line and + column number positions of the characters. It also has the capability + to backup the stream to some extent. An implementation of this + interface is used in the TokenManager implementation generated by + JavaCCParser. + + All the methods except backup can be implemented in any fashion. backup + needs to be implemented correctly for the correct operation of the lexer. + Rest of the methods are all used to get information like line number, + column number and the String that constitutes a token and are not used + by the lexer. Hence their implementation won't affect the generated lexer's + operation. + + + + Returns the next character from the selected input. The method + of selecting the input is the responsibility of the class + implementing this interface. Can throw any java.io.IOException. + + + + Returns the column position of the character last read. + + + + + + + Returns the line number of the character last read. + + + + + + + Returns the column number of the last character for current token (being + matched after the last call to BeginTOken). + + + + Returns the line number of the last character for current token (being + matched after the last call to BeginTOken). + + + + Returns the column number of the first character for current token (being + matched after the last call to BeginTOken). + + + + Returns the line number of the first character for current token (being + matched after the last call to BeginTOken). + + + + Backs up the input stream by amount steps. Lexer calls this method if it + had already read some characters, but could not use them to match a + (longer) token. So, they will be used again as the prefix of the next + token and it is the implemetation's responsibility to do this right. + + + + Returns the next character that marks the beginning of the next token. + All characters must remain in the buffer between two successive calls + to this method to implement backup correctly. + + + + Returns a string made up of characters from the marked token beginning + to the current buffer position. Implementations have the choice of returning + anything that they want to. For example, for efficiency, one might decide + to just return null, which is a valid implementation. + + + + Returns an array of characters that make up the suffix of length 'len' for + the currently matched token. This is used to build up the matched string + for use in actions in the case of MORE. A simple and inefficient + implementation of this is as follows : + + { + String t = GetImage(); + return t.substring(t.length() - len, t.length()).toCharArray(); + } + + + + The lexer calls this function to indicate that it is done with the stream + and hence implementations can free any resources held by this class. + Again, the body of this function can be just empty and it will not + affect the lexer's operation. + + + + Convenience class for holding TermVector information. + + + This class implements {@link InvertedDocConsumer}, which + is passed each token produced by the analyzer on each + field. It stores these tokens in a hash table, and + allocates separate byte streams per token. Consumers of + this class, eg {@link FreqProxTermsWriter} and {@link + TermVectorsTermsWriter}, write their own byte streams + under each term. + + + + Add a new thread + + + Abort (called after hitting AbortException) + + + Flush a new segment + + + Close doc stores + + + Attempt to free RAM, returning true if any RAM was + freed + + + + $Id + +

NOTE: This API is new and still experimental + (subject to change suddenly in the next release)

+

+
+ + The class which implements SegmentReader. + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Clones the norm bytes. May be overridden by subclasses. New and experimental. + Byte array to clone + + New BitVector + + + + Clones the deleteDocs BitVector. May be overridden by subclasses. New and experimental. + BitVector to clone + + New BitVector + + + + + + + + + + + + Read norms into a pre-allocated array. + + + Create a clone from the initial TermVectorsReader and store it in the ThreadLocal. + TermVectorsReader + + + + Return a term frequency vector for the specified document and field. The + vector returned contains term numbers and frequencies for all terms in + the specified field of this document, if the field had storeTermVector + flag set. If the flag was not set, the method returns null. + + IOException + + + Return an array of term frequency vectors for the specified document. + The array contains a vector for each vectorized field in the document. + Each vector vector contains term numbers and frequencies for all terms + in a given vectorized field. + If no such fields existed, the method returns null. + + IOException + + + Return the name of the segment this reader is reading. + + + Return the SegmentInfo of the segment this reader is reading. + + + Returns the directory this index resides in. + + + Lotsa tests did hacks like:
+ SegmentReader reader = (SegmentReader) IndexReader.open(dir);
+ They broke. This method serves as a hack to keep hacks working +
+
+ + Sets the initial value + + + Java's builtin ThreadLocal has a serious flaw: + it can take an arbitrarily long amount of time to + dereference the things you had stored in it, even once the + ThreadLocal instance itself is no longer referenced. + This is because there is single, master map stored for + each thread, which all ThreadLocals share, and that + master map only periodically purges "stale" entries. + + While not technically a memory leak, because eventually + the memory will be reclaimed, it can take a long time + and you can easily hit OutOfMemoryError because from the + GC's standpoint the stale entries are not reclaimaible. + + This class works around that, by only enrolling + WeakReference values into the ThreadLocal, and + separately holding a hard reference to each stored + value. When you call {@link #close}, these hard + references are cleared and then GC is freely able to + reclaim space by objects stored in it. + + + + Byte[] referencing is used because a new norm object needs + to be created for each clone, and the byte array is all + that is needed for sharing between cloned readers. The + current norm referencing is for sharing between readers + whereas the byte[] referencing is for copy on write which + is independent of reader references (i.e. incRef, decRef). + + + + Loader for text files that represent a list of stopwords. + + + + $Id: WordlistLoader.java 706342 2008-10-20 17:19:29Z gsingers $ + + + + Loads a text file and adds every line as an entry to a HashSet (omitting + leading and trailing whitespace). Every line of the file should contain only + one word. The words need to be in lowercase if you make use of an + Analyzer which uses LowerCaseFilter (like StandardAnalyzer). + + + File containing the wordlist + + A HashSet with the file's words + + + + Loads a text file and adds every non-comment line as an entry to a HashSet (omitting + leading and trailing whitespace). Every line of the file should contain only + one word. The words need to be in lowercase if you make use of an + Analyzer which uses LowerCaseFilter (like StandardAnalyzer). + + + File containing the wordlist + + The comment string to ignore + + A HashSet with the file's words + + + + Reads lines from a Reader and adds every line as an entry to a HashSet (omitting + leading and trailing whitespace). Every line of the Reader should contain only + one word. The words need to be in lowercase if you make use of an + Analyzer which uses LowerCaseFilter (like StandardAnalyzer). + + + Reader containing the wordlist + + A HashSet with the reader's words + + + + Reads lines from a Reader and adds every non-comment line as an entry to a HashSet (omitting + leading and trailing whitespace). Every line of the Reader should contain only + one word. The words need to be in lowercase if you make use of an + Analyzer which uses LowerCaseFilter (like StandardAnalyzer). + + + Reader containing the wordlist + + The string representing a comment. + + A HashSet with the reader's words + + + + Reads a stem dictionary. Each line contains: +
word\tstem
+ (i.e. two tab seperated words) + +
+ stem dictionary that overrules the stemming algorithm + + IOException +
+ + The term text of a Token. + + + Base class for Attributes that can be added to a + {@link Lucene.Net.Util.AttributeSource}. +

+ Attributes are used to add data in a dynamic, yet type-safe way to a source + of usually streamed objects, e. g. a {@link Lucene.Net.Analysis.TokenStream}. +

+
+ + Clears the values in this AttributeImpl and resets it to its + default value. If this implementation implements more than one Attribute interface + it clears all. + + + + The default implementation of this method accesses all declared + fields of this object and prints the values in the following syntax: + +
+            public String toString() {
+            return "start=" + startOffset + ",end=" + endOffset;
+            }
+            
+ + This method may be overridden by subclasses. +
+
+ + Subclasses must implement this method and should compute + a hashCode similar to this: +
+            public int hashCode() {
+            int code = startOffset;
+            code = code * 31 + endOffset;
+            return code;
+            }
+            
+ + see also {@link #equals(Object)} +
+
+ + All values used for computation of {@link #hashCode()} + should be checked here for equality. + + see also {@link Object#equals(Object)} + + + + Copies the values from this Attribute into the passed-in + target attribute. The target implementation must support all the + Attributes this implementation supports. + + + + Shallow clone. Subclasses must override this if they + need to clone any members deeply, + + The term text of a Token. @@ -18310,381 +17904,332 @@ the truncated length - - The payload of a Token. See also {@link Payload}. + + Returns the Token's term text. + + This method has a performance penalty + because the text is stored internally in a char[]. If + possible, use {@link #TermBuffer()} and {@link + #TermLength()} directly instead. If you really need a + String, use this method, which is nothing more than + a convenience call to new String(token.termBuffer(), 0, token.termLength()) + - - Initialize this attribute with no payload. + + Copies the contents of buffer, starting at offset for + length characters, into the termBuffer array. + + the buffer to copy + + the index in the buffer of the first character to copy + + the number of characters to copy + - - Initialize this attribute with the given payload. + + Copies the contents of buffer into the termBuffer array. + the buffer to copy + - - Returns this Token's payload. + + Copies the contents of buffer, starting at offset and continuing + for length characters, into the termBuffer array. + + the buffer to copy + + the index in the buffer of the first character to copy + + the number of characters to copy + - - Sets this Token's payload. + + Returns the internal termBuffer character array which + you can then directly alter. If the array is too + small for your token, use {@link + #ResizeTermBuffer(int)} to increase it. After + altering the buffer be sure to call {@link + #setTermLength} to record the number of valid + characters that were placed into the termBuffer. + - - LowerCaseTokenizer performs the function of LetterTokenizer - and LowerCaseFilter together. It divides text at non-letters and converts - them to lower case. While it is functionally equivalent to the combination - of LetterTokenizer and LowerCaseFilter, there is a performance advantage - to doing the two tasks at once, hence this (redundant) implementation. -

+ +

Grows the termBuffer to at least size newSize, preserving the + existing content. Note: If the next operation is to change + the contents of the term buffer use + {@link #SetTermBuffer(char[], int, int)}, + {@link #SetTermBuffer(String)}, or + {@link #SetTermBuffer(String, int, int)} + to optimally combine the resize with the setting of the termBuffer. + + minimum size of the new termBuffer + + newly created termBuffer with length >= newSize + + + + Allocates a buffer char[] of at least newSize, without preserving the existing content. + its always used in places that set the content + + minimum size of the buffer + + + + Return number of valid characters (length of the term) + in the termBuffer array. + + + + Set number of valid characters (length of the term) in + the termBuffer array. Use this to truncate the termBuffer + or to synchronize with external manipulation of the termBuffer. + Note: to grow the size of the array, + use {@link #ResizeTermBuffer(int)} first. + + the truncated length + + + + A LetterTokenizer is a tokenizer that divides text at non-letters. That's + to say, it defines tokens as maximal strings of adjacent letters, as defined + by java.lang.Character.isLetter() predicate. Note: this does a decent job for most European languages, but does a terrible - job for some Asian languages, where words are not separated by spaces. + job for some Asian languages, where words are not separated by spaces. - - Construct a new LowerCaseTokenizer. + + Construct a new LetterTokenizer. - - Construct a new LowerCaseTokenizer using a given {@link AttributeSource}. + + Construct a new LetterTokenizer using a given {@link AttributeSource}. - - Construct a new LowerCaseTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. + + Construct a new LetterTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. - - Converts char to lower case - {@link Character#toLowerCase(char)}. + + Collects only characters which satisfy + {@link Character#isLetter(char)}. - - Removes words that are too long or too short from the stream. - + + + The maximum number of items to cache. + + + + + The list to efficiently maintain the LRU state. + + + + + The dictionary to hash into any location in the list. + + + + + The node instance to use/re-use when adding an item to the cache. + + + + + Container to hold the key and value to aid in removal from + the dictionary when an item is removed from cache. + + + + This exception is thrown when there is an attempt to + access something that has already been closed. + + + + Subclass of FilteredTermEnum for enumerating all terms that match the + specified wildcard filter term. +

+ Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it.

- $Id: LengthFilter.java 807201 2009-08-24 13:22:34Z markrmiller $ + $Id: WildcardTermEnum.java 783371 2009-06-10 14:39:56Z mikemccand $
- - Build a filter that removes words that are too long or too - short from the text. + + ***************************************** + String equality with support for wildcards + ****************************************** - - Returns the next input Token whose term() is the right len - - - This is a helper class to generate prefix-encoded representations for numerical values - and supplies converters to represent float/double values as sortable integers/longs. - -

To quickly execute range queries in Apache Lucene, a range is divided recursively - into multiple intervals for searching: The center of the range is searched only with - the lowest possible precision in the trie, while the boundaries are matched - more exactly. This reduces the number of terms dramatically. - -

This class generates terms to achive this: First the numerical integer values need to - be converted to strings. For that integer values (32 bit or 64 bit) are made unsigned - and the bits are converted to ASCII chars with each 7 bit. The resulting string is - sortable like the original integer value. Each value is also prefixed - (in the first char) by the shift value (number of bits removed) used - during encoding. - -

To also index floating point numbers, this class supplies two methods to convert them - to integer values by changing their bit layout: {@link #doubleToSortableLong}, - {@link #floatToSortableInt}. You will have no precision loss by - converting floating point numbers to integers and back (only that the integer form - is not usable). Other data types like dates can easily converted to longs or ints (e.g. - date to long: {@link java.util.Date#getTime}). - -

For easy usage, the trie algorithm is implemented for indexing inside - {@link NumericTokenStream} that can index int, long, - float, and double. For querying, - {@link NumericRangeQuery} and {@link NumericRangeFilter} implement the query part - for the same data types. - -

This class can also be used, to generate lexicographically sortable (according - {@link String#compareTo(String)}) representations of numeric data types for other - usages (e.g. sorting). - -

NOTE: This API is experimental and - might change in incompatible ways in the next release. - -

- 2.9 - -
- - The default precision step used by {@link NumericField}, {@link NumericTokenStream}, - {@link NumericRangeQuery}, and {@link NumericRangeFilter} as default + + Creates a new WildcardTermEnum. +

+ After calling the constructor the enumeration is already pointing to the first + valid term if such a term exists.

- - Expert: The maximum term length (used for char[] buffer size) - for encoding long values. + + Determines if a word matches a wildcard pattern. + Work released by Granta Design Ltd after originally being done on + company time. - + + + Represents hits returned by {@link + * Searcher#search(Query,Filter,int)} and {@link + * Searcher#search(Query,int) + + + + The total number of hits for the query. + - - Expert: The maximum term length (used for char[] buffer size) - for encoding int values. + + The top hits for the query. + + + Stores the maximum score value encountered, needed for normalizing. + + + Returns the maximum score value encountered. Note that in case + scores are not tracked, this returns {@link Float#NaN}. - + + + Sets the maximum score value encountered. + + + Constructs a TopDocs with a default maxScore=Float.NaN. + + + + + + The {@link TimeLimitingCollector} is used to timeout search requests that + take longer than the maximum allowed search time limit. After this time is + exceeded, the search thread is stopped by throwing a + {@link TimeExceededException}. + + + + Default timer resolution. + - - Expert: Longs are stored at lower precision by shifting off lower bits. The shift count is - stored as SHIFT_START_LONG+shift in the first character - - - - Expert: Integers are stored at lower precision by shifting off lower bits. The shift count is - stored as SHIFT_START_INT+shift in the first character - - - - Expert: Returns prefix coded bits after reducing the precision by shift bits. - This is method is used by {@link NumericTokenStream}. - - the numeric value - - how many bits to strip from the right - - that will contain the encoded chars, must be at least of {@link #BUF_SIZE_LONG} - length - - number of chars written to buffer - - - - Expert: Returns prefix coded bits after reducing the precision by shift bits. - This is method is used by {@link LongRangeBuilder}. - - the numeric value - - how many bits to strip from the right - - - - This is a convenience method, that returns prefix coded bits of a long without - reducing the precision. It can be used to store the full precision value as a - stored field in index. -

To decode, use {@link #prefixCodedToLong}. -

-
- - Expert: Returns prefix coded bits after reducing the precision by shift bits. - This is method is used by {@link NumericTokenStream}. - - the numeric value - - how many bits to strip from the right - - that will contain the encoded chars, must be at least of {@link #BUF_SIZE_INT} - length - - number of chars written to buffer - - - - Expert: Returns prefix coded bits after reducing the precision by shift bits. - This is method is used by {@link IntRangeBuilder}. - - the numeric value - - how many bits to strip from the right - - - - This is a convenience method, that returns prefix coded bits of an int without - reducing the precision. It can be used to store the full precision value as a - stored field in index. -

To decode, use {@link #prefixCodedToInt}. -

-
- - Returns a long from prefixCoded characters. - Rightmost bits will be zero for lower precision codes. - This method can be used to decode e.g. a stored field. - - NumberFormatException if the supplied string is - not correctly prefix encoded. - - + + Default for {@link #IsGreedy()}. + - - Returns an int from prefixCoded characters. - Rightmost bits will be zero for lower precision codes. - This method can be used to decode e.g. a stored field. - - NumberFormatException if the supplied string is - not correctly prefix encoded. - - - - - - Converts a double value to a sortable signed long. - The value is converted by getting their IEEE 754 floating-point "double format" - bit layout and then some bits are swapped, to be able to compare the result as long. - By this the precision is not reduced, but the value can easily used as a long. - - - - - - Convenience method: this just returns: - longToPrefixCoded(doubleToSortableLong(val)) - - - - Converts a sortable long back to a double. - - - - - Convenience method: this just returns: - sortableLongToDouble(prefixCodedToLong(val)) - - - - Converts a float value to a sortable signed int. - The value is converted by getting their IEEE 754 floating-point "float format" - bit layout and then some bits are swapped, to be able to compare the result as int. - By this the precision is not reduced, but the value can easily used as an int. - - - - - - Convenience method: this just returns: - intToPrefixCoded(floatToSortableInt(val)) - - - - Converts a sortable int back to a float. - - - - - Convenience method: this just returns: - sortableIntToFloat(prefixCodedToInt(val)) - - - - Expert: Splits a long range recursively. - You may implement a builder that adds clauses to a - {@link Lucene.Net.Search.BooleanQuery} for each call to its - {@link LongRangeBuilder#AddRange(String,String)} - method. -

This method is used by {@link NumericRangeQuery}. -

-
- - Expert: Splits an int range recursively. - You may implement a builder that adds clauses to a - {@link Lucene.Net.Search.BooleanQuery} for each call to its - {@link IntRangeBuilder#AddRange(String,String)} - method. -

This method is used by {@link NumericRangeQuery}. -

-
- - This helper does the splitting for both 32 and 64 bit. - - - Helper that delegates to correct range builder - - - Expert: Callback for {@link #splitLongRange}. - You need to overwrite only one of the methods. -

NOTE: This is a very low-level interface, - the method signatures may change in later versions. -

-
- - Overwrite this method, if you like to receive the already prefix encoded range bounds. - You can directly build classical (inclusive) range queries from them. - - - - Overwrite this method, if you like to receive the raw long range bounds. - You can use this for e.g. debugging purposes (print out range bounds). - - - - Expert: Callback for {@link #splitIntRange}. - You need to overwrite only one of the methods. -

NOTE: This is a very low-level interface, - the method signatures may change in later versions. -

-
- - Overwrite this method, if you like to receive the already prefix encoded range bounds. - You can directly build classical range (inclusive) queries from them. - - - - Overwrite this method, if you like to receive the raw int range bounds. - You can use this for e.g. debugging purposes (print out range bounds). - - - - An average, best guess, MemoryModel that should work okay on most systems. - - - - - A Scorer for queries with a required subscorer - and an excluding (prohibited) sub DocIdSetIterator. -
- This Scorer implements {@link Scorer#SkipTo(int)}, - and it uses the skipTo() on the given scorers. -
-
- - Construct a ReqExclScorer. - The scorer that must match, except where + + Create a TimeLimitedCollector wrapper over another {@link Collector} with a specified timeout. + the wrapped {@link Collector} - indicates exclusion. + max time allowed for collecting hits after which {@link TimeExceededException} is thrown - - use {@link #NextDoc()} instead. - + + Return the timer resolution. + + - - Advance to non excluded doc. -
On entry: + + Set the timer resolution. + The default timer resolution is 20 milliseconds. + This means that a search required to take no longer than + 800 milliseconds may be stopped after 780 to 820 milliseconds. +
Note that:
    -
  • reqScorer != null,
  • -
  • exclScorer != null,
  • -
  • reqScorer was advanced once via next() or skipTo() - and reqScorer.doc() may still be excluded.
  • -
- Advances reqScorer a non excluded required doc, if any. +
  • Finer (smaller) resolution is more accurate but less efficient.
  • +
  • Setting resolution to less than 5 milliseconds will be silently modified to 5 milliseconds.
  • +
  • Setting resolution smaller than current resolution might take effect only after current + resolution. (Assume current resolution of 20 milliseconds is modified to 5 milliseconds, + then it can take up to 20 milliseconds for the change to have effect.
  • +
    - true iff there is a non excluded required doc. -
    - - use {@link #DocID()} instead. - - - - Returns the score of the current document matching the query. - Initially invalid, until {@link #Next()} is called the first time. + + Checks if this time limited collector is greedy in collecting the last hit. + A non greedy collector, upon a timeout, would throw a {@link TimeExceededException} + without allowing the wrapped collector to collect current doc. A greedy one would + first allow the wrapped hit collector to collect current doc and only then + throw a {@link TimeExceededException}. - The score of the required scorer. - + + - - use {@link #Advance(int)} instead. - + + Sets whether this time limited collector is greedy. + true to make this time limited greedy + + + - - Wrapper for ({@link HitCollector}) implementations, which simply re-bases the - incoming docID before calling {@link HitCollector#collect}. + + Calls {@link Collector#Collect(int)} on the decorated {@link Collector} + unless the allowed time has passed, in which case it throws an exception. - Please migrate custom HitCollectors to the new {@link Collector} - class. This class will be removed when {@link HitCollector} is - removed. + TimeExceededException + if the time allowed has exceeded. + + + + TimerThread provides a pseudo-clock service to all searching + threads, so that they can count elapsed time with less overhead + than repeatedly calling System.currentTimeMillis. A single + thread should be created to be used for all searches. + + + + Get the timer value in milliseconds. + + + Thrown when elapsed search time exceeds allowed search time. + + + Returns allowed time (milliseconds). + + + Returns elapsed time (milliseconds). + + + Returns last doc that was collected when the search time exceeded. + + + Removes matches which overlap with another SpanQuery. + + + Construct a SpanNotQuery matching spans from include which + have no overlap with spans from exclude. + + + + Return the SpanQuery whose matches are filtered. + + + Return the SpanQuery whose matches must not overlap those returned. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + Returns true iff o is equal to this. + + + Expert: Delegating scoring implementation. Useful in {@link + Query#GetSimilarity(Searcher)} implementations, to override only certain + methods of a Searcher's Similiarty implementation.. + + + + Construct a {@link Similarity} that delegates all methods to another. + + + the Similarity implementation to delegate to + Expert: a FieldComparator compares hits so as to determine their @@ -18889,3057 +18434,6 @@ very fast for very small results sets. - - This stores a monotonically increasing set of <Term, TermInfo> pairs in a - Directory. A TermInfos can be written once, in order. - - - - The file format version, a negative number. - - - Expert: The fraction of terms in the "dictionary" which should be stored - in RAM. Smaller values use more memory, but make searching slightly - faster, while larger values use less memory and make searching slightly - slower. Searching is typically not dominated by dictionary lookup, so - tweaking this is rarely useful. - - - - Expert: The fraction of {@link TermDocs} entries stored in skip tables, - used to accellerate {@link TermDocs#SkipTo(int)}. Larger values result in - smaller indexes, greater acceleration, but fewer accelerable cases, while - smaller values result in bigger indexes, less acceleration and more - accelerable cases. More detailed experiments would be useful here. - - - - Expert: The maximum number of skip levels. Smaller values result in - slightly smaller indexes, but slower skipping in big posting lists. - - - - Adds a new <fieldNumber, termBytes>, TermInfo> pair to the set. - Term must be lexicographically greater than all previous Terms added. - TermInfo pointers must be positive and greater than all previous. - - - - Called to complete TermInfos creation. - - - This stores a monotonically increasing set of <Term, TermInfo> pairs in a - Directory. Pairs are accessed either by Term or by ordinal position the - set. - - - - Returns the number of term/value pairs in the set. - - - Returns the offset of the greatest index entry which is less than or equal to term. - - - Returns the TermInfo for a Term in the set, or null. - - - Returns the TermInfo for a Term in the set, or null. - - - Returns the position of a Term in the set or -1. - - - Returns an enumeration of all the Terms and TermInfos in the set. - - - Returns an enumeration of terms starting at or after the named term. - - - Per-thread resources managed by ThreadLocal - - - Store a sorted collection of {@link Lucene.Net.Index.TermVectorEntry}s. Collects all term information - into a single, SortedSet. -
    - NOTE: This Mapper ignores all Field information for the Document. This means that if you are using offset/positions you will not - know what Fields they correlate with. -
    - This is not thread-safe -
    -
    - - Stand-in name for the field in {@link TermVectorEntry}. - - - - A Comparator for sorting {@link TermVectorEntry}s - - - - - The term to map - - The frequency of the term - - Offset information, may be null - - Position information, may be null - - - - The TermVectorEntrySet. A SortedSet of {@link TermVectorEntry} objects. Sort is by the comparator passed into the constructor. -
    - This set will be empty until after the mapping process takes place. - -
    - The SortedSet of {@link TermVectorEntry}. - -
    - - Increments the enumeration to the next element. True if one exists. - - - Optimized scan, without allocating new terms. - Return number of invocations to next(). - - - - Returns the current Term in the enumeration. - Initially invalid, valid after next() called for the first time. - - - - Returns the previous Term enumerated. Initially null. - - - Returns the current TermInfo in the enumeration. - Initially invalid, valid after next() called for the first time. - - - - Sets the argument to the current TermInfo in the enumeration. - Initially invalid, valid after next() called for the first time. - - - - Returns the docFreq from the current TermInfo in the enumeration. - Initially invalid, valid after next() called for the first time. - - - - Closes the enumeration to further activity, freeing resources. - - - An IndexReader which reads multiple, parallel indexes. Each index added - must have the same number of documents, but typically each contains - different fields. Each document contains the union of the fields of all - documents with the same document number. When searching, matches for a - query term are from the first index added that has the field. - -

    This is useful, e.g., with collections that have large fields which - change rarely and small fields that change more frequently. The smaller - fields may be re-indexed in a new index and both indexes may be searched - together. - -

    Warning: It is up to you to make sure all indexes - are created and modified the same way. For example, if you add - documents to one index, you need to add the same documents in the - same order to the other indexes. Failure to do so will result in - undefined behavior. -

    -
    - - Construct a ParallelReader. -

    Note that all subreaders are closed if this ParallelReader is closed.

    -

    -
    - - Construct a ParallelReader. - indicates whether the subreaders should be closed - when this ParallelReader is closed - - - - Add an IndexReader. - IOException if there is a low-level IO error - - - Add an IndexReader whose stored fields will not be returned. This can - accellerate search when stored fields are only needed from a subset of - the IndexReaders. - - - IllegalArgumentException if not all indexes contain the same number - of documents - - IllegalArgumentException if not all indexes have the same value - of {@link IndexReader#MaxDoc()} - - IOException if there is a low-level IO error - - - Tries to reopen the subreaders. -
    - If one or more subreaders could be re-opened (i. e. subReader.reopen() - returned a new instance != subReader), then a new ParallelReader instance - is returned, otherwise this instance is returned. -

    - A re-opened instance might share one or more subreaders with the old - instance. Index modification operations result in undefined behavior - when performed before the old instance is closed. - (see {@link IndexReader#Reopen()}). -

    - If subreaders are shared, then the reference count of those - readers is increased to ensure that the subreaders remain open - until the last referring reader is closed. - -

    - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
    - - Checks recursively if all subreaders are up to date. - - - Checks recursively if all subindexes are optimized - - - Not implemented. - UnsupportedOperationException - - - - - - - Remaps docIDs after a merge has completed, where the - merged segments had at least one deletion. This is used - to renumber the buffered deletes in IndexWriter when a - merge of segments with deletions commits. - - - -

    This class implements a {@link MergePolicy} that tries - to merge segments into levels of exponentially - increasing size, where each level has fewer segments than - the value of the merge factor. Whenever extra segments - (beyond the merge factor upper bound) are encountered, - all segments within the level are merged. You can get or - set the merge factor using {@link #GetMergeFactor()} and - {@link #SetMergeFactor(int)} respectively.

    - -

    This class is abstract and requires a subclass to - define the {@link #size} method which specifies how a - segment's size is determined. {@link LogDocMergePolicy} - is one subclass that measures size by document count in - the segment. {@link LogByteSizeMergePolicy} is another - subclass that measures size as the total byte size of the - file(s) for the segment.

    -

    -
    - -

    Expert: a MergePolicy determines the sequence of - primitive merge operations to be used for overall merge - and optimize operations.

    - -

    Whenever the segments in an index have been altered by - {@link IndexWriter}, either the addition of a newly - flushed segment, addition of many segments from - addIndexes* calls, or a previous merge that may now need - to cascade, {@link IndexWriter} invokes {@link - #findMerges} to give the MergePolicy a chance to pick - merges that are now required. This method returns a - {@link MergeSpecification} instance describing the set of - merges that should be done, or null if no merges are - necessary. When IndexWriter.optimize is called, it calls - {@link #findMergesForOptimize} and the MergePolicy should - then return the necessary merges.

    - -

    Note that the policy can return more than one merge at - a time. In this case, if the writer is using {@link - SerialMergeScheduler}, the merges will be run - sequentially but if it is using {@link - ConcurrentMergeScheduler} they will be run concurrently.

    - -

    The default MergePolicy is {@link - LogByteSizeMergePolicy}.

    - -

    NOTE: This API is new and still experimental - (subject to change suddenly in the next release)

    - -

    NOTE: This class typically requires access to - package-private APIs (e.g. SegmentInfos) to do its job; - if you implement your own MergePolicy, you'll need to put - it in package Lucene.Net.Index in order to use - these APIs. -

    -
    - - Determine what set of merge operations are now necessary on the index. - {@link IndexWriter} calls this whenever there is a change to the segments. - This call is always synchronized on the {@link IndexWriter} instance so - only one thread at a time will call this method. - - - the total set of segments in the index - - - - Determine what set of merge operations is necessary in order to optimize - the index. {@link IndexWriter} calls this when its - {@link IndexWriter#Optimize()} method is called. This call is always - synchronized on the {@link IndexWriter} instance so only one thread at a - time will call this method. - - - the total set of segments in the index - - requested maximum number of segments in the index (currently this - is always 1) - - contains the specific SegmentInfo instances that must be merged - away. This may be a subset of all SegmentInfos. - - - - Determine what set of merge operations is necessary in order to expunge all - deletes from the index. - - - the total set of segments in the index - - - - Release all resources for the policy. - - - Returns true if a newly flushed (not from merge) - segment should use the compound file format. - - - - Returns true if the doc store files should use the - compound file format. - - - - OneMerge provides the information necessary to perform - an individual primitive merge operation, resulting in - a single new segment. The merge spec includes the - subset of segments to be merged as well as whether the - new segment should use the compound file format. - - - - Record that an exception occurred while executing - this merge - - - - Retrieve previous exception set by {@link - #setException}. - - - - Mark this merge as aborted. If this is called - before the merge is committed then the merge will - not be committed. - - - - Returns true if this merge was aborted. - - - A MergeSpecification instance provides the information - necessary to perform multiple merges. It simply - contains a list of {@link OneMerge} instances. - - - - The subset of segments to be included in the primitive merge. - - - Exception thrown if there are any problems while - executing a merge. - - - - - Use {@link #MergePolicy.MergeException(String,Directory)} instead - - - - - Use {@link #MergePolicy.MergeException(Throwable,Directory)} instead - - - - Returns the {@link Directory} of the index that hit - the exception. - - - - Defines the allowed range of log(size) for each - level. A level is computed by taking the max segment - log size, minus LEVEL_LOG_SPAN, and finding all - segments falling within that range. - - - - Default merge factor, which is how many segments are - merged at a time - - - - Default maximum segment size. A segment of this size - - - - -

    Returns the number of segments that are merged at - once and also controls the total number of segments - allowed to accumulate in the index.

    -

    -
    - - Determines how often segment indices are merged by - addDocument(). With smaller values, less RAM is used - while indexing, and searches on unoptimized indices are - faster, but indexing speed is slower. With larger - values, more RAM is used during indexing, and while - searches on unoptimized indices are slower, indexing is - faster. Thus larger values (> 10) are best for batch - index creation, and smaller values (< 10) for indices - that are interactively maintained. - - - - Sets whether compound file format should be used for - newly flushed and newly merged segments. - - - - Returns true if newly flushed and newly merge segments - - - - - Sets whether compound file format should be used for - newly flushed and newly merged doc store - segment files (term vectors and stored fields). - - - - Returns true if newly flushed and newly merge doc - store segment files (term vectors and stored fields) - - - - - - Sets whether the segment size should be calibrated by - the number of deletes when choosing segments for merge. - - - - Returns true if the segment size should be calibrated - by the number of deletes when choosing segments for merge. - - - - Returns true if this single info is optimized (has no - pending norms or deletes, is in the same dir as the - writer, and matches the current compound file setting - - - - Returns the merges necessary to optimize the index. - This merge policy defines "optimized" to mean only one - segment in the index, where that segment has no - deletions pending nor separate norms, and it is in - compound file format if the current useCompoundFile - setting is true. This method returns multiple merges - (mergeFactor at a time) so the {@link MergeScheduler} - in use may make use of concurrency. - - - - Finds merges necessary to expunge all deletes from the - index. We simply merge adjacent segments that have - deletes, up to mergeFactor at a time. - - - - Checks if any merges are now necessary and returns a - {@link MergePolicy.MergeSpecification} if so. A merge - is necessary when there are more than {@link - #setMergeFactor} segments at a given level. When - multiple levels have too many segments, this method - will return multiple merges, allowing the {@link - MergeScheduler} to use concurrency. - - - -

    Determines the largest segment (measured by - document count) that may be merged with other segments. - Small values (e.g., less than 10,000) are best for - interactive indexing, as this limits the length of - pauses while indexing to a few seconds. Larger values - are best for batched indexing and speedier - searches.

    - -

    The default value is {@link Integer#MAX_VALUE}.

    - -

    The default merge policy ({@link - LogByteSizeMergePolicy}) also allows you to set this - limit by net size (in MB) of the segment, using {@link - LogByteSizeMergePolicy#setMaxMergeMB}.

    -

    -
    - - Returns the largest segment (measured by document - count) that may be merged with other segments. - - - - - - Used by DocumentsWriter to maintain per-thread state. - We keep a separate Posting hash and other state for each - thread and then merge postings hashes from all threads - when writing the segment. - - - - The positionIncrement determines the position of this token - relative to the previous Token in a {@link TokenStream}, used in phrase - searching. - -

    The default value is one. - -

    Some common uses for this are:

      - -
    • Set it to zero to put multiple terms in the same position. This is - useful if, e.g., a word has multiple stems. Searches for phrases - including either stem will match. In this case, all but the first stem's - increment should be set to zero: the increment of the first instance - should be one. Repeating a token with an increment of zero can also be - used to boost the scores of matches on that token.
    • - -
    • Set it to values greater than one to inhibit exact phrase matches. - If, for example, one does not want phrases to match across removed stop - words, then one could build a stop word filter that removes stop words and - also sets the increment to the number of stop words removed before each - non-stop word. Then exact phrase queries will only match when the terms - occur with no intervening stop words.
    • - -
    -
    -
    - - The positionIncrement determines the position of this token - relative to the previous Token in a TokenStream, used in phrase - searching. - -

    The default value is one. - -

    Some common uses for this are:

      - -
    • Set it to zero to put multiple terms in the same position. This is - useful if, e.g., a word has multiple stems. Searches for phrases - including either stem will match. In this case, all but the first stem's - increment should be set to zero: the increment of the first instance - should be one. Repeating a token with an increment of zero can also be - used to boost the scores of matches on that token.
    • - -
    • Set it to values greater than one to inhibit exact phrase matches. - If, for example, one does not want phrases to match across removed stop - words, then one could build a stop word filter that removes stop words and - also sets the increment to the number of stop words removed before each - non-stop word. Then exact phrase queries will only match when the terms - occur with no intervening stop words.
    • - -
    - -
    - - -
    - - Set the position increment. The default value is one. - - - the distance from the prior term - - - - Returns the position increment of this Token. - - - - - Set the position increment. The default value is one. - - - the distance from the prior term - - - - Returns the position increment of this Token. - - - - - Class to encode java's UTF16 char[] into UTF8 byte[] - without always allocating a new byte[] as - String.getBytes("UTF-8") does. - -

    WARNING: This API is a new and experimental and - may suddenly change.

    -

    -
    - - Encode characters from a char[] source, starting at - offset and stopping when the character 0xffff is seen. - Returns the number of bytes written to bytesOut. - - - - Encode characters from a char[] source, starting at - offset for length chars. Returns the number of bytes - written to bytesOut. - - - - Encode characters from this String, starting at offset - for length characters. Returns the number of bytes - written to bytesOut. - - - - Convert UTF8 bytes into UTF16 characters. If offset - is non-zero, conversion starts at that starting point - in utf8, re-using the results from the previous call - up until offset. - - - - Expert: A Directory instance that switches files between - two other Directory instances. -

    Files with the specified extensions are placed in the - primary directory; others are placed in the secondary - directory. The provided Set must not change once passed - to this class, and must allow multiple threads to call - contains at once.

    - -

    NOTE: this API is new and experimental and is - subject to suddenly change in the next release. -

    -
    - - Return the primary directory - - - Return the secondary directory - - - Utility method to return a file's extension. - - - Similar to {@link NearSpansOrdered}, but for the unordered case. - - Expert: - Only public for subclassing. Most implementations should not need this class - - - - WARNING: The List is not necessarily in order of the the positions - Collection of byte[] payloads - - IOException - - - Wraps a Spans, and can be used to form a linked list. - - - Constrains search results to only match those which also match a provided - query. - -

    This could be used, for example, with a {@link TermRangeQuery} on a suitably - formatted date field to implement date filtering. One could re-use a single - QueryFilter that matches, e.g., only documents modified within the last - week. The QueryFilter and TermRangeQuery would only need to be reconstructed - once per day. - -

    - $Id:$ - -
    - - Constructs a filter which only matches documents matching - query. - - - - Use {@link #GetDocIdSet(IndexReader)} instead. - - - - Subclass of FilteredTermEnum for enumerating all terms that match the - specified prefix filter term. -

    - Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. - -

    -
    - - MultiPhraseQuery is a generalized version of PhraseQuery, with an added - method {@link #Add(Term[])}. - To use this class, to search for the phrase "Microsoft app*" first use - add(Term) on the term "Microsoft", then find all terms that have "app" as - prefix using IndexReader.terms(Term), and use MultiPhraseQuery.add(Term[] - terms) to add them to the query. - - - 1.0 - - - - Sets the phrase slop for this query. - - - - - Sets the phrase slop for this query. - - - - - Add a single term at the next position in the phrase. - - - - - Add multiple terms at the next position in the phrase. Any of the terms - may match. - - - - - - - Allows to specify the relative position of terms within the phrase. - - - - - - - - - - - Returns a List<Term[]> of the terms in the multiphrase. - Do not modify the List or its contents. - - - - Returns the relative positions of terms in this phrase. - - - Prints a user-readable version of this query. - - - Returns true if o is equal to this. - - - Returns a hash code value for this object. - - - Subclass of FilteredTermEnum for enumerating all terms that are similiar - to the specified filter term. - -

    Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. -

    -
    - - Creates a FuzzyTermEnum with an empty prefix and a minSimilarity of 0.5f. -

    - After calling the constructor the enumeration is already pointing to the first - valid term if such a term exists. - -

    - - - - - IOException - - -
    - - Creates a FuzzyTermEnum with an empty prefix. -

    - After calling the constructor the enumeration is already pointing to the first - valid term if such a term exists. - -

    - - - - - - - IOException - - -
    - - Constructor for enumeration of all terms from specified reader which share a prefix of - length prefixLength with term and which have a fuzzy similarity > - minSimilarity. -

    - After calling the constructor the enumeration is already pointing to the first - valid term if such a term exists. - -

    - Delivers terms. - - Pattern term. - - Minimum required similarity for terms from the reader. Default value is 0.5f. - - Length of required common prefix. Default value is 0. - - IOException -
    - - The termCompare method in FuzzyTermEnum uses Levenshtein distance to - calculate the distance between the given term and the comparing term. - - - - Finds and returns the smallest of three integers - - -

    Similarity returns a number that is 1.0f or less (including negative numbers) - based on how similar the Term is compared to a target term. It returns - exactly 0.0f when -

    -            editDistance < maximumEditDistance
    - Otherwise it returns: -
    -            1 - (editDistance / length)
    - where length is the length of the shortest term (text or target) including a - prefix that are identical and editDistance is the Levenshtein distance for - the two words.

    - -

    Embedded within this algorithm is a fail-fast Levenshtein distance - algorithm. The fail-fast algorithm differs from the standard Levenshtein - distance algorithm in that it is aborted if it is discovered that the - mimimum distance between the words is greater than some threshold. - -

    To calculate the maximum distance threshold we use the following formula: -

    -            (1 - minimumSimilarity) * length
    - where length is the shortest term including any prefix that is not part of the - similarity comparision. This formula was derived by solving for what maximum value - of distance returns false for the following statements: -
    -            similarity = 1 - ((float)distance / (float) (prefixLength + Math.min(textlen, targetlen)));
    -            return (similarity > minimumSimilarity);
    - where distance is the Levenshtein distance for the two words. -

    -

    Levenshtein distance (also known as edit distance) is a measure of similiarity - between two strings where the distance is measured as the number of character - deletions, insertions or substitutions required to transform one string to - the other string. -

    - the target word or phrase - - the similarity, 0.0 or less indicates that it matches less than the required - threshold and 1.0 indicates that the text and target are identical - -
    - - Grow the second dimension of the array, so that we can calculate the - Levenshtein difference. - - - - The max Distance is the maximum Levenshtein distance for the text - compared to some other value that results in score that is - better than the minimum similarity. - - the length of the "other value" - - the maximum levenshtein distance that we care about - - - - A query that applies a filter to the results of another query. - -

    Note: the bits are retrieved from the filter each time this - query is used in a search - use a CachingWrapperFilter to avoid - regenerating the bits every time. - -

    Created: Apr 20, 2004 8:58:29 AM - -

    - 1.4 - - $Id: FilteredQuery.java 807821 2009-08-25 21:55:49Z mikemccand $ - - - -
    - - Constructs a new query which applies a filter to the results of the original query. - Filter.getDocIdSet() will be called every time this query is used in a search. - - Query to be filtered, cannot be null. - - Filter to apply to query results, cannot be null. - - - - Returns a Weight that applies the filter to the enclosed query's Weight. - This is accomplished by overriding the Scorer returned by the Weight. - - - - Rewrites the wrapped query. - - - Prints a user-readable version of this query. - - - Returns true iff o is equal to this. - - - Returns a hash code value for this object. - - - use {@link #NextDoc()} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #Advance(int)} instead. - - - - A Scorer for OR like queries, counterpart of ConjunctionScorer. - This Scorer implements {@link Scorer#SkipTo(int)} and uses skipTo() on the given Scorers. - TODO: Implement score(HitCollector, int). - - - - The number of subscorers. - - - The subscorers. - - - The minimum number of scorers that should match. - - - The scorerDocQueue contains all subscorers ordered by their current doc(), - with the minimum at the top. -
    The scorerDocQueue is initialized the first time next() or skipTo() is called. -
    An exhausted scorer is immediately removed from the scorerDocQueue. -
    If less than the minimumNrMatchers scorers - remain in the scorerDocQueue next() and skipTo() return false. -

    - After each to call to next() or skipTo() - currentSumScore is the total score of the current matching doc, - nrMatchers is the number of matching scorers, - and all scorers are after the matching doc, or are exhausted. -

    -
    - - The document number of the current match. - - - The number of subscorers that provide the current match. - - - Construct a DisjunctionScorer. - A collection of at least two subscorers. - - The positive minimum number of subscorers that should - match to match this query. -
    When minimumNrMatchers is bigger than - the number of subScorers, - no matches will be produced. -
    When minimumNrMatchers equals the number of subScorers, - it more efficient to use ConjunctionScorer. - -
    - - Construct a DisjunctionScorer, using one as the minimum number - of matching subscorers. - - - - Called the first time next() or skipTo() is called to - initialize scorerDocQueue. - - - - Scores and collects all matching documents. - The collector to which all matching documents are passed through - {@link HitCollector#Collect(int, float)}. -
    When this method is used the {@link #Explain(int)} method should not be used. - - use {@link #Score(Collector)} instead. - -
    - - Scores and collects all matching documents. - The collector to which all matching documents are passed through. -
    When this method is used the {@link #Explain(int)} method should not be used. - -
    - - Expert: Collects matching documents in a range. Hook for optimization. - Note that {@link #Next()} must be called once before this method is called - for the first time. - - The collector to which all matching documents are passed through - {@link HitCollector#Collect(int, float)}. - - Do not score documents past this. - - true if more matching documents may remain. - - use {@link #Score(Collector, int, int)} instead. - - - - Expert: Collects matching documents in a range. Hook for optimization. - Note that {@link #Next()} must be called once before this method is called - for the first time. - - The collector to which all matching documents are passed through. - - Do not score documents past this. - - true if more matching documents may remain. - - - - use {@link #NextDoc()} instead. - - - - Advance all subscorers after the current document determined by the - top of the scorerDocQueue. - Repeat until at least the minimum number of subscorers match on the same - document and all subscorers are after that document or are exhausted. -
    On entry the scorerDocQueue has at least minimumNrMatchers - available. At least the scorer with the minimum document number will be advanced. -
    - true iff there is a match. -
    In case there is a match, currentDoc, currentSumScore, - and nrMatchers describe the match. - - TODO: Investigate whether it is possible to use skipTo() when - the minimum number of matchers is bigger than one, ie. try and use the - character of ConjunctionScorer for the minimum number of matchers. - Also delay calling score() on the sub scorers until the minimum number of - matchers is reached. -
    For this, a Scorer array with minimumNrMatchers elements might - hold Scorers at currentDoc that are temporarily popped from scorerQueue. -
    -
    - - Returns the score of the current document matching the query. - Initially invalid, until {@link #Next()} is called the first time. - - - - use {@link #DocID()} instead. - - - - Returns the number of subscorers matching the current document. - Initially invalid, until {@link #Next()} is called the first time. - - - - Skips to the first match beyond the current whose document number is - greater than or equal to a given target.
    - When this method is used the {@link #Explain(int)} method should not be - used.
    - The implementation uses the skipTo() method on the subscorers. - -
    - The target document number. - - true iff there is such a match. - - use {@link #Advance(int)} instead. - -
    - - Advances to the first match beyond the current whose document number is - greater than or equal to a given target.
    - When this method is used the {@link #Explain(int)} method should not be - used.
    - The implementation uses the skipTo() method on the subscorers. - -
    - The target document number. - - the document whose number is greater than or equal to the given - target, or -1 if none exist. - -
    - - An explanation for the score of a given document. - - - - The Scorer for DisjunctionMaxQuery's. The union of all documents generated by the the subquery scorers - is generated in document number order. The score for each document is the maximum of the scores computed - by the subquery scorers that generate that document, plus tieBreakerMultiplier times the sum of the scores - for the other subqueries that generate the document. - - - - Creates a new instance of DisjunctionMaxScorer - - - Multiplier applied to non-maximum-scoring subqueries for a - document as they are summed into the result. - - -- not used since our definition involves neither coord nor terms - directly - - The sub scorers this Scorer should iterate on - - The actual number of scorers to iterate on. Note that the array's - length may be larger than the actual number of scorers. - - - - Generate the next document matching our associated DisjunctionMaxQuery. - - - true iff there is a next document - - use {@link #NextDoc()} instead. - - - - use {@link #DocID()} instead. - - - - Determine the current document score. Initially invalid, until {@link #Next()} is called the first time. - the score of the current generated document - - - - Advance to the first document beyond the current whose number is greater - than or equal to target. - - - the minimum number of the next desired document - - true iff there is a document to be generated whose number is at - least target - - use {@link #Advance(int)} instead. - - - - Explain a score that we computed. UNSUPPORTED -- see explanation capability in DisjunctionMaxQuery. - the number of a document we scored - - the Explanation for our score - - - - Wraps another SpanFilter's result and caches it. The purpose is to allow - filters to simply filter, and then wrap with this class to add caching. - - - - A transient Filter cache. - - - Filter to cache results of - - - - Use {@link #GetDocIdSet(IndexReader)} instead. - - - - - The number of the field this vector is associated with - - - - Taps into DocInverter, as an InvertedDocEndConsumer, - which is called at the end of inverting each field. We - just look at the length for the field (docState.length) - and record the norm. - - - - This is a {@link LogMergePolicy} that measures size of a - segment as the total byte size of the segment's files. - - - - - - - - Default maximum segment size. A segment of this size - - - - -

    Determines the largest segment (measured by total - byte size of the segment's files, in MB) that may be - merged with other segments. Small values (e.g., less - than 50 MB) are best for interactive indexing, as this - limits the length of pauses while indexing to a few - seconds. Larger values are best for batched indexing - and speedier searches.

    - -

    Note that {@link #setMaxMergeDocs} is also - used to check whether a segment is too large for - merging (it's either or).

    -

    -
    - - Returns the largest segment (meaured by total byte - size of the segment's files, in MB) that may be merged - with other segments. - - - - - - Sets the minimum size for the lowest level segments. - Any segments below this size are considered to be on - the same level (even if they vary drastically in size) - and will be merged whenever there are mergeFactor of - them. This effectively truncates the "long tail" of - small segments that would otherwise be created into a - single level. If you set this too large, it could - greatly increase the merging cost during indexing (if - you flush many small segments). - - - - Get the minimum size for a segment to remain - un-merged. - - - - - - Combines multiple files into a single compound file. - The file format:
    -
      -
    • VInt fileCount
    • -
    • {Directory} - fileCount entries with the following structure:
    • -
        -
      • long dataOffset
      • -
      • String fileName
      • -
      -
    • {File Data} - fileCount entries with the raw data of the corresponding file
    • -
    - - The fileCount integer indicates how many files are contained in this compound - file. The {directory} that follows has that many entries. Each directory entry - contains a long pointer to the start of this file's data section, and a String - with that file's name. - - -
    - $Id: CompoundFileWriter.java 690539 2008-08-30 17:33:06Z mikemccand $ - -
    - - Create the compound stream in the specified file. The file name is the - entire name (no extensions are added). - - NullPointerException if dir or name is null - - - Returns the directory of the compound file. - - - Returns the name of the compound file. - - - Add a source stream. file is the string by which the - sub-stream will be known in the compound stream. - - - IllegalStateException if this writer is closed - NullPointerException if file is null - IllegalArgumentException if a file with the same name - has been added already - - - - Merge files with the extensions added up to now. - All files with these extensions are combined sequentially into the - compound stream. After successful merge, the source files - are deleted. - - IllegalStateException if close() had been called before or - if no file has been added to this object - - - - Copy the contents of the file with specified extension into the - provided output stream. Use the provided buffer for moving data - to reduce memory allocation. - - - - source file - - - temporary holder for the start of directory entry for this file - - - temporary holder for the start of this file's data section - - - Class for accessing a compound stream. - This class implements a directory, but is limited to only read operations. - Directory methods that would normally modify data throw an exception. - - - - $Id: CompoundFileReader.java 673371 2008-07-02 11:57:27Z mikemccand $ - - - - Returns an array of strings, one for each file in the directory. - - - Returns true iff a file with the given name exists. - - - Returns the time the compound file was last modified. - - - Set the modified time of the compound file to now. - - - Not implemented - UnsupportedOperationException - - - Not implemented - UnsupportedOperationException - - - Returns the length of a file in the directory. - IOException if the file does not exist - - - Not implemented - UnsupportedOperationException - - - Not implemented - UnsupportedOperationException - - - Implementation of an IndexInput that reads from a portion of the - compound file. The visibility is left as "package" *only* because - this helps with testing since JUnit test cases in a different class - can then access package fields of this class. - - - - Expert: implements buffer refill. Reads bytes from the current - position in the input. - - the array to read bytes into - - the offset in the array to start storing bytes - - the number of bytes to read - - - - Expert: implements seek. Sets current position in this file, where - the next {@link #ReadInternal(byte[],int,int)} will occur. - - - - - - Closes the stream to further operations. - - - Simple utility class providing static methods to - compress and decompress binary data for stored fields. - This class uses java.util.zip.Deflater and Inflater - classes to compress and decompress, which is the same - format previously used by the now deprecated - Field.Store.COMPRESS. - - - - Compresses the specified byte range using the - specified compressionLevel (constants are defined in - java.util.zip.Deflater). - - - - Compresses the specified byte range, with default BEST_COMPRESSION level - - - Compresses all bytes in the array, with default BEST_COMPRESSION level - - - Compresses the String value, with default BEST_COMPRESSION level - - - Compresses the String value using the specified - compressionLevel (constants are defined in - java.util.zip.Deflater). - - - - Decompress the byte array previously returned by - compress - - - - Decompress the byte array previously returned by - compressString back into a String - - - - Methods for manipulating strings. - - $Id: StringHelper.java 801344 2009-08-05 18:05:06Z yonik $ - - - - Expert: - The StringInterner implementation used by Lucene. - This shouldn't be changed to an incompatible implementation after other Lucene APIs have been used. - - - - Return the same string object for all equal strings - - - Compares two byte[] arrays, element by element, and returns the - number of elements common to both arrays. - - - The first byte[] to compare - - The second byte[] to compare - - The number of common elements. - - - - Compares two strings, character by character, and returns the - first position where the two strings differ from one another. - - - The first string to compare - - The second string to compare - - The first position where the two strings differ. - - - - A variety of high efficiencly bit twiddling routines. - - - $Id$ - - - - Returns the number of bits set in the long - - - Returns the number of set bits in an array of longs. - - - Returns the popcount or cardinality of the two sets after an intersection. - Neither array is modified. - - - - Returns the popcount or cardinality of the union of two sets. - Neither array is modified. - - - - Returns the popcount or cardinality of A & ~B - Neither array is modified. - - - - table of number of trailing zeros in a byte - - - Returns number of trailing zeros in a 64 bit long value. - - - Returns number of trailing zeros in a 32 bit int value. - - - returns 0 based index of first set bit - (only works for x!=0) -
    This is an alternate implementation of ntz() -
    -
    - - returns 0 based index of first set bit -
    This is an alternate implementation of ntz() -
    -
    - - returns true if v is a power of two or zero - - - returns true if v is a power of two or zero - - - returns the next highest power of two, or the current value if it's already a power of two or zero - - - returns the next highest power of two, or the current value if it's already a power of two or zero - - - Methods for manipulating arrays. - - - Parses the string argument as if it was an int value and returns the - result. Throws NumberFormatException if the string does not represent an - int quantity. - - - a string representation of an int quantity. - - int the value represented by the argument - - NumberFormatException if the argument could not be parsed as an int quantity. - - - Parses a char array into an int. - the character array - - The offset into the array - - The length - - the int - - NumberFormatException if it can't parse - - - Parses the string argument as if it was an int value and returns the - result. Throws NumberFormatException if the string does not represent an - int quantity. The second argument specifies the radix to use when parsing - the value. - - - a string representation of an int quantity. - - the base to use for conversion. - - int the value represented by the argument - - NumberFormatException if the argument could not be parsed as an int quantity. - - - Returns hash of chars in range start (inclusive) to - end (inclusive) - - - - Returns hash of chars in range start (inclusive) to - end (inclusive) - - - - A Spans that is formed from the ordered subspans of a SpanNearQuery - where the subspans do not overlap and have a maximum slop between them. -

    - The formed spans only contains minimum slop matches.
    - The matching slop is computed from the distance(s) between - the non overlapping matching Spans.
    - Successive matches are always formed from the successive Spans - of the SpanNearQuery. -

    - The formed spans may contain overlaps when the slop is at least 1. - For example, when querying using -

    t1 t2 t3
    - with slop at least 1, the fragment: -
    t1 t2 t1 t3 t2 t3
    - matches twice: -
    t1 t2 .. t3      
    -
          t1 .. t2 t3
    - - - Expert: - Only public for subclassing. Most implementations should not need this class -
    -
    - - The spans in the same order as the SpanNearQuery - - - Indicates that all subSpans have same doc() - - - Advances the subSpans to just after an ordered match with a minimum slop - that is smaller than the slop allowed by the SpanNearQuery. - - true iff there is such a match. - - - - Advance the subSpans to the same document - - - Check whether two Spans in the same document are ordered. - - - - - true iff spans1 starts before spans2 - or the spans start at the same position, - and spans1 ends before spans2. - - - - Like {@link #DocSpansOrdered(Spans,Spans)}, but use the spans - starts and ends as parameters. - - - - Order the subSpans within the same document by advancing all later spans - after the previous one. - - - - The subSpans are ordered in the same doc, so there is a possible match. - Compute the slop while making the match as short as possible by advancing - all subSpans except the last one in reverse order. - - - - Score a candidate doc for all slop-valid position-combinations (matches) - encountered while traversing/hopping the PhrasePositions. -
    The score contribution of a match depends on the distance: -
    - highest score for distance=0 (exact match). -
    - score gets lower as distance gets higher. -
    Example: for query "a b"~2, a document "x a b a y" can be scored twice: - once for "a b" (distance=0), and once for "b a" (distance=2). -
    Possibly not all valid combinations are encountered, because for efficiency - we always propagate the least PhrasePosition. This allows to base on - PriorityQueue and move forward faster. - As result, for example, document "a b c b a" - would score differently for queries "a b c"~4 and "c b a"~4, although - they really are equivalent. - Similarly, for doc "a b c b a f g", query "c b"~2 - would get same score as "g f"~2, although "c b"~2 could be matched twice. - We may want to fix this in the future (currently not, for performance reasons). -
    -
    - - Init PhrasePositions in place. - There is a one time initialization for this scorer: -
    - Put in repeats[] each pp that has another pp with same position in the doc. -
    - Also mark each such pp by pp.repeats = true. -
    Later can consult with repeats[] in termPositionsDiffer(pp), making that check efficient. - In particular, this allows to score queries with no repetitions with no overhead due to this computation. -
    - Example 1 - query with no repetitions: "ho my"~2 -
    - Example 2 - query with repetitions: "ho my my"~2 -
    - Example 3 - query with repetitions: "my ho my"~2 -
    Init per doc w/repeats in query, includes propagating some repeating pp's to avoid false phrase detection. -
    - end (max position), or -1 if any term ran out (i.e. done) - - IOException -
    - - We disallow two pp's to have the same TermPosition, thereby verifying multiple occurrences - in the query of the same word would go elsewhere in the matched doc. - - null if differ (i.e. valid) otherwise return the higher offset PhrasePositions - out of the first two PPs found to not differ. - - - - A Query that matches documents within an exclusive range of terms. - -

    This query matches the documents looking for terms that fall into the - supplied range according to {@link Term#CompareTo(Term)}. It is not intended - for numerical ranges, use {@link NumericRangeQuery} instead. - -

    This query uses {@linkplain - MultiTermQuery#SCORING_BOOLEAN_QUERY_REWRITE}. If you - want to change this, use the new {@link TermRangeQuery} - instead. - -

    - Use {@link TermRangeQuery} for term ranges or - {@link NumericRangeQuery} for numeric ranges instead. - This class will be removed in Lucene 3.0. - -
    - - Constructs a query selecting all terms greater than - lowerTerm but less than upperTerm. - There must be at least one term and either term may be null, - in which case there is no bound on that side, but if there are - two terms, both terms must be for the same field. - - - The Term at the lower end of the range - - The Term at the upper end of the range - - If true, both lowerTerm and - upperTerm will themselves be included in the range. - - - - Constructs a query selecting all terms greater than - lowerTerm but less than upperTerm. - There must be at least one term and either term may be null, - in which case there is no bound on that side, but if there are - two terms, both terms must be for the same field. -

    - If collator is not null, it will be used to decide whether - index terms are within the given range, rather than using the Unicode code - point order in which index terms are stored. -

    - WARNING: Using this constructor and supplying a non-null - value in the collator parameter will cause every single - index Term in the Field referenced by lowerTerm and/or upperTerm to be - examined. Depending on the number of index Terms in this Field, the - operation could be very slow. - -

    - The Term at the lower end of the range - - The Term at the upper end of the range - - If true, both lowerTerm and - upperTerm will themselves be included in the range. - - The collator to use to collate index Terms, to determine - their membership in the range bounded by lowerTerm and - upperTerm. - -
    - - Returns the field name for this query - - - Returns the lower term of this range query. - - - Returns the upper term of this range query. - - - Returns true if the range query is inclusive - - - Returns the collator used to determine range inclusion, if any. - - - Prints a user-readable version of this query. - - - Returns true iff o is equal to this. - - - Returns a hash code value for this object. - - - Implements search over a set of Searchables. - -

    Applications usually need only call the inherited {@link #Search(Query)} - or {@link #Search(Query,Filter)} methods. -

    -
    - - Creates a searcher which searches searchers. - - - Return the array of {@link Searchable}s this searches. - - - Returns index of the searcher for document n in the array - used to construct this searcher. - - - - Returns the document number of document n within its - sub-index. - - - - Create weight in multiple index scenario. - - Distributed query processing is done in the following steps: - 1. rewrite query - 2. extract necessary terms - 3. collect dfs for these terms from the Searchables - 4. create query weight using aggregate dfs. - 5. distribute that weight to Searchables - 6. merge results - - Steps 1-4 are done here, 5+6 in the search() methods - - - rewritten queries - - - - Document Frequency cache acting as a Dummy-Searcher. This class is no - full-fledged Searcher, but only supports the methods necessary to - initialize Weights. - - - - Creates a new instance with size elements. If - prePopulate is set to true, the queue will pre-populate itself - with sentinel objects and set its {@link #Size()} to size. In - that case, you should not rely on {@link #Size()} to get the number of - actual elements that were added to the queue, but keep track yourself.
    - NOTE: in case prePopulate is true, you should pop - elements from the queue using the following code example: - -
    -            PriorityQueue pq = new HitQueue(10, true); // pre-populate.
    -            ScoreDoc top = pq.top();
    -            
    -            // Add/Update one element.
    -            top.score = 1.0f;
    -            top.doc = 0;
    -            top = (ScoreDoc) pq.updateTop();
    -            int totalHits = 1;
    -            
    -            // Now pop only the elements that were *truly* inserted.
    -            // First, pop all the sentinel elements (there are pq.size() - totalHits).
    -            for (int i = pq.size() - totalHits; i > 0; i--) pq.pop();
    -            
    -            // Now pop the truly added elements.
    -            ScoreDoc[] results = new ScoreDoc[totalHits];
    -            for (int i = totalHits - 1; i >= 0; i--) {
    -            results[i] = (ScoreDoc) pq.pop();
    -            }
    -            
    - -

    NOTE: This class pre-allocate a full array of - length size. - -

    - the requested size of this queue. - - specifies whether to pre-populate the queue with sentinel values. - - - -
    - - Default implementation of Message interface. - For Native Language Support (NLS), system of software internationalization. - - - - Message Interface for a lazy loading. - For Native Language Support (NLS), system of software internationalization. - - - - Lucene's package information, including version. * - - - Add a complete document specified by all its term vectors. If document has no - term vectors, add value for tvx. - - - - - IOException - - - Do a bulk copy of numDocs documents from reader to our - streams. This is used to expedite merging, if the - field numbers are congruent. - - - - Close all streams. - - - $Id: TermVectorsReader.java 687046 2008-08-19 13:01:11Z mikemccand $ - - - - Retrieve the length (in bytes) of the tvd and tvf - entries for the next numDocs starting with - startDocID. This is used for bulk copying when - merging segments, if the field numbers are - congruent. Once this returns, the tvf & tvd streams - are seeked to the startDocID. - - - - - The number of documents in the reader - - - - Retrieve the term vector for the given document and field - The document number to retrieve the vector for - - The field within the document to retrieve - - The TermFreqVector for the document and field or null if there is no termVector for this field. - - IOException if there is an error reading the term vector files - - - Return all term vectors stored for this document or null if the could not be read in. - - - The document number to retrieve the vector for - - All term frequency vectors - - IOException if there is an error reading the term vector files - - - - The field to read in - - The pointer within the tvf file where we should start reading - - The mapper used to map the TermVector - - IOException - - - Models the existing parallel array structure - - - Construct the vector - The {@link TermFreqVector} based on the mappings. - - - - A TermInfo is the record of information stored for a term. - - - The number of documents which contain the term. - - - A {@link MergeScheduler} that simply does each merge - sequentially, using the current thread. - - - - Just do the merges in sequence. We do this - "synchronized" so that even if the application is using - multiple threads, only one merge may run at a time. - - - - Used by DocumentsWriter to implemented a StringReader - that can be reset to a new string; we use this when - tokenizing the string value from a Field. - - - - This abstract class reads skip lists with multiple levels. - - See {@link MultiLevelSkipListWriter} for the information about the encoding - of the multi level skip lists. - - Subclasses must implement the abstract method {@link #ReadSkipData(int, IndexInput)} - which defines the actual format of the skip data. - - - - Returns the id of the doc to which the last call of {@link #SkipTo(int)} - has skipped. - - - - Skips entries to the first beyond the current whose document number is - greater than or equal to target. Returns the current doc count. - - - - Seeks the skip entry on the given level - - - initializes the reader - - - Loads the skip levels - - - Subclasses must implement the actual skip data encoding in this method. - - - the level skip data shall be read from - - the skip stream to read from - - - - Copies the values of the last read skip entry on this level - - - used to buffer the top skip levels - - - NOTE: this API is experimental and will likely change - - - Adds a new doc in this term. If this returns null - then we just skip consuming positions/payloads. - - - - Called when we are done adding docs to this term - - - This class tracks the number and position / offset parameters of terms - being added to the index. The information collected in this class is - also used to calculate the normalization factor for a field. - -

    WARNING: This API is new and experimental, and may suddenly - change.

    -

    -
    - - Re-initialize the state, using this boost value. - boost value to use. - - - - Get the last processed term position. - the position - - - - Get total number of terms in this field. - the length - - - - Get the number of terms with positionIncrement == 0. - the numOverlap - - - - Get end offset of the last processed term. - the offset - - - - Get boost value. This is the cumulative product of - document boost and field boost for all field instances - sharing the same field name. - - the boost - - - - This class accepts multiple added documents and directly - writes a single segment file. It does this more - efficiently than creating a single segment per document - (with DocumentWriter) and doing standard merges on those - segments. - - Each added document is passed to the {@link DocConsumer}, - which in turn processes the document and interacts with - other consumers in the indexing chain. Certain - consumers, like {@link StoredFieldsWriter} and {@link - TermVectorsTermsWriter}, digest a document and - immediately write bytes to the "doc store" files (ie, - they do not consume RAM per document, except while they - are processing the document). - - Other consumers, eg {@link FreqProxTermsWriter} and - {@link NormsWriter}, buffer bytes in RAM and flush only - when a new segment is produced. - Once we have used our allowed RAM buffer, or the number - of added docs is large enough (in the case we are - flushing by doc count instead of RAM usage), we create a - real segment and flush it to the Directory. - - Threads: - - Multiple threads are allowed into addDocument at once. - There is an initial synchronized call to getThreadState - which allocates a ThreadState for this thread. The same - thread will get the same ThreadState over time (thread - affinity) so that if there are consistent patterns (for - example each thread is indexing a different content - source) then we make better use of RAM. Then - processDocument is called on that ThreadState without - synchronization (most of the "heavy lifting" is in this - call). Finally the synchronized "finishDocument" is - called to flush changes to the directory. - - When flush is called by IndexWriter, or, we flush - internally when autoCommit=false, we forcefully idle all - threads and flush only once they are all idle. This - means you can call flush with a given thread even while - other threads are actively adding/deleting documents. - - - Exceptions: - - Because this class directly updates in-memory posting - lists, and flushes stored fields and term vectors - directly to files in the directory, there are certain - limited times when an exception can corrupt this state. - For example, a disk full while flushing stored fields - leaves this file in a corrupt state. Or, an OOM - exception while appending to the in-memory posting lists - can corrupt that posting list. We call such exceptions - "aborting exceptions". In these cases we must call - abort() to discard all docs added since the last flush. - - All other exceptions ("non-aborting exceptions") can - still partially update the index structures. These - updates are consistent, but, they represent only a part - of the document seen up until the exception was hit. - When this happens, we immediately mark the document as - deleted so that the document is always atomically ("all - or none") added to the index. - - - - Returns true if any of the fields in the current - buffered docs have omitTermFreqAndPositions==false - - - - If non-null, various details of indexing are printed - here. - - - - Set how much RAM we can use before flushing. - - - Set max buffered docs, which means we will flush by - doc count instead of by RAM usage. - - - - Get current segment name we are writing. - - - Returns how many docs are currently buffered in RAM. - - - Returns the current doc store segment we are writing - to. This will be the same as segment when autoCommit - * is true. - - - - Returns the doc offset into the shared doc store for - the current buffered docs. - - - - Closes the current open doc stores an returns the doc - store segment name. This returns null if there are * - no buffered documents. - - - - Called if we hit an exception at a bad time (when - updating the index files) and must discard all - currently buffered docs. This resets our state, - discarding any docs added since last flush. - - - - Reset after a flush - - - Flush all pending docs to a new segment - - - Build compound file for the segment we just flushed - - - Set flushPending if it is not already set and returns - whether it was set. This is used by IndexWriter to - trigger a single flush even when multiple threads are - trying to do so. - - - - Returns a free (idle) ThreadState that may be used for - indexing this one document. This call also pauses if a - flush is pending. If delTerm is non-null then we - buffer this deleted term after the thread state has - been acquired. - - - - Returns true if the caller (IndexWriter) should now - flush. - - - - Called whenever a merge has completed and the merged segments had deletions - - - Does the synchronized work to finish/flush the - inverted document. - - - - The IndexingChain must define the {@link #GetChain(DocumentsWriter)} method - which returns the DocConsumer that the DocumentsWriter calls to process the - documents. - - - - Consumer returns this on each doc. This holds any - state that must be flushed synchronized "in docID - order". We gather these and flush them in order. - - - - A {@link FieldSelector} based on a Map of field names to {@link FieldSelectorResult}s - - - - - Create a a MapFieldSelector - maps from field names (String) to {@link FieldSelectorResult}s - - - - Create a a MapFieldSelector - fields to LOAD. List of Strings. All other fields are NO_LOAD. - - - - Create a a MapFieldSelector - fields to LOAD. All other fields are NO_LOAD. - - - - Load field according to its associated value in fieldSelections - a field name - - the fieldSelections value that field maps to or NO_LOAD if none. - - - - Provides support for converting dates to strings and vice-versa. - The strings are structured so that lexicographic sorting orders - them by date, which makes them suitable for use as field values - and search terms. - -

    This class also helps you to limit the resolution of your dates. Do not - save dates with a finer resolution than you really need, as then - RangeQuery and PrefixQuery will require more memory and become slower. - -

    Compared to {@link DateField} the strings generated by the methods - in this class take slightly more space, unless your selected resolution - is set to Resolution.DAY or lower. - -

    - Another approach is {@link NumericUtils}, which provides - a sortable binary representation (prefix encoded) of numeric values, which - date/time are. - For indexing a {@link Date} or {@link Calendar}, just get the unix timestamp as - long using {@link Date#getTime} or {@link Calendar#getTimeInMillis} and - index this as a numeric value with {@link NumericField} - and use {@link NumericRangeQuery} to query it. -

    -
    - - Converts a Date to a string suitable for indexing. - - - the date to be converted - - the desired resolution, see - {@link #Round(Date, DateTools.Resolution)} - - a string in format yyyyMMddHHmmssSSS or shorter, - depending on resolution; using GMT as timezone - - - - Converts a millisecond time to a string suitable for indexing. - - - the date expressed as milliseconds since January 1, 1970, 00:00:00 GMT - - the desired resolution, see - {@link #Round(long, DateTools.Resolution)} - - a string in format yyyyMMddHHmmssSSS or shorter, - depending on resolution; using GMT as timezone - - - - Converts a string produced by timeToString or - DateToString back to a time, represented as the - number of milliseconds since January 1, 1970, 00:00:00 GMT. - - - the date string to be converted - - the number of milliseconds since January 1, 1970, 00:00:00 GMT - - ParseException if dateString is not in the - expected format - - - - Converts a string produced by timeToString or - DateToString back to a time, represented as a - Date object. - - - the date string to be converted - - the parsed time as a Date object - - ParseException if dateString is not in the - expected format - - - - Limit a date's resolution. For example, the date 2004-09-21 13:50:11 - will be changed to 2004-09-01 00:00:00 when using - Resolution.MONTH. - - - The desired resolution of the date to be returned - - the date with all values more precise than resolution - set to 0 or 1 - - - - Limit a date's resolution. For example, the date 1095767411000 - (which represents 2004-09-21 13:50:11) will be changed to - 1093989600000 (2004-09-01 00:00:00) when using - Resolution.MONTH. - - - The time in milliseconds (not ticks). - The desired resolution of the date to be returned - - the date with all values more precise than resolution - set to 0 or 1, expressed as milliseconds since January 1, 1970, 00:00:00 GMT - - - - Specifies the time granularity. - - - A WhitespaceTokenizer is a tokenizer that divides text at whitespace. - Adjacent sequences of non-Whitespace characters form tokens. - - - - Construct a new WhitespaceTokenizer. - - - Construct a new WhitespaceTokenizer using a given {@link AttributeSource}. - - - Construct a new WhitespaceTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. - - - Collects only characters which do not satisfy - {@link Character#isWhitespace(char)}. - - - - Filters {@link LetterTokenizer} with {@link LowerCaseFilter} and - {@link StopFilter}. - - -

    - You must specify the required {@link Version} compatibility when creating - StopAnalyzer: -

      -
    • As of 2.9, position increments are preserved
    • -
    -
    -
    - - An array containing some common English words that are not usually useful - for searching. - - Use {@link #ENGLISH_STOP_WORDS_SET} instead - - - - An unmodifiable set containing some common English words that are not usually useful - for searching. - - - - Builds an analyzer which removes words in - ENGLISH_STOP_WORDS. - - Use {@link #StopAnalyzer(Version)} instead - - - - Builds an analyzer which removes words in ENGLISH_STOP_WORDS. - - - Builds an analyzer which removes words in - ENGLISH_STOP_WORDS. - - - See {@link StopFilter#SetEnablePositionIncrements} - - Use {@link #StopAnalyzer(Version)} instead - - - - Builds an analyzer with the stop words from the given set. - Use {@link #StopAnalyzer(Version, Set)} instead - - - - Builds an analyzer with the stop words from the given set. - - - Builds an analyzer with the stop words from the given set. - Set of stop words - - - See {@link StopFilter#SetEnablePositionIncrements} - - Use {@link #StopAnalyzer(Version, Set)} instead - - - - Builds an analyzer which removes words in the provided array. - Use {@link #StopAnalyzer(Set, boolean)} instead - - Use {@link #StopAnalyzer(Version, Set)} instead - - - - Builds an analyzer which removes words in the provided array. - Array of stop words - - - See {@link StopFilter#SetEnablePositionIncrements} - - Use {@link #StopAnalyzer(Version, Set)} instead - - - - Builds an analyzer with the stop words from the given file. - - - Use {@link #StopAnalyzer(Version, File)} instead - - - - Builds an analyzer with the stop words from the given file. - - - File to load stop words from - - - See {@link StopFilter#SetEnablePositionIncrements} - - Use {@link #StopAnalyzer(Version, File)} instead - - - - Builds an analyzer with the stop words from the given file. - - - - - See above - - File to load stop words from - - - - Builds an analyzer with the stop words from the given reader. - - - Use {@link #StopAnalyzer(Version, Reader)} instead - - - - Builds an analyzer with the stop words from the given reader. - - - Reader to load stop words from - - - See {@link StopFilter#SetEnablePositionIncrements} - - Use {@link #StopAnalyzer(Version, Reader)} instead - - - - Builds an analyzer with the stop words from the given reader. - - - See above - - Reader to load stop words from - - - - Filters LowerCaseTokenizer with StopFilter. - - - Filters LowerCaseTokenizer with StopFilter. - - - Expert: This class provides a {@link TokenStream} - for indexing numeric values that can be used by {@link - NumericRangeQuery} or {@link NumericRangeFilter}. - -

    Note that for simple usage, {@link NumericField} is - recommended. {@link NumericField} disables norms and - term freqs, as they are not usually needed during - searching. If you need to change these settings, you - should use this class. - -

    See {@link NumericField} for capabilities of fields - indexed numerically.

    - -

    Here's an example usage, for an int field: - -

    -             Field field = new Field(name, new NumericTokenStream(precisionStep).setIntValue(value));
    -             field.setOmitNorms(true);
    -             field.setOmitTermFreqAndPositions(true);
    -             document.add(field);
    -            
    - -

    For optimal performance, re-use the TokenStream and Field instance - for more than one document: - -

    -             NumericTokenStream stream = new NumericTokenStream(precisionStep);
    -             Field field = new Field(name, stream);
    -             field.setOmitNorms(true);
    -             field.setOmitTermFreqAndPositions(true);
    -             Document document = new Document();
    -             document.add(field);
    -            
    -             for(all documents) {
    -               stream.setIntValue(value)
    -               writer.addDocument(document);
    -             }
    -            
    - -

    This stream is not intended to be used in analyzers; - it's more for iterating the different precisions during - indexing a specific numeric value.

    - -

    NOTE: as token streams are only consumed once - the document is added to the index, if you index more - than one numeric field, use a separate NumericTokenStream - instance for each.

    - -

    See {@link NumericRangeQuery} for more details on the - precisionStep - parameter as well as how numeric fields work under the hood.

    - -

    NOTE: This API is experimental and - might change in incompatible ways in the next release. - -

    - 2.9 - -
    - - The full precision token gets this token type assigned. - - - The lower precision tokens gets this token type assigned. - - - Creates a token stream for numeric values using the default precisionStep - {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The stream is not yet initialized, - before using set a value using the various set???Value() methods. - - - - Creates a token stream for numeric values with the specified - precisionStep. The stream is not yet initialized, - before using set a value using the various set???Value() methods. - - - - Expert: Creates a token stream for numeric values with the specified - precisionStep using the given {@link AttributeSource}. - The stream is not yet initialized, - before using set a value using the various set???Value() methods. - - - - Expert: Creates a token stream for numeric values with the specified - precisionStep using the given - {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. - The stream is not yet initialized, - before using set a value using the various set???Value() methods. - - - - Initializes the token stream with the supplied long value. - the value, for which this TokenStream should enumerate tokens. - - this instance, because of this you can use it the following way: - new Field(name, new NumericTokenStream(precisionStep).SetLongValue(value)) - - - - Initializes the token stream with the supplied int value. - the value, for which this TokenStream should enumerate tokens. - - this instance, because of this you can use it the following way: - new Field(name, new NumericTokenStream(precisionStep).SetIntValue(value)) - - - - Initializes the token stream with the supplied double value. - the value, for which this TokenStream should enumerate tokens. - - this instance, because of this you can use it the following way: - new Field(name, new NumericTokenStream(precisionStep).SetDoubleValue(value)) - - - - Initializes the token stream with the supplied float value. - the value, for which this TokenStream should enumerate tokens. - - this instance, because of this you can use it the following way: - new Field(name, new NumericTokenStream(precisionStep).SetFloatValue(value)) - - - - Holds a map of String input to String output, to be used - with {@link MappingCharFilter}. - - - - Records a replacement to be applied to the inputs - stream. Whenever singleMatch occurs in - the input, it will be replaced with - replacement. - - - input String to be replaced - - output String - - - - A memory-resident {@link IndexInput} implementation. - - - $Id: RAMInputStream.java 632120 2008-02-28 21:13:59Z mikemccand $ - - - - Expert: A Scorer for documents matching a Term. - - - Construct a TermScorer. - - - The weight of the Term in the query. - - An iterator over the documents matching the Term. - - The Similarity implementation to be used for score - computations. - - The field norms of the document fields for the Term. - - - - use {@link #Score(Collector)} instead. - - - - use {@link #Score(Collector, int, int)} instead. - - - - use {@link #DocID()} instead. - - - - Advances to the next document matching the query.
    - The iterator over the matching documents is buffered using - {@link TermDocs#Read(int[],int[])}. - -
    - true iff there is another document matching the query. - - use {@link #NextDoc()} instead. - -
    - - Advances to the next document matching the query.
    - The iterator over the matching documents is buffered using - {@link TermDocs#Read(int[],int[])}. - -
    - the document matching the query or -1 if there are no more documents. - -
    - - Skips to the first match beyond the current whose document number is - greater than or equal to a given target.
    - The implementation uses {@link TermDocs#SkipTo(int)}. - -
    - The target document number. - - true iff there is such a match. - - use {@link #Advance(int)} instead. - -
    - - Advances to the first match beyond the current whose document number is - greater than or equal to a given target.
    - The implementation uses {@link TermDocs#SkipTo(int)}. - -
    - The target document number. - - the matching document or -1 if none exist. - -
    - - Returns an explanation of the score for a document. -
    When this method is used, the {@link #Next()} method - and the {@link #Score(HitCollector)} method should not be used. -
    - The document number for the explanation. - -
    - - Returns a string representation of this TermScorer. - - - Implements parallel search over a set of Searchables. - -

    Applications usually need only call the inherited {@link #Search(Query)} - or {@link #Search(Query,Filter)} methods. -

    -
    - - Creates a searchable which searches searchables. - - - TODO: parallelize this one too - - - A search implementation which spans a new thread for each - Searchable, waits for each search to complete and merge - the results back together. - - - - A search implementation allowing sorting which spans a new thread for each - Searchable, waits for each search to complete and merges - the results back together. - - - - Lower-level search API. - -

    {@link Collector#Collect(int)} is called for every matching document. - -

    Applications should only use this if they need all of the - matching documents. The high-level search API ({@link - Searcher#Search(Query)}) is usually more efficient, as it skips - non-high-scoring hits. - -

    - to match documents - - if non-null, a bitset used to eliminate some documents - - to receive hits - - TODO: parallelize this one too - -
    - - A thread subclass for searching a single searchable - - - Expert: Collects sorted results from Searchable's and collates them. - The elements put into this queue must be of type FieldDoc. - -

    Created: Feb 11, 2004 2:04:21 PM - -

    - lucene 1.4 - - $Id: FieldDocSortedHitQueue.java 695514 2008-09-15 15:42:11Z otis $ - -
    - - Creates a hit queue sorted by the given list of fields. - Fieldable names, in priority order (highest priority first). - - The number of hits to retain. Must be greater than zero. - - - - Allows redefinition of sort fields if they are null. - This is to handle the case using ParallelMultiSearcher where the - original list contains AUTO and we don't know the actual sort - type until the values come back. The fields can only be set once. - This method is thread safe. - - - - - - Returns the fields being used to sort. - - - Returns an array of collators, possibly null. The collators - correspond to any SortFields which were given a specific locale. - - Array of sort fields. - - Array, possibly null. - - - - Returns whether a is less relevant than b. - ScoreDoc - - ScoreDoc - - true if document a should be sorted after document b. - - - - Expert: A ScoreDoc which also contains information about - how to sort the referenced document. In addition to the - document number and score, this object contains an array - of values for the document from the field(s) used to sort. - For example, if the sort criteria was to sort by fields - "a", "b" then "c", the fields object array - will have three elements, corresponding respectively to - the term values for the document in fields "a", "b" and "c". - The class of each element in the array will be either - Integer, Float or String depending on the type of values - in the terms of each field. - -

    Created: Feb 11, 2004 1:23:38 PM - -

    - lucene 1.4 - - $Id: FieldDoc.java 773194 2009-05-09 10:36:41Z mikemccand $ - - - - - -
    - - Expert: Returned by low-level search implementations. - - - - - Expert: The score of this document for the query. - - - Expert: A hit document's number. - - - - - Expert: Constructs a ScoreDoc. - - - Expert: The values which are used to sort the referenced document. - The order of these will match the original sort criteria given by a - Sort object. Each Object will be either an Integer, Float or String, - depending on the type of values in the terms of the original field. - - - - - - - - Expert: Creates one of these objects with empty sort information. - - - Expert: Creates one of these objects with the given sort information. - - - Token Manager. - - - Debug output. - - - Set debug output. - - - Token literal values. - - - Lexer state names. - - - Lex State array. - - - Constructor. - - - Constructor. - - - Reinitialise parser. - - - Reinitialise parser. - - - Switch to specified lex state. - - - Get the next Token. - - - MessageBundles classes extend this class, to implement a bundle. - - For Native Language Support (NLS), system of software internationalization. - - This interface is similar to the NLS class in eclipse.osgi.util.NLS class - - initializeMessages() method resets the values of all static strings, should - only be called by classes that extend from NLS (see TestMessages.java for - reference) - performs validation of all message in a bundle, at class load - time - performs per message validation at runtime - see NLSTest.java for - usage reference - - MessageBundle classes may subclass this type. - - - - Initialize a given class with the message bundle Keys Should be called from - a class that extends NLS in a static block at class load time. - - - Property file with that contains the message bundle - - where constants will reside - - - - - - - - - - - - - - - Message Key - - - - - Performs the priviliged action. - - A value that may represent the result of the action. - - - Fills in no-term-vectors for all docs we haven't seen - since the last doc that had term vectors. - - - - The TermVectorOffsetInfo class holds information pertaining to a Term in a {@link Lucene.Net.Index.TermPositionVector}'s - offset information. This offset information is the character offset as set during the Analysis phase (and thus may not be the actual offset in the - original content). - - - - Convenience declaration when creating a {@link Lucene.Net.Index.TermPositionVector} that stores only position information. - - - The accessor for the ending offset for the term - The offset - - - - The accessor for the starting offset of the term. - - - The offset - - - - Two TermVectorOffsetInfos are equals if both the start and end offsets are the same - The comparison Object - - true if both {@link #GetStartOffset()} and {@link #GetEndOffset()} are the same for both objects. - - Information about a segment such as it's name, directory, and files related to the segment. @@ -22017,106 +18511,121 @@ has the same dir and same name. - - This abstract class writes skip lists with multiple levels. - - Example for skipInterval = 3: - c (skip level 2) - c c c (skip level 1) - x x x x x x x x x x (skip level 0) - d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d (posting list) - 3 6 9 12 15 18 21 24 27 30 (df) - - d - document - x - skip data - c - skip data with child pointer - - Skip level i contains every skipInterval-th entry from skip level i-1. - Therefore the number of entries on level i is: floor(df / ((skipInterval ^ (i + 1))). - - Each skip entry on a level i>0 contains a pointer to the corresponding skip entry in list i-1. - This guarantess a logarithmic amount of skips to find the target document. - - While this class takes care of writing the different skip levels, - subclasses must define the actual format of the skip data. - - - - - Subclasses must implement the actual skip data encoding in this method. - - - the level skip data shall be writting for - - the skip buffer to write to - - - - Writes the current skip data to the buffers. The current document frequency determines - the max level is skip data is to be written to. - - - the current document frequency - - IOException - - - Writes the buffered skip lists to the given output. - - - the IndexOutput the skip lists shall be written to - - the pointer the skip list starts - - - - This is a {@link LogMergePolicy} that measures size of a - segment as the number of documents (not taking deletions - into account). - - - - - - - - Sets the minimum size for the lowest level segments. - Any segments below this size are considered to be on - the same level (even if they vary drastically in size) - and will be merged whenever there are mergeFactor of - them. This effectively truncates the "long tail" of - small segments that would otherwise be created into a - single level. If you set this too large, it could - greatly increase the merging cost during indexing (if - you flush many small segments). - - - - Get the minimum size for a segment to remain - un-merged. - - - - - - Filename filter that accept filenames and extensions only created by Lucene. + + Useful constants representing filenames and extensions used by lucene $rcs = ' $Id: Exp $ ' ; - - Returns true if this is a file that would be contained - in a CFS file. This function should only be called on - files that pass the above "accept" (ie, are already - known to be a Lucene index file). + + Name of the index segment file + + + Name of the generation reference file name + + + Name of the index deletable file (only used in + pre-lockless indices) - - Used by DocumentsWriter to merge the postings from - multiple ThreadStates when creating a segment + + Extension of norms file + + + Extension of freq postings file + + + Extension of prox postings file + + + Extension of terms file + + + Extension of terms index file + + + Extension of stored fields index file + + + Extension of stored fields file + + + Extension of vectors fields file + + + Extension of vectors documents file + + + Extension of vectors index file + + + Extension of compound file + + + Extension of compound file for doc store files + + + Extension of deletes + + + Extension of field infos + + + Extension of plain norms + + + Extension of separate norms + + + Extension of gen file + + + This array contains all filename extensions used by + Lucene's index files, with two exceptions, namely the + extension made up from .f + a number and + from .s + a number. Also note that + Lucene's segments_N files do not have any + filename extension. + + File extensions that are added to a compound file + (same as above, minus "del", "gen", "cfs"). + + + + File extensions of old-style index files + + + File extensions for term vector support + + + Computes the full file name from base, extension and + generation. If the generation is -1, the file name is + null. If it's 0, the file name is + If it's > 0, the file name is + + + -- main part of the file name + + -- extension of the filename (including .) + + -- generation + + + + Returns true if the provided filename is one of the doc + store files (ends with an extension in + STORE_INDEX_EXTENSIONS). + + + + Add a new position & payload + + + Called when we are done adding positions & payloads + Abstract API that consumes terms, doc, freq, prox and payloads postings. Concrete implementations of this @@ -22132,222 +18641,445 @@ Called when we are done adding everything. - - Gathers all Fieldables for a document under the same - name, updates FieldInfos, and calls per-field consumers - to process field by field. - - Currently, only a single thread visits the fields, - sequentially, for processing. + + Used by DocumentsWriter to maintain per-thread state. + We keep a separate Posting hash and other state for each + thread and then merge postings hashes from all threads + when writing the segment. - - If there are fields we've seen but did not see again - in the last run, then free them up. - + + An IndexReader which reads indexes with multiple segments. - - Implements the skip list writer for the default posting list format - that stores positions and payloads. - - + + Construct reading the named set of readers. - - Sets the values for the current skip data. + + This constructor is only used for {@link #Reopen()} - - Implements the skip list reader for the default posting list format - that stores positions and payloads. - - + + Version number when this IndexReader was opened. - - Returns the freq pointer of the doc to which the last call of - {@link MultiLevelSkipListReader#SkipTo(int)} has skipped. - - - - Returns the prox pointer of the doc to which the last call of - {@link MultiLevelSkipListReader#SkipTo(int)} has skipped. - - - - Returns the payload length of the payload stored just before - the doc to which the last call of {@link MultiLevelSkipListReader#SkipTo(int)} - has skipped. - - - - This exception is thrown when Lucene detects - an inconsistency in the index. - - - - Class to write byte streams into slices of shared - byte[]. This is used by DocumentsWriter to hold the - posting list for many terms in RAM. - - - - Set up the writer to write at address. - - - Write byte into byte slice stream - - - This class wraps a Token and supplies a single attribute instance - where the delegate token can be replaced. - - Will be removed, when old TokenStream API is removed. - - - - This attribute can be used to pass different flags down the {@link Tokenizer} chain, - eg from one TokenFilter to another one. - - - - EXPERIMENTAL: While we think this is here to stay, we may want to change it to be a long. -

    - - Get the bitset for any bits that have been set. This is completely distinct from {@link TypeAttribute#Type()}, although they do share similar purposes. - The flags can be used to encode information about the token for use by other {@link Lucene.Net.Analysis.TokenFilter}s. - - -

    - The bits + + Checks is the index is optimized (if it has a single segment and no deletions) + true if the index is optimized; false otherwise - - + + Tries to acquire the WriteLock on this directory. this method is only valid if this IndexReader is directory + owner. + + + StaleReaderException if the index has changed since this reader was opened + CorruptIndexException if the index is corrupt + Lucene.Net.Store.LockObtainFailedException + if another writer has this index open (write.lock could not be + obtained) + + IOException if there is a low-level IO error + + + + + + + Commit changes resulting from delete, undeleteAll, or setNorm operations +

    + If an exception is hit, then either no changes or all changes will have been committed to the index (transactional + semantics). + +

    + IOException if there is a low-level IO error +
    + + Returns the directory this index resides in. + + + Expert: return the IndexCommit that this reader has opened. +

    +

    WARNING: this API is new and experimental and may suddenly change.

    +

    +
    + + - - The start and end character offset of a Token. + + Optimized implementation. - - Returns this Token's starting offset, the position of the first character - corresponding to this token in the source text. - Note that the difference between endOffset() and startOffset() may not be - equal to termText.length(), as the term text may have been altered by a - stemmer or some other filter. + + A {@link MergeScheduler} that runs each merge using a + separate thread, up until a maximum number of threads + ({@link #setMaxThreadCount}) at which when a merge is + needed, the thread(s) that are updating the index will + pause until one or more merges completes. This is a + simple way to use concurrency in the indexing process + without having to create and manage application level + threads. - - Set the starting and ending offset. - See StartOffset() and EndOffset() + + Sets the max # simultaneous threads that may be + running. If a merge is necessary yet we already have + this many threads running, the incoming thread (that + is calling add/updateDocument) will block until + a merge thread has completed. - - Returns this Token's ending offset, one greater than the position of the - last character corresponding to this token in the source text. The length - of the token in the source text is (endOffset - startOffset). + + Get the max # simultaneous threads that may be + + + + + Return the priority that merge threads run at. By + default the priority is 1 plus the priority of (ie, + slightly higher priority than) the first thread that + calls merge. - - An iterator to iterate over set bits in an OpenBitSet. - This is faster than nextSetBit() for iterating over the complete set of bits, - especially when the density of the bits set is high. + + Return the priority that merge threads run at. + + + Does the actual merge, by calling {@link IndexWriter#merge} + + + Create and return a new MergeThread + + + Called when an exception is hit in a background merge + thread + + + + Used for testing + + + Used for testing + + + Used for testing + + + Used for testing + + + Used for testing + + + Declare what fields to load normally and what fields to load lazily + + + + + + Similar to a {@link java.io.FileFilter}, the FieldSelector allows one to make decisions about + what Fields get loaded on a {@link Document} by {@link Lucene.Net.Index.IndexReader#Document(int,Lucene.Net.Documents.FieldSelector)} + + + + + + + the field to accept or reject + + an instance of {@link FieldSelectorResult} + if the {@link Field} named fieldName should be loaded. + + + + Pass in the Set of {@link Field} names to load and the Set of {@link Field} names to load lazily. If both are null, the + Document will not have any {@link Field} on it. + + A Set of {@link String} field names to load. May be empty, but not null + + A Set of {@link String} field names to load lazily. May be empty, but not null + + + + Indicate whether to load the field with the given name or not. If the {@link Field#Name()} is not in either of the + initializing Sets, then {@link Lucene.Net.Documents.FieldSelectorResult#NO_LOAD} is returned. If a Field name + is in both fieldsToLoad and lazyFieldsToLoad, lazy has precedence. + + + The {@link Field} name to check + + The {@link FieldSelectorResult} + + + + A grammar-based tokenizer constructed with JFlex + +

    This should be a good tokenizer for most European-language documents: + +

      +
    • Splits words at punctuation characters, removing punctuation. However, a + dot that's not followed by whitespace is considered part of a token.
    • +
    • Splits words at hyphens, unless there's a number in the token, in which case + the whole token is interpreted as a product number and is not split.
    • +
    • Recognizes email addresses and internet hostnames as one token.
    • +
    + +

    Many applications have specific tokenizer needs. If this tokenizer does + not suit your application, please consider copying this source code + directory to your project and maintaining your own grammar-based tokenizer. + + +

    + You must specify the required {@link Version} compatibility when creating + StandardAnalyzer: +

    +
    +
    + + this solves a bug where HOSTs that end with '.' are identified + as ACRONYMs. It is deprecated and will be removed in the next + release. + + + + A private instance of the JFlex-constructed scanner + + + String token types that correspond to token type int constants + + + Please use {@link #TOKEN_TYPES} instead + + + + Specifies whether deprecated acronyms should be replaced with HOST type. + This is false by default to support backward compatibility. +

    + See http://issues.apache.org/jira/browse/LUCENE-1068 + +

    + this should be removed in the next release (3.0). + +
    + + Set the max allowed token length. Any token longer + than this is skipped. + + + + + + + + Creates a new instance of the {@link StandardTokenizer}. Attaches the + input to a newly created JFlex scanner. + + Use {@link #StandardTokenizer(Version, Reader)} instead + + + + Creates a new instance of the {@link Lucene.Net.Analysis.Standard.StandardTokenizer}. Attaches + the input to the newly created JFlex scanner. + + + The input reader + + Set to true to replace mischaracterized acronyms with HOST. + + See http://issues.apache.org/jira/browse/LUCENE-1068 + + Use {@link #StandardTokenizer(Version, Reader)} instead + + + + Creates a new instance of the + {@link org.apache.lucene.analysis.standard.StandardTokenizer}. Attaches + the input to the newly created JFlex scanner. + + + The input reader + + See http://issues.apache.org/jira/browse/LUCENE-1068 + + + + Creates a new StandardTokenizer with a given {@link AttributeSource}. + Use + {@link #StandardTokenizer(Version, AttributeSource, Reader)} + instead + + + + Creates a new StandardTokenizer with a given {@link AttributeSource}. + + + Creates a new StandardTokenizer with a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory} + Use + {@link #StandardTokenizer(Version, org.apache.lucene.util.AttributeSource.AttributeFactory, Reader)} + instead + + + + Creates a new StandardTokenizer with a given + {@link org.apache.lucene.util.AttributeSource.AttributeFactory} + + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + Will be removed in Lucene 3.0. This method is final, as it should + not be overridden. Delegates to the backwards compatibility layer. + + + + Prior to https://issues.apache.org/jira/browse/LUCENE-1068, StandardTokenizer mischaracterized as acronyms tokens like www.abc.com + when they should have been labeled as hosts instead. + + true if StandardTokenizer now returns these tokens as Hosts, otherwise false + + + Remove in 3.X and make true the only valid value + + + + + Set to true to replace mischaracterized acronyms as HOST. + + Remove in 3.X and make true the only valid value + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + + Removes words that are too long or too short from the stream. + + + + $Id: LengthFilter.java 807201 2009-08-24 13:22:34Z markrmiller $ + + + + Build a filter that removes words that are too long or too + short from the text. + + + + Returns the next input Token whose term() is the right len + + + A variety of high efficiencly bit twiddling routines. $Id$ - - ** the python code that generated bitlist - def bits2int(val): - arr=0 - for shift in range(8,0,-1): - if val & 0x80: - arr = (arr << 4) | shift - val = val << 1 - return arr - def int_table(): - tbl = [ hex(bits2int(val)).strip('L') for val in range(256) ] - return ','.join(tbl) - **** + + Returns the number of bits set in the long + + + Returns the number of set bits in an array of longs. + + + Returns the popcount or cardinality of the two sets after an intersection. + Neither array is modified. - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - use {@link #DocID()} instead. - - - - This exception is thrown when you try to list a - non-existent directory. + + Returns the popcount or cardinality of the union of two sets. + Neither array is modified. - - Writes bytes through to a primary IndexOutput, computing - checksum. Note that you cannot use seek(). + + Returns the popcount or cardinality of A & ~B + Neither array is modified. - - Starts but does not complete the commit of this file (= - writing of the final checksum at the end). After this - is called must call {@link #finishCommit} and the - {@link #close} to complete the commit. + + table of number of trailing zeros in a byte + + + Returns number of trailing zeros in a 64 bit long value. + + + Returns number of trailing zeros in a 32 bit int value. + + + returns 0 based index of first set bit + (only works for x!=0) +
    This is an alternate implementation of ntz()
    - - See {@link #prepareCommit} + + returns 0 based index of first set bit +
    This is an alternate implementation of ntz() +
    - - A {@link Collector} that sorts by {@link SortField} using - {@link FieldComparator}s. -

    - See the {@link #create(Lucene.Net.Search.Sort, int, boolean, boolean, boolean, boolean)} method - for instantiating a TopFieldCollector. + +

    returns true if v is a power of two or zero + + + returns true if v is a power of two or zero + + + returns the next highest power of two, or the current value if it's already a power of two or zero + + + returns the next highest power of two, or the current value if it's already a power of two or zero + + +

    Implements {@link LockFactory} using {@link + File#createNewFile()}.

    -

    NOTE: This API is experimental and might change in - incompatible ways in the next release.

    -

    -
    - - Creates a new {@link TopFieldCollector} from the given - arguments. +

    NOTE: the javadocs + for File.createNewFile contain a vague + yet spooky warning about not using the API for file + locking. This warning was added due to this + bug, and in fact the only known problem with using + this API for locking is that the Lucene write lock may + not be released when the JVM exits abnormally.

    +

    When this happens, a {@link LockObtainFailedException} + is hit when trying to create a writer, in which case you + need to explicitly clear the lock file first. You can + either manually remove the file, or use the {@link + org.apache.lucene.index.IndexReader#unlock(Directory)} + API. But, first be certain that no writer is in fact + writing to the index otherwise you can easily corrupt + your index.

    -

    NOTE: The instances returned by this method - pre-allocate a full array of length - numHits. +

    If you suspect that this or any other LockFactory is + not working properly in your environment, you can easily + test it by using {@link VerifyingLockFactory}, {@link + LockVerifyServer} and {@link LockStressTest}.

    - the sort criteria (SortFields). + + +
    + + Create a SimpleFSLockFactory instance, with null (unset) + lock directory. When you pass this factory to a {@link FSDirectory} + subclass, the lock directory is automatically set to the + directory itsself. Be sure to create one instance for each directory + your create! + + + + Instantiate using the provided directory (as a File instance). + where lock files should be created. - the number of results to collect. + + + Instantiate using the provided directory (as a File instance). + where lock files should be created. - specifies whether the actual field values should be returned on - the results (FieldDoc). + + + Instantiate using the provided directory name (String). + where lock files should be created. - specifies whether document scores should be tracked and set on the - results. Note that if set to false, then the results' scores will - be set to Float.NaN. Setting this to true affects performance, as - it incurs the score computation on each competitive result. - Therefore if document scores are not required by the application, - it is recommended to set it to false. - - specifies whether the query's maxScore should be tracked and set - on the resulting {@link TopDocs}. Note that if set to false, - {@link TopDocs#GetMaxScore()} returns Float.NaN. Setting this to - true affects performance as it incurs the score computation on - each result. Also, setting this true automatically sets - trackDocScores to true as well. - - specifies whether documents are scored in doc Id order or not by - the given {@link Scorer} in {@link #SetScorer(Scorer)}. - - a {@link TopFieldCollector} instance which will sort the results by - the sort criteria. - - IOException Stores information about how to sort documents by terms in an individual @@ -22670,28 +19402,53 @@ - - Expert: Delegating scoring implementation. Useful in {@link - Query#GetSimilarity(Searcher)} implementations, to override only certain - methods of a Searcher's Similiarty implementation.. + + A Scorer for queries with a required subscorer + and an excluding (prohibited) sub DocIdSetIterator. +
    + This Scorer implements {@link Scorer#SkipTo(int)}, + and it uses the skipTo() on the given scorers.
    - - Construct a {@link Similarity} that delegates all methods to another. - - - the Similarity implementation to delegate to + + Construct a ReqExclScorer. + The scorer that must match, except where + + indicates exclusion. - - Position of a term in a document that takes into account the term offset within the phrase. + + use {@link #NextDoc()} instead. + - - Go to next location of this term current document, and set - position as location - offset, so that a - matching exact phrase is easily identified when all PhrasePositions - have exactly the same position. + + Advance to non excluded doc. +
    On entry: +
      +
    • reqScorer != null,
    • +
    • exclScorer != null,
    • +
    • reqScorer was advanced once via next() or skipTo() + and reqScorer.doc() may still be excluded.
    • +
    + Advances reqScorer a non excluded required doc, if any.
    + true iff there is a non excluded required doc. + +
    + + use {@link #DocID()} instead. + + + + Returns the score of the current document matching the query. + Initially invalid, until {@link #Next()} is called the first time. + + The score of the required scorer. + + + + use {@link #Advance(int)} instead. + Calculates the minimum payload seen @@ -22699,726 +19456,100 @@ - - A query that matches all documents. + + Calculate the final score as the average score of all payloads seen. +

    + Is thread safe and completely reusable. +

    - - Field used for normalization factor (document boost). Null if nothing. - - - - use {@link #DocID()} instead. + + A ranked list of documents, used to hold search results. +

    + Caution: Iterate only over the hits needed. Iterating over all hits is + generally not desirable and may be the source of performance issues. If you + need to iterate over many or all hits, consider using the search method that + takes a {@link HitCollector}. +

    +

    + Note: Deleting matching documents concurrently with traversing the + hits, might, when deleting hits that were not yet retrieved, decrease + {@link #Length()}. In such case, + {@link java.util.ConcurrentModificationException + ConcurrentModificationException} is thrown when accessing hit n + > current_{@link #Length()} (but n < {@link #Length()} + _at_start). + +

    + see {@link Searcher#Search(Query, int)}, + {@link Searcher#Search(Query, Filter, int)} and + {@link Searcher#Search(Query, Filter, int, Sort)}:
    + +
    +            TopDocs topDocs = searcher.Search(query, numHits);
    +            ScoreDoc[] hits = topDocs.scoreDocs;
    +            for (int i = 0; i < hits.Length; i++) {
    +            int docId = hits[i].doc;
    +            Document d = searcher.Doc(docId);
    +            // do something with current hit
    +            ...
    +            
    - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - A Query that matches documents matching boolean combinations of other - queries, e.g. {@link TermQuery}s, {@link PhraseQuery}s or other - BooleanQuerys. + + Tries to add new documents to hitDocs. + Ensures that the hit numbered min has been retrieved. - - Return the maximum number of clauses permitted, 1024 by default. - Attempts to add more than the permitted number of clauses cause {@link - TooManyClauses} to be thrown. + + Returns the total number of hits available in this set. + + + Returns the stored fields of the nth document in this set. +

    Documents are cached, so that repeated requests for the same element may + return the same Document object.

    - - -
    - - Set the maximum number of clauses permitted per BooleanQuery. - Default value is 1024. - - - - Constructs an empty boolean query. - - - Constructs an empty boolean query. - - {@link Similarity#Coord(int,int)} may be disabled in scoring, as - appropriate. For example, this score factor does not make sense for most - automatically generated queries, like {@link WildcardQuery} and {@link - FuzzyQuery}. - - - disables {@link Similarity#Coord(int,int)} in scoring. - - - - Returns true iff {@link Similarity#Coord(int,int)} is disabled in - scoring for this query instance. - - - - - - Specifies a minimum number of the optional BooleanClauses - which must be satisfied. - -

    - By default no optional clauses are necessary for a match - (unless there are no required clauses). If this method is used, - then the specified number of clauses is required. -

    -

    - Use of this method is totally independent of specifying that - any specific clauses are required (or prohibited). This number will - only be compared against the number of matching optional clauses. -

    -

    - EXPERT NOTE: Using this method may force collecting docs in order, - regardless of whether setAllowDocsOutOfOrder(true) has been called. -

    - -

    - the number of optional clauses that must match - - - -
    - - Gets the minimum number of the optional BooleanClauses - which must be satisifed. - - - - Adds a clause to a boolean query. - - - TooManyClauses if the new number of clauses exceeds the maximum clause number - - - - - Adds a clause to a boolean query. - TooManyClauses if the new number of clauses exceeds the maximum clause number - - - - - Returns the set of clauses in this query. - - - Returns the list of clauses in this query. - - - Whether hit docs may be collected out of docid order. - - - this will not be needed anymore, as - {@link Weight#ScoresDocsOutOfOrder()} is used. - - - - Expert: Indicates whether hit docs may be collected out of docid order. - -

    - Background: although the contract of the Scorer class requires that - documents be iterated in order of doc id, this was not true in early - versions of Lucene. Many pieces of functionality in the current Lucene code - base have undefined behavior if this contract is not upheld, but in some - specific simple cases may be faster. (For example: disjunction queries with - less than 32 prohibited clauses; This setting has no effect for other - queries.) -

    - -

    - Specifics: By setting this option to true, docid N might be scored for a - single segment before docid N-1. Across multiple segments, docs may be - scored out of order regardless of this setting - it only applies to scoring - a single segment. - - Being static, this setting is system wide. -

    - -

    - this is not needed anymore, as - {@link Weight#ScoresDocsOutOfOrder()} is used. - -
    - - Whether hit docs may be collected out of docid order. - - - - - this is not needed anymore, as - {@link Weight#ScoresDocsOutOfOrder()} is used. - - - - Use {@link #SetAllowDocsOutOfOrder(boolean)} instead. - - - - Use {@link #GetAllowDocsOutOfOrder()} instead. - - - - Prints a user-readable version of this query. - - - Returns true iff o is equal to this. - - - Returns a hash code value for this object. - - - Thrown when an attempt is made to add more than {@link - #GetMaxClauseCount()} clauses. This typically happens if - a PrefixQuery, FuzzyQuery, WildcardQuery, or TermRangeQuery - is expanded to many terms during search. - - - - Expert: the Weight for BooleanQuery, used to - normalize, score and explain these queries. - -

    NOTE: this API and implementation is subject to - change suddenly in the next release.

    -

    -
    - - The Similarity implementation. - - - $Id - -

    NOTE: This API is new and still experimental - (subject to change suddenly in the next release)

    -

    -
    - - The class which implements SegmentReader. - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - - CorruptIndexException if the index is corrupt IOException if there is a low-level IO error - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - + + Returns the score for the nth document in this set. - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Clones the norm bytes. May be overridden by subclasses. New and experimental. - Byte array to clone - - New BitVector - - - - Clones the deleteDocs BitVector. May be overridden by subclasses. New and experimental. - BitVector to clone - - New BitVector - - - - - - - - - - - - Read norms into a pre-allocated array. - - - Create a clone from the initial TermVectorsReader and store it in the ThreadLocal. - TermVectorsReader - - - - Return a term frequency vector for the specified document and field. The - vector returned contains term numbers and frequencies for all terms in - the specified field of this document, if the field had storeTermVector - flag set. If the flag was not set, the method returns null. - - IOException - - - Return an array of term frequency vectors for the specified document. - The array contains a vector for each vectorized field in the document. - Each vector vector contains term numbers and frequencies for all terms - in a given vectorized field. - If no such fields existed, the method returns null. - - IOException - - - Return the name of the segment this reader is reading. - - - Return the SegmentInfo of the segment this reader is reading. - - - Returns the directory this index resides in. - - - Lotsa tests did hacks like:
    - SegmentReader reader = (SegmentReader) IndexReader.open(dir);
    - They broke. This method serves as a hack to keep hacks working + + Returns the id for the nth document in this set. + Note that ids may change when the index changes, so you cannot + rely on the id to be stable. - - Sets the initial value - - - Java's builtin ThreadLocal has a serious flaw: - it can take an arbitrarily long amount of time to - dereference the things you had stored in it, even once the - ThreadLocal instance itself is no longer referenced. - This is because there is single, master map stored for - each thread, which all ThreadLocals share, and that - master map only periodically purges "stale" entries. - - While not technically a memory leak, because eventually - the memory will be reclaimed, it can take a long time - and you can easily hit OutOfMemoryError because from the - GC's standpoint the stale entries are not reclaimaible. - - This class works around that, by only enrolling - WeakReference values into the ThreadLocal, and - separately holding a hard reference to each stored - value. When you call {@link #close}, these hard - references are cleared and then GC is freely able to - reclaim space by objects stored in it. - - - - Byte[] referencing is used because a new norm object needs - to be created for each clone, and the byte array is all - that is needed for sharing between cloned readers. The - current norm referencing is for sharing between readers - whereas the byte[] referencing is for copy on write which - is independent of reader references (i.e. incRef, decRef). - - - - Add a new field - - - Called when we are done adding everything. - - - Holds all per thread, per field state. - - - Provides information about what should be done with this Field - - - - - - Load this {@link Field} every time the {@link Document} is loaded, reading in the data as it is encountered. - {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should not return null. + + Returns a {@link HitIterator} to navigate the Hits. Each item returned + from {@link Iterator#next()} is a {@link Hit}.

    - {@link Document#Add(Fieldable)} should be called by the Reader. -

    -
    - - Lazily load this {@link Field}. This means the {@link Field} is valid, but it may not actually contain its data until - invoked. {@link Document#GetField(String)} SHOULD NOT BE USED. {@link Document#GetFieldable(String)} is safe to use and should - return a valid instance of a {@link Fieldable}. + Caution: Iterate only over the hits needed. Iterating over all + hits is generally not desirable and may be the source of + performance issues. If you need to iterate over many or all hits, consider + using a search method that takes a {@link HitCollector}.

    - {@link Document#Add(Fieldable)} should be called by the Reader.

    - - Do not load the {@link Field}. {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should return null. - {@link Document#Add(Fieldable)} is not called. -

    - {@link Document#Add(Fieldable)} should not be called by the Reader. + +

    Collapse the hash table & sort in-place. + + + Compares term text for two Posting instance and + returns -1 if p1 < p2; 1 if p1 > p2; else 0. - - Load this field as in the {@link #LOAD} case, but immediately return from {@link Field} loading for the {@link Document}. Thus, the - Document may not have its complete set of Fields. {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should - both be valid for this {@link Field} -

    - {@link Document#Add(Fieldable)} should be called by the Reader. + +

    Test whether the text for current RawPostingList p equals + current tokenText. - - Behaves much like {@link #LOAD} but does not uncompress any compressed data. This is used for internal purposes. - {@link Document#GetField(String)} and {@link Document#GetFieldable(String)} should not return null. -

    - {@link Document#Add(Fieldable)} should be called by - the Reader. + +

    Called when postings hash is too small (> 50% + occupied) or too large (< 20% occupied). - This is an internal option only, and is - no longer needed now that {@link CompressionTools} - is used for field compression. - - - - Expert: Load the size of this {@link Field} rather than its value. - Size is measured as number of bytes required to store the field == bytes for a binary or any compressed value, and 2*chars for a String value. - The size is stored as a binary value, represented as an int in a byte[], with the higher order byte first in [0] - - - - Expert: Like {@link #SIZE} but immediately break from the field loading loop, i.e., stop loading further fields, after the size is loaded - - - This attribute can be used to pass different flags down the tokenizer chain, - eg from one TokenFilter to another one. - - - - EXPERIMENTAL: While we think this is here to stay, we may want to change it to be a long. -

    - - Get the bitset for any bits that have been set. This is completely distinct from {@link TypeAttribute#Type()}, although they do share similar purposes. - The flags can be used to encode information about the token for use by other {@link Lucene.Net.Analysis.TokenFilter}s. - - -

    - The bits - -
    - - - - - - A SinkTokenizer can be used to cache Tokens for use in an Analyzer -

    - WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API. - If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers - the same functionality. -

    - - - Use {@link TeeSinkTokenFilter} instead - - - -
    - - Get the tokens in the internal List. -

    - WARNING: Adding tokens to this list requires the {@link #Reset()} method to be called in order for them - to be made available. Also, this Tokenizer does nothing to protect against {@link java.util.ConcurrentModificationException}s - in the case of adds happening while {@link #Next(Lucene.Net.Analysis.Token)} is being called. -

    - WARNING: Since this SinkTokenizer can be reset and the cached tokens made available again, do not modify them. Modify clones instead. - -

    - A List of {@link Lucene.Net.Analysis.Token}s - -
    - - Returns the next token out of the list of cached tokens - The next {@link Lucene.Net.Analysis.Token} in the Sink. - - IOException - - - Override this method to cache only certain tokens, or new tokens based - on the old tokens. - - - The {@link Lucene.Net.Analysis.Token} to add to the sink - - - - Reset the internal data structures to the start at the front of the list of tokens. Should be called - if tokens were added to the list after an invocation of {@link #Next(Token)} - - IOException - - - Transforms the token stream as per the Porter stemming algorithm. - Note: the input to the stemming filter must already be in lower case, - so you will need to use LowerCaseFilter or LowerCaseTokenizer farther - down the Tokenizer chain in order for this to work properly! -

    - To use this filter with other analyzers, you'll want to write an - Analyzer class that sets up the TokenStream chain as you want it. - To use this with LowerCaseTokenizer, for example, you'd write an - analyzer like this: -

    -

    -            class MyAnalyzer extends Analyzer {
    -            public final TokenStream tokenStream(String fieldName, Reader reader) {
    -            return new PorterStemFilter(new LowerCaseTokenizer(reader));
    -            }
    -            }
    -            
    -
    -
    - - Common util methods for dealing with {@link IndexReader}s. - - - - - Gathers sub-readers from reader into a List. - - - - - - - - - Returns sub IndexReader that contains the given document id. - - - id of document - - parent reader - - sub reader of parent which contains the specified doc id - - - - Returns sub-reader subIndex from reader. - - - parent reader - - index of desired sub reader - - the subreader at subINdex - - - - Returns index of the searcher/reader for document n in the - array used to construct this searcher/reader. - - - - Expert: allocate a new buffer. - Subclasses can allocate differently. - - size of allocated buffer. - - allocated buffer. - - - - Expert: returns a comparator for sorting ScoreDocs. - -

    - Created: Apr 21, 2004 3:49:28 PM - - This class will be used as part of a key to a FieldCache value. You must - implement hashCode and equals to avoid an explosion in RAM usage if you use - instances that are not the same instance. If you are searching using the - Remote contrib, the same instance of this class on the client will be a new - instance on every call to the server, so hashCode/equals is very important in - that situation. - -

    - $Id: SortComparatorSource.java 747019 2009-02-23 13:59:50Z - mikemccand $ - - 1.4 - - Please use {@link FieldComparatorSource} instead. - -
    - - Creates a comparator for the field in the given index. - Index to create comparator for. - - Name of the field to create comparator for. - - Comparator of ScoreDoc objects. - - IOException If an error occurs reading the index. - - - Experimental class to get set of payloads for most standard Lucene queries. - Operates like Highlighter - IndexReader should only contain doc of interest, - best to use MemoryIndex. - -

    - - WARNING: The status of the Payloads feature is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

    -
    - - that contains doc with payloads to extract - - - - Query should be rewritten for wild/fuzzy support. - - - - - payloads Collection - - IOException - - - Provides a {@link FieldComparator} for custom field sorting. - - NOTE: This API is experimental and might change in - incompatible ways in the next release. - - - - - Creates a comparator for the field in the given index. - - - Name of the field to create comparator for. - - FieldComparator. - - IOException - If an error occurs reading the index. - - - - A {@link Filter} that only accepts documents whose single - term value in the specified field is contained in the - provided set of allowed terms. - -

    - - This is the same functionality as TermsFilter (from - contrib/queries), except this filter requires that the - field contains only a single term for all documents. - Because of drastically different implementations, they - also have different performance characteristics, as - described below. - -

    - - The first invocation of this filter on a given field will - be slower, since a {@link FieldCache.StringIndex} must be - created. Subsequent invocations using the same field - will re-use this cache. However, as with all - functionality based on {@link FieldCache}, persistent RAM - is consumed to hold the cache, and is not freed until the - {@link IndexReader} is closed. In contrast, TermsFilter - has no persistent RAM consumption. - - -

    - - With each search, this filter translates the specified - set of Terms into a private {@link OpenBitSet} keyed by - term number per unique {@link IndexReader} (normally one - reader per segment). Then, during matching, the term - number for each docID is retrieved from the cache and - then checked for inclusion using the {@link OpenBitSet}. - Since all testing is done using RAM resident data - structures, performance should be very fast, most likely - fast enough to not require further caching of the - DocIdSet for each possible combination of terms. - However, because docIDs are simply scanned linearly, an - index with a great many small documents may find this - linear scan too costly. - -

    - - In contrast, TermsFilter builds up an {@link OpenBitSet}, - keyed by docID, every time it's created, by enumerating - through all matching docs using {@link TermDocs} to seek - and scan through each term's docID list. While there is - no linear scan of all docIDs, besides the allocation of - the underlying array in the {@link OpenBitSet}, this - approach requires a number of "disk seeks" in proportion - to the number of terms, which can be exceptionally costly - when there are cache misses in the OS's IO cache. - -

    - - Generally, this filter will be slower on the first - invocation for a given field, but subsequent invocations, - even if you change the allowed set of Terms, should be - faster than TermsFilter, especially as the number of - Terms being matched increases. If you are matching only - a very small number of terms, and those terms in turn - match a very small number of documents, TermsFilter may - perform faster. - -

    - - Which filter is best is very application dependent. -

    -
    - - This DocIdSet implementation is cacheable. - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - use {@link #Score(Collector, int, int)} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Score(Collector)} instead. - - - - use {@link #Advance(int)} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - A simple hash table of document scores within a range. Call this if the IndexInput passed to {@link #read} @@ -23426,21 +19557,135 @@ format. - - Returns an array of TermVectorOffsetInfo in which the term is found. + + Increments the enumeration to the next element. True if one exists. + + + Optimized scan, without allocating new terms. + Return number of invocations to next(). + + + + Returns the current Term in the enumeration. + Initially invalid, valid after next() called for the first time. + + + + Returns the previous Term enumerated. Initially null. + + + Returns the current TermInfo in the enumeration. + Initially invalid, valid after next() called for the first time. + + + + Sets the argument to the current TermInfo in the enumeration. + Initially invalid, valid after next() called for the first time. + + + + Returns the docFreq from the current TermInfo in the enumeration. + Initially invalid, valid after next() called for the first time. + + + + Closes the enumeration to further activity, freeing resources. + + + A Payload is metadata that can be stored together with each occurrence + of a term. This metadata is stored inline in the posting list of the + specific term. +

    + To store payloads in the index a {@link TokenStream} has to be used that + produces payload data. +

    + Use {@link TermPositions#GetPayloadLength()} and {@link TermPositions#GetPayload(byte[], int)} + to retrieve the payloads from the index.

    - The position in the array to get the offsets from - - An array of TermVectorOffsetInfo objects or the empty list - - -
    - - Returns an array of positions in which the term is found. - Terms are identified by the index at which its number appears in the - term String array obtained from the indexOf method. + + the byte array containing the payload data + + + the offset within the byte array + + + the length of the payload data + + + Creates an empty payload and does not allocate a byte array. + + + Creates a new payload with the the given array as data. + A reference to the passed-in array is held, i. e. no + copy is made. + + + the data of this payload + + + + Creates a new payload with the the given array as data. + A reference to the passed-in array is held, i. e. no + copy is made. + + + the data of this payload + + the offset in the data byte array + + the length of the data + + + + Sets this payloads data. + A reference to the passed-in array is held, i. e. no + copy is made. + + + + Sets this payloads data. + A reference to the passed-in array is held, i. e. no + copy is made. + + + + Returns a reference to the underlying byte array + that holds this payloads data. + + + + Returns the offset in the underlying byte array + + + Returns the length of the payload data. + + + Returns the byte at the given index. + + + Allocates a new byte array, copies the payload data into it and returns it. + + + Copies the payload data to a byte array. + + + the target byte array + + the offset in the target byte array + + + + Clones this payload by creating a copy of the underlying + byte array. + + + + Remaps docIDs after a merge has completed, where the + merged segments had at least one deletion. This is used + to renumber the buffered deletes in IndexWriter when a + merge of segments with deletions commits. @@ -23615,6 +19860,757 @@ Prints the fields of a document for human consumption. + + An Analyzer that uses {@link WhitespaceTokenizer}. + + + A {@link LockFactory} that wraps another {@link + LockFactory} and verifies that each lock obtain/release + is "correct" (never results in two processes holding the + lock at the same time). It does this by contacting an + external server ({@link LockVerifyServer}) to assert that + at most one process holds the lock at a time. To use + this, you should also run {@link LockVerifyServer} on the + host & port matching what you pass to the constructor. + + + + + + + + + should be a unique id across all clients + + the LockFactory that we are testing + + host or IP where {@link LockVerifyServer} + is running + + the port {@link LockVerifyServer} is + listening on + + + + This exception is thrown when the write.lock + could not be acquired. This + happens when a writer tries to open an index + that another writer already has open. + + + + + + Abstract base class providing a mechanism to restrict searches to a subset + of an index and also maintains and returns position information. + This is useful if you want to compare the positions from a SpanQuery with the positions of items in + a filter. For instance, if you had a SpanFilter that marked all the occurrences of the word "foo" in documents, + and then you entered a new SpanQuery containing bar, you could not only filter by the word foo, but you could + then compare position information for post processing. + + + + Returns a SpanFilterResult with true for documents which should be permitted in + search results, and false for those that should not and Spans for where the true docs match. + + The {@link Lucene.Net.Index.IndexReader} to load position and DocIdSet information from + + A {@link SpanFilterResult} + + java.io.IOException if there was an issue accessing the necessary information + + + + + Subclass of FilteredTermEnum for enumerating all terms that match the + specified prefix filter term. +

    + Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it. + +

    +
    + + Subclass of FilteredTermEnum for enumerating all terms that are similiar + to the specified filter term. + +

    Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it. +

    +
    + + Creates a FuzzyTermEnum with an empty prefix and a minSimilarity of 0.5f. +

    + After calling the constructor the enumeration is already pointing to the first + valid term if such a term exists. + +

    + + + + + IOException + + +
    + + Creates a FuzzyTermEnum with an empty prefix. +

    + After calling the constructor the enumeration is already pointing to the first + valid term if such a term exists. + +

    + + + + + + + IOException + + +
    + + Constructor for enumeration of all terms from specified reader which share a prefix of + length prefixLength with term and which have a fuzzy similarity > + minSimilarity. +

    + After calling the constructor the enumeration is already pointing to the first + valid term if such a term exists. + +

    + Delivers terms. + + Pattern term. + + Minimum required similarity for terms from the reader. Default value is 0.5f. + + Length of required common prefix. Default value is 0. + + IOException +
    + + The termCompare method in FuzzyTermEnum uses Levenshtein distance to + calculate the distance between the given term and the comparing term. + + + + Finds and returns the smallest of three integers + + +

    Similarity returns a number that is 1.0f or less (including negative numbers) + based on how similar the Term is compared to a target term. It returns + exactly 0.0f when +

    +            editDistance < maximumEditDistance
    + Otherwise it returns: +
    +            1 - (editDistance / length)
    + where length is the length of the shortest term (text or target) including a + prefix that are identical and editDistance is the Levenshtein distance for + the two words.

    + +

    Embedded within this algorithm is a fail-fast Levenshtein distance + algorithm. The fail-fast algorithm differs from the standard Levenshtein + distance algorithm in that it is aborted if it is discovered that the + mimimum distance between the words is greater than some threshold. + +

    To calculate the maximum distance threshold we use the following formula: +

    +            (1 - minimumSimilarity) * length
    + where length is the shortest term including any prefix that is not part of the + similarity comparision. This formula was derived by solving for what maximum value + of distance returns false for the following statements: +
    +            similarity = 1 - ((float)distance / (float) (prefixLength + Math.min(textlen, targetlen)));
    +            return (similarity > minimumSimilarity);
    + where distance is the Levenshtein distance for the two words. +

    +

    Levenshtein distance (also known as edit distance) is a measure of similiarity + between two strings where the distance is measured as the number of character + deletions, insertions or substitutions required to transform one string to + the other string. +

    + the target word or phrase + + the similarity, 0.0 or less indicates that it matches less than the required + threshold and 1.0 indicates that the text and target are identical + +
    + + Grow the second dimension of the array, so that we can calculate the + Levenshtein difference. + + + + The max Distance is the maximum Levenshtein distance for the text + compared to some other value that results in score that is + better than the minimum similarity. + + the length of the "other value" + + the maximum levenshtein distance that we care about + + + + Expert: Describes the score computation for document and query, and + can distinguish a match independent of a positive value. + + + + The match status of this explanation node. + May be null if match status is unknown + + + + Sets the match status assigned to this explanation node. + May be null if match status is unknown + + + + Indicates whether or not this Explanation models a good match. + +

    + If the match status is explicitly set (i.e.: not null) this method + uses it; otherwise it defers to the superclass. +

    +

    + + +
    + + Wraps another SpanFilter's result and caches it. The purpose is to allow + filters to simply filter, and then wrap with this class to add caching. + + + + A transient Filter cache. + + + Filter to cache results of + + + + Use {@link #GetDocIdSet(IndexReader)} instead. + + + + A Query that matches documents matching boolean combinations of other + queries, e.g. {@link TermQuery}s, {@link PhraseQuery}s or other + BooleanQuerys. + + + + Return the maximum number of clauses permitted, 1024 by default. + Attempts to add more than the permitted number of clauses cause {@link + TooManyClauses} to be thrown. + + + + + + Set the maximum number of clauses permitted per BooleanQuery. + Default value is 1024. + + + + Constructs an empty boolean query. + + + Constructs an empty boolean query. + + {@link Similarity#Coord(int,int)} may be disabled in scoring, as + appropriate. For example, this score factor does not make sense for most + automatically generated queries, like {@link WildcardQuery} and {@link + FuzzyQuery}. + + + disables {@link Similarity#Coord(int,int)} in scoring. + + + + Returns true iff {@link Similarity#Coord(int,int)} is disabled in + scoring for this query instance. + + + + + + Specifies a minimum number of the optional BooleanClauses + which must be satisfied. + +

    + By default no optional clauses are necessary for a match + (unless there are no required clauses). If this method is used, + then the specified number of clauses is required. +

    +

    + Use of this method is totally independent of specifying that + any specific clauses are required (or prohibited). This number will + only be compared against the number of matching optional clauses. +

    +

    + EXPERT NOTE: Using this method may force collecting docs in order, + regardless of whether setAllowDocsOutOfOrder(true) has been called. +

    + +

    + the number of optional clauses that must match + + + +
    + + Gets the minimum number of the optional BooleanClauses + which must be satisifed. + + + + Adds a clause to a boolean query. + + + TooManyClauses if the new number of clauses exceeds the maximum clause number + + + + + Adds a clause to a boolean query. + TooManyClauses if the new number of clauses exceeds the maximum clause number + + + + + Returns the set of clauses in this query. + + + Returns the list of clauses in this query. + + + Whether hit docs may be collected out of docid order. + + + this will not be needed anymore, as + {@link Weight#ScoresDocsOutOfOrder()} is used. + + + + Expert: Indicates whether hit docs may be collected out of docid order. + +

    + Background: although the contract of the Scorer class requires that + documents be iterated in order of doc id, this was not true in early + versions of Lucene. Many pieces of functionality in the current Lucene code + base have undefined behavior if this contract is not upheld, but in some + specific simple cases may be faster. (For example: disjunction queries with + less than 32 prohibited clauses; This setting has no effect for other + queries.) +

    + +

    + Specifics: By setting this option to true, docid N might be scored for a + single segment before docid N-1. Across multiple segments, docs may be + scored out of order regardless of this setting - it only applies to scoring + a single segment. + + Being static, this setting is system wide. +

    + +

    + this is not needed anymore, as + {@link Weight#ScoresDocsOutOfOrder()} is used. + +
    + + Whether hit docs may be collected out of docid order. + + + + + this is not needed anymore, as + {@link Weight#ScoresDocsOutOfOrder()} is used. + + + + Use {@link #SetAllowDocsOutOfOrder(boolean)} instead. + + + + Use {@link #GetAllowDocsOutOfOrder()} instead. + + + + Prints a user-readable version of this query. + + + Returns true iff o is equal to this. + + + Returns a hash code value for this object. + + + Thrown when an attempt is made to add more than {@link + #GetMaxClauseCount()} clauses. This typically happens if + a PrefixQuery, FuzzyQuery, WildcardQuery, or TermRangeQuery + is expanded to many terms during search. + + + + Expert: the Weight for BooleanQuery, used to + normalize, score and explain these queries. + +

    NOTE: this API and implementation is subject to + change suddenly in the next release.

    +

    +
    + + The Similarity implementation. + + + This stores a monotonically increasing set of <Term, TermInfo> pairs in a + Directory. A TermInfos can be written once, in order. + + + + The file format version, a negative number. + + + Expert: The fraction of terms in the "dictionary" which should be stored + in RAM. Smaller values use more memory, but make searching slightly + faster, while larger values use less memory and make searching slightly + slower. Searching is typically not dominated by dictionary lookup, so + tweaking this is rarely useful. + + + + Expert: The fraction of {@link TermDocs} entries stored in skip tables, + used to accellerate {@link TermDocs#SkipTo(int)}. Larger values result in + smaller indexes, greater acceleration, but fewer accelerable cases, while + smaller values result in bigger indexes, less acceleration and more + accelerable cases. More detailed experiments would be useful here. + + + + Expert: The maximum number of skip levels. Smaller values result in + slightly smaller indexes, but slower skipping in big posting lists. + + + + Adds a new <fieldNumber, termBytes>, TermInfo> pair to the set. + Term must be lexicographically greater than all previous Terms added. + TermInfo pointers must be positive and greater than all previous. + + + + Called to complete TermInfos creation. + + + This stores a monotonically increasing set of <Term, TermInfo> pairs in a + Directory. Pairs are accessed either by Term or by ordinal position the + set. + + + + Returns the number of term/value pairs in the set. + + + Returns the offset of the greatest index entry which is less than or equal to term. + + + Returns the TermInfo for a Term in the set, or null. + + + Returns the TermInfo for a Term in the set, or null. + + + Returns the position of a Term in the set or -1. + + + Returns an enumeration of all the Terms and TermInfos in the set. + + + Returns an enumeration of terms starting at or after the named term. + + + Per-thread resources managed by ThreadLocal + + + An IndexReader which reads multiple, parallel indexes. Each index added + must have the same number of documents, but typically each contains + different fields. Each document contains the union of the fields of all + documents with the same document number. When searching, matches for a + query term are from the first index added that has the field. + +

    This is useful, e.g., with collections that have large fields which + change rarely and small fields that change more frequently. The smaller + fields may be re-indexed in a new index and both indexes may be searched + together. + +

    Warning: It is up to you to make sure all indexes + are created and modified the same way. For example, if you add + documents to one index, you need to add the same documents in the + same order to the other indexes. Failure to do so will result in + undefined behavior. +

    +
    + + Construct a ParallelReader. +

    Note that all subreaders are closed if this ParallelReader is closed.

    +

    +
    + + Construct a ParallelReader. + indicates whether the subreaders should be closed + when this ParallelReader is closed + + + + Add an IndexReader. + IOException if there is a low-level IO error + + + Add an IndexReader whose stored fields will not be returned. This can + accellerate search when stored fields are only needed from a subset of + the IndexReaders. + + + IllegalArgumentException if not all indexes contain the same number + of documents + + IllegalArgumentException if not all indexes have the same value + of {@link IndexReader#MaxDoc()} + + IOException if there is a low-level IO error + + + Tries to reopen the subreaders. +
    + If one or more subreaders could be re-opened (i. e. subReader.reopen() + returned a new instance != subReader), then a new ParallelReader instance + is returned, otherwise this instance is returned. +

    + A re-opened instance might share one or more subreaders with the old + instance. Index modification operations result in undefined behavior + when performed before the old instance is closed. + (see {@link IndexReader#Reopen()}). +

    + If subreaders are shared, then the reference count of those + readers is increased to ensure that the subreaders remain open + until the last referring reader is closed. + +

    + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error +
    + + Checks recursively if all subreaders are up to date. + + + Checks recursively if all subindexes are optimized + + + Not implemented. + UnsupportedOperationException + + + + + + + NOTE: this API is experimental and will likely change + + + Adds a new doc in this term. If this returns null + then we just skip consuming positions/payloads. + + + + Called when we are done adding docs to this term + + + This class tracks the number and position / offset parameters of terms + being added to the index. The information collected in this class is + also used to calculate the normalization factor for a field. + +

    WARNING: This API is new and experimental, and may suddenly + change.

    +

    +
    + + Re-initialize the state, using this boost value. + boost value to use. + + + + Get the last processed term position. + the position + + + + Get total number of terms in this field. + the length + + + + Get the number of terms with positionIncrement == 0. + the numOverlap + + + + Get end offset of the last processed term. + the offset + + + + Get boost value. This is the cumulative product of + document boost and field boost for all field instances + sharing the same field name. + + the boost + + + + Combines multiple files into a single compound file. + The file format:
    +
      +
    • VInt fileCount
    • +
    • {Directory} + fileCount entries with the following structure:
    • +
        +
      • long dataOffset
      • +
      • String fileName
      • +
      +
    • {File Data} + fileCount entries with the raw data of the corresponding file
    • +
    + + The fileCount integer indicates how many files are contained in this compound + file. The {directory} that follows has that many entries. Each directory entry + contains a long pointer to the start of this file's data section, and a String + with that file's name. + + +
    + $Id: CompoundFileWriter.java 690539 2008-08-30 17:33:06Z mikemccand $ + +
    + + Create the compound stream in the specified file. The file name is the + entire name (no extensions are added). + + NullPointerException if dir or name is null + + + Returns the directory of the compound file. + + + Returns the name of the compound file. + + + Add a source stream. file is the string by which the + sub-stream will be known in the compound stream. + + + IllegalStateException if this writer is closed + NullPointerException if file is null + IllegalArgumentException if a file with the same name + has been added already + + + + Merge files with the extensions added up to now. + All files with these extensions are combined sequentially into the + compound stream. After successful merge, the source files + are deleted. + + IllegalStateException if close() had been called before or + if no file has been added to this object + + + + Copy the contents of the file with specified extension into the + provided output stream. Use the provided buffer for moving data + to reduce memory allocation. + + + + source file + + + temporary holder for the start of directory entry for this file + + + temporary holder for the start of this file's data section + + + Class for accessing a compound stream. + This class implements a directory, but is limited to only read operations. + Directory methods that would normally modify data throw an exception. + + + + $Id: CompoundFileReader.java 673371 2008-07-02 11:57:27Z mikemccand $ + + + + Returns an array of strings, one for each file in the directory. + + + Returns true iff a file with the given name exists. + + + Returns the time the compound file was last modified. + + + Set the modified time of the compound file to now. + + + Not implemented + UnsupportedOperationException + + + Not implemented + UnsupportedOperationException + + + Returns the length of a file in the directory. + IOException if the file does not exist + + + Not implemented + UnsupportedOperationException + + + Not implemented + UnsupportedOperationException + + + Implementation of an IndexInput that reads from a portion of the + compound file. The visibility is left as "package" *only* because + this helps with testing since JUnit test cases in a different class + can then access package fields of this class. + + + + Expert: implements buffer refill. Reads bytes from the current + position in the input. + + the array to read bytes into + + the offset in the array to start storing bytes + + the number of bytes to read + + + + Expert: implements seek. Sets current position in this file, where + the next {@link #ReadInternal(byte[],int,int)} will occur. + + + + + + Closes the stream to further operations. + Provides support for converting dates to strings and vice-versa. The strings are structured so that lexicographic sorting orders by date, @@ -23665,2870 +20661,29 @@ Converts a string-encoded date into a Date object. - - This class can be used if the token attributes of a TokenStream - are intended to be consumed more than once. It caches - all token attribute states locally in a List. - -

    CachingTokenFilter implements the optional method - {@link TokenStream#Reset()}, which repositions the - stream to the first Token. -

    + + The payload of a Token. See also {@link Payload}. - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - + + The payload of a Token. See also {@link Payload}. - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - + + Returns this Token's payload. - - This class converts alphabetic, numeric, and symbolic Unicode characters - which are not in the first 127 ASCII characters (the "Basic Latin" Unicode - block) into their ASCII equivalents, if one exists. - - Characters from the following Unicode blocks are converted; however, only - those characters with reasonable ASCII alternatives are converted: - - - - See: http://en.wikipedia.org/wiki/Latin_characters_in_Unicode - - The set of character conversions supported by this class is a superset of - those supported by Lucene's {@link ISOLatin1AccentFilter} which strips - accents from Latin1 characters. For example, 'À' will be replaced by - 'a'. - + + Sets this Token's payload. - - Converts characters above ASCII to their ASCII equivalents. For example, - accents are removed from accented characters. - - The string to fold - - The number of characters in the input string - + + Initialize this attribute with no payload. - - Simple lockless and memory barrier free String intern cache that is guaranteed - to return the same String instance as String.intern() does. - + + Initialize this attribute with the given payload. - - Size of the hash table, should be a power of two. - - Maximum length of each bucket, after which the oldest item inserted is dropped. - + + Returns this Token's payload. - - Use this {@link LockFactory} to disable locking entirely. - This LockFactory is used when you call {@link FSDirectory#setDisableLocks}. - Only one instance of this lock is created. You should call {@link - #GetNoLockFactory()} to get the instance. - - - - - - - - Not implemented. Waiting for volunteers. - - - - - Not implemented. Waiting for volunteers. - - - - Simple standalone tool that forever acquires & releases a - lock using a specific LockFactory. Run without any args - to see usage. - - - - - - - - - Writes bytes through to a primary IndexOutput, computing - checksum as it goes. Note that you cannot use seek(). - - - - The {@link TimeLimitingCollector} is used to timeout search requests that - take longer than the maximum allowed search time limit. After this time is - exceeded, the search thread is stopped by throwing a - {@link TimeExceededException}. - - - - Default timer resolution. - - - - - Default for {@link #IsGreedy()}. - - - - - Create a TimeLimitedCollector wrapper over another {@link Collector} with a specified timeout. - the wrapped {@link Collector} - - max time allowed for collecting hits after which {@link TimeExceededException} is thrown - - - - Return the timer resolution. - - - - - Set the timer resolution. - The default timer resolution is 20 milliseconds. - This means that a search required to take no longer than - 800 milliseconds may be stopped after 780 to 820 milliseconds. -
    Note that: -
      -
    • Finer (smaller) resolution is more accurate but less efficient.
    • -
    • Setting resolution to less than 5 milliseconds will be silently modified to 5 milliseconds.
    • -
    • Setting resolution smaller than current resolution might take effect only after current - resolution. (Assume current resolution of 20 milliseconds is modified to 5 milliseconds, - then it can take up to 20 milliseconds for the change to have effect.
    • -
    -
    -
    - - Checks if this time limited collector is greedy in collecting the last hit. - A non greedy collector, upon a timeout, would throw a {@link TimeExceededException} - without allowing the wrapped collector to collect current doc. A greedy one would - first allow the wrapped hit collector to collect current doc and only then - throw a {@link TimeExceededException}. - - - - - - Sets whether this time limited collector is greedy. - true to make this time limited greedy - - - - - - Calls {@link Collector#Collect(int)} on the decorated {@link Collector} - unless the allowed time has passed, in which case it throws an exception. - - - TimeExceededException - if the time allowed has exceeded. - - - - TimerThread provides a pseudo-clock service to all searching - threads, so that they can count elapsed time with less overhead - than repeatedly calling System.currentTimeMillis. A single - thread should be created to be used for all searches. - - - - Get the timer value in milliseconds. - - - Thrown when elapsed search time exceeds allowed search time. - - - Returns allowed time (milliseconds). - - - Returns elapsed time (milliseconds). - - - Returns last doc that was collected when the search time exceeded. - - - Implements the fuzzy search query. The similarity measurement - is based on the Levenshtein (edit distance) algorithm. - - Warning: this query is not very scalable with its default prefix - length of 0 - in this case, *every* term will be enumerated and - cause an edit score calculation. - - - - - Create a new FuzzyQuery that will match terms with a similarity - of at least minimumSimilarity to term. - If a prefixLength > 0 is specified, a common prefix - of that length is also required. - - - the term to search for - - a value between 0 and 1 to set the required similarity - between the query term and the matching terms. For example, for a - minimumSimilarity of 0.5 a term of the same length - as the query term is considered similar to the query term if the edit distance - between both terms is less than length(term)*0.5 - - length of common (non-fuzzy) prefix - - IllegalArgumentException if minimumSimilarity is >= 1 or < 0 - or if prefixLength < 0 - - - - Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, minimumSimilarity, 0)}. - - - Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, 0.5f, 0)}. - - - Returns the minimum similarity that is required for this query to match. - float value between 0.0 and 1.0 - - - - Returns the non-fuzzy prefix length. This is the number of characters at the start - of a term that must be identical (not fuzzy) to the query term if the query - is to match that term. - - - - Returns the pattern term. - - - A Term represents a word from text. This is the unit of search. It is - composed of two elements, the text of the word, as a string, and the name of - the field that the text occured in, an interned string. - Note that terms may represent more than words from text fields, but also - things like dates, email addresses, urls, etc. - - - - Constructs a Term with the given field and text. -

    Note that a null field or null text value results in undefined - behavior for most Lucene APIs that accept a Term parameter. -

    -
    - - Constructs a Term with the given field and empty text. - This serves two purposes: 1) reuse of a Term with the same field. - 2) pattern for a query. - - - - - - - Returns the field of this term, an interned string. The field indicates - the part of a document which this term came from. - - - - Returns the text of this term. In the case of words, this is simply the - text of the word. In the case of dates and other types, this is an - encoding of the object as a string. - - - - Optimized construction of new Terms by reusing same field as this Term - - avoids field.intern() overhead - - The text of the new term (field is implicitly same as this Term instance) - - A new Term - - - - Compares two terms, returning a negative integer if this - term belongs before the argument, zero if this term is equal to the - argument, and a positive integer if this term belongs after the argument. - The ordering of terms is first by field, then by text. - - - - Resets the field and text of a Term. - - - Writes norms. Each thread X field accumulates the norms - for the doc/fields it saw, then the flush method below - merges all of these together into a single _X.nrm file. - - - - Produce _X.nrm if any document had a field with norms - not disabled - - - - Allows you to iterate over the {@link TermPositions} for multiple {@link Term}s as - a single {@link TermPositions}. - - - - - Creates a new MultipleTermPositions instance. - - - - - - - Not implemented. - UnsupportedOperationException - - - Not implemented. - UnsupportedOperationException - - - Not implemented. - UnsupportedOperationException - - - Not implemented. - UnsupportedOperationException - - - Not implemented. - UnsupportedOperationException - - - - false - - - - Add a new position & payload. If payloadLength > 0 - you must read those bytes from the IndexInput. - - - - Called when we are done adding positions & payloads - - - Add a new position & payload - - - Called when we are done adding positions & payloads - - - For each Field, store a sorted collection of {@link TermVectorEntry}s -

    - This is not thread-safe. -

    -
    - - - A Comparator for sorting {@link TermVectorEntry}s - - - - Get the mapping between fields and terms, sorted by the comparator - - - A map between field names and {@link java.util.SortedSet}s per field. SortedSet entries are {@link TermVectorEntry} - - - - A grammar-based tokenizer constructed with JFlex - -

    This should be a good tokenizer for most European-language documents: - -

      -
    • Splits words at punctuation characters, removing punctuation. However, a - dot that's not followed by whitespace is considered part of a token.
    • -
    • Splits words at hyphens, unless there's a number in the token, in which case - the whole token is interpreted as a product number and is not split.
    • -
    • Recognizes email addresses and internet hostnames as one token.
    • -
    - -

    Many applications have specific tokenizer needs. If this tokenizer does - not suit your application, please consider copying this source code - directory to your project and maintaining your own grammar-based tokenizer. - - -

    - You must specify the required {@link Version} compatibility when creating - StandardAnalyzer: -

    -
    -
    - - this solves a bug where HOSTs that end with '.' are identified - as ACRONYMs. It is deprecated and will be removed in the next - release. - - - - A private instance of the JFlex-constructed scanner - - - String token types that correspond to token type int constants - - - Please use {@link #TOKEN_TYPES} instead - - - - Specifies whether deprecated acronyms should be replaced with HOST type. - This is false by default to support backward compatibility. -

    - See http://issues.apache.org/jira/browse/LUCENE-1068 - -

    - this should be removed in the next release (3.0). - -
    - - Set the max allowed token length. Any token longer - than this is skipped. - - - - - - - - Creates a new instance of the {@link StandardTokenizer}. Attaches the - input to a newly created JFlex scanner. - - Use {@link #StandardTokenizer(Version, Reader)} instead - - - - Creates a new instance of the {@link Lucene.Net.Analysis.Standard.StandardTokenizer}. Attaches - the input to the newly created JFlex scanner. - - - The input reader - - Set to true to replace mischaracterized acronyms with HOST. - - See http://issues.apache.org/jira/browse/LUCENE-1068 - - Use {@link #StandardTokenizer(Version, Reader)} instead - - - - Creates a new instance of the - {@link org.apache.lucene.analysis.standard.StandardTokenizer}. Attaches - the input to the newly created JFlex scanner. - - - The input reader - - See http://issues.apache.org/jira/browse/LUCENE-1068 - - - - Creates a new StandardTokenizer with a given {@link AttributeSource}. - Use - {@link #StandardTokenizer(Version, AttributeSource, Reader)} - instead - - - - Creates a new StandardTokenizer with a given {@link AttributeSource}. - - - Creates a new StandardTokenizer with a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory} - Use - {@link #StandardTokenizer(Version, org.apache.lucene.util.AttributeSource.AttributeFactory, Reader)} - instead - - - - Creates a new StandardTokenizer with a given - {@link org.apache.lucene.util.AttributeSource.AttributeFactory} - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Will be removed in Lucene 3.0. This method is final, as it should - not be overridden. Delegates to the backwards compatibility layer. - - - - Prior to https://issues.apache.org/jira/browse/LUCENE-1068, StandardTokenizer mischaracterized as acronyms tokens like www.abc.com - when they should have been labeled as hosts instead. - - true if StandardTokenizer now returns these tokens as Hosts, otherwise false - - - Remove in 3.X and make true the only valid value - - - - - Set to true to replace mischaracterized acronyms as HOST. - - Remove in 3.X and make true the only valid value - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - - This analyzer is used to facilitate scenarios where different - fields require different analysis techniques. Use {@link #addAnalyzer} - to add a non-default analyzer on a field name basis. - -

    Example usage: - -

    -            PerFieldAnalyzerWrapper aWrapper =
    -            new PerFieldAnalyzerWrapper(new StandardAnalyzer());
    -            aWrapper.addAnalyzer("firstname", new KeywordAnalyzer());
    -            aWrapper.addAnalyzer("lastname", new KeywordAnalyzer());
    -            
    - -

    In this example, StandardAnalyzer will be used for all fields except "firstname" - and "lastname", for which KeywordAnalyzer will be used. - -

    A PerFieldAnalyzerWrapper can be used like any other analyzer, for both indexing - and query parsing. -

    -
    - - Constructs with default analyzer. - - - Any fields not specifically - defined to use a different analyzer will use the one provided here. - - - - Constructs with default analyzer and a map of analyzers to use for - specific fields. - - - Any fields not specifically - defined to use a different analyzer will use the one provided here. - - a Map (String field name to the Analyzer) to be - used for those fields - - - - Defines an analyzer to use for the specified field. - - - field name requiring a non-default analyzer - - non-default analyzer to use for field - - - - Return the positionIncrementGap from the analyzer assigned to fieldName - - - Replacement for Java 1.5 Character.valueOf() - Move to Character.valueOf() in 3.0 - - - - Returns a Character instance representing the given char value - - - a char value - - a Character representation of the given char value. - - - - A {@link LockFactory} that wraps another {@link - LockFactory} and verifies that each lock obtain/release - is "correct" (never results in two processes holding the - lock at the same time). It does this by contacting an - external server ({@link LockVerifyServer}) to assert that - at most one process holds the lock at a time. To use - this, you should also run {@link LockVerifyServer} on the - host & port matching what you pass to the constructor. - - - - - - - - - should be a unique id across all clients - - the LockFactory that we are testing - - host or IP where {@link LockVerifyServer} - is running - - the port {@link LockVerifyServer} is - listening on - - - - This exception is thrown when there is an attempt to - access something that has already been closed. - - - - Represents hits returned by {@link - * Searcher#search(Query,Filter,int)} and {@link - * Searcher#search(Query,int) - - - - The total number of hits for the query. - - - - - The top hits for the query. - - - Stores the maximum score value encountered, needed for normalizing. - - - Returns the maximum score value encountered. Note that in case - scores are not tracked, this returns {@link Float#NaN}. - - - - Sets the maximum score value encountered. - - - Constructs a TopDocs with a default maxScore=Float.NaN. - - - - - - A Filter that restricts search results to a range of values in a given - field. - -

    This filter matches the documents looking for terms that fall into the - supplied range according to {@link String#compareTo(String)}. It is not intended - for numerical ranges, use {@link NumericRangeFilter} instead. - -

    If you construct a large number of range filters with different ranges but on the - same field, {@link FieldCacheRangeFilter} may have significantly better performance. -

    - 2.9 - -
    - - The field this range applies to - - The lower bound on this range - - The upper bound on this range - - Does this range include the lower bound? - - Does this range include the upper bound? - - IllegalArgumentException if both terms are null or if - lowerTerm is null and includeLower is true (similar for upperTerm - and includeUpper) - - - - WARNING: Using this constructor and supplying a non-null - value in the collator parameter will cause every single - index Term in the Field referenced by lowerTerm and/or upperTerm to be - examined. Depending on the number of index Terms in this Field, the - operation could be very slow. - - - The lower bound on this range - - The upper bound on this range - - Does this range include the lower bound? - - Does this range include the upper bound? - - The collator to use when determining range inclusion; set - to null to use Unicode code point ordering instead of collation. - - IllegalArgumentException if both terms are null or if - lowerTerm is null and includeLower is true (similar for upperTerm - and includeUpper) - - - - Constructs a filter for field fieldName matching - less than or equal to upperTerm. - - - - Constructs a filter for field fieldName matching - greater than or equal to lowerTerm. - - - - Returns the field name for this filter - - - Returns the lower value of this range filter - - - Returns the upper value of this range filter - - - Returns true if the lower endpoint is inclusive - - - Returns true if the upper endpoint is inclusive - - - Returns the collator used to determine range inclusion, if any. - - - Abstract base class for sorting hits returned by a Query. - -

    - This class should only be used if the other SortField types (SCORE, DOC, - STRING, INT, FLOAT) do not provide an adequate sorting. It maintains an - internal cache of values which could be quite large. The cache is an array of - Comparable, one for each document in the index. There is a distinct - Comparable for each unique term in the field - if some documents have the - same term in the field, the cache array will have entries which reference the - same Comparable. - - This class will be used as part of a key to a FieldCache value. You must - implement hashCode and equals to avoid an explosion in RAM usage if you use - instances that are not the same instance. If you are searching using the - Remote contrib, the same instance of this class on the client will be a new - instance on every call to the server, so hashCode/equals is very important in - that situation. - -

    - Created: Apr 21, 2004 5:08:38 PM - - -

    - $Id: SortComparator.java 800119 2009-08-02 17:59:21Z markrmiller $ - - 1.4 - - Please use {@link FieldComparatorSource} instead. - -
    - - Returns an object which, when sorted according to natural order, - will order the Term values in the correct order. -

    For example, if the Terms contained integer values, this method - would return new Integer(termtext). Note that this - might not always be the most efficient implementation - for this - particular example, a better implementation might be to make a - ScoreDocLookupComparator that uses an internal lookup table of int. -

    - The textual value of the term. - - An object representing termtext that sorts according to the natural order of termtext. - - - - - -
    - - Wrapper used by {@link HitIterator} to provide a lazily loaded hit - from {@link Hits}. - - - Use {@link TopScoreDocCollector} and {@link TopDocs} instead. Hits will be removed in Lucene 3.0. - - - - Constructed from {@link HitIterator} - Hits returned from a search - - Hit index in Hits - - - - Returns document for this hit. - - - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Returns score for this hit. - - - - - - - Returns id for this hit. - - - - - - - Returns the boost factor for this hit on any field of the underlying document. - - - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Returns the string value of the field with the given name if any exist in - this document, or null. If multiple fields exist with this name, this - method returns the first value added. If only binary fields with this name - exist, returns null. - - - - - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error - - - Prints the parameters to be used to discover the promised result. - - - This is a DocFieldConsumer that writes stored fields. - - - Fills in any hole in the docIDs - - - An IndexReader which reads multiple indexes, appending their content. - - - $Id: MultiReader.java 782406 2009-06-07 16:31:18Z mikemccand $ - - - -

    Construct a MultiReader aggregating the named set of (sub)readers. - Directory locking for delete, undeleteAll, and setNorm operations is - left to the subreaders.

    -

    Note that all subreaders are closed if this Multireader is closed.

    -

    - set of (sub)readers - - IOException -
    - -

    Construct a MultiReader aggregating the named set of (sub)readers. - Directory locking for delete, undeleteAll, and setNorm operations is - left to the subreaders.

    -

    - indicates whether the subreaders should be closed - when this MultiReader is closed - - set of (sub)readers - - IOException -
    - - Tries to reopen the subreaders. -
    - If one or more subreaders could be re-opened (i. e. subReader.reopen() - returned a new instance != subReader), then a new MultiReader instance - is returned, otherwise this instance is returned. -

    - A re-opened instance might share one or more subreaders with the old - instance. Index modification operations result in undefined behavior - when performed before the old instance is closed. - (see {@link IndexReader#Reopen()}). -

    - If subreaders are shared, then the reference count of those - readers is increased to ensure that the subreaders remain open - until the last referring reader is closed. - -

    - CorruptIndexException if the index is corrupt - IOException if there is a low-level IO error -
    - - Clones the subreaders. - (see {@link IndexReader#clone()}). -
    -

    - If subreaders are shared, then the reference count of those - readers is increased to ensure that the subreaders remain open - until the last referring reader is closed. -

    -
    - - If clone is true then we clone each of the subreaders - - - New IndexReader, or same one (this) if - reopen/clone is not necessary - - CorruptIndexException - IOException - - - - - - - Checks recursively if all subreaders are up to date. - - - Not implemented. - UnsupportedOperationException - - - This {@link IndexDeletionPolicy} implementation that - keeps only the most recent commit and immediately removes - all prior commits after a new commit is done. This is - the default deletion policy. - - - - Deletes all commits except the most recent one. - - - Deletes all commits except the most recent one. - - - Adds a new term in this field - - - Called when we are done adding terms to this field - - - This is a DocFieldConsumer that inverts each field, - separately, from a Document, and accepts a - InvertedTermsConsumer to process those terms. - - - - This is a DocConsumer that gathers all fields under the - same name, and calls per-field consumers to process field - by field. This class doesn't doesn't do any "real" work - of its own: it just forwards the fields to a - DocFieldConsumer. - - - - The term text of a Token. - - - Returns the Token's term text. - - This method has a performance penalty - because the text is stored internally in a char[]. If - possible, use {@link #TermBuffer()} and {@link - #TermLength()} directly instead. If you really need a - String, use this method, which is nothing more than - a convenience call to new String(token.termBuffer(), 0, token.termLength()) - - - - Copies the contents of buffer, starting at offset for - length characters, into the termBuffer array. - - the buffer to copy - - the index in the buffer of the first character to copy - - the number of characters to copy - - - - Copies the contents of buffer into the termBuffer array. - the buffer to copy - - - - Copies the contents of buffer, starting at offset and continuing - for length characters, into the termBuffer array. - - the buffer to copy - - the index in the buffer of the first character to copy - - the number of characters to copy - - - - Returns the internal termBuffer character array which - you can then directly alter. If the array is too - small for your token, use {@link - #ResizeTermBuffer(int)} to increase it. After - altering the buffer be sure to call {@link - #setTermLength} to record the number of valid - characters that were placed into the termBuffer. - - - - Grows the termBuffer to at least size newSize, preserving the - existing content. Note: If the next operation is to change - the contents of the term buffer use - {@link #SetTermBuffer(char[], int, int)}, - {@link #SetTermBuffer(String)}, or - {@link #SetTermBuffer(String, int, int)} - to optimally combine the resize with the setting of the termBuffer. - - minimum size of the new termBuffer - - newly created termBuffer with length >= newSize - - - - Allocates a buffer char[] of at least newSize, without preserving the existing content. - its always used in places that set the content - - minimum size of the buffer - - - - Return number of valid characters (length of the term) - in the termBuffer array. - - - - Set number of valid characters (length of the term) in - the termBuffer array. Use this to truncate the termBuffer - or to synchronize with external manipulation of the termBuffer. - Note: to grow the size of the array, - use {@link #ResizeTermBuffer(int)} first. - - the truncated length - - - - Filters {@link StandardTokenizer} with {@link StandardFilter}, - {@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop - words. - - -

    - You must specify the required {@link Version} compatibility when creating - StandardAnalyzer: -

    - -
    - $Id: StandardAnalyzer.java 829134 2009-10-23 17:18:53Z mikemccand $ - -
    - - Default maximum allowed token length - - - Specifies whether deprecated acronyms should be replaced with HOST type. - This is false by default to support backward compatibility. - - - this should be removed in the next release (3.0). - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - - - true if new instances of StandardTokenizer will - replace mischaracterized acronyms - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - This will be removed (hardwired to true) in 3.0 - - - - - Set to true to have new - instances of StandardTokenizer replace mischaracterized - acronyms by default. Set to false to preserve the - previous (before 2.4) buggy behavior. Alternatively, - set the system property - Lucene.Net.Analysis.Standard.StandardAnalyzer.replaceInvalidAcronym - to false. - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - This will be removed (hardwired to true) in 3.0 - - - - An array containing some common English words that are usually not - useful for searching. - - Use {@link #STOP_WORDS_SET} instead - - - - An unmodifiable set containing some common English words that are usually not - useful for searching. - - - - Builds an analyzer with the default stop words ({@link - #STOP_WORDS_SET}). - - Use {@link #StandardAnalyzer(Version)} instead. - - - - Builds an analyzer with the default stop words ({@link - #STOP_WORDS}). - - Lucene version to match See {@link - above} - - - - Builds an analyzer with the given stop words. - Use {@link #StandardAnalyzer(Version, Set)} - instead - - - - Builds an analyzer with the given stop words. - Lucene version to match See {@link - above} - - stop words - - - - Builds an analyzer with the given stop words. - Use {@link #StandardAnalyzer(Version, Set)} instead - - - - Builds an analyzer with the stop words from the given file. - - - Use {@link #StandardAnalyzer(Version, File)} - instead - - - - Builds an analyzer with the stop words from the given file. - - - Lucene version to match See {@link - above} - - File to read stop words from - - - - Builds an analyzer with the stop words from the given reader. - - - Use {@link #StandardAnalyzer(Version, Reader)} - instead - - - - Builds an analyzer with the stop words from the given reader. - - - Lucene version to match See {@link - above} - - Reader to read stop words from - - - - - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - Remove in 3.X and make true the only valid value - - - - The stopwords to use - - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - Remove in 3.X and make true the only valid value - - - - The stopwords to use - - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - Remove in 3.X and make true the only valid value - - - - - The stopwords to use - - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - Remove in 3.X and make true the only valid value - - - - The stopwords to use - - Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - - Remove in 3.X and make true the only valid value - - - - Constructs a {@link StandardTokenizer} filtered by a {@link - StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. - - - - Set maximum allowed token length. If a token is seen - that exceeds this length then it is discarded. This - setting only takes effect the next time tokenStream or - reusableTokenStream is called. - - - - - - - - Use {@link #tokenStream} instead - - - - - true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - This will be removed (hardwired to true) in 3.0 - - - - - Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer - - See https://issues.apache.org/jira/browse/LUCENE-1068 - - This will be removed (hardwired to true) in 3.0 - - - - "Tokenizes" the entire stream as a single token. This is useful - for data like zip codes, ids, and some product names. - - - - Borrowed from Cglib. Allows custom swap so that two arrays can be sorted - at the same time. - - - - Floating point numbers smaller than 32 bits. - - - $Id$ - - - - Converts a 32 bit float to an 8 bit float. -
    Values less than zero are all mapped to zero. -
    Values are truncated (rounded down) to the nearest 8 bit value. -
    Values between zero and the smallest representable value - are rounded up. - -
    - the 32 bit float to be converted to an 8 bit float (byte) - - the number of mantissa bits to use in the byte, with the remainder to be used in the exponent - - the zero-point in the range of exponent values - - the 8 bit float representation - -
    - - Converts an 8 bit float to a 32 bit float. - - - floatToByte(b, mantissaBits=3, zeroExponent=15) -
    smallest non-zero value = 5.820766E-10 -
    largest value = 7.5161928E9 -
    epsilon = 0.125 -
    -
    - - byteToFloat(b, mantissaBits=3, zeroExponent=15) - - - floatToByte(b, mantissaBits=5, zeroExponent=2) -
    smallest nonzero value = 0.033203125 -
    largest value = 1984.0 -
    epsilon = 0.03125 -
    -
    - - byteToFloat(b, mantissaBits=5, zeroExponent=2) - - - Estimates the size of a given Object using a given MemoryModel for primitive - size information. - - Resource Usage: - - Internally uses a Map to temporally hold a reference to every - object seen. - - If checkIntered, all Strings checked will be interned, but those - that were not already interned will be released for GC when the - estimate is complete. - - - - Constructs this object with an AverageGuessMemoryModel and - checkInterned = true. - - - - check if Strings are interned and don't add to size - if they are. Defaults to true but if you know the objects you are checking - won't likely contain many interned Strings, it will be faster to turn off - intern checking. - - - - MemoryModel to use for primitive object sizes. - - - - MemoryModel to use for primitive object sizes. - - check if Strings are interned and don't add to size - if they are. Defaults to true but if you know the objects you are checking - won't likely contain many interned Strings, it will be faster to turn off - intern checking. - - - - Return good default units based on byte size. - - - Implements the wildcard search query. Supported wildcards are *, which - matches any character sequence (including the empty one), and ?, - which matches any single character. Note this query can be slow, as it - needs to iterate over many terms. In order to prevent extremely slow WildcardQueries, - a Wildcard term should not start with one of the wildcards * or - ?. - -

    This query uses the {@link - MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} - rewrite method. - -

    - - -
    - - Returns the pattern term. - - - Prints a user-readable version of this query. - - -

    - The TimeLimitedCollector is used to timeout search requests that take longer - than the maximum allowed search time limit. After this time is exceeded, the - search thread is stopped by throwing a TimeExceeded Exception. -

    - -

    - Use {@link TimeLimitingCollector} instead, which extends the new - {@link Collector}. This class will be removed in 3.0. - -
    - - Default timer resolution. - - - - - Default for {@link #IsGreedy()}. - - - - - Create a TimeLimitedCollector wrapper over another HitCollector with a specified timeout. - the wrapped HitCollector - - max time allowed for collecting hits after which {@link TimeExceededException} is thrown - - - - Calls collect() on the decorated HitCollector. - - - TimeExceededException if the time allowed has been exceeded. - - - Return the timer resolution. - - - - - Set the timer resolution. - The default timer resolution is 20 milliseconds. - This means that a search required to take no longer than - 800 milliseconds may be stopped after 780 to 820 milliseconds. -
    Note that: -
      -
    • Finer (smaller) resolution is more accurate but less efficient.
    • -
    • Setting resolution to less than 5 milliseconds will be silently modified to 5 milliseconds.
    • -
    • Setting resolution smaller than current resolution might take effect only after current - resolution. (Assume current resolution of 20 milliseconds is modified to 5 milliseconds, - then it can take up to 20 milliseconds for the change to have effect.
    • -
    -
    -
    - - Checks if this time limited collector is greedy in collecting the last hit. - A non greedy collector, upon a timeout, would throw a {@link TimeExceededException} - without allowing the wrapped collector to collect current doc. A greedy one would - first allow the wrapped hit collector to collect current doc and only then - throw a {@link TimeExceededException}. - - - - - - Sets whether this time limited collector is greedy. - true to make this time limited greedy - - - - - - TimerThread provides a pseudo-clock service to all searching - threads, so that they can count elapsed time with less overhead - than repeatedly calling System.currentTimeMillis. A single - thread should be created to be used for all searches. - - - - Get the timer value in milliseconds. - - - Thrown when elapsed search time exceeds allowed search time. - - - Returns allowed time (milliseconds). - - - Returns elapsed time (milliseconds). - - - Returns last doc that was collected when the search time exceeded. - - - A {@link Scorer} which wraps another scorer and caches the score of the - current document. Successive calls to {@link #Score()} will return the same - result and will not invoke the wrapped Scorer's score() method, unless the - current document has changed.
    - This class might be useful due to the changes done to the {@link Collector} - interface, in which the score is not computed for a document by default, only - if the collector requests it. Some collectors may need to use the score in - several places, however all they have in hand is a {@link Scorer} object, and - might end up computing the score of a document more than once. -
    -
    - - Creates a new instance by wrapping the given scorer. - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - Expert: obtains the ordinal of the field value from the default Lucene - {@link Lucene.Net.Search.FieldCache FieldCache} using getStringIndex() - and reverses the order. -

    - The native lucene index order is used to assign an ordinal value for each field value. -

    - Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. -
    - Example of reverse ordinal (rord): -
    If there were only three field values: "apple","banana","pear" -
    then rord("apple")=3, rord("banana")=2, ord("pear")=1 -

    - WARNING: - rord() depends on the position in an index and can thus change - when other documents are inserted or deleted, - or if a MultiSearcher is used. - -

    - WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

    NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

    -

    -
    - - Contructor for a certain field. - field whose values reverse order is used. - - - - Expert: obtains int field values from the - {@link Lucene.Net.Search.FieldCache FieldCache} - using getInts() and makes those values - available as other numeric types, casting as needed. - -

    - WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

    - for requirements - on the field. - -

    NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

    - - - -

    Create a cached int field source with default string-to-int parser. -
    - - Create a cached int field source with a specific string-to-int parser. - - - A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum - score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries. - This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be - combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, - not the sum of the field scores (as BooleanQuery would give). - If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching - another gets a higher score than "albino" matching both fields. - To get this result, use both BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in - each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery. - The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that - include this term in only the best of those multiple fields, without confusing this with the better case of two different terms - in the multiple fields. - - - - Creates a new empty DisjunctionMaxQuery. Use add() to add the subqueries. - the score of each non-maximum disjunct for a document is multiplied by this weight - and added into the final score. If non-zero, the value should be small, on the order of 0.1, which says that - 10 occurrences of word in a lower-scored field that is also in a higher scored field is just as good as a unique - word in the lower scored field (i.e., one that is not in any higher scored field. - - - - Creates a new DisjunctionMaxQuery - a Collection<Query> of all the disjuncts to add - - the weight to give to each matching non-maximum disjunct - - - - Add a subquery to this disjunction - the disjunct added - - - - Add a collection of disjuncts to this disjunction - via Iterable - - - - An Iterator<Query> over the disjuncts - - - Optimize our representation and our subqueries representations - the IndexReader we query - - an optimized copy of us (which may not be a copy if there is nothing to optimize) - - - - Create a shallow copy of us -- used in rewriting if necessary - a copy of us (but reuse, don't copy, our subqueries) - - - - Prettyprint us. - the field to which we are applied - - a string that shows what we do, of the form "(disjunct1 | disjunct2 | ... | disjunctn)^boost" - - - - Return true iff we represent the same query as o - another object - - true iff o is a DisjunctionMaxQuery with the same boost and the same subqueries, in the same order, as us - - - - Compute a hash code for hashing us - the hash code - - - - Expert: the Weight for DisjunctionMaxQuery, used to - normalize, score and explain these queries. - -

    NOTE: this API and implementation is subject to - change suddenly in the next release.

    -

    -
    - - The Similarity implementation. - - - The Weights for our subqueries, in 1-1 correspondence with disjuncts - - - An alternative to BooleanScorer that also allows a minimum number - of optional scorers that should match. -
    Implements skipTo(), and has no limitations on the numbers of added scorers. -
    Uses ConjunctionScorer, DisjunctionScorer, ReqOptScorer and ReqExclScorer. -
    -
    - - The scorer to which all scoring will be delegated, - except for computing and using the coordination factor. - - - - The number of optionalScorers that need to match (if there are any) - - - Creates a {@link Scorer} with the given similarity and lists of required, - prohibited and optional scorers. In no required scorers are added, at least - one of the optional scorers will have to match during the search. - - - The similarity to be used. - - The minimum number of optional added scorers that should match - during the search. In case no required scorers are added, at least - one of the optional scorers will have to match during the search. - - the list of required scorers. - - the list of prohibited scorers. - - the list of optional scorers. - - - - Returns the scorer to be used for match counting and score summing. - Uses requiredScorers, optionalScorers and prohibitedScorers. - - - - Returns the scorer to be used for match counting and score summing. - Uses the given required scorer and the prohibitedScorers. - - A required scorer already built. - - - - Scores and collects all matching documents. - The collector to which all matching documents are passed through - {@link HitCollector#Collect(int, float)}. -
    When this method is used the {@link #Explain(int)} method should not be used. - - use {@link #Score(Collector)} instead. - -
    - - Scores and collects all matching documents. - The collector to which all matching documents are passed through. -
    When this method is used the {@link #Explain(int)} method should not be used. - -
    - - Expert: Collects matching documents in a range. -
    Note that {@link #Next()} must be called once before this method is - called for the first time. -
    - The collector to which all matching documents are passed through - {@link HitCollector#Collect(int, float)}. - - Do not score documents past this. - - true if more matching documents may remain. - - use {@link #Score(Collector, int, int)} instead. - -
    - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - Throws an UnsupportedOperationException. - TODO: Implement an explanation of the coordination factor. - - The document number for the explanation. - - UnsupportedOperationException - - - Count a scorer as a single match. - - - use {@link #DocID()} instead. - - - - use {@link #NextDoc()} instead. - - - - use {@link #Advance(int)} instead. - - - - Interface that exceptions should implement to support lazy loading of messages. - - For Native Language Support (NLS), system of software internationalization. - - This Interface should be implemented by all exceptions that require - translation - - - - - a instance of a class that implements the Message interface - - - - Adds a new doc in this term. If this returns null - then we just skip consuming positions/payloads. - - - - Called when we are done adding docs to this term - - - This TokenFilter provides the ability to set aside attribute states - that have already been analyzed. This is useful in situations where multiple fields share - many common analysis steps and then go their separate ways. -

    - It is also useful for doing things like entity extraction or proper noun analysis as - part of the analysis workflow and saving off those tokens for use in another field. - -

    -            TeeSinkTokenFilter source1 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader1));
    -            TeeSinkTokenFilter.SinkTokenStream sink1 = source1.newSinkTokenStream();
    -            TeeSinkTokenFilter.SinkTokenStream sink2 = source1.newSinkTokenStream();
    -            TeeSinkTokenFilter source2 = new TeeSinkTokenFilter(new WhitespaceTokenizer(reader2));
    -            source2.addSinkTokenStream(sink1);
    -            source2.addSinkTokenStream(sink2);
    -            TokenStream final1 = new LowerCaseFilter(source1);
    -            TokenStream final2 = source2;
    -            TokenStream final3 = new EntityDetect(sink1);
    -            TokenStream final4 = new URLDetect(sink2);
    -            d.add(new Field("f1", final1));
    -            d.add(new Field("f2", final2));
    -            d.add(new Field("f3", final3));
    -            d.add(new Field("f4", final4));
    -            
    - In this example, sink1 and sink2 will both get tokens from both - reader1 and reader2 after whitespace tokenizer - and now we can further wrap any of these in extra analysis, and more "sources" can be inserted if desired. - It is important, that tees are consumed before sinks (in the above example, the field names must be - less the sink's field names). If you are not sure, which stream is consumed first, you can simply - add another sink and then pass all tokens to the sinks at once using {@link #consumeAllTokens}. - This TokenFilter is exhausted after this. In the above example, change - the example above to: -
    -            ...
    -            TokenStream final1 = new LowerCaseFilter(source1.newSinkTokenStream());
    -            TokenStream final2 = source2.newSinkTokenStream();
    -            sink1.consumeAllTokens();
    -            sink2.consumeAllTokens();
    -            ...
    -            
    - In this case, the fields can be added in any order, because the sources are not used anymore and all sinks are ready. -

    Note, the EntityDetect and URLDetect TokenStreams are for the example and do not currently exist in Lucene. -

    -
    - - Instantiates a new TeeSinkTokenFilter. - - - Returns a new {@link SinkTokenStream} that receives all tokens consumed by this stream. - - - Returns a new {@link SinkTokenStream} that receives all tokens consumed by this stream - that pass the supplied filter. - - - - - - Adds a {@link SinkTokenStream} created by another TeeSinkTokenFilter - to this one. The supplied stream will also receive all consumed tokens. - This method can be used to pass tokens from two different tees to one sink. - - - - TeeSinkTokenFilter passes all tokens to the added sinks - when itself is consumed. To be sure, that all tokens from the input - stream are passed to the sinks, you can call this methods. - This instance is exhausted after this, but all sinks are instant available. - - - - A filter that decides which {@link AttributeSource} states to store in the sink. - - - Returns true, iff the current state of the passed-in {@link AttributeSource} shall be stored - in the sink. - - - - Called by {@link SinkTokenStream#Reset()}. This method does nothing by default - and can optionally be overridden. - - - - - Stemmer, implementing the Porter Stemming Algorithm - - The Stemmer class transforms a word into its root form. The input - word can be provided a character at time (by calling add()), or at once - by calling one of the various stem(something) methods. - - - - reset() resets the stemmer so it can stem another word. If you invoke - the stemmer by calling add(char) and then stem(), you must call reset() - before starting another word. - - - - Add a character to the word being stemmed. When you are finished - adding characters, you can call stem(void) to process the word. - - - - After a word has been stemmed, it can be retrieved by toString(), - or a reference to the internal buffer can be retrieved by getResultBuffer - and getResultLength (which is generally more efficient.) - - - - Returns the length of the word resulting from the stemming process. - - - Returns a reference to a character buffer containing the results of - the stemming process. You also need to consult getResultLength() - to determine the length of the result. - - - - Stem a word provided as a String. Returns the result as a String. - - - Stem a word contained in a char[]. Returns true if the stemming process - resulted in a word different from the input. You can retrieve the - result with getResultLength()/getResultBuffer() or toString(). - - - - Stem a word contained in a portion of a char[] array. Returns - true if the stemming process resulted in a word different from - the input. You can retrieve the result with - getResultLength()/getResultBuffer() or toString(). - - - - Stem a word contained in a leading portion of a char[] array. - Returns true if the stemming process resulted in a word different - from the input. You can retrieve the result with - getResultLength()/getResultBuffer() or toString(). - - - - Stem the word placed into the Stemmer buffer through calls to add(). - Returns true if the stemming process resulted in a word different - from the input. You can retrieve the result with - getResultLength()/getResultBuffer() or toString(). - - - - Test program for demonstrating the Stemmer. It reads a file and - stems each word, writing the result to standard out. - Usage: Stemmer file-name - - - - Normalizes token text to lower case. - - - $Id: LowerCaseFilter.java 797665 2009-07-24 21:45:48Z buschmi $ - - - -

    Implements {@link LockFactory} using native OS file - locks. Note that because this LockFactory relies on - java.nio.* APIs for locking, any problems with those APIs - will cause locking to fail. Specifically, on certain NFS - environments the java.nio.* locks will fail (the lock can - incorrectly be double acquired) whereas {@link - SimpleFSLockFactory} worked perfectly in those same - environments. For NFS based access to an index, it's - recommended that you try {@link SimpleFSLockFactory} - first and work around the one limitation that a lock file - could be left when the JVM exits abnormally.

    - -

    The primary benefit of {@link NativeFSLockFactory} is - that lock files will be properly removed (by the OS) if - the JVM has an abnormal exit.

    - -

    Note that, unlike {@link SimpleFSLockFactory}, the existence of - leftover lock files in the filesystem on exiting the JVM - is fine because the OS will free the locks held against - these files even though the files still remain.

    - -

    If you suspect that this or any other LockFactory is - not working properly in your environment, you can easily - test it by using {@link VerifyingLockFactory}, {@link - LockVerifyServer} and {@link LockStressTest}.

    - -

    - - -
    - - Create a NativeFSLockFactory instance, with null (unset) - lock directory. When you pass this factory to a {@link FSDirectory} - subclass, the lock directory is automatically set to the - directory itsself. Be sure to create one instance for each directory - your create! - - - - Create a NativeFSLockFactory instance, storing lock - files into the specified lockDirName: - - - where lock files are created. - - - - Create a NativeFSLockFactory instance, storing lock - files into the specified lockDir: - - - where lock files are created. - - - - Create a NativeFSLockFactory instance, storing lock - files into the specified lockDir: - - - where lock files are created. - - - - This exception is thrown when the write.lock - could not be released. - - - - - - - Represents hits returned by {@link Searcher#search(Query,Filter,int,Sort)}. - - - - The fields which were used to sort results by. - - - Creates one of these objects. - Total number of hits for the query. - - The top hits for the query. - - The sort criteria used to find the top hits. - - The maximum score encountered. - - - - Subclass of FilteredTermEnum for enumerating all terms that match the - specified range parameters. -

    - Term enumerations are always ordered by Term.compareTo(). Each term in - the enumeration is greater than all that precede it. -

    - 2.9 - -
    - - Enumerates all terms greater/equal than lowerTerm - but less/equal than upperTerm. - - If an endpoint is null, it is said to be "open". Either or both - endpoints may be open. Open endpoints may not be exclusive - (you can't select all but the first or last term without - explicitly specifying the term to exclude.) - - - - - An interned field that holds both lower and upper terms. - - The term text at the lower end of the range - - The term text at the upper end of the range - - If true, the lowerTerm is included in the range. - - If true, the upperTerm is included in the range. - - The collator to use to collate index Terms, to determine their - membership in the range bounded by lowerTerm and - upperTerm. - - - IOException - - - Matches the union of its clauses. - - - Construct a SpanOrQuery merging the provided clauses. - - - Return the clauses whose spans are matched. - - - Returns a collection of all terms matched by this query. - use extractTerms instead - - - - - - A Filter that restricts search results to a range of values in a given - field. - -

    This filter matches the documents looking for terms that fall into the - supplied range according to {@link String#compareTo(String)}. It is not intended - for numerical ranges, use {@link NumericRangeFilter} instead. - -

    If you construct a large number of range filters with different ranges but on the - same field, {@link FieldCacheRangeFilter} may have significantly better performance. - -

    - Use {@link TermRangeFilter} for term ranges or - {@link NumericRangeFilter} for numeric ranges instead. - This class will be removed in Lucene 3.0. - -
    - - The field this range applies to - - The lower bound on this range - - The upper bound on this range - - Does this range include the lower bound? - - Does this range include the upper bound? - - IllegalArgumentException if both terms are null or if - lowerTerm is null and includeLower is true (similar for upperTerm - and includeUpper) - - - - WARNING: Using this constructor and supplying a non-null - value in the collator parameter will cause every single - index Term in the Field referenced by lowerTerm and/or upperTerm to be - examined. Depending on the number of index Terms in this Field, the - operation could be very slow. - - - The lower bound on this range - - The upper bound on this range - - Does this range include the lower bound? - - Does this range include the upper bound? - - The collator to use when determining range inclusion; set - to null to use Unicode code point ordering instead of collation. - - IllegalArgumentException if both terms are null or if - lowerTerm is null and includeLower is true (similar for upperTerm - and includeUpper) - - - - Constructs a filter for field fieldName matching - less than or equal to upperTerm. - - - - Constructs a filter for field fieldName matching - greater than or equal to lowerTerm. - - - - - - - - - - - The original list of terms from the query, can contain duplicates - - - - Returns the maximum payload score seen, else 1 if there are no payloads on the doc. -

    - Is thread safe and completely reusable. - - -

    -
    - -

    A {@link Query} that matches numeric values within a - specified range. To use this, you must first index the - numeric values using {@link NumericField} (expert: {@link - NumericTokenStream}). If your terms are instead textual, - you should use {@link TermRangeQuery}. {@link - NumericRangeFilter} is the filter equivalent of this - query.

    - -

    You create a new NumericRangeQuery with the static - factory methods, eg: - -

    -            Query q = NumericRangeQuery.newFloatRange("weight",
    -            new Float(0.3f), new Float(0.10f),
    -            true, true);
    -            
    - - matches all documents whose float valued "weight" field - ranges from 0.3 to 0.10, inclusive. - -

    The performance of NumericRangeQuery is much better - than the corresponding {@link TermRangeQuery} because the - number of terms that must be searched is usually far - fewer, thanks to trie indexing, described below.

    - -

    You can optionally specify a precisionStep - when creating this query. This is necessary if you've - changed this configuration from its default (4) during - indexing. Lower values consume more disk space but speed - up searching. Suitable values are between 1 and - 8. A good starting point to test is 4, - which is the default value for all Numeric* - classes. See below for - details. - -

    This query defaults to {@linkplain - MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} for - 32 bit (int/float) ranges with precisionStep <8 and 64 - bit (long/double) ranges with precisionStep <6. - Otherwise it uses {@linkplain - MultiTermQuery#CONSTANT_SCORE_FILTER_REWRITE} as the - number of terms is likely to be high. With precision - steps of <4, this query can be run with one of the - BooleanQuery rewrite methods without changing - BooleanQuery's default max clause count. - -

    NOTE: This API is experimental and - might change in incompatible ways in the next release. - -

    How it works

    - -

    See the publication about panFMP, - where this algorithm was described (referred to as TrieRangeQuery): - -

    Schindler, U, Diepenbroek, M, 2008. - Generic XML-based Framework for Metadata Portals. - Computers & Geosciences 34 (12), 1947-1955. - doi:10.1016/j.cageo.2008.02.023
    - -

    A quote from this paper: Because Apache Lucene is a full-text - search engine and not a conventional database, it cannot handle numerical ranges - (e.g., field value is inside user defined bounds, even dates are numerical values). - We have developed an extension to Apache Lucene that stores - the numerical values in a special string-encoded format with variable precision - (all numerical values like doubles, longs, floats, and ints are converted to - lexicographic sortable string representations and stored with different precisions - (for a more detailed description of how the values are stored, - see {@link NumericUtils}). A range is then divided recursively into multiple intervals for searching: - The center of the range is searched only with the lowest possible precision in the trie, - while the boundaries are matched more exactly. This reduces the number of terms dramatically.

    - -

    For the variant that stores long values in 8 different precisions (each reduced by 8 bits) that - uses a lowest precision of 1 byte, the index contains only a maximum of 256 distinct values in the - lowest precision. Overall, a range could consist of a theoretical maximum of - 7*255*2 + 255 = 3825 distinct terms (when there is a term for every distinct value of an - 8-byte-number in the index and the range covers almost all of them; a maximum of 255 distinct values is used - because it would always be possible to reduce the full 256 values to one term with degraded precision). - In practice, we have seen up to 300 terms in most cases (index with 500,000 metadata records - and a uniform value distribution).

    - -

    Precision Step

    -

    You can choose any precisionStep when encoding values. - Lower step values mean more precisions and so more terms in index (and index gets larger). - On the other hand, the maximum number of terms to match reduces, which optimized query speed. - The formula to calculate the maximum term count is: -

    -            n = [ (bitsPerValue/precisionStep - 1) * (2^precisionStep - 1 ) * 2 ] + (2^precisionStep - 1 )
    -            
    -

    (this formula is only correct, when bitsPerValue/precisionStep is an integer; - in other cases, the value must be rounded up and the last summand must contain the modulo of the division as - precision step). - For longs stored using a precision step of 4, n = 15*15*2 + 15 = 465, and for a precision - step of 2, n = 31*3*2 + 3 = 189. But the faster search speed is reduced by more seeking - in the term enum of the index. Because of this, the ideal precisionStep value can only - be found out by testing. Important: You can index with a lower precision step value and test search speed - using a multiple of the original step value.

    - -

    Good values for precisionStep are depending on usage and data type: -

      -
    • The default for all data types is 4, which is used, when no precisionStep is given.
    • -
    • Ideal value in most cases for 64 bit data types (long, double) is 6 or 8.
    • -
    • Ideal value in most cases for 32 bit data types (int, float) is 4.
    • -
    • Steps >64 for long/double and >32 for int/float produces one token - per value in the index and querying is as slow as a conventional {@link TermRangeQuery}. But it can be used - to produce fields, that are solely used for sorting (in this case simply use {@link Integer#MAX_VALUE} as - precisionStep). Using {@link NumericField NumericFields} for sorting - is ideal, because building the field cache is much faster than with text-only numbers. - Sorting is also possible with range query optimized fields using one of the above precisionSteps.
    • -
    - -

    Comparisons of the different types of RangeQueries on an index with about 500,000 docs showed - that {@link TermRangeQuery} in boolean rewrite mode (with raised {@link BooleanQuery} clause count) - took about 30-40 secs to complete, {@link TermRangeQuery} in constant score filter rewrite mode took 5 secs - and executing this class took <100ms to complete (on an Opteron64 machine, Java 1.5, 8 bit - precision step). This query type was developed for a geographic portal, where the performance for - e.g. bounding boxes or exact date/time stamps is important.

    - -

    - 2.9 - - -
    - - Factory that creates a NumericRangeQuery, that queries a long - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - -
    - - Factory that creates a NumericRangeQuery, that queries a long - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a int - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a int - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a double - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a double - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a float - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeQuery, that queries a float - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Returns the field name for this query - - - Returns true if the lower endpoint is inclusive - - - Returns true if the upper endpoint is inclusive - - - Returns the lower value of this range query - - - Returns the upper value of this range query - - - - Lucene.Net specific. Needed for Serialization - - - - - - - Lucene.Net specific. Needed for deserialization - - - - - - Subclass of FilteredTermEnum for enumerating all terms that match the - sub-ranges for trie range queries. -

    - WARNING: This term enumeration is not guaranteed to be always ordered by - {@link Term#compareTo}. - The ordering depends on how {@link NumericUtils#splitLongRange} and - {@link NumericUtils#splitIntRange} generates the sub-ranges. For - {@link MultiTermQuery} ordering is not relevant. -

    -
    - - this is a dummy, it is not used by this class. - - - Compares if current upper bound is reached, - this also updates the term count for statistics. - In contrast to {@link FilteredTermEnum}, a return value - of false ends iterating the current enum - and forwards to the next sub-range. - - - - Increments the enumeration to the next element. True if one exists. - - - Closes the enumeration to further activity, freeing resources. - - - A {@link Filter} that only accepts numeric values within - a specified range. To use this, you must first index the - numeric values using {@link NumericField} (expert: {@link - NumericTokenStream}). - -

    You create a new NumericRangeFilter with the static - factory methods, eg: - -

    -            Filter f = NumericRangeFilter.newFloatRange("weight",
    -            new Float(0.3f), new Float(0.10f),
    -            true, true);
    -            
    - - accepts all documents whose float valued "weight" field - ranges from 0.3 to 0.10, inclusive. - See {@link NumericRangeQuery} for details on how Lucene - indexes and searches numeric valued fields. - -

    NOTE: This API is experimental and - might change in incompatible ways in the next - release. - -

    - 2.9 - - -
    - - Factory that creates a NumericRangeFilter, that filters a long - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that queries a long - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that filters a int - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that queries a int - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that filters a double - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that queries a double - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that filters a float - range using the given precisionStep. - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Factory that creates a NumericRangeFilter, that queries a float - range using the default precisionStep {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). - You can have half-open ranges (which are in fact </≤ or >/≥ queries) - by setting the min or max value to null. By setting inclusive to false, it will - match all documents excluding the bounds, with inclusive on, the boundaries are hits, too. - - - - Returns the field name for this filter - - - Returns true if the lower endpoint is inclusive - - - Returns true if the upper endpoint is inclusive - - - Returns the lower value of this range filter - - - Returns the upper value of this range filter - - - Expert: obtains the ordinal of the field value from the default Lucene - {@link Lucene.Net.Search.FieldCache Fieldcache} using getStringIndex(). -

    - The native lucene index order is used to assign an ordinal value for each field value. -

    - Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1. -

    - Example: -
    If there were only three field values: "apple","banana","pear" -
    then ord("apple")=1, ord("banana")=2, ord("pear")=3 -

    - WARNING: - ord() depends on the position in an index and can thus change - when other documents are inserted or deleted, - or if a MultiSearcher is used. - -

    - WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. - -

    NOTE: with the switch in 2.9 to segment-based - searching, if {@link #getValues} is invoked with a - composite (multi-segment) reader, this can easily cause - double RAM usage for the values in the FieldCache. It's - best to switch your application to pass only atomic - (single segment) readers to this API. Alternatively, for - a short-term fix, you could wrap your ValueSource using - {@link MultiValueSource}, which costs more CPU per lookup - but will not consume double the FieldCache RAM.

    -

    -
    - - Constructor for a certain field. - field whose values order is used. - - - - Query that sets document score as a programmatic function of several (sub) scores: -
      -
    1. the score of its subQuery (any query)
    2. -
    3. (optional) the score of its ValueSourceQuery (or queries). - For most simple/convenient use cases this query is likely to be a - {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQuery}
    4. -
    - Subclasses can modify the computation by overriding {@link #getCustomScoreProvider}. - -

    - WARNING: The status of the Search.Function package is experimental. - The APIs introduced here might change in the future and will not be - supported anymore in such a case. -

    -
    - - Create a CustomScoreQuery over input subQuery. - the sub query whose scored is being customed. Must not be null. - - - - Create a CustomScoreQuery over input subQuery and a {@link ValueSourceQuery}. - the sub query whose score is being customed. Must not be null. - - a value source query whose scores are used in the custom score - computation. For most simple/convineient use case this would be a - {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQuery}. - This parameter is optional - it can be null or even an empty array. - - - - Create a CustomScoreQuery over input subQuery and a {@link ValueSourceQuery}. - the sub query whose score is being customized. Must not be null. - - value source queries whose scores are used in the custom score - computation. For most simple/convenient use case these would be - {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQueries}. - This parameter is optional - it can be null or even an empty array. - - - - Returns true if o is equal to this. - - - Returns a hash code value for this object. - - - Returns a {@link CustomScoreProvider} that calculates the custom scores - for the given {@link IndexReader}. The default implementation returns a default - implementation as specified in the docs of {@link CustomScoreProvider}. - @since 2.9.2 - - - - Compute a custom score by the subQuery score and a number of - ValueSourceQuery scores. - - The doc is relative to the current reader, which is - unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). - Please override {@link #getCustomScoreProvider} and return a subclass - of {@link CustomScoreProvider} for the given {@link IndexReader}. - see CustomScoreProvider#customScore(int,float,float[]) - - - - Compute a custom score by the subQuery score and the ValueSourceQuery score. - - The doc is relative to the current reader, which is - unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). - Please override {@link #getCustomScoreProvider} and return a subclass - of {@link CustomScoreProvider} for the given {@link IndexReader}. - @see CustomScoreProvider#customScore(int,float,float) - - - - Explain the custom score. - - The doc is relative to the current reader, which is - unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). - Please override {@link #getCustomScoreProvider} and return a subclass - of {@link CustomScoreProvider} for the given {@link IndexReader}. - - - - Explain the custom score. - The doc is relative to the current reader, which is - unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). - Please override {@link #getCustomScoreProvider} and return a subclass - of {@link CustomScoreProvider} for the given {@link IndexReader}. - - - - Checks if this is strict custom scoring. - In strict custom scoring, the ValueSource part does not participate in weight normalization. - This may be useful when one wants full control over how scores are modified, and does - not care about normalizing by the ValueSource part. - One particular case where this is useful if for testing this query. -

    - Note: only has effect when the ValueSource part is not null. -

    -
    - - Set the strict mode of this query. - The strict mode to set. - - - - - - A short name of this query, used in {@link #ToString(String)}. - - - A scorer that applies a (callback) function on scores of the subQuery. - - - use {@link #NextDoc()} instead. - - - - use {@link #DocID()} instead. - - - - use {@link #Advance(int)} instead. - - - - A range query that returns a constant score equal to its boost for - all documents in the exclusive range of terms. - -

    It does not have an upper bound on the number of clauses covered in the range. - -

    This query matches the documents looking for terms that fall into the - supplied range according to {@link String#compareTo(String)}. It is not intended - for numerical ranges, use {@link NumericRangeQuery} instead. - -

    This query is hardwired to {@link MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT}. - If you want to change this, use {@link TermRangeQuery} instead. - -

    - Use {@link TermRangeQuery} for term ranges or - {@link NumericRangeQuery} for numeric ranges instead. - This class will be removed in Lucene 3.0. - - $Id: ConstantScoreRangeQuery.java 797694 2009-07-25 00:03:33Z mikemccand $ - -
    - - Changes of mode are not supported by this class (fixed to constant score rewrite mode) - - - Called once per field per document if term vectors - are enabled, to write the vectors to - RAMOutputStream, which is then quickly flushed to - * the real term vectors files in the Directory. - - - - A Payload is metadata that can be stored together with each occurrence - of a term. This metadata is stored inline in the posting list of the - specific term. -

    - To store payloads in the index a {@link TokenStream} has to be used that - produces payload data. -

    - Use {@link TermPositions#GetPayloadLength()} and {@link TermPositions#GetPayload(byte[], int)} - to retrieve the payloads from the index.
    - -

    -
    - - the byte array containing the payload data - - - the offset within the byte array - - - the length of the payload data - - - Creates an empty payload and does not allocate a byte array. - - - Creates a new payload with the the given array as data. - A reference to the passed-in array is held, i. e. no - copy is made. - - - the data of this payload - - - - Creates a new payload with the the given array as data. - A reference to the passed-in array is held, i. e. no - copy is made. - - - the data of this payload - - the offset in the data byte array - - the length of the data - - - - Sets this payloads data. - A reference to the passed-in array is held, i. e. no - copy is made. - - - - Sets this payloads data. - A reference to the passed-in array is held, i. e. no - copy is made. - - - - Returns a reference to the underlying byte array - that holds this payloads data. - - - - Returns the offset in the underlying byte array - - - Returns the length of the payload data. - - - Returns the byte at the given index. - - - Allocates a new byte array, copies the payload data into it and returns it. - - - Copies the payload data to a byte array. - - - the target byte array - - the offset in the target byte array - - - - Clones this payload by creating a copy of the underlying - byte array. - - - - This is just a "splitter" class: it lets you wrap two - DocFieldConsumer instances as a single consumer. - + + Sets this Token's payload. A Token is an occurrence of a term from the text of a field. It consists of @@ -26630,6 +20785,60 @@ + + A Token's lexical type. The Default value is "word". + + + Returns this Token's lexical type. Defaults to "word". + + + Set the lexical type. + + + + + This attribute can be used to pass different flags down the {@link Tokenizer} chain, + eg from one TokenFilter to another one. + + + + EXPERIMENTAL: While we think this is here to stay, we may want to change it to be a long. +

    + + Get the bitset for any bits that have been set. This is completely distinct from {@link TypeAttribute#Type()}, although they do share similar purposes. + The flags can be used to encode information about the token for use by other {@link Lucene.Net.Analysis.TokenFilter}s. + + +

    + The bits + +
    + + + + + + The start and end character offset of a Token. + + + Returns this Token's starting offset, the position of the first character + corresponding to this token in the source text. + Note that the difference between endOffset() and startOffset() may not be + equal to termText.length(), as the term text may have been altered by a + stemmer or some other filter. + + + + Set the starting and ending offset. + See StartOffset() and EndOffset() + + + + Returns this Token's ending offset, one greater than the position of the + last character corresponding to this token in the source text. The length + of the token in the source text is (endOffset - startOffset). + + We will remove this when we remove the deprecated APIs @@ -27067,6 +21276,5058 @@ + + Provides support for converting byte sequences to Strings and back again. + The resulting Strings preserve the original byte sequences' sort order. + + The Strings are constructed using a Base 8000h encoding of the original + binary data - each char of an encoded String represents a 15-bit chunk + from the byte sequence. Base 8000h was chosen because it allows for all + lower 15 bits of char to be used without restriction; the surrogate range + [U+D8000-U+DFFF] does not represent valid chars, and would require + complicated handling to avoid them and allow use of char's high bit. + + Although unset bits are used as padding in the final char, the original + byte sequence could contain trailing bytes with no set bits (null bytes): + padding is indistinguishable from valid information. To overcome this + problem, a char is appended, indicating the number of encoded bytes in the + final content char. + + This class's operations are defined over CharBuffers and ByteBuffers, to + allow for wrapped arrays to be reused, reducing memory allocation costs for + repeated operations. Note that this class calls array() and arrayOffset() + on the CharBuffers and ByteBuffers it uses, so only wrapped arrays may be + used. This class interprets the arrayOffset() and limit() values returned by + its input buffers as beginning and end+1 positions on the wrapped array, + resprectively; similarly, on the output buffer, arrayOffset() is the first + position written to, and limit() is set to one past the final output array + position. + + + + Returns the number of chars required to encode the given byte sequence. + + + The byte sequence to be encoded. Must be backed by an array. + + The number of chars required to encode the given byte sequence + + IllegalArgumentException If the given ByteBuffer is not backed by an array + + + Returns the number of bytes required to decode the given char sequence. + + + The char sequence to be encoded. Must be backed by an array. + + The number of bytes required to decode the given char sequence + + IllegalArgumentException If the given CharBuffer is not backed by an array + + + Encodes the input byte sequence into the output char sequence. Before + calling this method, ensure that the output CharBuffer has sufficient + capacity by calling {@link #GetEncodedLength(java.nio.ByteBuffer)}. + + + The byte sequence to encode + + Where the char sequence encoding result will go. The limit + is set to one past the position of the final char. + + IllegalArgumentException If either the input or the output buffer + is not backed by an array + + + + Decodes the input char sequence into the output byte sequence. Before + calling this method, ensure that the output ByteBuffer has sufficient + capacity by calling {@link #GetDecodedLength(java.nio.CharBuffer)}. + + + The char sequence to decode + + Where the byte sequence decoding result will go. The limit + is set to one past the position of the final char. + + IllegalArgumentException If either the input or the output buffer + is not backed by an array + + + + Decodes the given char sequence, which must have been encoded by + {@link #Encode(java.nio.ByteBuffer)} or + {@link #Encode(java.nio.ByteBuffer, java.nio.CharBuffer)}. + + + The char sequence to decode + + A byte sequence containing the decoding result. The limit + is set to one past the position of the final char. + + IllegalArgumentException If the input buffer is not backed by an + array + + + + Encodes the input byte sequence. + + + The byte sequence to encode + + A char sequence containing the encoding result. The limit is set + to one past the position of the final char. + + IllegalArgumentException If the input buffer is not backed by an + array + + + + Implements {@link LockFactory} for a single in-process instance, + meaning all locking will take place through this one instance. + Only use this {@link LockFactory} when you are certain all + IndexReaders and IndexWriters for a given index are running + against a single shared in-process Directory instance. This is + currently the default locking for RAMDirectory. + + + + + + + The BoostingTermQuery is very similar to the {@link Lucene.Net.Search.Spans.SpanTermQuery} except + that it factors in the value of the payload located at each of the positions where the + {@link Lucene.Net.Index.Term} occurs. +

    + In order to take advantage of this, you must override {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} + which returns 1 by default. +

    + Payload scores are averaged across term occurrences in the document. + +

    + + + + See {@link Lucene.Net.Search.Payloads.PayloadTermQuery} + +
    + + This class is very similar to + {@link Lucene.Net.Search.Spans.SpanTermQuery} except that it factors + in the value of the payload located at each of the positions where the + {@link Lucene.Net.Index.Term} occurs. +

    + In order to take advantage of this, you must override + {@link Lucene.Net.Search.Similarity#ScorePayload(String, byte[],int,int)} + which returns 1 by default. +

    + Payload scores are aggregated using a pluggable {@link PayloadFunction}. + +

    +
    + + + {@link #GetSpanScore()} * {@link #GetPayloadScore()} + + IOException + + + Returns the SpanScorer score only. +

    + Should not be overriden without good cause! + +

    + the score for just the Span part w/o the payload + + IOException + + + + +
    + + The score for the payload + + + The score, as calculated by + {@link PayloadFunction#DocScore(int, String, int, float)} + + + + MultiPhraseQuery is a generalized version of PhraseQuery, with an added + method {@link #Add(Term[])}. + To use this class, to search for the phrase "Microsoft app*" first use + add(Term) on the term "Microsoft", then find all terms that have "app" as + prefix using IndexReader.terms(Term), and use MultiPhraseQuery.add(Term[] + terms) to add them to the query. + + + 1.0 + + + + Sets the phrase slop for this query. + + + + + Sets the phrase slop for this query. + + + + + Add a single term at the next position in the phrase. + + + + + Add multiple terms at the next position in the phrase. Any of the terms + may match. + + + + + + + Allows to specify the relative position of terms within the phrase. + + + + + + + + + + + Returns a List<Term[]> of the terms in the multiphrase. + Do not modify the List or its contents. + + + + Returns the relative positions of terms in this phrase. + + + Prints a user-readable version of this query. + + + Returns true if o is equal to this. + + + Returns a hash code value for this object. + + + Implements the fuzzy search query. The similarity measurement + is based on the Levenshtein (edit distance) algorithm. + + Warning: this query is not very scalable with its default prefix + length of 0 - in this case, *every* term will be enumerated and + cause an edit score calculation. + + + + + Create a new FuzzyQuery that will match terms with a similarity + of at least minimumSimilarity to term. + If a prefixLength > 0 is specified, a common prefix + of that length is also required. + + + the term to search for + + a value between 0 and 1 to set the required similarity + between the query term and the matching terms. For example, for a + minimumSimilarity of 0.5 a term of the same length + as the query term is considered similar to the query term if the edit distance + between both terms is less than length(term)*0.5 + + length of common (non-fuzzy) prefix + + IllegalArgumentException if minimumSimilarity is >= 1 or < 0 + or if prefixLength < 0 + + + + Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, minimumSimilarity, 0)}. + + + Calls {@link #FuzzyQuery(Term, float) FuzzyQuery(term, 0.5f, 0)}. + + + Returns the minimum similarity that is required for this query to match. + float value between 0.0 and 1.0 + + + + Returns the non-fuzzy prefix length. This is the number of characters at the start + of a term that must be identical (not fuzzy) to the query term if the query + is to match that term. + + + + Returns the pattern term. + + + A query that scores each document as the value of the numeric input field. +

    + The query matches all documents, and scores each document according to the numeric + value of that field. +

    + It is assumed, and expected, that: +

      +
    • The field used here is indexed, and has exactly + one token in every scored document.
    • +
    • Best if this field is un_tokenized.
    • +
    • That token is parsable to the selected type.
    • +
    +

    + Combining this query in a FunctionQuery allows much freedom in affecting document scores. + Note, that with this freedom comes responsibility: it is more than likely that the + default Lucene scoring is superior in quality to scoring modified as explained here. + However, in some cases, and certainly for research experiments, this capability may turn useful. +

    + When contructing this query, select the appropriate type. That type should match the data stored in the + field. So in fact the "right" type should be selected before indexing. Type selection + has effect on the RAM usage: +

      +
    • {@link Type#BYTE} consumes 1 * maxDocs bytes.
    • +
    • {@link Type#SHORT} consumes 2 * maxDocs bytes.
    • +
    • {@link Type#INT} consumes 4 * maxDocs bytes.
    • +
    • {@link Type#FLOAT} consumes 8 * maxDocs bytes.
    • +
    +

    + Caching: + Values for the numeric field are loaded once and cached in memory for further use with the same IndexReader. + To take advantage of this, it is extremely important to reuse index-readers or index-searchers, + otherwise, for instance if for each query a new index reader is opened, large penalties would be + paid for loading the field values into memory over and over again! + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. +

    +
    + + Expert: A Query that sets the scores of document to the + values obtained from a {@link Lucene.Net.Search.Function.ValueSource ValueSource}. +

    + This query provides a score for each and every undeleted document in the index. +

    + The value source can be based on a (cached) value of an indexed field, but it + can also be based on an external source, e.g. values read from an external database. +

    + Score is set as: Score(doc,query) = query.getBoost()2 * valueSource(doc). + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. +

    +
    + + Create a value source query + provides the values defines the function to be used for scoring + + + + Returns true if o is equal to this. + + + Returns a hash code value for this object. + + + A scorer that (simply) matches all documents, and scores each document with + the value of the value soure in effect. As an example, if the value source + is a (cached) field source, then value of that field in that document will + be used. (assuming field is indexed for this doc, with a single token.) + + + + use {@link #NextDoc()} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #Advance(int)} instead. + + + + Create a FieldScoreQuery - a query that scores each document as the value of the numeric input field. +

    + The type param tells how to parse the field string values into a numeric score value. +

    + the numeric field to be used. + + the type of the field: either + {@link Type#BYTE}, {@link Type#SHORT}, {@link Type#INT}, or {@link Type#FLOAT}. + +
    + + Type of score field, indicating how field values are interpreted/parsed. +

    + The type selected at search search time should match the data stored in the field. + Different types have different RAM requirements: +

      +
    • {@link #BYTE} consumes 1 * maxDocs bytes.
    • +
    • {@link #SHORT} consumes 2 * maxDocs bytes.
    • +
    • {@link #INT} consumes 4 * maxDocs bytes.
    • +
    • {@link #FLOAT} consumes 8 * maxDocs bytes.
    • +
    +
    +
    + + field values are interpreted as numeric byte values. + + + field values are interpreted as numeric short values. + + + field values are interpreted as numeric int values. + + + field values are interpreted as numeric float values. + + + A query that applies a filter to the results of another query. + +

    Note: the bits are retrieved from the filter each time this + query is used in a search - use a CachingWrapperFilter to avoid + regenerating the bits every time. + +

    Created: Apr 20, 2004 8:58:29 AM + +

    + 1.4 + + $Id: FilteredQuery.java 807821 2009-08-25 21:55:49Z mikemccand $ + + + +
    + + Constructs a new query which applies a filter to the results of the original query. + Filter.getDocIdSet() will be called every time this query is used in a search. + + Query to be filtered, cannot be null. + + Filter to apply to query results, cannot be null. + + + + Returns a Weight that applies the filter to the enclosed query's Weight. + This is accomplished by overriding the Scorer returned by the Weight. + + + + Rewrites the wrapped query. + + + Prints a user-readable version of this query. + + + Returns true iff o is equal to this. + + + Returns a hash code value for this object. + + + use {@link #NextDoc()} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #Advance(int)} instead. + + + + A range filter built on top of a cached single term field (in {@link FieldCache}). + +

    FieldCacheRangeFilter builds a single cache for the field the first time it is used. + Each subsequent FieldCacheRangeFilter on the same field then reuses this cache, + even if the range itself changes. + +

    This means that FieldCacheRangeFilter is much faster (sometimes more than 100x as fast) + as building a {@link TermRangeFilter} (or {@link ConstantScoreRangeQuery} on a {@link TermRangeFilter}) + for each query, if using a {@link #newStringRange}. However, if the range never changes it + is slower (around 2x as slow) than building a CachingWrapperFilter on top of a single TermRangeFilter. + + For numeric data types, this filter may be significantly faster than {@link NumericRangeFilter}. + Furthermore, it does not need the numeric values encoded by {@link NumericField}. But + it has the problem that it only works with exact one value/document (see below). + +

    As with all {@link FieldCache} based functionality, FieldCacheRangeFilter is only valid for + fields which exact one term for each document (except for {@link #newStringRange} + where 0 terms are also allowed). Due to a restriction of {@link FieldCache}, for numeric ranges + all terms that do not have a numeric value, 0 is assumed. + +

    Thus it works on dates, prices and other single value fields but will not work on + regular text fields. It is preferable to use a NOT_ANALYZED field to ensure that + there is only a single term. + +

    This class does not have an constructor, use one of the static factory methods available, + that create a correct instance for different data types supported by {@link FieldCache}. +

    +
    + + This method is implemented for each data type + + + Creates a string range query using {@link FieldCache#getStringIndex}. This works with all + fields containing zero or one term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetBytes(IndexReader,String)}. This works with all + byte fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetBytes(IndexReader,String,FieldCache.ByteParser)}. This works with all + byte fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetShorts(IndexReader,String)}. This works with all + short fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetShorts(IndexReader,String,FieldCache.ShortParser)}. This works with all + short fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetInts(IndexReader,String)}. This works with all + int fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetInts(IndexReader,String,FieldCache.IntParser)}. This works with all + int fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetLongs(IndexReader,String)}. This works with all + long fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetLongs(IndexReader,String,FieldCache.LongParser)}. This works with all + long fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetFloats(IndexReader,String)}. This works with all + float fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetFloats(IndexReader,String,FieldCache.FloatParser)}. This works with all + float fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetDoubles(IndexReader,String)}. This works with all + double fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + Creates a numeric range query using {@link FieldCache#GetDoubles(IndexReader,String,FieldCache.DoubleParser)}. This works with all + double fields containing exactly one numeric term in the field. The range can be half-open by setting one + of the values to null. + + + + this method checks, if a doc is a hit, should throw AIOBE, when position invalid + + + this DocIdSet is cacheable, if it works solely with FieldCache and no TermDocs + + + @deprecated use {@link #NextDoc()} instead. + + + use {@link #Advance(int)} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + use {@link #DocID()} instead. + + + + MessageBundles classes extend this class, to implement a bundle. + + For Native Language Support (NLS), system of software internationalization. + + This interface is similar to the NLS class in eclipse.osgi.util.NLS class - + initializeMessages() method resets the values of all static strings, should + only be called by classes that extend from NLS (see TestMessages.java for + reference) - performs validation of all message in a bundle, at class load + time - performs per message validation at runtime - see NLSTest.java for + usage reference + + MessageBundle classes may subclass this type. + + + + Initialize a given class with the message bundle Keys Should be called from + a class that extends NLS in a static block at class load time. + + + Property file with that contains the message bundle + + where constants will reside + + + + + + + + + + + + + + - Message Key + + + + + Performs the priviliged action. + + A value that may represent the result of the action. + + + Compares {@link Lucene.Net.Index.TermVectorEntry}s first by frequency and then by + the term (case-sensitive) + + + + + + Writes norms. Each thread X field accumulates the norms + for the doc/fields it saw, then the flush method below + merges all of these together into a single _X.nrm file. + + + + Produce _X.nrm if any document had a field with norms + not disabled + + + + Holds state for inverting all occurrences of a single + field in the document. This class doesn't do anything + itself; instead, it forwards the tokens produced by + analysis to its own consumer + (InvertedDocConsumerPerField). It also interacts with an + endConsumer (InvertedDocEndConsumerPerField). + + + + Processes all occurrences of a single field + + + Gathers all Fieldables for a document under the same + name, updates FieldInfos, and calls per-field consumers + to process field by field. + + Currently, only a single thread visits the fields, + sequentially, for processing. + + + + If there are fields we've seen but did not see again + in the last run, then free them up. + + + + Filters {@link LetterTokenizer} with {@link LowerCaseFilter} and + {@link StopFilter}. + + +

    + You must specify the required {@link Version} compatibility when creating + StopAnalyzer: +

      +
    • As of 2.9, position increments are preserved
    • +
    +
    +
    + + An array containing some common English words that are not usually useful + for searching. + + Use {@link #ENGLISH_STOP_WORDS_SET} instead + + + + An unmodifiable set containing some common English words that are not usually useful + for searching. + + + + Builds an analyzer which removes words in + ENGLISH_STOP_WORDS. + + Use {@link #StopAnalyzer(Version)} instead + + + + Builds an analyzer which removes words in ENGLISH_STOP_WORDS. + + + Builds an analyzer which removes words in + ENGLISH_STOP_WORDS. + + + See {@link StopFilter#SetEnablePositionIncrements} + + Use {@link #StopAnalyzer(Version)} instead + + + + Builds an analyzer with the stop words from the given set. + Use {@link #StopAnalyzer(Version, Set)} instead + + + + Builds an analyzer with the stop words from the given set. + + + Builds an analyzer with the stop words from the given set. + Set of stop words + + + See {@link StopFilter#SetEnablePositionIncrements} + + Use {@link #StopAnalyzer(Version, Set)} instead + + + + Builds an analyzer which removes words in the provided array. + Use {@link #StopAnalyzer(Set, boolean)} instead + + Use {@link #StopAnalyzer(Version, Set)} instead + + + + Builds an analyzer which removes words in the provided array. + Array of stop words + + + See {@link StopFilter#SetEnablePositionIncrements} + + Use {@link #StopAnalyzer(Version, Set)} instead + + + + Builds an analyzer with the stop words from the given file. + + + Use {@link #StopAnalyzer(Version, File)} instead + + + + Builds an analyzer with the stop words from the given file. + + + File to load stop words from + + + See {@link StopFilter#SetEnablePositionIncrements} + + Use {@link #StopAnalyzer(Version, File)} instead + + + + Builds an analyzer with the stop words from the given file. + + + + + See above + + File to load stop words from + + + + Builds an analyzer with the stop words from the given reader. + + + Use {@link #StopAnalyzer(Version, Reader)} instead + + + + Builds an analyzer with the stop words from the given reader. + + + Reader to load stop words from + + + See {@link StopFilter#SetEnablePositionIncrements} + + Use {@link #StopAnalyzer(Version, Reader)} instead + + + + Builds an analyzer with the stop words from the given reader. + + + See above + + Reader to load stop words from + + + + Filters LowerCaseTokenizer with StopFilter. + + + Filters LowerCaseTokenizer with StopFilter. + + + Use this {@link LockFactory} to disable locking entirely. + This LockFactory is used when you call {@link FSDirectory#setDisableLocks}. + Only one instance of this lock is created. You should call {@link + #GetNoLockFactory()} to get the instance. + + + + + + + Expert: A Directory instance that switches files between + two other Directory instances. +

    Files with the specified extensions are placed in the + primary directory; others are placed in the secondary + directory. The provided Set must not change once passed + to this class, and must allow multiple threads to call + contains at once.

    + +

    NOTE: this API is new and experimental and is + subject to suddenly change in the next release. +

    +
    + + Return the primary directory + + + Return the secondary directory + + + Utility method to return a file's extension. + + + A {@link Collector} implementation that collects the top-scoring hits, + returning them as a {@link TopDocs}. This is used by {@link IndexSearcher} to + implement {@link TopDocs}-based search. Hits are sorted by score descending + and then (when the scores are tied) docID ascending. When you create an + instance of this collector you should know in advance whether documents are + going to be collected in doc Id order or not. + +

    NOTE: The values {@link Float#NaN} and + {Float#NEGATIVE_INFINITY} are not valid scores. This + collector will not properly collect hits with such + scores. +

    +
    + + Creates a new {@link TopScoreDocCollector} given the number of hits to + collect and whether documents are scored in order by the input + {@link Scorer} to {@link #SetScorer(Scorer)}. + +

    NOTE: The instances returned by this method + pre-allocate a full array of length + numHits, and fill the array with sentinel + objects. +

    +
    + + Constrains search results to only match those which also match a provided + query. + +

    This could be used, for example, with a {@link TermRangeQuery} on a suitably + formatted date field to implement date filtering. One could re-use a single + QueryFilter that matches, e.g., only documents modified within the last + week. The QueryFilter and TermRangeQuery would only need to be reconstructed + once per day. + +

    + $Id:$ + +
    + + Constructs a filter which only matches documents matching + query. + + + + Use {@link #GetDocIdSet(IndexReader)} instead. + + + + Position of a term in a document that takes into account the term offset within the phrase. + + + Go to next location of this term current document, and set + position as location - offset, so that a + matching exact phrase is easily identified when all PhrasePositions + have exactly the same position. + + + + A query that generates the union of documents produced by its subqueries, and that scores each document with the maximum + score for that document as produced by any subquery, plus a tie breaking increment for any additional matching subqueries. + This is useful when searching for a word in multiple fields with different boost factors (so that the fields cannot be + combined equivalently into a single search field). We want the primary score to be the one associated with the highest boost, + not the sum of the field scores (as BooleanQuery would give). + If the query is "albino elephant" this ensures that "albino" matching one field and "elephant" matching + another gets a higher score than "albino" matching both fields. + To get this result, use both BooleanQuery and DisjunctionMaxQuery: for each term a DisjunctionMaxQuery searches for it in + each field, while the set of these DisjunctionMaxQuery's is combined into a BooleanQuery. + The tie breaker capability allows results that include the same term in multiple fields to be judged better than results that + include this term in only the best of those multiple fields, without confusing this with the better case of two different terms + in the multiple fields. + + + + Creates a new empty DisjunctionMaxQuery. Use add() to add the subqueries. + the score of each non-maximum disjunct for a document is multiplied by this weight + and added into the final score. If non-zero, the value should be small, on the order of 0.1, which says that + 10 occurrences of word in a lower-scored field that is also in a higher scored field is just as good as a unique + word in the lower scored field (i.e., one that is not in any higher scored field. + + + + Creates a new DisjunctionMaxQuery + a Collection<Query> of all the disjuncts to add + + the weight to give to each matching non-maximum disjunct + + + + Add a subquery to this disjunction + the disjunct added + + + + Add a collection of disjuncts to this disjunction + via Iterable + + + + An Iterator<Query> over the disjuncts + + + Optimize our representation and our subqueries representations + the IndexReader we query + + an optimized copy of us (which may not be a copy if there is nothing to optimize) + + + + Create a shallow copy of us -- used in rewriting if necessary + a copy of us (but reuse, don't copy, our subqueries) + + + + Prettyprint us. + the field to which we are applied + + a string that shows what we do, of the form "(disjunct1 | disjunct2 | ... | disjunctn)^boost" + + + + Return true iff we represent the same query as o + another object + + true iff o is a DisjunctionMaxQuery with the same boost and the same subqueries, in the same order, as us + + + + Compute a hash code for hashing us + the hash code + + + + Expert: the Weight for DisjunctionMaxQuery, used to + normalize, score and explain these queries. + +

    NOTE: this API and implementation is subject to + change suddenly in the next release.

    +

    +
    + + The Similarity implementation. + + + The Weights for our subqueries, in 1-1 correspondence with disjuncts + + + Default implementation of Message interface. + For Native Language Support (NLS), system of software internationalization. + + + + Message Interface for a lazy loading. + For Native Language Support (NLS), system of software internationalization. + + + + A {@link IndexDeletionPolicy} that wraps around any other + {@link IndexDeletionPolicy} and adds the ability to hold and + later release a single "snapshot" of an index. While + the snapshot is held, the {@link IndexWriter} will not + remove any files associated with it even if the index is + otherwise being actively, arbitrarily changed. Because + we wrap another arbitrary {@link IndexDeletionPolicy}, this + gives you the freedom to continue using whatever {@link + IndexDeletionPolicy} you would normally want to use with your + index. Note that you can re-use a single instance of + SnapshotDeletionPolicy across multiple writers as long + as they are against the same index Directory. Any + snapshot held when a writer is closed will "survive" + when the next writer is opened. + +

    WARNING: This API is a new and experimental and + may suddenly change.

    +

    +
    + + Take a snapshot of the most recent commit to the + index. You must call release() to free this snapshot. + Note that while the snapshot is held, the files it + references will not be deleted, which will consume + additional disk space in your index. If you take a + snapshot at a particularly bad time (say just before + you call optimize()) then in the worst case this could + consume an extra 1X of your total index size, until + you release the snapshot. + + + + Release the currently held snapshot. + + + Taps into DocInverter, as an InvertedDocEndConsumer, + which is called at the end of inverting each field. We + just look at the length for the field (docState.length) + and record the norm. + + + + An IndexReader which reads multiple indexes, appending their content. + + + $Id: MultiReader.java 782406 2009-06-07 16:31:18Z mikemccand $ + + + +

    Construct a MultiReader aggregating the named set of (sub)readers. + Directory locking for delete, undeleteAll, and setNorm operations is + left to the subreaders.

    +

    Note that all subreaders are closed if this Multireader is closed.

    +

    + set of (sub)readers + + IOException +
    + +

    Construct a MultiReader aggregating the named set of (sub)readers. + Directory locking for delete, undeleteAll, and setNorm operations is + left to the subreaders.

    +

    + indicates whether the subreaders should be closed + when this MultiReader is closed + + set of (sub)readers + + IOException +
    + + Tries to reopen the subreaders. +
    + If one or more subreaders could be re-opened (i. e. subReader.reopen() + returned a new instance != subReader), then a new MultiReader instance + is returned, otherwise this instance is returned. +

    + A re-opened instance might share one or more subreaders with the old + instance. Index modification operations result in undefined behavior + when performed before the old instance is closed. + (see {@link IndexReader#Reopen()}). +

    + If subreaders are shared, then the reference count of those + readers is increased to ensure that the subreaders remain open + until the last referring reader is closed. + +

    + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error +
    + + Clones the subreaders. + (see {@link IndexReader#clone()}). +
    +

    + If subreaders are shared, then the reference count of those + readers is increased to ensure that the subreaders remain open + until the last referring reader is closed. +

    +
    + + If clone is true then we clone each of the subreaders + + + New IndexReader, or same one (this) if + reopen/clone is not necessary + + CorruptIndexException + IOException + + + + + + + Checks recursively if all subreaders are up to date. + + + Not implemented. + UnsupportedOperationException + + + This abstract class reads skip lists with multiple levels. + + See {@link MultiLevelSkipListWriter} for the information about the encoding + of the multi level skip lists. + + Subclasses must implement the abstract method {@link #ReadSkipData(int, IndexInput)} + which defines the actual format of the skip data. + + + + Returns the id of the doc to which the last call of {@link #SkipTo(int)} + has skipped. + + + + Skips entries to the first beyond the current whose document number is + greater than or equal to target. Returns the current doc count. + + + + Seeks the skip entry on the given level + + + initializes the reader + + + Loads the skip levels + + + Subclasses must implement the actual skip data encoding in this method. + + + the level skip data shall be read from + + the skip stream to read from + + + + Copies the values of the last read skip entry on this level + + + used to buffer the top skip levels + + + Add a new field + + + Called when we are done adding everything. + + + This is a DocFieldConsumer that inverts each field, + separately, from a Document, and accepts a + InvertedTermsConsumer to process those terms. + + + + This class keeps track of closing the underlying directory. It is used to wrap + DirectoryReaders, that are created using a String/File parameter + in IndexReader.open() with FSDirectory.getDirectory(). + + This helper class is removed with all String/File + IndexReader.open() methods in Lucene 3.0 + + + + This member contains the ref counter, that is passed to each instance after cloning/reopening, + and is global to all DirectoryOwningReader derived from the original one. + This reuses the class {@link SegmentReader.Ref} + + + + Class to write byte streams into slices of shared + byte[]. This is used by DocumentsWriter to hold the + posting list for many terms in RAM. + + + + Set up the writer to write at address. + + + Write byte into byte slice stream + + + Simple utility class providing static methods to + compress and decompress binary data for stored fields. + This class uses java.util.zip.Deflater and Inflater + classes to compress and decompress, which is the same + format previously used by the now deprecated + Field.Store.COMPRESS. + + + + Compresses the specified byte range using the + specified compressionLevel (constants are defined in + java.util.zip.Deflater). + + + + Compresses the specified byte range, with default BEST_COMPRESSION level + + + Compresses all bytes in the array, with default BEST_COMPRESSION level + + + Compresses the String value, with default BEST_COMPRESSION level + + + Compresses the String value using the specified + compressionLevel (constants are defined in + java.util.zip.Deflater). + + + + Decompress the byte array previously returned by + compress + + + + Decompress the byte array previously returned by + compressString back into a String + + + + Filters {@link StandardTokenizer} with {@link StandardFilter}, + {@link LowerCaseFilter} and {@link StopFilter}, using a list of English stop + words. + + +

    + You must specify the required {@link Version} compatibility when creating + StandardAnalyzer: +

    + +
    + $Id: StandardAnalyzer.java 829134 2009-10-23 17:18:53Z mikemccand $ + +
    + + Default maximum allowed token length + + + Specifies whether deprecated acronyms should be replaced with HOST type. + This is false by default to support backward compatibility. + + + this should be removed in the next release (3.0). + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + + + true if new instances of StandardTokenizer will + replace mischaracterized acronyms + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + This will be removed (hardwired to true) in 3.0 + + + + + Set to true to have new + instances of StandardTokenizer replace mischaracterized + acronyms by default. Set to false to preserve the + previous (before 2.4) buggy behavior. Alternatively, + set the system property + Lucene.Net.Analysis.Standard.StandardAnalyzer.replaceInvalidAcronym + to false. + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + This will be removed (hardwired to true) in 3.0 + + + + An array containing some common English words that are usually not + useful for searching. + + Use {@link #STOP_WORDS_SET} instead + + + + An unmodifiable set containing some common English words that are usually not + useful for searching. + + + + Builds an analyzer with the default stop words ({@link + #STOP_WORDS_SET}). + + Use {@link #StandardAnalyzer(Version)} instead. + + + + Builds an analyzer with the default stop words ({@link + #STOP_WORDS}). + + Lucene version to match See {@link + above} + + + + Builds an analyzer with the given stop words. + Use {@link #StandardAnalyzer(Version, Set)} + instead + + + + Builds an analyzer with the given stop words. + Lucene version to match See {@link + above} + + stop words + + + + Builds an analyzer with the given stop words. + Use {@link #StandardAnalyzer(Version, Set)} instead + + + + Builds an analyzer with the stop words from the given file. + + + Use {@link #StandardAnalyzer(Version, File)} + instead + + + + Builds an analyzer with the stop words from the given file. + + + Lucene version to match See {@link + above} + + File to read stop words from + + + + Builds an analyzer with the stop words from the given reader. + + + Use {@link #StandardAnalyzer(Version, Reader)} + instead + + + + Builds an analyzer with the stop words from the given reader. + + + Lucene version to match See {@link + above} + + Reader to read stop words from + + + + + Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + Remove in 3.X and make true the only valid value + + + + The stopwords to use + + Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + Remove in 3.X and make true the only valid value + + + + The stopwords to use + + Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + Remove in 3.X and make true the only valid value + + + + + The stopwords to use + + Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + Remove in 3.X and make true the only valid value + + + + The stopwords to use + + Set to true if this analyzer should replace mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + + Remove in 3.X and make true the only valid value + + + + Constructs a {@link StandardTokenizer} filtered by a {@link + StandardFilter}, a {@link LowerCaseFilter} and a {@link StopFilter}. + + + + Set maximum allowed token length. If a token is seen + that exceeds this length then it is discarded. This + setting only takes effect the next time tokenStream or + reusableTokenStream is called. + + + + + + + + Use {@link #tokenStream} instead + + + + + true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + This will be removed (hardwired to true) in 3.0 + + + + + Set to true if this Analyzer is replacing mischaracterized acronyms in the StandardTokenizer + + See https://issues.apache.org/jira/browse/LUCENE-1068 + + This will be removed (hardwired to true) in 3.0 + + + + A SinkTokenizer can be used to cache Tokens for use in an Analyzer +

    + WARNING: {@link TeeTokenFilter} and {@link SinkTokenizer} only work with the old TokenStream API. + If you switch to the new API, you need to use {@link TeeSinkTokenFilter} instead, which offers + the same functionality. +

    + + + Use {@link TeeSinkTokenFilter} instead + + + +
    + + Get the tokens in the internal List. +

    + WARNING: Adding tokens to this list requires the {@link #Reset()} method to be called in order for them + to be made available. Also, this Tokenizer does nothing to protect against {@link java.util.ConcurrentModificationException}s + in the case of adds happening while {@link #Next(Lucene.Net.Analysis.Token)} is being called. +

    + WARNING: Since this SinkTokenizer can be reset and the cached tokens made available again, do not modify them. Modify clones instead. + +

    + A List of {@link Lucene.Net.Analysis.Token}s + +
    + + Returns the next token out of the list of cached tokens + The next {@link Lucene.Net.Analysis.Token} in the Sink. + + IOException + + + Override this method to cache only certain tokens, or new tokens based + on the old tokens. + + + The {@link Lucene.Net.Analysis.Token} to add to the sink + + + + Reset the internal data structures to the start at the front of the list of tokens. Should be called + if tokens were added to the list after an invocation of {@link #Next(Token)} + + IOException + + + Optimized implementation of a vector of bits. This is more-or-less like + java.util.BitSet, but also includes the following: +
      +
    • a count() method, which efficiently computes the number of one bits;
    • +
    • optimized read from and write to disk;
    • +
    • inlinable get() method;
    • +
    • store and load, as bit set or d-gaps, depending on sparseness;
    • +
    +
    + $Id: BitVector.java 765649 2009-04-16 14:29:26Z mikemccand $ + +
    + + Constructs a vector capable of holding n bits. + + + Sets the value of bit to one. + + + Sets the value of bit to true, and + returns true if bit was already set + + + + Sets the value of bit to zero. + + + Returns true if bit is one and + false if it is zero. + + + + Returns the number of bits in this vector. This is also one greater than + the number of the largest valid bit number. + + + + Returns the total number of one bits in this vector. This is efficiently + computed and cached, so that, if the vector is not changed, no + recomputation is done for repeated calls. + + + + Writes this vector to the file name in Directory + d, in a format that can be read by the constructor {@link + #BitVector(Directory, String)}. + + + + Write as a bit set + + + Write as a d-gaps list + + + Indicates if the bit vector is sparse and should be saved as a d-gaps list, or dense, and should be saved as a bit set. + + + Constructs a bit vector from the file name in Directory + d, as written by the {@link #write} method. + + + + Read as a bit set + + + read as a d-gaps list + + + Retrieve a subset of this BitVector. + + + starting index, inclusive + + ending index, exclusive + + subset + + + + A memory-resident {@link IndexInput} implementation. + + + $Id: RAMInputStream.java 632120 2008-02-28 21:13:59Z mikemccand $ + + + + This exception is thrown when you try to list a + non-existent directory. + + + + A {@link Collector} that sorts by {@link SortField} using + {@link FieldComparator}s. +

    + See the {@link #create(Lucene.Net.Search.Sort, int, boolean, boolean, boolean, boolean)} method + for instantiating a TopFieldCollector. + +

    NOTE: This API is experimental and might change in + incompatible ways in the next release.

    +

    +
    + + Creates a new {@link TopFieldCollector} from the given + arguments. + +

    NOTE: The instances returned by this method + pre-allocate a full array of length + numHits. + +

    + the sort criteria (SortFields). + + the number of results to collect. + + specifies whether the actual field values should be returned on + the results (FieldDoc). + + specifies whether document scores should be tracked and set on the + results. Note that if set to false, then the results' scores will + be set to Float.NaN. Setting this to true affects performance, as + it incurs the score computation on each competitive result. + Therefore if document scores are not required by the application, + it is recommended to set it to false. + + specifies whether the query's maxScore should be tracked and set + on the resulting {@link TopDocs}. Note that if set to false, + {@link TopDocs#GetMaxScore()} returns Float.NaN. Setting this to + true affects performance as it incurs the score computation on + each result. Also, setting this true automatically sets + trackDocScores to true as well. + + specifies whether documents are scored in doc Id order or not by + the given {@link Scorer} in {@link #SetScorer(Scorer)}. + + a {@link TopFieldCollector} instance which will sort the results by + the sort criteria. + + IOException +
    + + Score a candidate doc for all slop-valid position-combinations (matches) + encountered while traversing/hopping the PhrasePositions. +
    The score contribution of a match depends on the distance: +
    - highest score for distance=0 (exact match). +
    - score gets lower as distance gets higher. +
    Example: for query "a b"~2, a document "x a b a y" can be scored twice: + once for "a b" (distance=0), and once for "b a" (distance=2). +
    Possibly not all valid combinations are encountered, because for efficiency + we always propagate the least PhrasePosition. This allows to base on + PriorityQueue and move forward faster. + As result, for example, document "a b c b a" + would score differently for queries "a b c"~4 and "c b a"~4, although + they really are equivalent. + Similarly, for doc "a b c b a f g", query "c b"~2 + would get same score as "g f"~2, although "c b"~2 could be matched twice. + We may want to fix this in the future (currently not, for performance reasons). +
    +
    + + Init PhrasePositions in place. + There is a one time initialization for this scorer: +
    - Put in repeats[] each pp that has another pp with same position in the doc. +
    - Also mark each such pp by pp.repeats = true. +
    Later can consult with repeats[] in termPositionsDiffer(pp), making that check efficient. + In particular, this allows to score queries with no repetitions with no overhead due to this computation. +
    - Example 1 - query with no repetitions: "ho my"~2 +
    - Example 2 - query with repetitions: "ho my my"~2 +
    - Example 3 - query with repetitions: "my ho my"~2 +
    Init per doc w/repeats in query, includes propagating some repeating pp's to avoid false phrase detection. +
    + end (max position), or -1 if any term ran out (i.e. done) + + IOException +
    + + We disallow two pp's to have the same TermPosition, thereby verifying multiple occurrences + in the query of the same word would go elsewhere in the matched doc. + + null if differ (i.e. valid) otherwise return the higher offset PhrasePositions + out of the first two PPs found to not differ. + + + + A Filter that restricts search results to a range of values in a given + field. + +

    This filter matches the documents looking for terms that fall into the + supplied range according to {@link String#compareTo(String)}. It is not intended + for numerical ranges, use {@link NumericRangeFilter} instead. + +

    If you construct a large number of range filters with different ranges but on the + same field, {@link FieldCacheRangeFilter} may have significantly better performance. + +

    + Use {@link TermRangeFilter} for term ranges or + {@link NumericRangeFilter} for numeric ranges instead. + This class will be removed in Lucene 3.0. + +
    + + The field this range applies to + + The lower bound on this range + + The upper bound on this range + + Does this range include the lower bound? + + Does this range include the upper bound? + + IllegalArgumentException if both terms are null or if + lowerTerm is null and includeLower is true (similar for upperTerm + and includeUpper) + + + + WARNING: Using this constructor and supplying a non-null + value in the collator parameter will cause every single + index Term in the Field referenced by lowerTerm and/or upperTerm to be + examined. Depending on the number of index Terms in this Field, the + operation could be very slow. + + + The lower bound on this range + + The upper bound on this range + + Does this range include the lower bound? + + Does this range include the upper bound? + + The collator to use when determining range inclusion; set + to null to use Unicode code point ordering instead of collation. + + IllegalArgumentException if both terms are null or if + lowerTerm is null and includeLower is true (similar for upperTerm + and includeUpper) + + + + Constructs a filter for field fieldName matching + less than or equal to upperTerm. + + + + Constructs a filter for field fieldName matching + greater than or equal to lowerTerm. + + + + Add a complete document specified by all its term vectors. If document has no + term vectors, add value for tvx. + + + + + IOException + + + Do a bulk copy of numDocs documents from reader to our + streams. This is used to expedite merging, if the + field numbers are congruent. + + + + Close all streams. + + + $Id: TermVectorsReader.java 687046 2008-08-19 13:01:11Z mikemccand $ + + + + Retrieve the length (in bytes) of the tvd and tvf + entries for the next numDocs starting with + startDocID. This is used for bulk copying when + merging segments, if the field numbers are + congruent. Once this returns, the tvf & tvd streams + are seeked to the startDocID. + + + + + The number of documents in the reader + + + + Retrieve the term vector for the given document and field + The document number to retrieve the vector for + + The field within the document to retrieve + + The TermFreqVector for the document and field or null if there is no termVector for this field. + + IOException if there is an error reading the term vector files + + + Return all term vectors stored for this document or null if the could not be read in. + + + The document number to retrieve the vector for + + All term frequency vectors + + IOException if there is an error reading the term vector files + + + + The field to read in + + The pointer within the tvf file where we should start reading + + The mapper used to map the TermVector + + IOException + + + Models the existing parallel array structure + + + Construct the vector + The {@link TermFreqVector} based on the mappings. + + + + This is a {@link LogMergePolicy} that measures size of a + segment as the number of documents (not taking deletions + into account). + + + + + + + + Sets the minimum size for the lowest level segments. + Any segments below this size are considered to be on + the same level (even if they vary drastically in size) + and will be merged whenever there are mergeFactor of + them. This effectively truncates the "long tail" of + small segments that would otherwise be created into a + single level. If you set this too large, it could + greatly increase the merging cost during indexing (if + you flush many small segments). + + + + Get the minimum size for a segment to remain + un-merged. + + + + + + A {@link FieldSelector} based on a Map of field names to {@link FieldSelectorResult}s + + + + + Create a a MapFieldSelector + maps from field names (String) to {@link FieldSelectorResult}s + + + + Create a a MapFieldSelector + fields to LOAD. List of Strings. All other fields are NO_LOAD. + + + + Create a a MapFieldSelector + fields to LOAD. All other fields are NO_LOAD. + + + + Load field according to its associated value in fieldSelections + a field name + + the fieldSelections value that field maps to or NO_LOAD if none. + + + + This class wraps a Token and supplies a single attribute instance + where the delegate token can be replaced. + + Will be removed, when old TokenStream API is removed. + + + + Holds a map of String input to String output, to be used + with {@link MappingCharFilter}. + + + + Records a replacement to be applied to the inputs + stream. Whenever singleMatch occurs in + the input, it will be replaced with + replacement. + + + input String to be replaced + + output String + + + + Borrowed from Cglib. Allows custom swap so that two arrays can be sorted + at the same time. + + + + An iterator to iterate over set bits in an OpenBitSet. + This is faster than nextSetBit() for iterating over the complete set of bits, + especially when the density of the bits set is high. + + + $Id$ + + + + ** the python code that generated bitlist + def bits2int(val): + arr=0 + for shift in range(8,0,-1): + if val & 0x80: + arr = (arr << 4) | shift + val = val << 1 + return arr + def int_table(): + tbl = [ hex(bits2int(val)).strip('L') for val in range(256) ] + return ','.join(tbl) + **** + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + use {@link #DocID()} instead. + + + + Some useful constants. + + + + $Id: Constants.java 828327 2009-10-22 06:47:40Z uschindler $ + + + + + The value of System.getProperty("java.version"). * + + + True iff this is Java version 1.1. + + + True iff this is Java version 1.2. + + + True iff this is Java version 1.3. + + + The value of System.getProperty("os.name"). * + + + True iff running on Linux. + + + True iff running on Windows. + + + True iff running on SunOS. + + + A {@link HitCollector} implementation that collects the top-sorting + documents, returning them as a {@link TopFieldDocs}. This is used by {@link + IndexSearcher} to implement {@link TopFieldDocs}-based search. + +

    This may be extended, overriding the collect method to, e.g., + conditionally invoke super() in order to filter which + documents are collected. + +

    + Please use {@link TopFieldCollector} instead. + +
    + + Construct to collect a given number of hits. + the index to be searched + + the sort criteria + + the maximum number of hits to collect + + + + Matches spans near the beginning of a field. + + + Construct a SpanFirstQuery matching spans in match whose end + position is less than or equal to end. + + + + Return the SpanQuery whose matches are filtered. + + + Return the maximum end position permitted in a match. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + + Returns the maximum payload score seen, else 1 if there are no payloads on the doc. +

    + Is thread safe and completely reusable. + + +

    +
    + + A query that matches all documents. + + + + + Field used for normalization factor (document boost). Null if nothing. + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Describes the input token stream. + + + An integer that describes the kind of this token. This numbering + system is determined by JavaCCParser, and a table of these numbers is + stored in the file ...Constants.java. + + + + The line number of the first character of this Token. + + + The column number of the first character of this Token. + + + The line number of the last character of this Token. + + + The column number of the last character of this Token. + + + The string image of the token. + + + A reference to the next regular (non-special) token from the input + stream. If this is the last token from the input stream, or if the + token manager has not read tokens beyond this one, this field is + set to null. This is true only if this token is also a regular + token. Otherwise, see below for a description of the contents of + this field. + + + + This field is used to access special tokens that occur prior to this + token, but after the immediately preceding regular (non-special) token. + If there are no such special tokens, this field is set to null. + When there are more than one such special token, this field refers + to the last of these special tokens, which in turn refers to the next + previous special token through its specialToken field, and so on + until the first special token (whose specialToken field is null). + The next fields of special tokens refer to other special tokens that + immediately follow it (without an intervening regular token). If there + is no such token, this field is null. + + + + An optional attribute value of the Token. + Tokens which are not used as syntactic sugar will often contain + meaningful values that will be used later on by the compiler or + interpreter. This attribute value is often different from the image. + Any subclass of Token that actually wants to return a non-null value can + override this method as appropriate. + + + + No-argument constructor + + + Constructs a new token for the specified Image. + + + Constructs a new token for the specified Image and Kind. + + + Returns the image. + + + Returns a new Token object, by default. However, if you want, you + can create and return subclass objects based on the value of ofKind. + Simply add the cases to the switch for all those special cases. + For example, if you have a subclass of Token called IDToken that + you want to create if ofKind is ID, simply add something like : + + case MyParserConstants.ID : return new IDToken(ofKind, image); + + to the following switch statement. Then you can cast matchedToken + variable to the appropriate type and use sit in your lexical actions. + + + + A QueryParser which constructs queries to search multiple fields. + + + $Revision: 829134 $ + + + + Creates a MultiFieldQueryParser. Allows passing of a map with term to + Boost, and the boost to apply to each term. + +

    + It will, when parse(String query) is called, construct a query like this + (assuming the query consists of two terms and you specify the two fields + title and body): +

    + + + (title:term1 body:term1) (title:term2 body:term2) + + +

    + When setDefaultOperator(AND_OPERATOR) is set, the result will be: +

    + + + +(title:term1 body:term1) +(title:term2 body:term2) + + +

    + When you pass a boost (title=>5 body=>10) you can get +

    + + + +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0) + + +

    + In other words, all the query's terms must appear, but it doesn't matter + in what fields they appear. +

    + +

    + Please use + {@link #MultiFieldQueryParser(Version, String[], Analyzer, Map)} + instead + +
    + + Creates a MultiFieldQueryParser. Allows passing of a map with term to + Boost, and the boost to apply to each term. + +

    + It will, when parse(String query) is called, construct a query like this + (assuming the query consists of two terms and you specify the two fields + title and body): +

    + + + (title:term1 body:term1) (title:term2 body:term2) + + +

    + When setDefaultOperator(AND_OPERATOR) is set, the result will be: +

    + + + +(title:term1 body:term1) +(title:term2 body:term2) + + +

    + When you pass a boost (title=>5 body=>10) you can get +

    + + + +(title:term1^5.0 body:term1^10.0) +(title:term2^5.0 body:term2^10.0) + + +

    + In other words, all the query's terms must appear, but it doesn't matter + in what fields they appear. +

    +

    +
    + + Creates a MultiFieldQueryParser. + +

    + It will, when parse(String query) is called, construct a query like this + (assuming the query consists of two terms and you specify the two fields + title and body): +

    + + + (title:term1 body:term1) (title:term2 body:term2) + + +

    + When setDefaultOperator(AND_OPERATOR) is set, the result will be: +

    + + + +(title:term1 body:term1) +(title:term2 body:term2) + + +

    + In other words, all the query's terms must appear, but it doesn't matter + in what fields they appear. +

    + +

    + Please use + {@link #MultiFieldQueryParser(Version, String[], Analyzer)} + instead + +
    + + Creates a MultiFieldQueryParser. + +

    + It will, when parse(String query) is called, construct a query like this + (assuming the query consists of two terms and you specify the two fields + title and body): +

    + + + (title:term1 body:term1) (title:term2 body:term2) + + +

    + When setDefaultOperator(AND_OPERATOR) is set, the result will be: +

    + + + +(title:term1 body:term1) +(title:term2 body:term2) + + +

    + In other words, all the query's terms must appear, but it doesn't matter + in what fields they appear. +

    +

    +
    + + Parses a query which searches on the fields specified. +

    + If x fields are specified, this effectively constructs: + +

    +            <code>
    +            (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx)
    +            </code>
    +            
    + +
    + Queries strings to parse + + Fields to search on + + Analyzer to use + + ParseException + if query parsing fails + + IllegalArgumentException + if the length of the queries array differs from the length of + the fields array + + Use {@link #Parse(Version,String[],String[],Analyzer)} + instead + +
    + + Parses a query which searches on the fields specified. +

    + If x fields are specified, this effectively constructs: + +

    +            <code>
    +            (field1:query1) (field2:query2) (field3:query3)...(fieldx:queryx)
    +            </code>
    +            
    + +
    + Lucene version to match; this is passed through to + QueryParser. + + Queries strings to parse + + Fields to search on + + Analyzer to use + + ParseException + if query parsing fails + + IllegalArgumentException + if the length of the queries array differs from the length of + the fields array + +
    + + Parses a query, searching on the fields specified. + Use this if you need to specify certain fields as required, + and others as prohibited. +

    +            Usage:
    +            
    +            String[] fields = {"filename", "contents", "description"};
    +            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
    +            BooleanClause.Occur.MUST,
    +            BooleanClause.Occur.MUST_NOT};
    +            MultiFieldQueryParser.parse("query", fields, flags, analyzer);
    +            
    +            
    +

    + The code above would construct a query: +

    +            
    +            (filename:query) +(contents:query) -(description:query)
    +            
    +            
    + +
    + Query string to parse + + Fields to search on + + Flags describing the fields + + Analyzer to use + + ParseException if query parsing fails + IllegalArgumentException if the length of the fields array differs + from the length of the flags array + + Use + {@link #Parse(Version, String, String[], BooleanClause.Occur[], Analyzer)} + instead + +
    + + Parses a query, searching on the fields specified. Use this if you need + to specify certain fields as required, and others as prohibited. +

    + +

    +            Usage:
    +            <code>
    +            String[] fields = {"filename", "contents", "description"};
    +            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
    +            BooleanClause.Occur.MUST,
    +            BooleanClause.Occur.MUST_NOT};
    +            MultiFieldQueryParser.parse("query", fields, flags, analyzer);
    +            </code>
    +            
    +

    + The code above would construct a query: + +

    +            <code>
    +            (filename:query) +(contents:query) -(description:query)
    +            </code>
    +            
    + +
    + Lucene version to match; this is passed through to + QueryParser. + + Query string to parse + + Fields to search on + + Flags describing the fields + + Analyzer to use + + ParseException + if query parsing fails + + IllegalArgumentException + if the length of the fields array differs from the length of + the flags array + +
    + + Parses a query, searching on the fields specified. + Use this if you need to specify certain fields as required, + and others as prohibited. +

    +            Usage:
    +            
    +            String[] query = {"query1", "query2", "query3"};
    +            String[] fields = {"filename", "contents", "description"};
    +            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
    +            BooleanClause.Occur.MUST,
    +            BooleanClause.Occur.MUST_NOT};
    +            MultiFieldQueryParser.parse(query, fields, flags, analyzer);
    +            
    +            
    +

    + The code above would construct a query: +

    +            
    +            (filename:query1) +(contents:query2) -(description:query3)
    +            
    +            
    + +
    + Queries string to parse + + Fields to search on + + Flags describing the fields + + Analyzer to use + + ParseException if query parsing fails + IllegalArgumentException if the length of the queries, fields, + and flags array differ + + Used + {@link #Parse(Version, String[], String[], BooleanClause.Occur[], Analyzer)} + instead + +
    + + Parses a query, searching on the fields specified. Use this if you need + to specify certain fields as required, and others as prohibited. +

    + +

    +            Usage:
    +            <code>
    +            String[] query = {"query1", "query2", "query3"};
    +            String[] fields = {"filename", "contents", "description"};
    +            BooleanClause.Occur[] flags = {BooleanClause.Occur.SHOULD,
    +            BooleanClause.Occur.MUST,
    +            BooleanClause.Occur.MUST_NOT};
    +            MultiFieldQueryParser.parse(query, fields, flags, analyzer);
    +            </code>
    +            
    +

    + The code above would construct a query: + +

    +            <code>
    +            (filename:query1) +(contents:query2) -(description:query3)
    +            </code>
    +            
    + +
    + Lucene version to match; this is passed through to + QueryParser. + + Queries string to parse + + Fields to search on + + Flags describing the fields + + Analyzer to use + + ParseException + if query parsing fails + + IllegalArgumentException + if the length of the queries, fields, and flags array differ + +
    + + Filename filter that accept filenames and extensions only created by Lucene. + + + $rcs = ' $Id: Exp $ ' ; + + + + Returns true if this is a file that would be contained + in a CFS file. This function should only be called on + files that pass the above "accept" (ie, are already + known to be a Lucene index file). + + + + + + + + + + Constructs a new runtime exception with null as its + detail message. The cause is not initialized, and may subsequently be + initialized by a call to {@link #innerException}. + + + + Constructs a new runtime exception with the specified cause and a + detail message of (cause==null ? null : cause.toString()) + (which typically contains the class and detail message of + cause). +

    + This constructor is useful for runtime exceptions + that are little more than wrappers for other throwables. + +

    + the cause (which is saved for later retrieval by the + {@link #InnerException()} method). (A null value is + permitted, and indicates that the cause is nonexistent or + unknown.) + + 1.4 + +
    + + Constructs a new runtime exception with the specified detail message. + The cause is not initialized, and may subsequently be initialized by a + call to {@link #innerException}. + + + the detail message. The detail message is saved for + later retrieval by the {@link #getMessage()} method. + + + + Constructs a new runtime exception with the specified detail message and + cause.

    Note that the detail message associated with + cause is not automatically incorporated in + this runtime exception's detail message. + +

    + the detail message (which is saved for later retrieval + by the {@link #getMessage()} method). + + the cause (which is saved for later retrieval by the + {@link #InnerException()} method). (A null value is + permitted, and indicates that the cause is nonexistent or + unknown.) + + 1.4 + +
    + + A Token's lexical type. The Default value is "word". + + + Returns this Token's lexical type. Defaults to "word". + + + Set the lexical type. + + + + + This analyzer is used to facilitate scenarios where different + fields require different analysis techniques. Use {@link #addAnalyzer} + to add a non-default analyzer on a field name basis. + +

    Example usage: + +

    +            PerFieldAnalyzerWrapper aWrapper =
    +            new PerFieldAnalyzerWrapper(new StandardAnalyzer());
    +            aWrapper.addAnalyzer("firstname", new KeywordAnalyzer());
    +            aWrapper.addAnalyzer("lastname", new KeywordAnalyzer());
    +            
    + +

    In this example, StandardAnalyzer will be used for all fields except "firstname" + and "lastname", for which KeywordAnalyzer will be used. + +

    A PerFieldAnalyzerWrapper can be used like any other analyzer, for both indexing + and query parsing. +

    +
    + + Constructs with default analyzer. + + + Any fields not specifically + defined to use a different analyzer will use the one provided here. + + + + Constructs with default analyzer and a map of analyzers to use for + specific fields. + + + Any fields not specifically + defined to use a different analyzer will use the one provided here. + + a Map (String field name to the Analyzer) to be + used for those fields + + + + Defines an analyzer to use for the specified field. + + + field name requiring a non-default analyzer + + non-default analyzer to use for field + + + + Return the positionIncrementGap from the analyzer assigned to fieldName + + + LowerCaseTokenizer performs the function of LetterTokenizer + and LowerCaseFilter together. It divides text at non-letters and converts + them to lower case. While it is functionally equivalent to the combination + of LetterTokenizer and LowerCaseFilter, there is a performance advantage + to doing the two tasks at once, hence this (redundant) implementation. +

    + Note: this does a decent job for most European languages, but does a terrible + job for some Asian languages, where words are not separated by spaces. +

    +
    + + Construct a new LowerCaseTokenizer. + + + Construct a new LowerCaseTokenizer using a given {@link AttributeSource}. + + + Construct a new LowerCaseTokenizer using a given {@link Lucene.Net.Util.AttributeSource.AttributeFactory}. + + + Converts char to lower case + {@link Character#toLowerCase(char)}. + + + + Helper class for keeping Listss of Objects associated with keys. WARNING: THIS CLASS IS NOT THREAD SAFE + + + the backing store for this object + + + + direct access to the map backing this object. + + + + Adds val to the Set associated with key in the Map. If key is not + already in the map, a new Set will first be created. + + the size of the Set associated with key once val is added to it. + + + + Adds multiple vals to the Set associated with key in the Map. + If key is not + already in the map, a new Set will first be created. + + the size of the Set associated with key once val is added to it. + + + + + Not implemented. Waiting for volunteers. + + + + + Not implemented. Waiting for volunteers. + + + + Simple standalone tool that forever acquires & releases a + lock using a specific LockFactory. Run without any args + to see usage. + + + + + + + + + Writes bytes through to a primary IndexOutput, computing + checksum as it goes. Note that you cannot use seek(). + + + + + Represents hits returned by {@link Searcher#search(Query,Filter,int,Sort)}. + + + + The fields which were used to sort results by. + + + Creates one of these objects. + Total number of hits for the query. + + The top hits for the query. + + The sort criteria used to find the top hits. + + The maximum score encountered. + + + + A Filter that restricts search results to a range of values in a given + field. + +

    This filter matches the documents looking for terms that fall into the + supplied range according to {@link String#compareTo(String)}. It is not intended + for numerical ranges, use {@link NumericRangeFilter} instead. + +

    If you construct a large number of range filters with different ranges but on the + same field, {@link FieldCacheRangeFilter} may have significantly better performance. +

    + 2.9 + +
    + + The field this range applies to + + The lower bound on this range + + The upper bound on this range + + Does this range include the lower bound? + + Does this range include the upper bound? + + IllegalArgumentException if both terms are null or if + lowerTerm is null and includeLower is true (similar for upperTerm + and includeUpper) + + + + WARNING: Using this constructor and supplying a non-null + value in the collator parameter will cause every single + index Term in the Field referenced by lowerTerm and/or upperTerm to be + examined. Depending on the number of index Terms in this Field, the + operation could be very slow. + + + The lower bound on this range + + The upper bound on this range + + Does this range include the lower bound? + + Does this range include the upper bound? + + The collator to use when determining range inclusion; set + to null to use Unicode code point ordering instead of collation. + + IllegalArgumentException if both terms are null or if + lowerTerm is null and includeLower is true (similar for upperTerm + and includeUpper) + + + + Constructs a filter for field fieldName matching + less than or equal to upperTerm. + + + + Constructs a filter for field fieldName matching + greater than or equal to lowerTerm. + + + + Returns the field name for this filter + + + Returns the lower value of this range filter + + + Returns the upper value of this range filter + + + Returns true if the lower endpoint is inclusive + + + Returns true if the upper endpoint is inclusive + + + Returns the collator used to determine range inclusion, if any. + + + A Query that matches documents containing a particular sequence of terms. + A PhraseQuery is built by QueryParser for input like "new york". + +

    This query may be combined with other terms or queries with a {@link BooleanQuery}. +

    +
    + + Constructs an empty phrase query. + + + Sets the number of other words permitted between words in query phrase. + If zero, then this is an exact phrase search. For larger values this works + like a WITHIN or NEAR operator. +

    The slop is in fact an edit-distance, where the units correspond to + moves of terms in the query phrase out of position. For example, to switch + the order of two words requires two moves (the first move places the words + atop one another), so to permit re-orderings of phrases, the slop must be + at least two. +

    More exact matches are scored higher than sloppier matches, thus search + results are sorted by exactness. +

    The slop is zero by default, requiring exact matches. +

    +
    + + Returns the slop. See setSlop(). + + + Adds a term to the end of the query phrase. + The relative position of the term is the one immediately after the last term added. + + + + Adds a term to the end of the query phrase. + The relative position of the term within the phrase is specified explicitly. + This allows e.g. phrases with more than one term at the same position + or phrases with gaps (e.g. in connection with stopwords). + + + + + + + + + Returns the set of terms in this phrase. + + + Returns the relative positions of terms in this phrase. + + + + + + + Prints a user-readable version of this query. + + + Returns true iff o is equal to this. + + + Returns a hash code value for this object. + + + + + Creates a new instance of the provider class for the given IndexReader. + + + + + * Compute a custom score by the subQuery score and a number of + ValueSourceQuery scores. +

    + Subclasses can override this method to modify the custom score. +

    + If your custom scoring is different than the default herein you + should override at least one of the two customScore() methods. + If the number of ValueSourceQueries is always < 2 it is + sufficient to override the other + {@link #customScore(int, float, float) customScore()} + method, which is simpler. +

    + The default computation herein is a multiplication of given scores: +

    +                ModifiedScore = valSrcScore * valSrcScores[0] * valSrcScores[1] * ...
    +            
    +
    + id of scored doc + score of that doc by the subQuery + scores of that doc by the ValueSourceQuery + custom score +
    + + + + Explain the custom score. + Whenever overriding {@link #customScore(int, float, float[])}, + this method should also be overridden to provide the correct explanation + for the part of the custom scoring. + + doc being explained + explanation for the sub-query part + explanation for the value source part + an explanation for the custom score + + + + Explain the custom score. + Whenever overriding {@link #customScore(int, float, float)}, + this method should also be overridden to provide the correct explanation + for the part of the custom scoring. + + + doc being explained + explanation for the sub-query part + explanation for the value source part + an explanation for the custom score + + + Expert: obtains single byte field values from the + {@link Lucene.Net.Search.FieldCache FieldCache} + using getBytes() and makes those values + available as other numeric types, casting as needed. + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

    + for requirements" + on the field. + +

    NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

    + + + +

    Create a cached byte field source with default string-to-byte parser. +
    + + Create a cached byte field source with a specific string-to-byte parser. + + + use {@link #Score(Collector, int, int)} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Score(Collector)} instead. + + + + use {@link #Advance(int)} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + A simple hash table of document scores within a range. + + + An efficient implementation of JavaCC's CharStream interface.

    Note that + this does not do line-number counting, but instead keeps track of the + character position of the token in the input, as required by Lucene's {@link + Lucene.Net.Analysis.Token} API. + +

    +
    + + Constructs from a Reader. + + + Load the First field and break. +

    + See {@link FieldSelectorResult#LOAD_AND_BREAK} +

    +
    + + This attribute can be used to pass different flags down the tokenizer chain, + eg from one TokenFilter to another one. + + + + EXPERIMENTAL: While we think this is here to stay, we may want to change it to be a long. +

    + + Get the bitset for any bits that have been set. This is completely distinct from {@link TypeAttribute#Type()}, although they do share similar purposes. + The flags can be used to encode information about the token for use by other {@link Lucene.Net.Analysis.TokenFilter}s. + + +

    + The bits + +
    + + + + + + Transforms the token stream as per the Porter stemming algorithm. + Note: the input to the stemming filter must already be in lower case, + so you will need to use LowerCaseFilter or LowerCaseTokenizer farther + down the Tokenizer chain in order for this to work properly! +

    + To use this filter with other analyzers, you'll want to write an + Analyzer class that sets up the TokenStream chain as you want it. + To use this with LowerCaseTokenizer, for example, you'd write an + analyzer like this: +

    +

    +            class MyAnalyzer extends Analyzer {
    +            public final TokenStream tokenStream(String fieldName, Reader reader) {
    +            return new PorterStemFilter(new LowerCaseTokenizer(reader));
    +            }
    +            }
    +            
    +
    +
    + + Estimates the size of a given Object using a given MemoryModel for primitive + size information. + + Resource Usage: + + Internally uses a Map to temporally hold a reference to every + object seen. + + If checkIntered, all Strings checked will be interned, but those + that were not already interned will be released for GC when the + estimate is complete. + + + + Constructs this object with an AverageGuessMemoryModel and + checkInterned = true. + + + + check if Strings are interned and don't add to size + if they are. Defaults to true but if you know the objects you are checking + won't likely contain many interned Strings, it will be faster to turn off + intern checking. + + + + MemoryModel to use for primitive object sizes. + + + + MemoryModel to use for primitive object sizes. + + check if Strings are interned and don't add to size + if they are. Defaults to true but if you know the objects you are checking + won't likely contain many interned Strings, it will be faster to turn off + intern checking. + + + + Return good default units based on byte size. + + + Provides methods for sanity checking that entries in the FieldCache + are not wasteful or inconsistent. +

    +

    + Lucene 2.9 Introduced numerous enhancements into how the FieldCache + is used by the low levels of Lucene searching (for Sorting and + ValueSourceQueries) to improve both the speed for Sorting, as well + as reopening of IndexReaders. But these changes have shifted the + usage of FieldCache from "top level" IndexReaders (frequently a + MultiReader or DirectoryReader) down to the leaf level SegmentReaders. + As a result, existing applications that directly access the FieldCache + may find RAM usage increase significantly when upgrading to 2.9 or + Later. This class provides an API for these applications (or their + Unit tests) to check at run time if the FieldCache contains "insane" + usages of the FieldCache. +

    +

    + EXPERIMENTAL API: This API is considered extremely advanced and + experimental. It may be removed or altered w/o warning in future releases + of Lucene. +

    +

    + + + + + + +
    + + If set, will be used to estimate size for all CacheEntry objects + dealt with. + + + + Quick and dirty convenience method + + + + + Quick and dirty convenience method that instantiates an instance with + "good defaults" and uses it to test the CacheEntry[] + + + + + + Tests a CacheEntry[] for indication of "insane" cache usage. +

    + NOTE:FieldCache CreationPlaceholder objects are ignored. + (:TODO: is this a bad idea? are we masking a real problem?) +

    +

    +
    + + Internal helper method used by check that iterates over + valMismatchKeys and generates a Collection of Insanity + instances accordingly. The MapOfSets are used to populate + the Insantiy objects. + + + + + + Internal helper method used by check that iterates over + the keys of readerFieldToValIds and generates a Collection + of Insanity instances whenever two (or more) ReaderField instances are + found that have an ancestery relationships. + + + + + + + Checks if the seed is an IndexReader, and if so will walk + the hierarchy of subReaders building up a list of the objects + returned by obj.getFieldCacheKey() + + + + Simple pair object for using "readerKey + fieldName" a Map key + + + Simple container for a collection of related CacheEntry objects that + in conjunction with eachother represent some "insane" usage of the + FieldCache. + + + + Type of insane behavior this object represents + + + Description of hte insane behavior + + + CacheEntry objects which suggest a problem + + + Multi-Line representation of this Insanity object, starting with + the Type and Msg, followed by each CacheEntry.toString() on it's + own line prefaced by a tab character + + + + An Enumaration of the differnet types of "insane" behavior that + may be detected in a FieldCache. + + + + + + + + + + + Indicates an overlap in cache usage on a given field + in sub/super readers. + + + +

    + Indicates entries have the same reader+fieldname but + different cached values. This can happen if different datatypes, + or parsers are used -- and while it's not necessarily a bug + it's typically an indication of a possible problem. +

    +

    + PNOTE: Only the reader, fieldname, and cached value are actually + tested -- if two cache entries have different parsers or datatypes but + the cached values are the same Object (== not just equal()) this method + does not consider that a red flag. This allows for subtle variations + in the way a Parser is specified (null vs DEFAULT_LONG_PARSER, etc...) +

    +

    +
    + + Indicates an expected bit of "insanity". This may be useful for + clients that wish to preserve/log information about insane usage + but indicate that it was expected. + + + + File-based {@link Directory} implementation that uses + mmap for reading, and {@link + SimpleFSDirectory.SimpleFSIndexOutput} for writing. + +

    NOTE: memory mapping uses up a portion of the + virtual memory address space in your process equal to the + size of the file being mapped. Before using this class, + be sure your have plenty of virtual address space, e.g. by + using a 64 bit JRE, or a 32 bit JRE with indexes that are + guaranteed to fit within the address space. + On 32 bit platforms also consult {@link #setMaxChunkSize} + if you have problems with mmap failing because of fragmented + address space. If you get an OutOfMemoryException, it is recommened + to reduce the chunk size, until it works. + +

    Due to + this bug in Sun's JRE, MMapDirectory's {@link IndexInput#close} + is unable to close the underlying OS file handle. Only when GC + finally collects the underlying objects, which could be quite + some time later, will the file handle be closed. + +

    This will consume additional transient disk usage: on Windows, + attempts to delete or overwrite the files will result in an + exception; on other platforms, which typically have a "delete on + last close" semantics, while such operations will succeed, the bytes + are still consuming space on disk. For many applications this + limitation is not a problem (e.g. if you have plenty of disk space, + and you don't rely on overwriting files on Windows) but it's still + an important limitation to be aware of. + +

    This class supplies the workaround mentioned in the bug report + (disabled by default, see {@link #setUseUnmap}), which may fail on + non-Sun JVMs. It forcefully unmaps the buffer on close by using + an undocumented internal cleanup functionality. + {@link #UNMAP_SUPPORTED} is true, if the workaround + can be enabled (with no guarantees). +

    +
    + + Create a new MMapDirectory for the named location. + + + the path of the directory + + the lock factory to use, or null for the default. + + IOException + + + Create a new MMapDirectory for the named location. + + + the path of the directory + + the lock factory to use, or null for the default. + + IOException + + + Create a new MMapDirectory for the named location and the default lock factory. + + + the path of the directory + + IOException + + + Create a new MMapDirectory for the named location and the default lock factory. + + + the path of the directory + + IOException + + + + + + + true, if this platform supports unmapping mmaped files. + + + This method enables the workaround for unmapping the buffers + from address space after closing {@link IndexInput}, that is + mentioned in the bug report. This hack may fail on non-Sun JVMs. + It forcefully unmaps the buffer on close by using + an undocumented internal cleanup functionality. +

    NOTE: Enabling this is completely unsupported + by Java and may lead to JVM crashs if IndexInput + is closed while another thread is still accessing it (SIGSEGV). +

    + IllegalArgumentException if {@link #UNMAP_SUPPORTED} + is false and the workaround cannot be enabled. + +
    + + Returns true, if the unmap workaround is enabled. + + + + + Try to unmap the buffer, this method silently fails if no support + for that in the JVM. On Windows, this leads to the fact, + that mmapped files cannot be modified or deleted. + + + + Sets the maximum chunk size (default is {@link Integer#MAX_VALUE} for + 64 bit JVMs and 256 MiBytes for 32 bit JVMs) used for memory mapping. + Especially on 32 bit platform, the address space can be very fragmented, + so large index files cannot be mapped. + Using a lower chunk size makes the directory implementation a little + bit slower (as the correct chunk must be resolved on each seek) + but the chance is higher that mmap does not fail. On 64 bit + Java platforms, this parameter should always be {@link Integer#MAX_VALUE}, + as the adress space is big enough. + + + + Returns the current mmap chunk size. + + + + + Creates an IndexInput for the file with the given name. + + + Creates an IndexOutput for the file with the given name. + + + Writes bytes through to a primary IndexOutput, computing + checksum. Note that you cannot use seek(). + + + + Starts but does not complete the commit of this file (= + writing of the final checksum at the end). After this + is called must call {@link #finishCommit} and the + {@link #close} to complete the commit. + + + + See {@link #prepareCommit} + + + Abstract base class for sorting hits returned by a Query. + +

    + This class should only be used if the other SortField types (SCORE, DOC, + STRING, INT, FLOAT) do not provide an adequate sorting. It maintains an + internal cache of values which could be quite large. The cache is an array of + Comparable, one for each document in the index. There is a distinct + Comparable for each unique term in the field - if some documents have the + same term in the field, the cache array will have entries which reference the + same Comparable. + + This class will be used as part of a key to a FieldCache value. You must + implement hashCode and equals to avoid an explosion in RAM usage if you use + instances that are not the same instance. If you are searching using the + Remote contrib, the same instance of this class on the client will be a new + instance on every call to the server, so hashCode/equals is very important in + that situation. + +

    + Created: Apr 21, 2004 5:08:38 PM + + +

    + $Id: SortComparator.java 800119 2009-08-02 17:59:21Z markrmiller $ + + 1.4 + + Please use {@link FieldComparatorSource} instead. + +
    + + Returns an object which, when sorted according to natural order, + will order the Term values in the correct order. +

    For example, if the Terms contained integer values, this method + would return new Integer(termtext). Note that this + might not always be the most efficient implementation - for this + particular example, a better implementation might be to make a + ScoreDocLookupComparator that uses an internal lookup table of int. +

    + The textual value of the term. + + An object representing termtext that sorts according to the natural order of termtext. + + + + + +
    + + A Scorer for OR like queries, counterpart of ConjunctionScorer. + This Scorer implements {@link Scorer#SkipTo(int)} and uses skipTo() on the given Scorers. + TODO: Implement score(HitCollector, int). + + + + The number of subscorers. + + + The subscorers. + + + The minimum number of scorers that should match. + + + The scorerDocQueue contains all subscorers ordered by their current doc(), + with the minimum at the top. +
    The scorerDocQueue is initialized the first time next() or skipTo() is called. +
    An exhausted scorer is immediately removed from the scorerDocQueue. +
    If less than the minimumNrMatchers scorers + remain in the scorerDocQueue next() and skipTo() return false. +

    + After each to call to next() or skipTo() + currentSumScore is the total score of the current matching doc, + nrMatchers is the number of matching scorers, + and all scorers are after the matching doc, or are exhausted. +

    +
    + + The document number of the current match. + + + The number of subscorers that provide the current match. + + + Construct a DisjunctionScorer. + A collection of at least two subscorers. + + The positive minimum number of subscorers that should + match to match this query. +
    When minimumNrMatchers is bigger than + the number of subScorers, + no matches will be produced. +
    When minimumNrMatchers equals the number of subScorers, + it more efficient to use ConjunctionScorer. + +
    + + Construct a DisjunctionScorer, using one as the minimum number + of matching subscorers. + + + + Called the first time next() or skipTo() is called to + initialize scorerDocQueue. + + + + Scores and collects all matching documents. + The collector to which all matching documents are passed through + {@link HitCollector#Collect(int, float)}. +
    When this method is used the {@link #Explain(int)} method should not be used. + + use {@link #Score(Collector)} instead. + +
    + + Scores and collects all matching documents. + The collector to which all matching documents are passed through. +
    When this method is used the {@link #Explain(int)} method should not be used. + +
    + + Expert: Collects matching documents in a range. Hook for optimization. + Note that {@link #Next()} must be called once before this method is called + for the first time. + + The collector to which all matching documents are passed through + {@link HitCollector#Collect(int, float)}. + + Do not score documents past this. + + true if more matching documents may remain. + + use {@link #Score(Collector, int, int)} instead. + + + + Expert: Collects matching documents in a range. Hook for optimization. + Note that {@link #Next()} must be called once before this method is called + for the first time. + + The collector to which all matching documents are passed through. + + Do not score documents past this. + + true if more matching documents may remain. + + + + use {@link #NextDoc()} instead. + + + + Advance all subscorers after the current document determined by the + top of the scorerDocQueue. + Repeat until at least the minimum number of subscorers match on the same + document and all subscorers are after that document or are exhausted. +
    On entry the scorerDocQueue has at least minimumNrMatchers + available. At least the scorer with the minimum document number will be advanced. +
    + true iff there is a match. +
    In case there is a match, currentDoc, currentSumScore, + and nrMatchers describe the match. + + TODO: Investigate whether it is possible to use skipTo() when + the minimum number of matchers is bigger than one, ie. try and use the + character of ConjunctionScorer for the minimum number of matchers. + Also delay calling score() on the sub scorers until the minimum number of + matchers is reached. +
    For this, a Scorer array with minimumNrMatchers elements might + hold Scorers at currentDoc that are temporarily popped from scorerQueue. +
    +
    + + Returns the score of the current document matching the query. + Initially invalid, until {@link #Next()} is called the first time. + + + + use {@link #DocID()} instead. + + + + Returns the number of subscorers matching the current document. + Initially invalid, until {@link #Next()} is called the first time. + + + + Skips to the first match beyond the current whose document number is + greater than or equal to a given target.
    + When this method is used the {@link #Explain(int)} method should not be + used.
    + The implementation uses the skipTo() method on the subscorers. + +
    + The target document number. + + true iff there is such a match. + + use {@link #Advance(int)} instead. + +
    + + Advances to the first match beyond the current whose document number is + greater than or equal to a given target.
    + When this method is used the {@link #Explain(int)} method should not be + used.
    + The implementation uses the skipTo() method on the subscorers. + +
    + The target document number. + + the document whose number is greater than or equal to the given + target, or -1 if none exist. + +
    + + An explanation for the score of a given document. + + + + This is a DocFieldConsumer that writes stored fields. + + + Fills in any hole in the docIDs + + + This exception is thrown when an {@link IndexReader} + tries to make changes to the index (via {@link + IndexReader#deleteDocument}, {@link + IndexReader#undeleteAll} or {@link IndexReader#setNorm}) + but changes have already been committed to the index + since this reader was instantiated. When this happens + you must open a new reader on the current index to make + the changes. + + + + For each Field, store position by position information. It ignores frequency information +

    + This is not thread-safe. +

    +
    + + A Map of Integer and TVPositionInfo + + + + + + + + Never ignores positions. This mapper doesn't make much sense unless there are positions + false + + + + Callback for the TermVectorReader. + + + + + + + + + + + Callback mechanism used by the TermVectorReader + The field being read + + The number of terms in the vector + + Whether offsets are available + + Whether positions are available + + + + Get the mapping between fields and terms, sorted by the comparator + + + A map between field names and a Map. The sub-Map key is the position as the integer, the value is {@link Lucene.Net.Index.PositionBasedTermVectorMapper.TVPositionInfo}. + + + + Container for a term at a position + + + + The position of the term + + + + Note, there may be multiple terms at the same position + A List of Strings + + + + Parallel list (to {@link #getTerms()}) of TermVectorOffsetInfo objects. There may be multiple entries since there may be multiple terms at a position + A List of TermVectorOffsetInfo objects, if offsets are store. + + + + This is a DocFieldConsumer that inverts each field, + separately, from a Document, and accepts a + InvertedTermsConsumer to process those terms. + + + + The positionIncrement determines the position of this token + relative to the previous Token in a {@link TokenStream}, used in phrase + searching. + +

    The default value is one. + +

    Some common uses for this are:

      + +
    • Set it to zero to put multiple terms in the same position. This is + useful if, e.g., a word has multiple stems. Searches for phrases + including either stem will match. In this case, all but the first stem's + increment should be set to zero: the increment of the first instance + should be one. Repeating a token with an increment of zero can also be + used to boost the scores of matches on that token.
    • + +
    • Set it to values greater than one to inhibit exact phrase matches. + If, for example, one does not want phrases to match across removed stop + words, then one could build a stop word filter that removes stop words and + also sets the increment to the number of stop words removed before each + non-stop word. Then exact phrase queries will only match when the terms + occur with no intervening stop words.
    • + +
    +
    +
    + + Set the position increment. The default value is one. + + + the distance from the prior term + + + + Returns the position increment of this Token. + + + + + + Stemmer, implementing the Porter Stemming Algorithm + + The Stemmer class transforms a word into its root form. The input + word can be provided a character at time (by calling add()), or at once + by calling one of the various stem(something) methods. + + + + reset() resets the stemmer so it can stem another word. If you invoke + the stemmer by calling add(char) and then stem(), you must call reset() + before starting another word. + + + + Add a character to the word being stemmed. When you are finished + adding characters, you can call stem(void) to process the word. + + + + After a word has been stemmed, it can be retrieved by toString(), + or a reference to the internal buffer can be retrieved by getResultBuffer + and getResultLength (which is generally more efficient.) + + + + Returns the length of the word resulting from the stemming process. + + + Returns a reference to a character buffer containing the results of + the stemming process. You also need to consult getResultLength() + to determine the length of the result. + + + + Stem a word provided as a String. Returns the result as a String. + + + Stem a word contained in a char[]. Returns true if the stemming process + resulted in a word different from the input. You can retrieve the + result with getResultLength()/getResultBuffer() or toString(). + + + + Stem a word contained in a portion of a char[] array. Returns + true if the stemming process resulted in a word different from + the input. You can retrieve the result with + getResultLength()/getResultBuffer() or toString(). + + + + Stem a word contained in a leading portion of a char[] array. + Returns true if the stemming process resulted in a word different + from the input. You can retrieve the result with + getResultLength()/getResultBuffer() or toString(). + + + + Stem the word placed into the Stemmer buffer through calls to add(). + Returns true if the stemming process resulted in a word different + from the input. You can retrieve the result with + getResultLength()/getResultBuffer() or toString(). + + + + Test program for demonstrating the Stemmer. It reads a file and + stems each word, writing the result to standard out. + Usage: Stemmer file-name + + + + "Tokenizes" the entire stream as a single token. This is useful + for data like zip codes, ids, and some product names. + + + + Simple lockless and memory barrier free String intern cache that is guaranteed + to return the same String instance as String.intern() does. + + + + Subclasses of StringInterner are required to + return the same single String object for all equal strings. + Depending on the implementation, this may not be + the same object returned as String.intern(). + + This StringInterner base class simply delegates to String.intern(). + + + + Returns a single object instance for each equal string. + + + Returns a single object instance for each equal string. + + + Size of the hash table, should be a power of two. + + Maximum length of each bucket, after which the oldest item inserted is dropped. + + + + This exception is thrown when the write.lock + could not be released. + + + + + + Implements the wildcard search query. Supported wildcards are *, which + matches any character sequence (including the empty one), and ?, + which matches any single character. Note this query can be slow, as it + needs to iterate over many terms. In order to prevent extremely slow WildcardQueries, + a Wildcard term should not start with one of the wildcards * or + ?. + +

    This query uses the {@link + MultiTermQuery#CONSTANT_SCORE_AUTO_REWRITE_DEFAULT} + rewrite method. + +

    + + +
    + + Returns the pattern term. + + + Prints a user-readable version of this query. + + + A Query that matches documents containing a term. + This may be combined with other terms with a {@link BooleanQuery}. + + + + Constructs a query for the term t. + + + Returns the term of this query. + + + Prints a user-readable version of this query. + + + Returns true iff o is equal to this. + + + Returns a hash code value for this object. + + + Similar to {@link NearSpansOrdered}, but for the unordered case. + + Expert: + Only public for subclassing. Most implementations should not need this class + + + + WARNING: The List is not necessarily in order of the the positions + Collection of byte[] payloads + + IOException + + + Wraps a Spans, and can be used to form a linked list. + + + + + + + + + + The original list of terms from the query, can contain duplicates + + + + Creates a new instance with size elements. If + prePopulate is set to true, the queue will pre-populate itself + with sentinel objects and set its {@link #Size()} to size. In + that case, you should not rely on {@link #Size()} to get the number of + actual elements that were added to the queue, but keep track yourself.
    + NOTE: in case prePopulate is true, you should pop + elements from the queue using the following code example: + +
    +            PriorityQueue pq = new HitQueue(10, true); // pre-populate.
    +            ScoreDoc top = pq.top();
    +            
    +            // Add/Update one element.
    +            top.score = 1.0f;
    +            top.doc = 0;
    +            top = (ScoreDoc) pq.updateTop();
    +            int totalHits = 1;
    +            
    +            // Now pop only the elements that were *truly* inserted.
    +            // First, pop all the sentinel elements (there are pq.size() - totalHits).
    +            for (int i = pq.size() - totalHits; i > 0; i--) pq.pop();
    +            
    +            // Now pop the truly added elements.
    +            ScoreDoc[] results = new ScoreDoc[totalHits];
    +            for (int i = totalHits - 1; i >= 0; i--) {
    +            results[i] = (ScoreDoc) pq.pop();
    +            }
    +            
    + +

    NOTE: This class pre-allocate a full array of + length size. + +

    + the requested size of this queue. + + specifies whether to pre-populate the queue with sentinel values. + + + +
    + + An iterator over {@link Hits} that provides lazy fetching of each document. + {@link Hits#Iterator()} returns an instance of this class. Calls to {@link #next()} + return a {@link Hit} instance. + + + Use {@link TopScoreDocCollector} and {@link TopDocs} instead. Hits will be removed in Lucene 3.0. + + + + Constructed from {@link Hits#Iterator()}. + + + true if current hit is less than the total number of {@link Hits}. + + + + Unsupported operation. + + + UnsupportedOperationException + + + Returns the total number of hits. + + + Returns a {@link Hit} instance representing the next hit in {@link Hits}. + + + Next {@link Hit}. + + + + Query that sets document score as a programmatic function of several (sub) scores: +
      +
    1. the score of its subQuery (any query)
    2. +
    3. (optional) the score of its ValueSourceQuery (or queries). + For most simple/convenient use cases this query is likely to be a + {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQuery}
    4. +
    + Subclasses can modify the computation by overriding {@link #getCustomScoreProvider}. + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. +

    +
    + + Create a CustomScoreQuery over input subQuery. + the sub query whose scored is being customed. Must not be null. + + + + Create a CustomScoreQuery over input subQuery and a {@link ValueSourceQuery}. + the sub query whose score is being customed. Must not be null. + + a value source query whose scores are used in the custom score + computation. For most simple/convineient use case this would be a + {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQuery}. + This parameter is optional - it can be null or even an empty array. + + + + Create a CustomScoreQuery over input subQuery and a {@link ValueSourceQuery}. + the sub query whose score is being customized. Must not be null. + + value source queries whose scores are used in the custom score + computation. For most simple/convenient use case these would be + {@link Lucene.Net.Search.Function.FieldScoreQuery FieldScoreQueries}. + This parameter is optional - it can be null or even an empty array. + + + + Returns true if o is equal to this. + + + Returns a hash code value for this object. + + + Returns a {@link CustomScoreProvider} that calculates the custom scores + for the given {@link IndexReader}. The default implementation returns a default + implementation as specified in the docs of {@link CustomScoreProvider}. + @since 2.9.2 + + + + Compute a custom score by the subQuery score and a number of + ValueSourceQuery scores. + + The doc is relative to the current reader, which is + unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). + Please override {@link #getCustomScoreProvider} and return a subclass + of {@link CustomScoreProvider} for the given {@link IndexReader}. + see CustomScoreProvider#customScore(int,float,float[]) + + + + Compute a custom score by the subQuery score and the ValueSourceQuery score. + + The doc is relative to the current reader, which is + unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). + Please override {@link #getCustomScoreProvider} and return a subclass + of {@link CustomScoreProvider} for the given {@link IndexReader}. + @see CustomScoreProvider#customScore(int,float,float) + + + + Explain the custom score. + + The doc is relative to the current reader, which is + unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). + Please override {@link #getCustomScoreProvider} and return a subclass + of {@link CustomScoreProvider} for the given {@link IndexReader}. + + + + Explain the custom score. + The doc is relative to the current reader, which is + unknown to CustomScoreQuery when using per-segment search (since Lucene 2.9). + Please override {@link #getCustomScoreProvider} and return a subclass + of {@link CustomScoreProvider} for the given {@link IndexReader}. + + + + Checks if this is strict custom scoring. + In strict custom scoring, the ValueSource part does not participate in weight normalization. + This may be useful when one wants full control over how scores are modified, and does + not care about normalizing by the ValueSource part. + One particular case where this is useful if for testing this query. +

    + Note: only has effect when the ValueSource part is not null. +

    +
    + + Set the strict mode of this query. + The strict mode to set. + + + + + + A short name of this query, used in {@link #ToString(String)}. + + + A scorer that applies a (callback) function on scores of the subQuery. + + + use {@link #NextDoc()} instead. + + + + use {@link #DocID()} instead. + + + + use {@link #Advance(int)} instead. + + + + Expert: A ScoreDoc which also contains information about + how to sort the referenced document. In addition to the + document number and score, this object contains an array + of values for the document from the field(s) used to sort. + For example, if the sort criteria was to sort by fields + "a", "b" then "c", the fields object array + will have three elements, corresponding respectively to + the term values for the document in fields "a", "b" and "c". + The class of each element in the array will be either + Integer, Float or String depending on the type of values + in the terms of each field. + +

    Created: Feb 11, 2004 1:23:38 PM + +

    + lucene 1.4 + + $Id: FieldDoc.java 773194 2009-05-09 10:36:41Z mikemccand $ + + + + + +
    + + Expert: The values which are used to sort the referenced document. + The order of these will match the original sort criteria given by a + Sort object. Each Object will be either an Integer, Float or String, + depending on the type of values in the terms of the original field. + + + + + + + + Expert: Creates one of these objects with empty sort information. + + + Expert: Creates one of these objects with the given sort information. + + + The Scorer for DisjunctionMaxQuery's. The union of all documents generated by the the subquery scorers + is generated in document number order. The score for each document is the maximum of the scores computed + by the subquery scorers that generate that document, plus tieBreakerMultiplier times the sum of the scores + for the other subqueries that generate the document. + + + + Creates a new instance of DisjunctionMaxScorer + + + Multiplier applied to non-maximum-scoring subqueries for a + document as they are summed into the result. + + -- not used since our definition involves neither coord nor terms + directly + + The sub scorers this Scorer should iterate on + + The actual number of scorers to iterate on. Note that the array's + length may be larger than the actual number of scorers. + + + + Generate the next document matching our associated DisjunctionMaxQuery. + + + true iff there is a next document + + use {@link #NextDoc()} instead. + + + + use {@link #DocID()} instead. + + + + Determine the current document score. Initially invalid, until {@link #Next()} is called the first time. + the score of the current generated document + + + + Advance to the first document beyond the current whose number is greater + than or equal to target. + + + the minimum number of the next desired document + + true iff there is a document to be generated whose number is at + least target + + use {@link #Advance(int)} instead. + + + + Explain a score that we computed. UNSUPPORTED -- see explanation capability in DisjunctionMaxQuery. + the number of a document we scored + + the Explanation for our score + + + + An alternative to BooleanScorer that also allows a minimum number + of optional scorers that should match. +
    Implements skipTo(), and has no limitations on the numbers of added scorers. +
    Uses ConjunctionScorer, DisjunctionScorer, ReqOptScorer and ReqExclScorer. +
    +
    + + The scorer to which all scoring will be delegated, + except for computing and using the coordination factor. + + + + The number of optionalScorers that need to match (if there are any) + + + Creates a {@link Scorer} with the given similarity and lists of required, + prohibited and optional scorers. In no required scorers are added, at least + one of the optional scorers will have to match during the search. + + + The similarity to be used. + + The minimum number of optional added scorers that should match + during the search. In case no required scorers are added, at least + one of the optional scorers will have to match during the search. + + the list of required scorers. + + the list of prohibited scorers. + + the list of optional scorers. + + + + Returns the scorer to be used for match counting and score summing. + Uses requiredScorers, optionalScorers and prohibitedScorers. + + + + Returns the scorer to be used for match counting and score summing. + Uses the given required scorer and the prohibitedScorers. + + A required scorer already built. + + + + Scores and collects all matching documents. + The collector to which all matching documents are passed through + {@link HitCollector#Collect(int, float)}. +
    When this method is used the {@link #Explain(int)} method should not be used. + + use {@link #Score(Collector)} instead. + +
    + + Scores and collects all matching documents. + The collector to which all matching documents are passed through. +
    When this method is used the {@link #Explain(int)} method should not be used. + +
    + + Expert: Collects matching documents in a range. +
    Note that {@link #Next()} must be called once before this method is + called for the first time. +
    + The collector to which all matching documents are passed through + {@link HitCollector#Collect(int, float)}. + + Do not score documents past this. + + true if more matching documents may remain. + + use {@link #Score(Collector, int, int)} instead. + +
    + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + Throws an UnsupportedOperationException. + TODO: Implement an explanation of the coordination factor. + + The document number for the explanation. + + UnsupportedOperationException + + + Count a scorer as a single match. + + + use {@link #DocID()} instead. + + + + use {@link #NextDoc()} instead. + + + + use {@link #Advance(int)} instead. + + + + This is a DocConsumer that gathers all fields under the + same name, and calls per-field consumers to process field + by field. This class doesn't doesn't do any "real" work + of its own: it just forwards the fields to a + DocFieldConsumer. + + + + Basic tool and API to check the health of an index and + write a new segments file that removes reference to + problematic segments. + +

    As this tool checks every byte in the index, on a large + index it can take quite a long time to run. + +

    WARNING: this tool and API is new and + experimental and is subject to suddenly change in the + next release. Please make a complete backup of your + index before using this to fix your index! +

    +
    + + Default PrintStream for all CheckIndex instances. + Use {@link #setInfoStream} per instance, + instead. + + + + Create a new CheckIndex on the directory. + + + Set infoStream where messages should go. If null, no + messages are printed + + + + Returns true if index is clean, else false. + Please instantiate a CheckIndex and then use {@link #CheckIndex()} instead + + + + Returns true if index is clean, else false. + Please instantiate a CheckIndex and then use {@link #CheckIndex(List)} instead + + + + Returns a {@link Status} instance detailing + the state of the index. + +

    As this method checks every byte in the index, on a large + index it can take quite a long time to run. + +

    WARNING: make sure + you only call this when the index is not opened by any + writer. +

    +
    + + Returns a {@link Status} instance detailing + the state of the index. + + + list of specific segment names to check + +

    As this method checks every byte in the specified + segments, on a large index it can take quite a long + time to run. + +

    WARNING: make sure + you only call this when the index is not opened by any + writer. + + + +

    Test field norms. +
    + + Test the term index. + + + Test stored fields for a segment. + + + Test term vectors for a segment. + + + Repairs the index using previously returned result + from {@link #checkIndex}. Note that this does not + remove any of the unreferenced files after it's done; + you must separately open an {@link IndexWriter}, which + deletes unreferenced files when it's created. + +

    WARNING: this writes a + new segments file into the index, effectively removing + all documents in broken segments from the index. + BE CAREFUL. + +

    WARNING: Make sure you only call this when the + index is not opened by any writer. +

    +
    + + Command-line interface to check and fix an index. +

    + Run it like this: +

    +            java -ea:Lucene.Net... Lucene.Net.Index.CheckIndex pathToIndex [-fix] [-segment X] [-segment Y]
    +            
    +
      +
    • -fix: actually write a new segments_N file, removing any problematic segments
    • +
    • -segment X: only check the specified + segment(s). This can be specified multiple times, + to check more than one segment, eg -segment _2 + -segment _a. You can't use this with the -fix + option.
    • +
    +

    WARNING: -fix should only be used on an emergency basis as it will cause + documents (perhaps many) to be permanently removed from the index. Always make + a backup copy of your index before running this! Do not run this tool on an index + that is actively being written to. You have been warned! +

    Run without -fix, this tool will open the index, report version information + and report any exceptions it hits and what action it would take if -fix were + specified. With -fix, this tool will remove any segments that have issues and + write a new segments_N file. This means all documents contained in the affected + segments will be removed. +

    + This tool exits with exit code 1 if the index cannot be opened or has any + corruption, else 0. +

    +
    + + Returned from {@link #CheckIndex()} detailing the health and status of the index. + +

    WARNING: this API is new and experimental and is + subject to suddenly change in the next release. + +

    +
    + + True if no problems were found with the index. + + + True if we were unable to locate and load the segments_N file. + + + True if we were unable to open the segments_N file. + + + True if we were unable to read the version number from segments_N file. + + + Name of latest segments_N file in the index. + + + Number of segments in the index. + + + String description of the version of the index. + + + Empty unless you passed specific segments list to check as optional 3rd argument. + + + + + True if the index was created with a newer version of Lucene than the CheckIndex tool. + + + List of {@link SegmentInfoStatus} instances, detailing status of each segment. + + + Directory index is in. + + + SegmentInfos instance containing only segments that + had no problems (this is used with the {@link CheckIndex#fixIndex} + method to repair the index. + + + + How many documents will be lost to bad segments. + + + How many bad segments were found. + + + True if we checked only specific segments ({@link + #CheckIndex(List)}) was called with non-null + argument). + + + + Holds the userData of the last commit in the index + + + Holds the status of each segment in the index. + See {@link #segmentInfos}. + +

    WARNING: this API is new and experimental and is + subject to suddenly change in the next release. +

    +
    + + Name of the segment. + + + Document count (does not take deletions into account). + + + True if segment is compound file format. + + + Number of files referenced by this segment. + + + Net size (MB) of the files referenced by this + segment. + + + + Doc store offset, if this segment shares the doc + store files (stored fields and term vectors) with + other segments. This is -1 if it does not share. + + + + String of the shared doc store segment, or null if + this segment does not share the doc store files. + + + + True if the shared doc store files are compound file + format. + + + + True if this segment has pending deletions. + + + Name of the current deletions file name. + + + Number of deleted documents. + + + True if we were able to open a SegmentReader on this + segment. + + + + Number of fields in this segment. + + + True if at least one of the fields in this segment + does not omitTermFreqAndPositions. + + + + + + Map<String, String> that includes certain + debugging details that IndexWriter records into + each segment it creates + + + + Status for testing of field norms (null if field norms could not be tested). + + + Status for testing of indexed terms (null if indexed terms could not be tested). + + + Status for testing of stored fields (null if stored fields could not be tested). + + + Status for testing of term vectors (null if term vectors could not be tested). + + + Status from testing field norms. + + + Number of fields successfully tested + + + Exception thrown during term index test (null on success) + + + Status from testing term index. + + + Total term count + + + Total frequency across all terms. + + + Total number of positions. + + + Exception thrown during term index test (null on success) + + + Status from testing stored fields. + + + Number of documents tested. + + + Total number of stored fields tested. + + + Exception thrown during stored fields test (null on success) + + + Status from testing stored fields. + + + Number of documents tested. + + + Total number of term vectors tested. + + + Exception thrown during term vector test (null on success) + + + A field is a section of a Document. Each field has two parts, a name and a + value. Values may be free text, provided as a String or as a Reader, or they + may be atomic keywords, which are not further processed. Such keywords may + be used to represent dates, urls, etc. Fields are optionally stored in the + index, so that they may be returned with hits on the document. + + + + The value of the field as a String, or null. If null, the Reader value or + binary value is used. Exactly one of stringValue(), + readerValue(), and getBinaryValue() must be set. + + + + The value of the field as a Reader, or null. If null, the String value or + binary value is used. Exactly one of stringValue(), + readerValue(), and getBinaryValue() must be set. + + + + The value of the field in Binary, or null. If null, the Reader value, + or String value is used. Exactly one of stringValue(), + readerValue(), and getBinaryValue() must be set. + + This method must allocate a new byte[] if + the {@link AbstractField#GetBinaryOffset()} is non-zero + or {@link AbstractField#GetBinaryLength()} is not the + full length of the byte[]. Please use {@link + AbstractField#GetBinaryValue()} instead, which simply + returns the byte[]. + + + + The TokesStream for this field to be used when indexing, or null. If null, the Reader value + or String value is analyzed to produce the indexed tokens. + + + +

    Expert: change the value of this field. This can + be used during indexing to re-use a single Field + instance to improve indexing speed by avoiding GC cost + of new'ing and reclaiming Field instances. Typically + a single {@link Document} instance is re-used as + well. This helps most on small documents.

    + +

    Each Field instance should only be used once + within a single {@link Document} instance. See ImproveIndexingSpeed + for details.

    +

    +
    + + Expert: change the value of this field. See setValue(String). + + + Expert: change the value of this field. See setValue(String). + + + Expert: change the value of this field. See setValue(String). + + + Expert: change the value of this field. See setValue(String). + use {@link #setTokenStream} + + + + Expert: sets the token stream to be used for indexing and causes isIndexed() and isTokenized() to return true. + May be combined with stored values from stringValue() or binaryValue() + + + + Create a field by specifying its name, value and how it will + be saved in the index. Term vectors will not be stored in the index. + + + The name of the field + + The string to process + + Whether value should be stored in the index + + Whether the field should be indexed, and if so, if it should + be tokenized before indexing + + NullPointerException if name or value is null + IllegalArgumentException if the field is neither stored nor indexed + + + Create a field by specifying its name, value and how it will + be saved in the index. + + + The name of the field + + The string to process + + Whether value should be stored in the index + + Whether the field should be indexed, and if so, if it should + be tokenized before indexing + + Whether term vector should be stored + + NullPointerException if name or value is null + IllegalArgumentException in any of the following situations: +
      +
    • the field is neither stored nor indexed
    • +
    • the field is not indexed but termVector is TermVector.YES
    • +
    +
    +
    + + Create a field by specifying its name, value and how it will + be saved in the index. + + + The name of the field + + Whether to .intern() name or not + + The string to process + + Whether value should be stored in the index + + Whether the field should be indexed, and if so, if it should + be tokenized before indexing + + Whether term vector should be stored + + NullPointerException if name or value is null + IllegalArgumentException in any of the following situations: +
      +
    • the field is neither stored nor indexed
    • +
    • the field is not indexed but termVector is TermVector.YES
    • +
    +
    +
    + + Create a tokenized and indexed field that is not stored. Term vectors will + not be stored. The Reader is read only when the Document is added to the index, + i.e. you may not close the Reader until {@link IndexWriter#AddDocument(Document)} + has been called. + + + The name of the field + + The reader with the content + + NullPointerException if name or reader is null + + + Create a tokenized and indexed field that is not stored, optionally with + storing term vectors. The Reader is read only when the Document is added to the index, + i.e. you may not close the Reader until {@link IndexWriter#AddDocument(Document)} + has been called. + + + The name of the field + + The reader with the content + + Whether term vector should be stored + + NullPointerException if name or reader is null + + + Create a tokenized and indexed field that is not stored. Term vectors will + not be stored. This is useful for pre-analyzed fields. + The TokenStream is read only when the Document is added to the index, + i.e. you may not close the TokenStream until {@link IndexWriter#AddDocument(Document)} + has been called. + + + The name of the field + + The TokenStream with the content + + NullPointerException if name or tokenStream is null + + + Create a tokenized and indexed field that is not stored, optionally with + storing term vectors. This is useful for pre-analyzed fields. + The TokenStream is read only when the Document is added to the index, + i.e. you may not close the TokenStream until {@link IndexWriter#AddDocument(Document)} + has been called. + + + The name of the field + + The TokenStream with the content + + Whether term vector should be stored + + NullPointerException if name or tokenStream is null + + + Create a stored field with binary value. Optionally the value may be compressed. + + + The name of the field + + The binary value + + How value should be stored (compressed or not) + + IllegalArgumentException if store is Store.NO + + + Create a stored field with binary value. Optionally the value may be compressed. + + + The name of the field + + The binary value + + Starting offset in value where this Field's bytes are + + Number of bytes to use for this Field, starting at offset + + How value should be stored (compressed or not) + + IllegalArgumentException if store is Store.NO + + + Specifies whether and how a field should be stored. + + + Store the original field value in the index in a compressed form. This is + useful for long documents and for binary valued fields. + + Please use {@link CompressionTools} instead. + For string fields that were previously indexed and stored using compression, + the new way to achieve this is: First add the field indexed-only (no store) + and additionally using the same field name as a binary, stored field + with {@link CompressionTools#compressString}. + + + + Store the original field value in the index. This is useful for short texts + like a document's title which should be displayed with the results. The + value is stored in its original form, i.e. no analyzer is used before it is + stored. + + + + Do not store the field value in the index. + + + Specifies whether and how a field should be indexed. + + + Do not index the field value. This field can thus not be searched, + but one can still access its contents provided it is + {@link Field.Store stored}. + + + + Index the tokens produced by running the field's + value through an Analyzer. This is useful for + common text. + + + + this has been renamed to {@link #ANALYZED} + + + + Index the field's value without using an Analyzer, so it can be searched. + As no analyzer is used the value will be stored as a single term. This is + useful for unique Ids like product numbers. + + + + This has been renamed to {@link #NOT_ANALYZED} + + + + Expert: Index the field's value without an Analyzer, + and also disable the storing of norms. Note that you + can also separately enable/disable norms by calling + {@link Field#setOmitNorms}. No norms means that + index-time field and document boosting and field + length normalization are disabled. The benefit is + less memory usage as norms take up one byte of RAM + per indexed field for every document in the index, + during searching. Note that once you index a given + field with norms enabled, disabling norms will + have no effect. In other words, for this to have the + above described effect on a field, all instances of + that field must be indexed with NOT_ANALYZED_NO_NORMS + from the beginning. + + + + This has been renamed to + {@link #NOT_ANALYZED_NO_NORMS} + + + + Expert: Index the tokens produced by running the + field's value through an Analyzer, and also + separately disable the storing of norms. See + {@link #NOT_ANALYZED_NO_NORMS} for what norms are + and why you may want to disable them. + + + + Specifies whether and how a field should have term vectors. + + + Do not store term vectors. + + + Store the term vectors of each document. A term vector is a list + of the document's terms and their number of occurrences in that document. + + + + Store the term vector + token position information + + + + + + + Store the term vector + Token offset information + + + + + + + Store the term vector + Token position and offset information + + + + + + + + + This class is a scanner generated by JFlex 1.4.1 @@ -27277,5 +26538,769 @@ if any I/O-Error occurs + + Expert: This class provides a {@link TokenStream} + for indexing numeric values that can be used by {@link + NumericRangeQuery} or {@link NumericRangeFilter}. + +

    Note that for simple usage, {@link NumericField} is + recommended. {@link NumericField} disables norms and + term freqs, as they are not usually needed during + searching. If you need to change these settings, you + should use this class. + +

    See {@link NumericField} for capabilities of fields + indexed numerically.

    + +

    Here's an example usage, for an int field: + +

    +             Field field = new Field(name, new NumericTokenStream(precisionStep).setIntValue(value));
    +             field.setOmitNorms(true);
    +             field.setOmitTermFreqAndPositions(true);
    +             document.add(field);
    +            
    + +

    For optimal performance, re-use the TokenStream and Field instance + for more than one document: + +

    +             NumericTokenStream stream = new NumericTokenStream(precisionStep);
    +             Field field = new Field(name, stream);
    +             field.setOmitNorms(true);
    +             field.setOmitTermFreqAndPositions(true);
    +             Document document = new Document();
    +             document.add(field);
    +            
    +             for(all documents) {
    +               stream.setIntValue(value)
    +               writer.addDocument(document);
    +             }
    +            
    + +

    This stream is not intended to be used in analyzers; + it's more for iterating the different precisions during + indexing a specific numeric value.

    + +

    NOTE: as token streams are only consumed once + the document is added to the index, if you index more + than one numeric field, use a separate NumericTokenStream + instance for each.

    + +

    See {@link NumericRangeQuery} for more details on the + precisionStep + parameter as well as how numeric fields work under the hood.

    + +

    NOTE: This API is experimental and + might change in incompatible ways in the next release. + +

    + 2.9 + +
    + + The full precision token gets this token type assigned. + + + The lower precision tokens gets this token type assigned. + + + Creates a token stream for numeric values using the default precisionStep + {@link NumericUtils#PRECISION_STEP_DEFAULT} (4). The stream is not yet initialized, + before using set a value using the various set???Value() methods. + + + + Creates a token stream for numeric values with the specified + precisionStep. The stream is not yet initialized, + before using set a value using the various set???Value() methods. + + + + Expert: Creates a token stream for numeric values with the specified + precisionStep using the given {@link AttributeSource}. + The stream is not yet initialized, + before using set a value using the various set???Value() methods. + + + + Expert: Creates a token stream for numeric values with the specified + precisionStep using the given + {@link org.apache.lucene.util.AttributeSource.AttributeFactory}. + The stream is not yet initialized, + before using set a value using the various set???Value() methods. + + + + Initializes the token stream with the supplied long value. + the value, for which this TokenStream should enumerate tokens. + + this instance, because of this you can use it the following way: + new Field(name, new NumericTokenStream(precisionStep).SetLongValue(value)) + + + + Initializes the token stream with the supplied int value. + the value, for which this TokenStream should enumerate tokens. + + this instance, because of this you can use it the following way: + new Field(name, new NumericTokenStream(precisionStep).SetIntValue(value)) + + + + Initializes the token stream with the supplied double value. + the value, for which this TokenStream should enumerate tokens. + + this instance, because of this you can use it the following way: + new Field(name, new NumericTokenStream(precisionStep).SetDoubleValue(value)) + + + + Initializes the token stream with the supplied float value. + the value, for which this TokenStream should enumerate tokens. + + this instance, because of this you can use it the following way: + new Field(name, new NumericTokenStream(precisionStep).SetFloatValue(value)) + + + + Simplistic {@link CharFilter} that applies the mappings + contained in a {@link NormalizeCharMap} to the character + stream, and correcting the resulting changes to the + offsets. + + + + Default constructor that takes a {@link CharStream}. + + + Easy-use constructor that takes a {@link Reader}. + + + Replacement for Java 1.5 Character.valueOf() + Move to Character.valueOf() in 3.0 + + + + Returns a Character instance representing the given char value + + + a char value + + a Character representation of the given char value. + + + + An average, best guess, MemoryModel that should work okay on most systems. + + + + + A memory-resident {@link Directory} implementation. Locking + implementation is by default the {@link SingleInstanceLockFactory} + but can be changed with {@link #setLockFactory}. + + + $Id: RAMDirectory.java 781333 2009-06-03 10:38:57Z mikemccand $ + + + + Constructs an empty {@link Directory}. + + + Creates a new RAMDirectory instance from a different + Directory implementation. This can be used to load + a disk-based index into memory. +

    + This should be used only with indices that can fit into memory. +

    + Note that the resulting RAMDirectory instance is fully + independent from the original Directory (it is a + complete copy). Any subsequent changes to the + original Directory will not be visible in the + RAMDirectory instance. + +

    + a Directory value + + if an error occurs + +
    + + Creates a new RAMDirectory instance from the {@link FSDirectory}. + + + a File specifying the index directory + + + + + Use {@link #RAMDirectory(Directory)} instead + + + + Creates a new RAMDirectory instance from the {@link FSDirectory}. + + + a String specifying the full index directory path + + + + + Use {@link #RAMDirectory(Directory)} instead + + + + Returns true iff the named file exists in this directory. + + + Returns the time the named file was last modified. + IOException if the file does not exist + + + Set the modified time of an existing file to now. + IOException if the file does not exist + + + Returns the length in bytes of a file in the directory. + IOException if the file does not exist + + + Return total size in bytes of all files in this + directory. This is currently quantized to + RAMOutputStream.BUFFER_SIZE. + + + + Removes an existing file in the directory. + IOException if the file does not exist + + + Renames an existing file in the directory. + FileNotFoundException if from does not exist + + + + + Creates a new, empty file in the directory with the given name. Returns a stream writing this file. + + + Returns a stream reading an existing file. + + + Closes the store to future operations, releasing associated memory. + + + Subclass of FilteredTermEnum for enumerating all terms that match the + specified range parameters. +

    + Term enumerations are always ordered by Term.compareTo(). Each term in + the enumeration is greater than all that precede it. +

    + 2.9 + +
    + + Enumerates all terms greater/equal than lowerTerm + but less/equal than upperTerm. + + If an endpoint is null, it is said to be "open". Either or both + endpoints may be open. Open endpoints may not be exclusive + (you can't select all but the first or last term without + explicitly specifying the term to exclude.) + + + + + An interned field that holds both lower and upper terms. + + The term text at the lower end of the range + + The term text at the upper end of the range + + If true, the lowerTerm is included in the range. + + If true, the upperTerm is included in the range. + + The collator to use to collate index Terms, to determine their + membership in the range bounded by lowerTerm and + upperTerm. + + + IOException + + + Expert: + Public for extension only + + + + Matches the union of its clauses. + + + Construct a SpanOrQuery merging the provided clauses. + + + Return the clauses whose spans are matched. + + + Returns a collection of all terms matched by this query. + use extractTerms instead + + + + + + Constrains search results to only match those which also match a provided + query. Also provides position information about where each document matches + at the cost of extra space compared with the QueryWrapperFilter. + There is an added cost to this above what is stored in a {@link QueryWrapperFilter}. Namely, + the position information for each matching document is stored. +

    + This filter does not cache. See the {@link Lucene.Net.Search.CachingSpanFilter} for a wrapper that + caches. + + +

    + $Id:$ + +
    + + Constructs a filter which only matches documents matching + query. + + The {@link Lucene.Net.Search.Spans.SpanQuery} to use as the basis for the Filter. + + + + Expert: obtains short field values from the + {@link Lucene.Net.Search.FieldCache FieldCache} + using getShorts() and makes those values + available as other numeric types, casting as needed. + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

    + for requirements + on the field. + +

    NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

    + + + +

    Create a cached short field source with default string-to-short parser. +
    + + Create a cached short field source with a specific string-to-short parser. + + + Expert: obtains int field values from the + {@link Lucene.Net.Search.FieldCache FieldCache} + using getInts() and makes those values + available as other numeric types, casting as needed. + +

    + WARNING: The status of the Search.Function package is experimental. + The APIs introduced here might change in the future and will not be + supported anymore in such a case. + +

    + for requirements + on the field. + +

    NOTE: with the switch in 2.9 to segment-based + searching, if {@link #getValues} is invoked with a + composite (multi-segment) reader, this can easily cause + double RAM usage for the values in the FieldCache. It's + best to switch your application to pass only atomic + (single segment) readers to this API. Alternatively, for + a short-term fix, you could wrap your ValueSource using + {@link MultiValueSource}, which costs more CPU per lookup + but will not consume double the FieldCache RAM.

    + + + +

    Create a cached int field source with default string-to-int parser. +
    + + Create a cached int field source with a specific string-to-int parser. + + + This exception is thrown when parse errors are encountered. + You can explicitly create objects of this exception type by + calling the method generateParseException in the generated + parser. + + You can modify this class to customize your error reporting + mechanisms so long as you retain the public fields. + + + + This constructor is used by the method "generateParseException" + in the generated parser. Calling this constructor generates + a new object of this type with the fields "currentToken", + "expectedTokenSequences", and "tokenImage" set. The boolean + flag "specialConstructor" is also set to true to indicate that + this constructor was used to create this object. + This constructor calls its super class with the empty string + to force the "toString" method of parent class "Throwable" to + print the error message in the form: + ParseException: <result of getMessage> + + + + The following constructors are for use by you for whatever + purpose you can think of. Constructing the exception in this + manner makes the exception behave in the normal way - i.e., as + documented in the class "Throwable". The fields "errorToken", + "expectedTokenSequences", and "tokenImage" do not contain + relevant information. The JavaCC generated code does not use + these constructors. + + + + Constructor with message. + + + Constructor with message. + + + This variable determines which constructor was used to create + this object and thereby affects the semantics of the + "getMessage" method (see below). + + + + This is the last token that has been consumed successfully. If + this object has been created due to a parse error, the token + followng this token will (therefore) be the first error token. + + + + Each entry in this array is an array of integers. Each array + of integers represents a sequence of tokens (by their ordinal + values) that is expected at this point of the parse. + + + + This is a reference to the "tokenImage" array of the generated + parser within which the parse error occurred. This array is + defined in the generated ...Constants interface. + + + + The end of line string for this machine. + + + Used to convert raw characters to their escaped version + when these raw version cannot be used as part of an ASCII + string literal. + + + + This method has the standard behavior when this object has been + created using the standard constructors. Otherwise, it uses + "currentToken" and "expectedTokenSequences" to generate a parse + error message and returns it. If this object has been created + due to a parse error, and you do not catch it (it gets thrown + from the parser), then this method is called during the printing + of the final stack trace, and hence the correct error message + gets displayed. + + + + Interface that exceptions should implement to support lazy loading of messages. + + For Native Language Support (NLS), system of software internationalization. + + This Interface should be implemented by all exceptions that require + translation + + + + + a instance of a class that implements the Message interface + + + + Store a sorted collection of {@link Lucene.Net.Index.TermVectorEntry}s. Collects all term information + into a single, SortedSet. +
    + NOTE: This Mapper ignores all Field information for the Document. This means that if you are using offset/positions you will not + know what Fields they correlate with. +
    + This is not thread-safe +
    +
    + + Stand-in name for the field in {@link TermVectorEntry}. + + + + A Comparator for sorting {@link TermVectorEntry}s + + + + + The term to map + + The frequency of the term + + Offset information, may be null + + Position information, may be null + + + + The TermVectorEntrySet. A SortedSet of {@link TermVectorEntry} objects. Sort is by the comparator passed into the constructor. +
    + This set will be empty until after the mapping process takes place. + +
    + The SortedSet of {@link TermVectorEntry}. + +
    + + The SegmentMerger class combines two or more Segments, represented by an IndexReader ({@link #add}, + into a single Segment. After adding the appropriate readers, call the merge method to combine the + segments. +

    + If the compoundFile flag is set, then the segments will be merged into a compound file. + + +

    + + + + +
    + + Maximum number of contiguous documents to bulk-copy + when merging stored fields + + + + norms header placeholder + + + This ctor used only by test code. + + + The Directory to merge the other segments into + + The name of the new segment + + + + Add an IndexReader to the collection of readers that are to be merged + + + + + + The index of the reader to return + + The ith reader to be merged + + + + Merges the readers specified by the {@link #add} method into the directory passed to the constructor + The number of documents that were merged + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Merges the readers specified by the {@link #add} method + into the directory passed to the constructor. + + if false, we will not merge the + stored fields nor vectors files + + The number of documents that were merged + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + close all IndexReaders that have been added. + Should not be called before merge(). + + IOException + + + + The number of documents in all of the readers + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Merge the TermVectors from each of the segments into the new one. + IOException + + + Process postings from multiple segments all positioned on the + same term. Writes out merged entries into freqOutput and + the proxOutput streams. + + + array of segments + + number of cells in the array actually occupied + + number of documents across all segments where this term was found + + CorruptIndexException if the index is corrupt + IOException if there is a low-level IO error + + + Records the fact that roughly units amount of work + have been done since this method was last called. + When adding time-consuming code into SegmentMerger, + you should test different values for units to ensure + that the time in between calls to merge.checkAborted + is up to ~ 1 second. + + + + Adds a new doc in this term. If this returns null + then we just skip consuming positions/payloads. + + + + Called when we are done adding docs to this term + + + Class responsible for access to stored document fields. +

    + It uses <segment>.fdt and <segment>.fdx; files. + +

    + $Id: FieldsReader.java 801344 2009-08-05 18:05:06Z yonik $ + +
    + + Returns a cloned FieldsReader that shares open + IndexInputs with the original one. It is the caller's + job not to close the original FieldsReader until all + clones are called (eg, currently SegmentReader manages + this logic). + + + + AlreadyClosedException if this FieldsReader is closed + + + Closes the underlying {@link Lucene.Net.Store.IndexInput} streams, including any ones associated with a + lazy implementation of a Field. This means that the Fields values will not be accessible. + + + IOException + + + Returns the length in bytes of each raw document in a + contiguous range of length numDocs starting with + startDocID. Returns the IndexInput (the fieldStream), + already seeked to the starting point for startDocID. + + + + Skip the field. We still have to read some of the information about the field, but can skip past the actual content. + This will have the most payoff on large fields. + + + + A Lazy implementation of Fieldable that differs loading of fields until asked for, instead of when the Document is + loaded. + + + + The value of the field in Binary, or null. If null, the Reader value, + String value, or TokenStream value is used. Exactly one of stringValue(), + readerValue(), binaryValue(), and tokenStreamValue() must be set. + + + + The value of the field as a Reader, or null. If null, the String value, + binary value, or TokenStream value is used. Exactly one of stringValue(), + readerValue(), binaryValue(), and tokenStreamValue() must be set. + + + + The value of the field as a TokenStream, or null. If null, the Reader value, + String value, or binary value is used. Exactly one of stringValue(), + readerValue(), binaryValue(), and tokenStreamValue() must be set. + + + + The value of the field as a String, or null. If null, the Reader value, + binary value, or TokenStream value is used. Exactly one of stringValue(), + readerValue(), binaryValue(), and tokenStreamValue() must be set. + + + + Implements the skip list reader for the default posting list format + that stores positions and payloads. + + + + + Returns the freq pointer of the doc to which the last call of + {@link MultiLevelSkipListReader#SkipTo(int)} has skipped. + + + + Returns the prox pointer of the doc to which the last call of + {@link MultiLevelSkipListReader#SkipTo(int)} has skipped. + + + + Returns the payload length of the payload stored just before + the doc to which the last call of {@link MultiLevelSkipListReader#SkipTo(int)} + has skipped. + + + + The start and end character offset of a Token. + + + Returns this Token's starting offset, the position of the first character + corresponding to this token in the source text. + Note that the difference between endOffset() and startOffset() may not be + equal to termText.length(), as the term text may have been altered by a + stemmer or some other filter. + + + + Set the starting and ending offset. + See StartOffset() and EndOffset() + + + + Returns this Token's ending offset, one greater than the position of the + last character corresponding to this token in the source text. The length + of the token in the source text is (endOffset - startOffset). + + + + Normalizes token text to lower case. + + + $Id: LowerCaseFilter.java 797665 2009-07-24 21:45:48Z buschmi $ + +