Returns information and statistics on terms in the fields of a particular document. The document could be stored in the index or artificially provided by the user. Term vectors are realtime by default, not near realtime. This can be changed by setting realtime parameter to false. Optionally, you can specify the fields for which the information is retrieved either with a parameter in the url or by adding the requested fields in the request body (see example below). Fields can also be specified with wildcards in similar way to the multi match query Three types of values can be requested: term information, term statistics and field statistics. By default, all term information and field statistics are returned for all fields but no term statistics. If the requested information wasn’t stored in the index, it will be computed on the fly if possible. Additionally, term vectors could be computed for documents not even existing in the index, but instead provided by the user. Start and end offsets assume UTF-16 encoding is being used. If you want to use these offsets in order to get the original text that produced this token, you should make sure that the string you are taking a sub-string of is also encoded using UTF-16. By default these values are not returned since term statistics can have a serious performance impact. Setting dfs to true (default is false) will return the term statistics or the field statistics of the entire index, and not just at the shard. Use it with caution as distributed frequencies can have a serious performance impact. With the parameter filter, the terms returned could also be filtered based on their tf-idf scores. This could be useful in order find out a good characteristic vector of a document. This feature works in a similar manner to the second phase of the More Like This Query. See example 5 for usage. The term and field statistics are not accurate. Deleted documents are not taken into account. The information is only retrieved for the shard the requested document resides in, unless dfs is set to true. The term and field statistics are therefore only useful as relative measures whereas the absolute numbers have no meaning in this context. By default, when requesting term vectors of artificial documents, a shard to get the statistics from is randomly selected. Use routing only to hit a particular shard. The following request returns all information and statistics for field text in document 1 (John Doe): Term vectors which are not explicitly stored in the index are automatically computed on the fly. The following request returns all information and statistics for the fields in document 1, even though the terms haven’t been explicitly stored in the index. Note that for the field text, the terms are not re-generated. Term vectors can also be generated for artificial documents, that is for documents not present in the index. The syntax is similar to the percolator API. For example, the following request would return the same results as in example 1. The mapping used is determined by the index and type. If dynamic mapping is turned on (default), the document fields not in the original mapping will be dynamically created. Additionally, a different analyzer than the one at the field may be provided by using the per_field_analyzer parameter. This is useful in order to generate term vectors in any fashion, especially when using artificial documents. When providing an analyzer for a field that already stores term vectors, the term vectors will be re-generated. Finally, the terms returned could be filtered based on their tf-idf scores. In the example below we obtain the three most ‘interesting’ keywords from the artificial document having the given ‘plot’ field value. Additionally, we are asking for distributed frequencies to obtain more accurate results. Notice that the keyword ‘Tony’ or any stop words are not part of the response, as their tf-idf must be too low. Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant logo are trademarks of the Apache Software Foundation in the United States and/or other countries. Source.

whatsapp button