|
Index Server Query Language
Using the Search Criteria Boolean Search (Step #2), you can build
a query using the advanced Search syntax of Microsoft Index Server. Highlights
follow!
Boolean
and Proximity Operators
Boolean
and proximity operators can be used to create a more precise query.
Hints:
Wildcards
Wildcard
operators are useful for finding pages with words similar to a given word.
Free-Text
Queries
The
query engine finds pages that best match the words and phrases in a free-text
query. This is done by automatically finding pages that match the meaning,
not the exact wording, of the query. Boolean, proximity, and wildcard
operators are ignored within a free-text query. Free-text queries are
prefixed with "$contents".
Vector
Space Queries
The query engine supports
vector space queries. Vector queries return pages that match a list of
words and phrases. The rank of each page indicates how well the page matched
the query.
| To
search for |
Example |
Results |
| pages
that contain specific words |
light,
bulb |
files
that best match the words |
| pages
that contain weighted prefixes, words, and phrases |
invent*,
light[50], bulb[10], "light bulb"[400] |
files
that contain words prefixed by "invent", the words "light",
"bulb", and the phrase "light bulb". The terms
are weighted. |
- Components in vector
queries are separated by commas.
- Components in vector
queries can be weighted using the [weight] syntax.
- Pages returned
by vector queries don't necessarily match every term in the query.
- Vector queries
work best when the results are sorted by rank.
Property
Value Queries
Property
value queries can be used to find files that have property values that
match a given criteria. The properties over which you can query include
basic file information like file name and file size, and OLE properties
including the document summary that is stored in files created by OLE-aware
applications.
There
are two types of property queries, relational queries and regular expression
queries.
Property
Names
Property
names are preceded by either the at (@) or pound (#) character. Use (@)
for relational queries, and (#) for regular expression queries.
If
no property name is specified, @contents is assumed.
Properties
available for all files include:
OLE
property values can also be used in queries. Web sites with files created
by most OLE-aware applications can be queried for these properties:
A
more complete list of properties can be found here.
Relational
operators
Relational operators
are used in relational property queries.
| To
search for |
Example |
Results |
| property
values in relation to a fixed value |
@size < 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
files
whose size matches the query |
| property
values with all of a set of bits on |
@attrib ^a
0x820 |
compressed
files with the archive bit on |
| property
values with some of a set of bits on |
@attrib ^s
0x20 |
files
with the archive bit on |
Property
values
| To
search for |
Example |
Results |
| a
specific value |
@DocAuthor
= Bill Gates |
files
authored by "Bill Gates" |
| values
beginning with a prefix |
#DocAuthor
George* |
files
whose author property begins with "George" |
| files
with any of a set of extensions> |
#filename
*.|(exe|,dll|, sys|) |
files
with ".exe", ".dll", or ".sys& quot; extensions
|
| files
modified after a date |
@write
> 96/2/14 10:00:00 |
files
modified after February 14, 1996 at 10:00 GMT |
| files
modified after a relative date |
@write
> -1d2h> |
files
modified in the last 26 hours |
| vectors
matching a vector |
@vectorprop
= { 10, 15, 20 } |
OLE
documents with a vectorprop value of { 10, 15, 20 } |
| vectors
where each value matches a criteria |
@vectorprop
>^a 15 |
OLE
documents with a vectorprop value in which all values in the vector
are greater than 15 |
| vectors
where at least one value matches a criteria |
@vectorprop
=^s 15 |
OLE
documents with a vectorprop value in which at least one value is
15 |
- Be sure to use
the pound (#) character before the property name when using a regular
expression in a property value, and an at (@) character otherwise. The
equal (=) relational operator is assumed for regular expression queries.
- File name (#filename)
is the only property that supports regular expressions with wildcards
to the left of text. Wildcards in regular expressions for all
other properties must come after a prefix.
- Date and time values
are of the form yyyy/mm/dd hh:mm:ss. The first two characters of the
year and the entire time can be omitted. Dates and times are in GMT.
- Dates and times
relative to the current time can be expressed with a minus (-) character
followed by zero or more integer and time unit pairs. Time units are
expressed as: (y) for years, (m) for months, (w) for weeks, (d) for
days, (h) for hours, (n) for minutes, and (s) for seconds.
- Currency values
are of the form x.y, where x is the whole value amount and y is the
fractional amount. There is no assumption about units.
- Boolean values
are (t) or (true) for true and (f) or (false) for false.
- Vectors (VT_VECTOR)
are expressed as an opening brace ({), a comma-separated list of values,
then a closing brace (}).
- Single value expressions
that are compared against vectors are expressed as a relational
operator, then a (^a) for All Of or a (^s) for Some
Of.
- Numeric values
can be in decimal or hex (preceeded by 0x).
- The contents
property does not support relational operators. If a relational operator
is specified, no results will be found. For example, "@contents
Microsoft" will find documents containing Microsoft, but "@contents=Microsoft"
will find none.
Regular
expressions
Regular expressions
in property queries are defined as follows:
- Any character except,
*, ., ?, and | defaults to matching just itself.
- Regular expressions
can be enclosed in matching quotes ("), and must be enclosed in
quotes if they contain a space ( ) or closing parenthesis ()).
- *, ., and ? behave
as they behave in Windows/NT (match any number of characters, match
(.) or end of sentence, and match any one character respectively)
- | is an escape
character. After |, the following characters have special meaning:
- ( opens a group.
Must be followed by a matching )
- ) closes a
group. Must be preceded by a matching (
- [ opens a character
class. Must be followed by a matching (un-escaped) ]
- { opens a counted
match. Must be followed by a matching }
- } closes a
counted match. Must be preceded by a matching {
- , separates
OR clauses
- * matches zero
of more occurrences of preceding expression.
- ? matches zero
or one occurrences of preceding expression.
- + matches one
or more occurrences of preceding expression.
- anything else,
including | matches itself
- Between [ and ]
the following characters have special meaning:
- ^ Match everything
but following classes. Must be the first character.
- ] Matches ].
May only be preceded by ^, otherwise it closes the class.
- - Range operator.
Preceded and followed by normal characters
- anything else
matches itself (or begins/ends a range at itself)
- Between { and }
the following syntax applies:
- |{m|} matches
exactly m occurrences of the preceding expression. (0 < m <
256)
- |{m,|} matches
at least m occurrences of the preceding expression. (1 < m <
256)
- |{m,n|} matches
between m and n occurrences of the preceding expression, inclusive.
(0 < m < 256, 0 < n < 256)
- To match *, .,
and ?, enclose them in brackets (e.g. |[*]foo will match "*foo")
Query
Examples
| Example |
here
Results |
| @size
> 1000000 |
pages
larger than one million bytes |
| @write
> 95/12/23 |
pages
modified after the date |
| Apple
tree |
pages
with the phrase "apple tree" |
| "apple
tree" |
same
as above |
| @contents
apple tree |
same
as above |
| Microsoft
and @size > 1000000 |
pages
with the word " Microsoft" that are larger than one million
bytes |
| "microsoft
and @size > 1000000" |
pages
with the phrase specified (not the same as above) |
| #filename
*.avi |
video
files. (the '#' prefix is used because the query contains a regular
expression) |
| @attrib
^s 32 |
pages
with the archive attribute bit on |
| @docauthor
= William Gates |
pages
with the given author |
| $contents
why is the sky blue? |
pages
that match the query |
| @size
< 100 & #filename *.gif |
GIF
files less than 100 bytes in size |
List
of Property Names
These
properties are always available for queries. Additional properties may
also be available depending on the configuration of the web server.
| Property
Name |
Description |
| Access |
Last
time file was accessed. |
| All |
Everything. |
| AllocSize |
Size
of disk allocation for file. |
| Attrib |
File
attributes. |
| ClassId |
Class
Id of object.. |
| Change |
Last
time file was changed (includes changes to attributes). |
| Characterization |
Characterization
/ abstract of document. Computed by Index Server. |
| Contents |
Main
contents of file. |
| Create |
Time
file was created. |
| DocAppName |
Name
of application owning file. |
| DocAuthor |
Author
of document. |
| DocCharCount |
Number
of characters in document. |
| DocComments |
Comments
about document. |
| DocCreatedTm |
Time
document was created. |
| DocEditTime |
Total
time spent editing document. |
| DocKeywords |
Document
keywords. |
| DocLastAuthor |
Most
recent user who edited document. |
| DocLastPrinted |
Time
document was last printed. |
| DocLastSavedTm |
Time
document was last saved. |
| DocPageCount |
Number
of pages in document |
| DocRevNumber |
Current
version number of document. |
| DocSubject |
Subject
of document. |
| DocTemplate |
Name
of template for document. |
| DocTitle |
Title
of document. |
| DocWordCount |
Number
of words in document. |
| FileIndex |
Unique
id of file. |
| FileName |
Name
of file. |
| HitCount |
Number
of hits (words matching query) in file. |
| HtmlHRef |
Text
of HTML HREF. |
| Path |
Full
physical path to file, including filename. |
| Rank |
Rank
of row. Ranges from 0 to 1000. Larger numbers indicate better matches. |
| SecurityChange |
Last
time security was changed on file. |
| ShortFileName |
Short
(8.3) file name. |
| Size |
Size
of file, in bytes. |
| USN |
Update
Sequence Number. Ntfs drives only. |
| VPath |
Full
virtual path to file, including filename. If more than one possible
path, then the best match for the specific query is chosen. |
Miscellaneous
Tips on Query Syntax
- Queries are case-insensitive:
you can type your query in uppercase or lowercase.
- You may search
for any word except for those in the exception list (for English, this
includes a, an, and, as, and other common words) which are ignored during
a search. Words in the exception list are treated as placeholders in
phrase and proximity queries.
- Punctuation marks
such as the period (.), colon (:), semicolon (;), and comma (,) are
ignored during a search.
- To use specially-treated
characters ( (&), (|), (^), (#), (@), ($), ((), ()) ) in a query,
enclose your query in quotes (").
|