SQL query syntax for SharePoint 2010 is
similar to that of SharePoint 2007. The most important thing to
understand when constructing SQL query syntax is how to parse the input
to get a representation that reflects what the user expects and is
easily analyzed. This section looks at the different components of the
SQL query syntax, what they do, and how to combine them to achieve a
specific behavior.
Creating the Query
Calling and getting results from search engine can be summarized as follows:
- An instance of the SP query object is created.
- The query object settings get configured.
- The SQL is added to the query object.
- An
execute()
method on the SP query object is called. - The return value of the
execute()
method is an SP object of type ResultTableCollection
.
Important Configuration Options of the Query Object
The SP query object can be configured with a number of properties. The “must know” properties are as follows:
- Enable stemming
- Trim duplicates
- Ignore all noise query
- Highlighted sentence count (default = 3, range =[0,10] or an exception gets thrown)
- Result types
- SQL query string
- Site context (URL)
- Keyword inclusion
SQL Query Predicates
The query string supplied to the search engine
is constructed from the following predicates. These look like normal
T-SQL and should be easily understandable, perhaps with the exception of
the SCOPE statement.
Details on Keyword Inclusion
The KeywordInclusion
property is related to the FREETEXT and CONTAINS predicates. It can be set to one of two enumerations:
AllKeywords
(acts as an AND between all keywords in the FULLTEXT predicate)AnyKeyword
(acts as an OR between all keywords in the FULLTEXT predicate)
The value of the KeywordInclusion
property is set based on how the query tree should look per default. The value AnyKeyword
typically relates to whether an OR node is implicit when constructing the query tree, and AllKeywords
relates to implicit AND searches.
It is not be feasible for the user to be limited
to use either of the two enumerations. It is therefore possible to
override the AnyKeyword
option by adding a + symbol to the keywords, which basically states that it must be contained in the results.
An example could be a query for the following:
SharePoint search or engine
This would then be evaluated as follows:
('SharePoinit' AND ' search') OR 'engine'
With KeywordInclusion = AnyKeyword
The SQL would be as follows:
SELECT WorkId,Rank,Title,Author,ModifiedBy,Size,Path,Description,Created,Write,Filename,
SiteName,SiteTitle,CollapsingStatus,HitHighlightedSummary,HitHighlightedProperties,
ContentClass,IsDocument,ContentType,objectid,PictureURL,WorkEmail,CreatedBy,ContentSource,
FileExtension FROM SCOPE() WHERE FREETEXT(defaultproperties,'+engine') OR
FREETEXT(defaultproperties,'+SharePoint +search') ORDER BY Rank DESC
FREETEXT and CONTAINS Predicates Format
The format of the predicates is as follows:
([<column_identifier>,]'<content_search_condition>' [,LCID])
The column identifier can be either a column or a
column group. In Ontolica the latter is used by setting it to
“defaultproperties”, which is also the default value of the search
engine if none is set.
The content search condition is different from FREETEXT and CONTAINS and is described individually next.
Localization can be achieved by setting the LCID
argument on the predicates, thus allowing for locale-specific searches
(language-specific searches). This is relevant not only for words, but
also for dates and numeric values, currency format, and more. If no LCID
is set, then the systems locale is used.
FREETEXT
Generally as few FREETEXT statements should be
used as possible to improve ranking. If the query can be wrapped into a
single FREETEXT statement, that would be the best solution.
The SP search engine removes noise words during
indexing and also from the search itself. A search for a noise word will
always yield zero results.
A FREETEXT predicate can contain phrases. They must be surrounded by quotation marks.
If more keywords or phrases are present in a
FREETEXT predicate, they are implicitly separated by either AND or OR,
depending on the KeywordInclusion
setting (as described earlier).
CONTAINS
As opposed to the FREETEXT predicate, the CONTAINS predicate should be used for exact matches.
It has a range of options not available in the FREETEXT predicate. Table 1
describes the predicates and arguments available for the CONTAINS
predicate. These are then used to create the individual CONTAINS
predicates to apply to the query. Multiple CONTAINS predicates can be
applied to achieve the behavior desired, if it is not feasible to create
one CONTAINS predicate that achieves it. In the case of multiple
CONTAINS predicates, they are also separated by Boolean operators, which
are required in this case.
Table 1. Different uses of the CONTAINS Predicate
Word |
A single word without spaces or other punctuation |
…WHERE CONTAINS (‘sharepoint’) |
Phrase |
Multiple words or included spaces |
…WHERE CONTAINS(‘sharepoint search’) Or, to include double quotation marks:
… WHERE CONTAINS(‘sharepoint ““search””’) |
Wildcard |
Words or phrases with the asterisk (*) added to the end. This is for performing wildcard searching. |
…WHERE CONTAINS (‘“sea*”’) Matches “search”, “searches”, “searching”, and “searchable”. |
Boolean |
Words, phrases, and wildcard strings
combined by using the Boolean operators AND, OR, or NOT; enclose the
Boolean terms in double quotation marks. |
…WHERE CONTAINS (‘sharepoint’ AND ‘2010’ AND ‘search’) |
Near |
Words, phrases, or wildcards separated by the function NEAR |
…WHERE CONTAINS (‘sharepoint’ NEAR ‘search’) |
FormsOf |
Matches a word and the inflectional versions of that word |
…WHERE CONTAINS (‘FORMSOF (INFLECTIONAL, “search”)) Matches “search”, “searches”, “searching” etc. |
IsAbout |
Combines matching results over multiple words, phrases, or wildcard search terms |
…WHERE CONTAINS (‘ISABOUT ( “sharepoint”, “search”) ’) |
Words and phrases are similar to those of FREETEXT.
Wildcards are similar to a single word phrase (created by code, not the user).
Booleans can be used either inside a single CONTAINS predicate or to separate a number of CONTAINS and FREETEXT predicates.
As mentioned earlier, it is possible to use
either a single CONTAINS predicate with multiple keywords separated by
Booleans or multiple CONTAINS predicates that each have only one keyword
and separate the predicates by Booleans.
The impact on the query results when using either method or a combination is not known or documented.
FormsOf and IsAbout are not used, and the
documentation is not specific enough to make any real estimates on their
performance and effect.
An example of a SQL query using wildcards might look like this:
SELECT WorkId,Rank,Title,Author,ModifiedBy,Size,Path,Description,Created,Write,Filename,
SiteName,SiteTitle,CollapsingStatus,HitHighlightedSummary,HitHighlightedProperties,
ContentClass,IsDocument,ContentType,objectid,PictureURL,WorkEmail,CreatedBy,ContentSource,
FileExtension FROM SCOPE() WHERE FREETEXT(defaultproperties,'+engine') OR (CONTAINS
(' "SharePo*" ') AND CONTAINS(' "sear*" ') ) ORDER BY Rank DESC
In the foregoing SQL, there has been a search for the following:
SharePo * sear* or engine
This is evaluated as follows:
('SharePo *' AND 'sear*') or 'engine'
Notice how the wildcard keywords are wrapped
inside two CONTAINS predicates. According to the documentation, they
might as well have been wrapped inside a single one, but in real life,
that yields an exception if two wildcards are used within the same
CONTAINS predicate. Also notice that there is a ‘+’ operator before the
keyword engine in the FREETEXT predicate. This way it is
specified that the keyword has to exist because an OR-based search
always has the following property set:
KeywordInclusion = AnyKeyword
Otherwise the query would return results that might contain the keyword engine
but not necessarily. (This appears to be prevented by the search
engine, though, as full or empty set queries are not accepted. It seems
that at least one word of a predicate has to be matched if the predicate
should be evaluated.) If the OR was changed to AND, then KeywordInclusion
would be set to AllKeywords
and the + would not be set.