ENTERPRISE

SharePoint 2010 : Searching Through the API - Creating SQL for the FullTextSqlQuery

3/27/2013 7:25:40 PM

SQL query syntax for SharePoint 2010 is similar to that of SharePoint 2007. The most important thing to understand when constructing SQL query syntax is how to parse the input to get a representation that reflects what the user expects and is easily analyzed. This section looks at the different components of the SQL query syntax, what they do, and how to combine them to achieve a specific behavior.

Creating the Query

Calling and getting results from search engine can be summarized as follows:

  1. An instance of the SP query object is created.
  2. The query object settings get configured.
  3. The SQL is added to the query object.
  4. An execute() method on the SP query object is called.
  5. The return value of the execute() method is an SP object of type ResultTableCollection.

Important Configuration Options of the Query Object

The SP query object can be configured with a number of properties. The “must know” properties are as follows:

  • Enable stemming
  • Trim duplicates
  • Ignore all noise query
  • Highlighted sentence count (default = 3, range =[0,10] or an exception gets thrown)
  • Result types
  • SQL query string
  • Site context (URL)
  • Keyword inclusion

SQL Query Predicates

The query string supplied to the search engine is constructed from the following predicates. These look like normal T-SQL and should be easily understandable, perhaps with the exception of the SCOPE statement.

  • SELECT: This states which information you want to get returned from the search engine. These values might be null if they are not available properties for a particular result type.
  • FROM SCOPE: This is responsible for limiting which areas are searched. The functionality of setting this is currently unclear as it is not easily examined in our test setup.
  • WHERE: This contains the predicates that are to be searched. These predicates can be FREETEXT or CONTAINS. Predicates can be concatenated with the logical operators AND and OR. The property does not need to be one of the returned properties of the SELECT statement.
  • FREETEXT: This is in the SP documentation, described as being the best option for finding documents containing combinations of the search words scattered across columns. It is used this way:
    WHERE FREETEXT(defaultproperties,'SharePoint search')
  • CONTAINS: This is best for finding exact matches. It is also used for performing wildcard searches, which is actually also an exact match from a logical point of view but a partial match from a user's point of view. It is used this way:
    WHERE CONTAINS(' "sear*" ')
  • ORDER BY: This instructs the search engine to do a reordering of the returned results by some property.

Details on Keyword Inclusion

The KeywordInclusion property is related to the FREETEXT and CONTAINS predicates. It can be set to one of two enumerations:

  • AllKeywords (acts as an AND between all keywords in the FULLTEXT predicate)
  • AnyKeyword (acts as an OR between all keywords in the FULLTEXT predicate)

The value of the KeywordInclusion property is set based on how the query tree should look per default. The value AnyKeyword typically relates to whether an OR node is implicit when constructing the query tree, and AllKeywords relates to implicit AND searches.

It is not be feasible for the user to be limited to use either of the two enumerations. It is therefore possible to override the AnyKeyword option by adding a + symbol to the keywords, which basically states that it must be contained in the results.

An example could be a query for the following:

SharePoint search or engine

This would then be evaluated as follows:

('SharePoinit' AND ' search') OR 'engine'
With KeywordInclusion = AnyKeyword

The SQL would be as follows:

SELECT WorkId,Rank,Title,Author,ModifiedBy,Size,Path,Description,Created,Write,Filename,Image
SiteName,SiteTitle,CollapsingStatus,HitHighlightedSummary,HitHighlightedProperties,Image
ContentClass,IsDocument,ContentType,objectid,PictureURL,WorkEmail,CreatedBy,ContentSource,Image
FileExtension FROM SCOPE() WHERE FREETEXT(defaultproperties,'+engine') ORImage
 FREETEXT(defaultproperties,'+SharePoint +search') ORDER BY Rank DESC

FREETEXT and CONTAINS Predicates Format

The format of the predicates is as follows:

([<column_identifier>,]'<content_search_condition>' [,LCID])

The column identifier can be either a column or a column group. In Ontolica the latter is used by setting it to “defaultproperties”, which is also the default value of the search engine if none is set.

The content search condition is different from FREETEXT and CONTAINS and is described individually next.

Localization can be achieved by setting the LCID argument on the predicates, thus allowing for locale-specific searches (language-specific searches). This is relevant not only for words, but also for dates and numeric values, currency format, and more. If no LCID is set, then the systems locale is used.

FREETEXT

Generally as few FREETEXT statements should be used as possible to improve ranking. If the query can be wrapped into a single FREETEXT statement, that would be the best solution.

The SP search engine removes noise words during indexing and also from the search itself. A search for a noise word will always yield zero results.

A FREETEXT predicate can contain phrases. They must be surrounded by quotation marks.

If more keywords or phrases are present in a FREETEXT predicate, they are implicitly separated by either AND or OR, depending on the KeywordInclusion setting (as described earlier).

CONTAINS

As opposed to the FREETEXT predicate, the CONTAINS predicate should be used for exact matches.

It has a range of options not available in the FREETEXT predicate. Table 1 describes the predicates and arguments available for the CONTAINS predicate. These are then used to create the individual CONTAINS predicates to apply to the query. Multiple CONTAINS predicates can be applied to achieve the behavior desired, if it is not feasible to create one CONTAINS predicate that achieves it. In the case of multiple CONTAINS predicates, they are also separated by Boolean operators, which are required in this case.

Table 1. Different uses of the CONTAINS Predicate

Word A single word without spaces or other punctuation …WHERE CONTAINS (‘sharepoint’)
Phrase Multiple words or included spaces …WHERE CONTAINS(‘sharepoint search’)
Or, to include double quotation marks:
… WHERE CONTAINS(‘sharepoint ““search””’)
Wildcard Words or phrases with the asterisk (*) added to the end. This is for performing wildcard searching. …WHERE CONTAINS (‘“sea*”’) Matches “search”, “searches”, “searching”, and “searchable”.
Boolean Words, phrases, and wildcard strings combined by using the Boolean operators AND, OR, or NOT; enclose the Boolean terms in double quotation marks. …WHERE CONTAINS (‘sharepoint’ AND ‘2010’ AND ‘search’)
Near Words, phrases, or wildcards separated by the function NEAR …WHERE CONTAINS (‘sharepoint’ NEAR ‘search’)
FormsOf Matches a word and the inflectional versions of that word …WHERE CONTAINS (‘FORMSOF (INFLECTIONAL, “search”)) Matches “search”, “searches”, “searching” etc.
IsAbout Combines matching results over multiple words, phrases, or wildcard search terms …WHERE CONTAINS (‘ISABOUT ( “sharepoint”, “search”) ’)

Words and phrases are similar to those of FREETEXT.

Wildcards are similar to a single word phrase (created by code, not the user).

Booleans can be used either inside a single CONTAINS predicate or to separate a number of CONTAINS and FREETEXT predicates.

As mentioned earlier, it is possible to use either a single CONTAINS predicate with multiple keywords separated by Booleans or multiple CONTAINS predicates that each have only one keyword and separate the predicates by Booleans.

The impact on the query results when using either method or a combination is not known or documented.

FormsOf and IsAbout are not used, and the documentation is not specific enough to make any real estimates on their performance and effect.

An example of a SQL query using wildcards might look like this:

SELECT WorkId,Rank,Title,Author,ModifiedBy,Size,Path,Description,Created,Write,Filename,Image
SiteName,SiteTitle,CollapsingStatus,HitHighlightedSummary,HitHighlightedProperties,Image
ContentClass,IsDocument,ContentType,objectid,PictureURL,WorkEmail,CreatedBy,ContentSource,Image
FileExtension FROM SCOPE() WHERE FREETEXT(defaultproperties,'+engine') OR (CONTAINSImage
(' "SharePo*" ') AND CONTAINS(' "sear*" ') ) ORDER BY Rank DESC

In the foregoing SQL, there has been a search for the following:

SharePo * sear* or engine

This is evaluated as follows:

('SharePo *' AND 'sear*') or 'engine'

Notice how the wildcard keywords are wrapped inside two CONTAINS predicates. According to the documentation, they might as well have been wrapped inside a single one, but in real life, that yields an exception if two wildcards are used within the same CONTAINS predicate. Also notice that there is a ‘+’ operator before the keyword engine in the FREETEXT predicate. This way it is specified that the keyword has to exist because an OR-based search always has the following property set:

KeywordInclusion = AnyKeyword

Otherwise the query would return results that might contain the keyword engine but not necessarily. (This appears to be prevented by the search engine, though, as full or empty set queries are not accepted. It seems that at least one word of a predicate has to be matched if the predicate should be evaluated.) If the OR was changed to AND, then KeywordInclusion would be set to AllKeywords and the + would not be set.

Other  
 
Video
Top 10
Free Mobile And Desktop Apps For Accessing Restricted Websites
MASERATI QUATTROPORTE; DIESEL : Lure of Italian limos
TOYOTA CAMRY 2; 2.5 : Camry now more comely
KIA SORENTO 2.2CRDi : Fuel-sipping slugger
How To Setup, Password Protect & Encrypt Wireless Internet Connection
Emulate And Run iPad Apps On Windows, Mac OS X & Linux With iPadian
Backup & Restore Game Progress From Any Game With SaveGameProgress
Generate A Facebook Timeline Cover Using A Free App
New App for Women ‘Remix’ Offers Fashion Advice & Style Tips
SG50 Ferrari F12berlinetta : Prancing Horse for Lion City's 50th
Popular Tags
Video Tutorail Microsoft Access Microsoft Excel Microsoft OneNote Microsoft PowerPoint Microsoft Project Microsoft Visio Microsoft Word Active Directory Exchange Server Sharepoint Sql Server Windows Server 2008 Windows Server 2012 Windows 7 Windows 8 Adobe Flash Professional Dreamweaver Adobe Illustrator Adobe Photoshop CorelDRAW X5 CorelDraw 10 windows Phone 7 windows Phone 8 Iphone