Having the design document will greatly streamline
the configuration process because the important questions will have
already been answered. This section provides an overview of the
configuration approach as well as explanations about certain
configuration tasks.
Configuration Overview
After the SharePoint Server 2010 installation has
been completed, see the following list that delineates the approach to
configuring search. This sequence is significant because certain
configurations depend on previous configurations. For example, crawled
properties depend on a full crawl, managed properties depend on crawled
properties, scopes depend on managed properties, and Site Collection
settings depend on scopes.
Configure the server topology and farm-wide settings in the Central Administration, Farm-wide Search Administration page.
Install third-party iFilters and custom document icons (for example, PDF).
From the Central Administration, Search Service Application, configure the following:
From Central Administration, create a search center Site Collection.
Within the Site Collection, configure the following:
In the file system, configure the thesaurus files (optional).
Adding and Configuring Content Sources
Content sources are managed from the Central
Administration, Search Service Application, Manage Content Sources
page. As described in the previous section, you need to make content
available to search queries by crawling the desired content to build
the content index so that the information is searchable. Content source
can be added and configured to instruct SharePoint to crawl the
following types of locations: SharePoint sites, Web sites that are not
SharePoint sites, file shares, Exchange public folders, line of
business data, and custom repository. You can specify one or more start
addresses (URLs) for each content source. A start address is the top of
the content’s hierarchy. For example, a root folder in a file share,
folder structure is considered a start address.
It is possible have only one content source with
many SharePoint start addresses in it; however, this configuration is
less flexible than if you were to add a separate content source for
each major starting address or SharePoint Web application. Configuring
separate content sources provides greater control because it allows you
to distinguish content within search scopes, allows you to set unique
crawl schedules for different start addresses, and allows you to start
and stop crawls on one start address without interrupting others.
Aside from start addresses and content sources, the
content source configuration screen also allows you to specify if you
would like to crawl only the starting address versus all content below
the starting address.
To add a new content source, click the New Content
Source link on the Manage Content Sources Screen to get the Add Content
Source page. Here, you supply a name for your new content source. From
here, you name the content source, provide a start address, and
establish the crawl schedules. This process is shown in Figure 1.
To configure a content source for SharePoint sites,
select SharePoint Sites as the content source type. Enter the URLs in
the form, http://intranet.
Next, configure the crawl settings and set up a schedule. Keep in mind
that a high-intensity crawl may impair that underlying system and the
network as well. Unleashing a crawl of network shares could also
consume enough bandwidth
to be noticeable to workers on the network. Furthermore, if you are
backing up the search components of the SharePoint farm or the crawled
systems, try to run these backup operations during periods when the
crawl is not running. These tips help to reduce contention on the
SharePoint servers as well as the servers storing the crawled content.
It is also a good practice to document the system operation schedules,
such as backups and crawls, and keep this information for reference. On
an ongoing basis, crawl behaviors should be measured and monitored. You
should keep track of the amount of content in the content sources as
well as the amount of time it takes for a crawl to complete on the
content source. If a content source grows, backups will take longer,
and a job schedule overlap can occur, creating a contention. Reviewing
and adjusting the crawl schedules should be considered as a regular
responsibility for the support team. Figure 2 shows the crawler impact rules.
Federated Locations
Federated locations allow users to expand their
searches to include content that is either in a remote SharePoint
environment or retrievable by public Web sites that support OpenSearch
1.0 or 1.1. For example, if Bing is configured as a federated location,
users searching from their SharePoint search
portals will retrieve results both from the local SharePoint index as
well as from Bing. Federated locations are configurable from within the
Central Administration → Search Service Application Manage Federated
Locations page.
Authoritative Pages and Demoted Sites
Authoritative page settings prioritize locations in
the content index so that results from those sites are more (or less)
likely to appear ranked highly in the result set. Authoritative page
settings are configured in the Search Service Application. Pages can
have one of four ranking levels (most authoritative, second level,
third level, and sites to demote). By default, all top-level pages for
Web applications are added as most authoritative. You can move the
top-level pages to other authoritative page levels or remove them from
authoritative page settings completely.
When planning authoritative page settings, group
sites into the three levels by importance. In addition, group the sites
that are not likely to be relevant as sites to demote.
Demoted sites will typically appear toward the end of the search
results after all other relevance weighting factors have been
considered. Don’t try to assign an authoritative page to every single
site. Start with obvious ones and then adjust the authoritative page
settings based on feedback from users and information in the query logs
and crawl logs. Authoritative pages and demoted sites are configurable
from within the Central Administration → Search Service Applications →
Specify Authoritative Pages screen.
Metadata Properties
When SharePoint crawls content, it includes stored
property values in a database; these are crawled properties. Managed
properties are the set of properties that are provided to the user as
part of the search user experience (the ones that users can filter on,
and so on). These map one-to-many to crawled properties. Some managed
properties are created by default while in others, administrators must
create and map. For example, if you crawl the file extension for
documents, you must explicitly specify that the file extension crawled
property be included in the index. This is done in the service
application, under the metadata properties, crawled properties panel.
Next, you must create a managed property called File Extension and map
it to the appropriate crawled property. Creating a managed property allows users to leverage the property in keyword searches and allows scopes to leverage the managed property as a filter.
Search Scopes
A search scope provides a way to filter search
queries by enabling users to focus their queries on a subset of the
total index. Ideally, a search result will appear in the top 20 results
when a user issues a query. By providing scopes, users can easily apply
filters to their initial queries, making this benchmark much easier to
reach.
Scopes can be configured to filter search results by
content address, managed property (for example, issue status =
unresolved), or content source. For example, a scope might allow a
medical doctor to search on all items in the Medical Records Scope,
where the Medical Records scope limits the results to items that are
documents, located in a Records Center, having the file extension of
PDF.
Note
Scopes may be created within the Search Service Application as a shared scope or within a Site Collection as a local scope.
To help determine your search scopes, review your
information architecture to identify Content Types and properties that
people want to search. Create shared scopes for content in the
information architecture that is relevant for more than one Site
Collection being hosted in the farm.
To create a search scope, go to Search Service
Application within Central Administration. Click View Scopes. Click New
Scope. Enter a title, description, and keep the default results page.
Once the scope is created, click Add rules.
Search scopes can contain one or more rules that are
applied to all content in the currently selected search scope to
determine what is included in search results.
You can set rules by
- Web address (location)
- Properties (managed properties)
- Content source (why it is beneficial to be using separate content sources for start addresses)
- All content (everything in the index)
For
example, to set up a scope that only returns information from a
specific site, add a Web Address rule where the Folder equals the site
URL. This will provide a filtered search list that is scoped to a
specific set of content.
Search User Interface
Once the search service application is configured,
the next step is to create the user interface for search. When
designing the search experience, a decision that needs to be made is
whether to create a stand-alone search portal or integrate search
features into an existing portal, such as an intranet or content
management portal. The Search Center site templates may be used in
either case. Once the search site structure is determined, there are
numerous components within the Site Collection that combine to make up
the overall user experience by providing controls to the user for
submitting queries as well as the pieces needed to view and interact
with results. The components of the search results page in Edit mode
include
Search pages and search Web Parts (basic, advanced, people)
Search results pages and search results Web Parts
Scope display groups
Search keywords
Keywords and Best Bets
Keywords are words or phrases that SharePoint
administrators have identified as important. They provide a way to
display information and links on the initial results page manually.
Created at the Site Collection level, keywords help to prioritize
content during search queries to display high-relevance content more
prominently in search results. Each keyword should have a definition of
the keyword that appears in search results, one or more synonymous
search terms, and one or more best bets, which are the URLs that
administrators specify as being most relevant for a particular keyword
phrase.
Searches that match keywords (or synonyms of
keywords) show the specific preselected content (definition(s) and best
bet(s)) at the top of search results. Best bets are used highlight or
promote search results that the search administrator has determined are
more relevant for users of a collection. You should choose obvious
keywords to start, leveraging best bets to publicize very popular sites
and continue to monitor the effectiveness of the chosen best bets over
time.