programming4us
programming4us
DESKTOP

Windows Server 2003 : Understanding the Indexing Service, Planning Your Indexing Service

10/3/2012 2:25:21 AM

Understanding the Indexing Service

The Indexing Service functions much as one would expect—it catalogs a set of documents, enabling dynamic full-text searches using the search function, a query form, or Microsoft Internet Explorer. Just as an index in a book maps an important word to a page inside the book, content indexing on a computer takes a word within a document and maps it back to that document. Documents to be indexed can be specified in catalogs and can include document properties as well as the actual text in the document. After the Indexing Service is set up, no ongoing maintenance is needed, and administration is required only when you need to change a basic configuration. If you didn’t include the Indexing Service in your original installation of Windows Server 2003, you can add it through Add/Remove Programs in Control Panel.

Note

By default, the Indexing Service is disabled in Windows Server 2003.


Defining Terms

When administering the Indexing Service, you’ll encounter a number of terms that have a special meaning when used in the Indexing Service context. Here are some of the most common ones, with their definitions:

  • Catalog A directory where all temporary (word lists) and persistent (shadow and master) indexes and cached properties are stored for a particular scope.

  • CiDaemon A child process created by the Indexing Service (cisvc.exe). CiDaemon works in the background, filtering documents for the Indexing Service.

  • Corpus The entire collection of HTML pages and other documents indexed by the Indexing Service.

  • Filter Part of a dynamic-link library (DLL) of filters, each designed to extract textual information and properties from a specific type of formatted document.

  • Master index A persistent index that contains the indexed data for a large number of documents. This is usually the largest persistent data structure. In an ideal state, this is the only index present because all the indexed data is stored in the master index and there are no shadow indexes or word lists. A master index is created through a master merge.

  • Master merge The process by which shadow indexes are combined with the current master index into a single master index. Unlike shadow merges, this is usually a fairly long process.

  • Persistent index Data for an index that is stored on disk. Unlike word lists, which exist only in memory, a persistent index survives shutdowns and restarts. Persistent-index data is stored in a highly compressed format. There are two types of persistent indexes: shadow indexes (also referred to as saved indexes and temporary indexes) and master indexes.

  • Query A request to search files for specific data.

  • Scan The process by which files and directories are checked for modifications. Scanning is performed against virtual roots that have been selected for indexing.

  • Scope The range of documents to be searched when executing a query. Physical paths or virtual roots can specify scopes.

  • Shadow index (also known as saved index) A persistent index created by merging word lists and occasionally other shadow indexes into a single index. A catalog can have multiple shadow indexes.

  • Shadow merge The process by which word lists and shadow indexes are combined into a single shadow index. A shadow merge is performed to free up memory used by word lists and also to make the filtered data persistent.

  • Virtual root An alias to a physical location on disk. Index Server can index any directory defined as a virtual root. Index Server can be set up to work with a central index but point to files on other servers.

  • Word list When a document is indexed, the index information goes first to a small temporary index, called a word list. Word lists are maintained in memory until the Indexing Service combines them into the existing indexes.

How Indexing Works

The Indexing Service uses filters that can read certain types of documents, extract the text and properties, and send that information to the indexing engine. The filters included with Windows Server 2003 index the following kinds of documents: text, HTML, Microsoft Office 95 and later, and Internet Mail and News (provided that IIS is installed). The Indexing Service can use other filters made available by software vendors. The vendor that supplies the filter also supplies installation instructions.

After extracting the text and properties, the Indexing Service determines the language the document is written in and removes words that are on the language’s exception list. The exception list contains prepositions, pronouns, articles, and so forth, and is appropriately named Noise.xxx, where xxx represents the language. Noise.xxx is in the System32 directory. Figure-1 shows a portion of the Noise.eng file, which contains the exception list for American English. You can add words to or remove words from the exception list using any text editor, such as Notepad.

Figure 1. A portion of the exception list for American English


After words from the exception list are removed, the remaining words are stored first in a word list in memory. At least once a day, the word lists are combined to form temporary saved indexes, and later the Indexing Service consolidates the temporary indexes into a single master index.

Planning Your Indexing Service

When designing an indexing site, the first question that arises is how much storage space will be needed. The minimum disk space allocated should be at least 30 percent of the size of your corpus, and 40 percent is better. During a master merge, the Indexing Service can temporarily need up to 45 percent of the corpus size.

Depending on the filters used to index a group of documents, the actual size of the indexes might be less than the standard 30 percent. For example, if you write a filter for indexing large documents (such as large image files), you can limit indexing to the first few hundred bytes (about all you need to get the header information), thus reducing the amount of space needed for the index.

Note

Because most Indexing Service operations are read requests (searching the indexes, returning the results, and then accessing the actual documents), disk striping (RAID-0) or a RAID-5 array is a good way to reduce disk-bound I/O operations. 


Planning for future site growth is essential. Moving documents to larger disks to overcome space limitations can cause query errors until you are able to run a complete reindex, which can take many hours. Another critical part of planning an Indexing Service site is to make sure that plenty of memory is available on the indexing machine. Table 1 shows the minimum memory required versus the recommended minimum amount for different quantities of documents. As usual, the more memory you have available, the better (and with the price of memory as low as it is, consider 512 MB a minimum for any type of Windows Server 2003). With large numbers of documents, a faster CPU also speeds up indexing and searching.

Table 1. Memory requirements by number of documents indexed
Number of DocumentsMinimum MemoryRecommended Memory
Fewer than 100,000128 MB128 MB
100,000 to 250,000128 MB128 MB to 256 MB
250,000 to 500,000128 MB256 MB to 512 MB
500,000 or more256 MB512 MB or more

Merging Indexes

The Indexing Service automatically combines memory-resident word lists into disk-resident temporary lists and, once a day, merges all temporary indexes into a master index. Depending on the number of temporary lists, merging can be a long process that uses much of the CPU’s resources. Queries are slower during a merge, and other processes on the computer are slower still.

By default, merges are done at midnight local time. If this is unsuitable for your system, you can change the default when the master merge is performed. You can also initiate a merge manually when a large number of documents in a catalog are changed. This section describes how to perform these two tasks.

Setting the Time to Start a Master Merge

To change the operation’s schedule from the default time, follow these steps:

1.
Run Regedit.exe.

2.
Navigate to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\ContentIndex.

3.
In the rightmost pane of the Registry Editor window, double-click the MasterMergeTime value.

4.
The DWORD Editor dialog box opens. In the Data box, type the number of minutes after midnight when a master merge should be initiated. Be sure to select Decimal from the Base options.

5.
Click OK and close the Registry Editor.

Note

MasterMergeTime has a valid range of values from 0 to 1439 minutes, though no error is reported if you enter a larger value. The default is 0. When the specified number of minutes after midnight has passed, the Indexing Service initiates a master merge.


Manually Merging Indexes

If a large number of documents change in a short period, you might want to perform a merge of the temporary indexes without waiting for the scheduled master merge. To initiate a merge, follow these steps:

1.
Open Computer Management, and select Indexing Service in the console tree.

2.
Right-click the appropriate catalog, point to All Tasks on the shortcut menu, and choose Merge. (See Figure 2.)

Figure 2. Starting a manual merge

3.
You’re asked to confirm that you want to merge the catalog. Click Yes.

Setting Up an Indexing Console

For easy and frequent access, ideally you should set up a Microsoft Management Console (MMC) with Indexing Service. To do so, follow these steps:

1.
Choose Run from the Start menu. Type mmc, and press Enter.

2.
Choose Add/Remove Snap-in from the File menu. Click Add.

3.
In the Add Standalone Snap-In box, select Indexing Service and click Add. Select Local Computer.

4.
Click Close and then OK, and you see an Indexing Service MMC like the one shown in Figure 3.

Figure 3. An Indexing Service MMC

The illustrations and examples in the following sections use the Indexing Service MMC, but you can also perform these tasks just as well through Computer Management.

Other  
 
PS4 game trailer XBox One game trailer
WiiU game trailer 3ds game trailer
Top 10 Video Game
-   Minecraft Mods - MAD PACK #10 'NETHER DOOM!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
-   Minecraft Mods - MAD PACK #9 'KING SLIME!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
-   Minecraft Mods - MAD PACK #2 'LAVA LOBBERS!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
-   Minecraft Mods - MAD PACK #3 'OBSIDIAN LONGSWORD!' with Vikkstar & Pete (Minecraft Mod - Mad Pack 2)
-   Total War: Warhammer [PC] Demigryph Trailer
-   Minecraft | MINIONS MOVIE MOD! (Despicable Me, Minions Movie)
-   Minecraft | Crazy Craft 3.0 - Ep 3! "TITANS ATTACK"
-   Minecraft | Crazy Craft 3.0 - Ep 2! "THIEVING FROM THE CRAZIES"
-   Minecraft | MORPH HIDE AND SEEK - Minions Despicable Me Mod
-   Minecraft | Dream Craft - Star Wars Modded Survival Ep 92 "IS JOE DEAD?!"
-   Minecraft | Dream Craft - Star Wars Modded Survival Ep 93 "JEDI STRIKE BACK"
-   Minecraft | Dream Craft - Star Wars Modded Survival Ep 94 "TATOOINE PLANET DESTRUCTION"
-   Minecraft | Dream Craft - Star Wars Modded Survival Ep 95 "TATOOINE CAPTIVES"
-   Hitman [PS4/XOne/PC] Alpha Gameplay Trailer
-   Satellite Reign [PC] Release Date Trailer
Video
programming4us
 
 
programming4us