Searching NSE items whose junction points are several folders w/out fixed locations

Mar 9, 2007 at 2:03 AM
Jerry/James-

In the RegNamespace sample a protocol handler is used to bind to the IFilter implementation for registry items, and protocol handlers seem to be best-suited for NSEs which describe something that is in a fixed location, e.g. the root URL that is indexed is always the same (reg://<UserSID>). What options are available for an NSE which describes something that's not in a fixed location and which isn't simply a file with a known extension?

For example, let's say I have an NSE that describes a collection of files in a folder. All the files in the folder comprise a database, and a given user might have several of these folders, and they can be located anywhere on disk (and moved around, etc.). Each folder is a junction point into the NSE, accomplished by having a Desktop.ini file listing the CLSID of the NSE. Thus the NSE interprets the collection of files in the folder and presents the user with the logical database contents instead of just a bunch of files.

How can the items in the NSE be indexed in the above case? An IFilter is intended to filter a file with a certain extension, and that's not what I have. A protocol handler can also be used to bind to an IFilter, like what your sample does, but my database folders can be located anywhere and can be moved around. To handle that it seems like I would have to either: 1) have each database folder be a root URL for the protocol handler or 2) make each drive a root and have the protocol handler recursively search for items in my namespace to bind to. (1) doesn't seem practical and I'm a bit fuzzy on how (2) would work.

Any suggestions for how to do make an NSE like what is described in the 2nd paragraph searchable are greatly appreciated!

Thanks,
-Matt

What is on disk (for illustration only):
C:\folder1.ext\
C:\folder1.ext\Desktop.ini
C:\folder1.ext\dbfile1.ext1
C:\folder1.ext\dbfile1.ext2
C:\folder1.ext\dbfile1.ext3
C:\folder1.ext\dbfile2.ext1
C:\folder1.ext\dbfile2.ext2
C:\folder1.ext\dbfile2.ext3
C:\folder1.ext\dbfile3.ext1
C:\folder1.ext\dbfile3.ext2
C:\folder1.ext\dbfile3.ext3
C:\folder1.ext\images.db
C:\folder1.ext\index.db
C:\folder1.ext\permissions.file
...

In Windows Explorer the NSE presents the directory C:\folder1.ext\ as containing:
FriendlyName1
FriendlyName2
FriendlyName3
Coordinator
Mar 9, 2007 at 8:07 AM
Edited Mar 9, 2007 at 8:14 AM
Matt-

This is a tough problem and the problems you're facing are similar to the ones that prevented us from indexing the content of zip files.

I will post your message to an internal MS alias to see if I can generate some ideas.

In the meantime, here are some things that you may investigate:

Maybe you could create a single start address with the search service. Your protocol handler would have to enumerate all physical drives and do a recursive crawl of all of them looking for your particular junction points. There are obvious perf implications here.

There are several types of crawls that the Windows Search service can perform. RegNamespace uses an incremental crawl, meaning that a full crawl of the data source is done on a schedule. There are also notification based crawls which notify the Windows Search service of changes so that only changed items are updated in the index. The protocol handler for the file system uses notifications and the notifications are enabled by the USN Change Journal. It's possible you may be able to leverage the change journal to detect when your junction points are moved around. There isn't much available on MSDN for this yet, but check out IGatherNotifyInline.

-Jerry

Coordinator
Mar 9, 2007 at 8:19 PM
Matt-

Here is a response from a guy on the Windows Desktop Search team:

"I would make a protocol handler that know how to look for this NSE junction point and crawl the disk looking for them. When he find it, he would implement the IFilter directly in the protocol handler to query the data to emit the sub items. You can use an external IFilter implementation because it's locked not to only one file.

To be more efficient, he could create a unique files extension in his NSE folder and an IFilter that will emit out a DIR_LINK with the his NSE protocol handler. This way when the file system crawls the disk (it know nothing about NSE), it will emit out a notification to his protocol handler to process the items under that NSE folder."

Hope this helps.

-JJ