Search by Page module for Drupal This module adds searching to the core Drupal search module that is oriented by page. It can be used as an additional tab in the core Search page, or you can display Search by Page separately (though it requires the core Search module to be enabled, because it uses Search for its indexing). Contents of this file: - How it works - Setup and Configuration - Usage - Theming - Users and roles - Other suggestions -- How it works -- The core Search module works by indexing your content whenever cron is run, and then looking in that index when someone requests a search on your site. The Content tab of Search indexes all the content items on your site managed by the core Node module, by loading the content item and indexing the resulting content (including the body and other fields, as well as comments, following the display settings on the content type). The User portion actually doesn't index users -- at search time, it just looks for a user name that matches (User Search doesn't look in any other profile fields, except that for an administrative user doing a user search, it looks in the email field as well). Other modules may add other tabs to Search as well. Search by Page, in contrast, indexes the content of pages on your site, which could be content pages, user profile pages, composites of content (such as Views), or pages that are generated by other modules. The indexing in Search by Page is done by first building the "content" region of each page to be indexed, in the language(s) appropriate to the page and with the viewpoint of the role you configure, and then adding that content to the Search index. Note that only the "content" region of the page is indexed, not the sidebars, header, footer, or other block regions in your theme. Also note that what is indexed is what is output by your theme for each page, in contrast to core Search, which does not depend on the theme's rendering of the page. Search by Page also restricts search results to the currently-enabled language. The core Search module only does this for content search, and only if you have the Internationalization module enabled. One other difference between Search by Page and the usual content search of the core Search module is in reindexing. Search by Page assumes that your page content might change over time, so it periodically reindexes the pages on your site, giving priority to pages that have been edited. In contrast, the core Search module assumes that if a content item hasn't been edited, it doesn't need to be reindexed. You have some additional control over this reindexing -- see the configuration section below. Your site may experience errors during content indexing, which you can see in the Recent Log Entries report. The typical reason is that the item that is being indexed cannot be viewed by the user role you chose for indexing; other errors are also possible. If this happens, Search by Page will still mark the page as "indexed" in the search index, so that in the next cron run, it will not try to index the page again and block working items from indexing. If you ever want Search by Page to try indexing the failed pages again (after fixing the cause of the error, presumably), there is a link to reset items with no content in the index. This is located in the "Additional Actions" section of the Search by Page configuration screen. -- Setup and Configuration -- Search by Page does not know what the pages of your site are, so it doesn't index anything by itself. You will need to enable and configure at least one sub-module that lets you add paths to the search index, in order for this module to do anything. You will also need to set up one or more search "environments". Each environment defines which paths are searchable, and has its own search URL and search block. You will also need to make sure that Search by Page is enabled on the main Search configuration page. Four sub-modules are provided: "Paths" to index arbitrary paths to pages on your site, "Nodes" to index content items of particular content types, "Users" to index user profile pages for users of particular roles, and "Attachments" to index files attached to content items. The "Paths" sub-module is the most generic, but if you put a lot of paths in it, your searches will run slower. (The technical reason is that each time someone searches on your site, this module has to check whether that person has permission to view each page in the list, to exclude pages the person doesn't have permission to view from search results, and this has to be done via a PHP loop rather than an SQL query because of how Drupal permissions work.) IMPORTANT NOTE: If you are using Search by Page Paths, your database must be set up with permission to create temporary tables. The "Attachments" sub-module indexes the text in certain types of files that are attached to content items via the core File field. This requires "helper" programs to extract the text from file attachments, and the helper programs are configured using the separate Search Files module (which you can find at http://drupal.org/project/search_files). It is recommended that you only enable the Search Files API module (and not the other included modules). This will enable just the helper program setup functionality, without enabling the other functionality of Search Files. If you want to write your own sub-modules, see the search_by_page.api.php file included with this module (or use one of the included sub-modules as an example). Once you have enabled sub-module(s), visit the path admin/config/search/search_by_page to set up search environments and define pages to index for each environment. Then wait for cron to run (or visit the status report page, admin/reports/status, and click on "run cron manually"). No pages will be indexed until cron has run, and no search results will come out until pages have been indexed. Other configuration options: * You can change various labels and other text on the Search by Page configuration pages. * You can set the number of items Search by Page will index per cron run on the Search by Page configuration pages. This is independent of the indexing settings for the core Search module. If you are using Search by Page as an independent search (rather than as a tab on the core Search page -- see section below), you might want to set the core Search settings cron limit to zero, and turn off searching for the core Node and User modules (on the core Search Settings page), so that only Search by Page items are added to the search index. * You can control the reindex cycling described in the How it Works section above by using the minimum/maximum reindexing time settings, which are on a per-module, per-environment basis. Setting the minimum reindexing time forces Search by Page to wait at least this amount of time before reindexing that type of page. Setting the maximum reindex time forces Search by Page to reindex that type of page immediately when this amount of time has passed. WARNING: Do not choose too small of a maximum reindex page globally! This setting works by marking the pages for immediate reindexing when this time has passed, and it can interfere with the reindexing of new content. * You can exclude the contents of specific HTML tags from indexing. * You will also need to set permissions, which are separate from the core Search permissions. * You should also visit the main Search configuration screen, where you can set options such as the number of items to index each cron run for core Search modules, and the minimum word size for searching. You can also watch the progress of indexing on that page (there is a detailed table near the bottom, in the Search by Page section). -- Usage -- You have two choices for how to use the module, once you have it set up: a) There will be a new tab (called "Pages" by default), included in Drupal's built-in search. If a site visitor performs a search from that tab, they will get the Search by Pages results. This will use whichever search environment you have set as the default, and may be useful if you just want to use Search by Page to add a few pages or files to Drupal's existing search functionality. b) You can also use Search by Page as its own entity. You will need to set up your search environment(s) so that all the content you want to search is available. To run Search by Page as its own entity, enable the Search by Pages blocks for your search environments, and/or add a link to the paths you have defined for your search environments to your navigation menu system. You will also need to make sure that Search by Page is an enabled search module on the Search configuration page (admin/config/search/settings). -- Theming -- The search form that is used by Search by Page on search pages and search blocks can be themed using the search-by-page-form.tpl.php file provided (copy that file into your theme and modify it). Search results are themed using the search-result.tpl.php (each result item) and search-results.tpl.php (the list of results) theme files from the core Search module (in directory modules/search in your Drupal installation). If you are using Search by Page Attachments, there is an additional variable available $result['related_node'], which gives you the node object that the attachment is attached to. -- Users and roles -- When you set up content items, attachments, etc. for searching within Search by Page, you will need to choose a role to use for search indexing. This will make Search by Page render your pages from the point of view of a user with that role. In order to do this, assuming you have used a non-anonymous role, Search by Page will create its own user accounts for internal use, which you will see on your Users management page. For instance, if you set up Search by Page Nodes to index from the point of view of role "My role", Search by Page will set up a user called "sbp indexing My role" with role "My role". The users that Search by Page sets up will always have their status set to "blocked". During search indexing, the account is set to "active" only temporarily, and only for the indexing process, so no one should ever be able to see these users except site administrators. -- Other suggestions -- The default behavior for Drupal's core Search module (which is the technology used for indexing/searching in Search by Page) is that only exact matches are returned (except for the User search portion of core Search, which matches substrings of user names). For instance, this means that if you search for "quake", and a page contains "quakes", "quaking", or "earthquake", it will not be matched. To get around this limitation, I suggest using a "stemmer" module, such as http://drupal.org/project/porterstemmer (You can search for "stemmer" on drupal.org to find stemmers for other languages.) Stemmers enable matching on inflected forms of words (verb forms, plurals, etc.), so they should give you matches for "quaking" and "quakes" if you search for "quake". They wouldn't give you a match for "earthquake", however.