I've recently completed a project for Swiss Television and Radio. Swiss Television currently uses a Lucene based solution but after comparison with standard Google search we decided that redeveloping the Lucene solution would be more costly than opting for Google Search Appliance (GSA).
The GSA is delivered as a snazzy yellow rack-mounted server that looks like a bit piece of Swiss cheese. It should be connected to a UPS (uninterruptible power supply). Initial setup can be done via a laptop and a crossover network cable. You will want to configure your local DNS, SMTP (mail) server and NTP (time) server. You can also configure a HTTP and Ping and Traceroute (icmp) target for diagnosing network errors. In short the basic tests you would need to analyze network connectivity. In my case the set-up was performed by our SysAdmin. The interface is multilingual. We had a French edition and most, but not all, of the text was in French.
If more than one person is going to be developing the search application it probably makes sense to set up some additional user accounts as well as changing the default admin password from test. There are two levels of user account: administrators and managers. People who will just be configuring search front ends should be given a manager level account, this stops them doing anything they shouldn't such as deleting front-ends and collections and simplifies the user interface. You can stop browsers storing this account information locally.
Now it is time to give the Google Spider some food. The GSA can consume a wide variety of sources including public web content (probably already indexed by Google and others) and documents on company intranets included databases, CMS and even Google apps such as docs and gmail.
You have to supply a list of URLs telling the GSA where to start crawling and also URL patterns to match.
It is worth noting that the GSA respects no-follows tags and robots.txt.
Swiss TV runs a big news room with constant updates of current events in their eScenic CMS. One of the requirements was timely updates of the GSA index. One way of doing this is via a connector, a custom bit of code that links the CMS to the GSA. However eScenic doesn't (yet) have a connector however GSA can work with Web feeds, such as the RSS feed from the CMS.
You may want to include images, media or audio in your web search. Standard Google search will include an image where it finds a video embedded in a web page as a hint to searchers that the page includes video content. It seems that the Google indexer has been modified to search for links to video files (.flv, .avi, .mp4 etc) and to extract an image frame to use as a thumbnail.
The Google Search Appliance's indexer does not do this (as of V 6.2). This is probably the right thing to do. Adding video indexing would add to the workload of what is essentially an enterprise search appliance. There is a work around using meta data. MTV.com, which uses the Google Search Appliance, is an example of how to do this.
Searching for say: cilmi, shows a results page where some of the results have thumbnails. If you click on these pages and look at the source you will see a lot of Meta Tags. For example the number of views and a link to the thumbnail:
<meta name="mtvn_views" content="1,102"/> <meta name="thumbnail" content="/shared/promoimages/bands/c/cilmi_gabriella/sweet_about_me/140x105.jpg"/>
It is possible to extract this meta information in the front end's XSLT. You can show all the meta tags in a page by changing the show_meta_tags variable to 1 (true).
<xsl:variable name=“show_meta_tags”>1</xsl:variable>
The following block of code will loop through all the MetaTags in the page looking for the name “thumbnail” with a non-empty content section (@V). It then generates an HTML anchor element with the image as the clickable element.
<xsl:for-each select="MT"> <xsl:if test="@N='thumbnail' and @V!=''"> <a href="{$protocol}://{$escaped_url}"> <img align="left" height="60" width="80" src="{@V}"/> </a> </xsl:if> </xsl:for-each>
Meta tags can also be used to create collections, which are essentially views or subsets of the search index. Thus we could have a collection just concerned with videos
The Google Search Appliance is supplied with a default frontend, called not unsurprisingly default_frontent. On my Swiss version of the system this is found under the Frontaux (Front Ends) menu item. The default front end looks pretty much like the good old Google search engine. The tabs are actually links to Google's online search engines. In fact the only real difference you'll notice is that it is powered by the Google Search Appliance.
The default front-end XSLT is around 4000 lines. Parsing and generating a results page takes a lot of compute resources so once this is done it gets cached by the search appliance. None of the changes you make will be visible in the test center or search interface. The GSA seems to cache for about a day. However you can force a reload using the proxyreload query parameter:
http://www.mysite.ch/search?client=test_fe&proxystylesheet=test_fe&proxyreload=1
The Fly Shop example uses an iFrame to redirect the results of the search to the form page. I'm not a big fan of iFrames but they are quick and dirty.
<form method="GET" target="ResultFrame" action="http://www.mysite.ch/search">
However if you click on a link the page appears within the iFrame, not necessarily what you want. To replace the search form with the target page go the to XSLT and search for the following code
<xsl:if test="$link"> <xsl:text disable-output-escaping='yes'><a target="_parent" href="</xsl:text>
insert a target=“_parent” just before the href attribute.