Differences

This shows you the differences between two versions of the page.

tech:search:google-search-appliance [2010/03/01 13:05]
davidof
tech:search:google-search-appliance [2010/06/01 10:50] (current)
davidof
Line 3: Line 3:
===== Introduction ===== ===== Introduction =====
-I'm currently working on implementing search for Swiss Television and Radio. Swiss Television currently uses a Lucene based solution but after comparison with standard Google search we decided that redeveloping the Lucene solution would be more costly than opting for Google Search Appliance (GSA).+I've recently completed a project for Swiss Television and Radio. Swiss Television currently uses a Lucene based solution but after comparison with standard Google search we decided that redeveloping the Lucene solution would be more costly than opting for Google Search Appliance (GSA).
===== Set-up ===== ===== Set-up =====
-The GSA is delivered as a snazzy yellow rack-mounted server that looks like a bit piece of Swiss cheese. It should be connected to a UPS (uninterrupteble power supply). Initial setup can be done via a laptop and a crossover network cable. You will want to configure your local DNS, SMTP (mail) server and NTP (time) server. You can also configure a HTTP and Ping and Traceroute (icmp) target for diagnosing network errors. In short the basic tests you would need to analyze network connectivity. In my case the set-up was performed by our SysAdmin. The interface is multilingual. We had a French edition and most, but not all, of the text was in French.+The GSA is delivered as a snazzy yellow rack-mounted server that looks like a bit piece of Swiss cheese. It should be connected to a UPS (uninterruptible power supply). Initial setup can be done via a laptop and a crossover network cable. You will want to configure your local DNS, SMTP (mail) server and NTP (time) server. You can also configure a HTTP and Ping and Traceroute (icmp) target for diagnosing network errors. In short the basic tests you would need to analyze network connectivity. In my case the set-up was performed by our SysAdmin. The interface is multilingual. We had a French edition and most, but not all, of the text was in French.
==== User Accounts ==== ==== User Accounts ====
Line 29: Line 29:
Swiss TV runs a big news room with constant updates of current events in their eScenic CMS. One of the requirements was timely updates of the GSA index. One way of doing this is via a connector, a custom bit of code that links the CMS to the GSA. However eScenic doesn't (yet) have a connector however GSA can work with Web feeds, such as the RSS feed from the CMS. Swiss TV runs a big news room with constant updates of current events in their eScenic CMS. One of the requirements was timely updates of the GSA index. One way of doing this is via a connector, a custom bit of code that links the CMS to the GSA. However eScenic doesn't (yet) have a connector however GSA can work with Web feeds, such as the RSS feed from the CMS.
 +====== Multimedia ======
 +
 +You may want to include images, media or audio in your web search. Standard Google search will include an image where it finds a video embedded in a web page as a hint to searchers that the page includes video content. It seems that the Google indexer has been modified to search for links to video files (.flv, .avi, .mp4 etc) and to extract an image frame to use as a thumbnail.
 +
 +The Google Search Appliance's indexer does not do this (as of V 6.2). This is probably the right thing to do. Adding video indexing would add to the workload of what is essentially an enterprise search appliance. There is a work around using meta data. MTV.com, which uses the Google Search Appliance, is an example of how to do this.
 +
 +Searching for say: cilmi, shows a results page where some of the results have thumbnails. If you click on these pages and look at the source you will see a lot of Meta Tags. For example the number of views and a link to the thumbnail:
 +
 +<code xml>
 +<meta name="mtvn_views" content="1,102"/>
 +<meta name="thumbnail" content="/shared/promoimages/bands/c/cilmi_gabriella/sweet_about_me/140x105.jpg"/>
 +</code>
 +
 +It is possible to extract this meta information in the front end's XSLT. You can show all the meta tags in a page by changing the show_meta_tags variable to 1 (true).
 +
 +<xsl:variable name="show_meta_tags">1</xsl:variable>
 +
 +The following block of code will loop through all the MetaTags in the page looking for the name "thumbnail" with a non-empty content section (@V). It then generates an HTML anchor element with the image as the clickable element.
 +<code xml>
 +<xsl:for-each select="MT">
 +    <xsl:if test="@N='thumbnail' and @V!=''">
 +        <a href="{$protocol}://{$escaped_url}">
 +          <img align="left" height="60" width="80" src="{@V}"/>
 +      </a>
 +    </xsl:if>
 +</xsl:for-each>
 +</code>
 +
 +Meta tags can also be used to create collections, which are essentially views or subsets of the search index. Thus we could have a collection just concerned with videos
 +
 +===== Collections =====
 +
 +
 +====== Front Ends ======
 +
 +The Google Search Appliance is supplied with a default frontend, called not unsurprisingly default_frontent. On my Swiss version of the system this is found under the Frontaux (Front Ends) menu item. The default front end looks pretty much like the good old Google search engine. The tabs are actually links to Google's online search engines. In fact the only real difference you'll notice is that it is powered by the Google Search Appliance.
 +
 +====== Google Search Appliance Problems ======
 +
 +==== Frontend XSLT not getting refreshed? ====
 +
 +The default front-end XSLT is around 4000 lines. Parsing and generating a results page takes a lot of compute resources so once this is done it gets cached by the search appliance. None of the changes you make will be visible in the test center or search interface. The GSA seems to cache for about a day. However you can force a reload using the proxyreload query parameter:
 +
 +http://www.mysite.ch/search?client=test_fe&proxystylesheet=test_fe&proxyreload=1
 +
 +===== Breaking out of the Iframe =====
 +
 +The Fly Shop example uses an iFrame to redirect the results of the search to the form page. I'm not a big fan of iFrames but they are quick and dirty.
 +
 +<code html>
 +<form method="GET" target="ResultFrame" action="http://www.mysite.ch/search">
 +</code>
 +
 +However if you click on a link the page appears within the iFrame, not necessarily what you want. To replace the search form with the target page go the to XSLT and search for the following code
 +
 +<code xml>
 +<xsl:if test="$link">
 +  <xsl:text disable-output-escaping='yes'>&lt;a target="_parent" href="</xsl:text>
 +</code>
 +
 +insert a target="_parent" just before the href attribute.
 +
 +===== References =====
 +
 +http://code.google.com/apis/searchappliance/documentation/62/admin_searchexp/adv_customization.html
tech/search/google-search-appliance.1267448715.txt.gz · Last modified: 2010/03/01 13:05 by davidof
Recent changes RSS feed