Free Your Embedded Data With SearchMonkey
Got something to say?
Share your comments on this topic with other web professionals
In: Articles
Published on June 3, 2008
Arguing for web standards and semantically clean and rich websites is an uphill battle. For years we had to deal with browsers that needed us to mess around with HTML just to display a document in several columns and the visual outcome was much more important than the structure.
The only way we were able to “sell” the idea of structured markup was with SEO and accessibility concerns. Yes, another argument was ease of redesign—the CSS Zen Garden was a great resource for that—but how many times did you really have to redesign a site without changing the system it runs on or the HTML structure?
In any case, it is tricky to make the business decision makers give a hoot about the structure of the HTML, especially after spending a lot of money on a CMS to avoid dealing with it (well, that is the CMS sales pitch—the reality looks a bit different).
Connecting the dots of good markup and SEO
SEO sells to a certain degree, and people love to see their site stand out in search result pages. What if there were a system that would allow you to do that, based on information that is already in your documents? Well, now there is, and it is called SearchMonkey.
SearchMonkey is a developer tool by Yahoo! that allows you to style search results that match a particular url structure differently. You can add information stored in the Yahoo! index, and enhance it with information stored either in the page’s HTML, your own data API, or microformats in the document.
Showing off Facebook data in search results
Let’s take a look at an example, providing a preview of Facebook applications inside the search result page of a Yahoo! search. You can see the difference in the following screenshot:
To get started, go to the SearchMonkey developer tool (you’ll need to be signed in to Yahoo!). You’ll see the introduction screen:
You can either create a new application, start from a sample application or import your own one. Let’s create a new one. Clicking the link will take you to the application dashboard:
In the application dashboard you choose what your application will be: an enhanced result, or an ‘infobar’. The latter is less obtrusive in the results page and needs to be expanded by the user. Let’s enter “Facebook Apps” as the name of the application and choose an infobar. Add a description, pick a category, and (if you want to) upload an icon for your app. Make sure to read and tick the “Terms of Service”, and we are ready to go to the next step.
Next, you need to define the trigger URL for your application. In the case of a Facebook application this will be *.facebook.com/apps/application.php/*
.
Below that are a few text fields and an “Autofind URLs” button. Click the button and SearchMonkey will try to find matching URLs and ping them if they are available.
When they are available a green “Reachable URL” appears behind each field and you can hit the “Save & Refresh” button below. In the carousel following the button you will see a preview of the results, including your new application. Make sure that each result displays correctly. You can also hit the “Input” and “Output” links to check the data retrieved from the different URLs.
All in all it looks like this:
On the URLs step, make sure to test your trigger URLs and check the output.
In the next step you can choose which data source to use for your application. The /files/includes/default.css is what Yahoo! knows about the URL from its search index. If you click the link entitled “Contains 16 data fields” you’ll see all the available information:
This is already a lot, but not enough—I want to show a thumbnail and description of the Facebook application as well. Therefore we need to actually start not with an application, but with a data service.
Not enough data? Create a custom data service
As we cannot get this information, we need to create a custom data service. In the case of Facebook, we can write an XSLT to scrape the information from the web page. Click the “Custom Data Service” link to get to the Data Service Wizard.
The first few steps are the same for data services and applications. You need to provide a name and description for your data service. Choose the “Page” option, and click “Next Step”. You define the data URLs, which are once again *.facebook.com/apps/application.php/*
. The next step gets you to where it gets exciting—the Data Extraction section.
On the Data Extraction screen you can write an XSLT to get any information out of the HTML document. Analyzing the source of Facebook’s application pages, we can write a short XSLT document to grab the description and the thumbnail of the application. The resulting XML should comply with SearchMonkey’s DataRSS specs:
<?xml version="1.0"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:template match="/"> <adjunctcontainer> <adjunct id="smid:{$smid}" version="1.0"> <item rel="rel:Agent"> <meta property="dc:title"> <xsl:value-of select="//div[@class='fbpage_title clearfix']/div/a" /> </meta> <meta property="dc:description"> <xsl:value-of select="//div[@class='app_description']" /> </meta> </item> <xsl:variable name="image-src"> <xsl:value-of select="//div[@id='profileimage']/img/@src" /> </xsl:variable> <item rel="rel:Photo" resource="{$image-src}"></item> </adjunct> </adjunctcontainer> </xsl:template> </xsl:stylesheet>
We paste our XSLT into the large text box and hit the “Save & Refresh” button to generate a sample output for each URL in the carousel:
Check that everything works for all the demo URLs, and then go to the next step to confirm your new data service. Click the “Create a new Presentation Application” link to create an application for this data service.
Jumping to the “Data Services” step now shows you that we have the information from our data service in there, in addition to the /files/includes/default.css information:
On to the next step, and we can finally change the appearance of our application. SearchMonkey uses a single PHP snippet to allow you to change the application’s look-and-feel. This snippet allows for a secure subset of PHP, so you can use string manipulation, loops, and conditions, but not set cookies or load other files.
On the right is the data that is available from your data sources, displayed as a tree menu. Simply highlight the SMDEFAULT
in the code you want to replace and click on the information in the tree you want to replace it with.
For our application, we will:
- Collapse the
yahoo:index
data and expand thesmid:ChJ
data - Replace the
SMDEFAULT
of$ret['title']
with thedc:title data
- Replace the
SMDEFAULT
of$ret['summary']
withdc:description
- Replace the
SMDEFAULT
of$ret['image']['src']
with@resource
.
Hit “Save & Refresh”—now you can expand and collapse your application bar in the preview pane to display the Facebook application description and thumbnail:
When you are happy, click “Next Step” and it’s done. You can check the app with the “Try” link, or start sharing it immediately.
(When you try your new application, you will be prompted to save your “My Apps” section of Yahoo!. You can either return to the developer tool, or try your app on a Yahoo! Search.)
Microformats and RDF
If you were insightful enough to support microformats or you are using a publishing tool that supports embedded RDF, you can save yourself the step of setting up a data service.
As an example, let’s use the BBC’s schedules.
Getting BBC schedules into search results
Start a new app named “BBC Schedules” and select ‘infobar’ as the format. The trigger URL pattern is *.bbc.co.uk/programmes/*
. Once you pinged the demo URLs and found all to be working you’ll see in the data services step that SearchMonkey found both the hcalendar and hcard data embedded in the BBC pages:
public static function getOutput() { $ret = arp define("SMDEFAULT", ""); $ret['title'] = Data::get('com.yahoo.uf.hcalendar/rel:Event/vcal:summary'); $ret['summary'] = SMDEFAULT; $ret['image']['src'] = "http://www.creativelondon.org.uk/upload/img_100/BBC_Logo.jpg"; $ret['image']['alt'] = SMDEFAULT; $ret['image']['title'] = SMDEFAULT; $ret['image']['allowResize'] = true; $ret['links'][0]['text'] = SMDEFAULT; $ret['links'][0]['href'] = SMDEFAULT; $ret['links'][1]['text'] = SMDEFAULT; $ret['links'][1]['href'] = SMDEFAULT; $ret['links'][2]['text'] = SMDEFAULT; $ret['links'][2]['href'] = SMDEFAULT; $ret['dict'][0]['key'] = "Show"; $ret['dict'][0]['value'] = Data::get('com.yahoo.uf.hcalendar/rel:Event/vcal:summary'); $ret['dict'][1]['key'] = "Starts"; $ret['dict'][1]['value'] = date('l j F g.ia',strtotime(Data::get('com.yahoo.uf.hcalendar/rel:Event/vcal:dtstart'))); $ret['dict'][2]['key'] = "Ends"; $ret['dict'][2]['value'] = date('l j F g.ia',strtotime(Data::get('com.yahoo.uf.hcalendar/rel:Event/vcal:dtend'))); $ret['dict'][3]['key'] = "Channel"; $ret['dict'][3]['value'] = Data::get('com.yahoo.uf.hcalendar/rel:Event/vcal:location'); $ret['infobar']['summary'] = SMDEFAULT; $ret['infobar']['blob'] = SMDEFAULT; return $ret; }
If you use this as your appearance code, every result from the BBC programmes section will have all of this information in it:
Information stored in hCalendar shown inside search results might save visitors one click—they know when their show starts.
What does all of this mean to you?
You might now ask how SearchMonkey can affect your life as a web developer. The answer is: as much as you want it to. SearchMonkey is coming and Yahoo! will allow all of their hundreds of millions of end users to add applications to make their search experience more interesting and find the information they are looking for faster. You can be part of that, and create some of those applications.
It also means that you have one more argument for adding microformats to your pages or offering a clean and easy to grasp document structure. People will be able to take information embedded in your documents and display it in search result pages—something every client would love search engines to do.
Check out the SearchMonkey information page to get started or just dive into developing your first monkey.
Related Topics: XHTML, Web Standards, Search Engine Optimization (SEO)
Christian Heilmann is a contributing writer for Digital Web Magazine. Most of his publications are written on the underground travelling through London, as there is simply nothing else to do there. One day he’d like to hand out his own book to others trapped there.