RDF For The Rest Of Us
Published on July 30, 2007
You have a website full of information, and you want to make it easier for people to reuse it—but what format should you publish it in?
You’ve looked at various XML schemas and microformats, but none of them really describes all the information that you want to publish. Should you invent a new format? It’s tempting, but if you could use an existing standard, your data would be immediately interoperable with other data published in the same way; people could reuse the tools, parsers and stylesheets built around that standard. But unfortunately, nothing seems to really fit your needs.
This is where RDF comes in.
What is RDF?
RDF stands for Resource Description Framework. A resource is simply a thing; a person, a book, a keyboard, a blog post, a fish tank, an idea: any thing that can be described. RDF is a framework that uses the architecture of the web to describe a resource. Just as HTML allows you to link your document to other documents on the web, RDF lets you link a resource to other resources on the web.
RDF for Web Publishers
RDF is different. It’s not a format; it’s a framework for describing data. What RDF does is to take a step back; instead of giving you a fixed set of terms to label your data with (such as HTML’s elements or a microformat’s class names), RDF provides you with a framework in which you can mix-and-match terms from existing vocabularies—or invent your own—in whatever combination best suits your particular content.
If you find a vocabulary that can be used to describe some part of your data, then great, that data can be interoperable; you still have the freedom to describe the rest of your data with terms from as many other vocabularies as you need. If you can’t find any terms to describe what you need, well, you can just write them yourself.
For publishing your data, RDF offers you great flexibility and interoperability. What this interoperability means is not just that your data will work with existing tools, or that it can work along side other data in the same format, but that your data can actually connect with other data across the web. This is what the Semantic Web—the web of data—is; just as HTML lets you connect your page to other pages on the web, RDF lets you link your data to other data on the web.
Fantastic. But it’s only worth publishing if people will use it—so why would people want your data in RDF?
RDF for Web Developers
There are two things about RDF that make it really nice for web developers to use.
All RDF Data has the Same Shape
What is an RDF Triple?
- Subject represents the thing (resource) that the statement is about, and is always a Uniform Resource Indicator (URI).
- Predicate is the name of a property of the resource (such as the name of a database record’s field), and is always a URI.
- Object is the value of that property, and can be a URI or a literal (text, numbers, dates, etc.).
Each triple is saying: this resource has this property, which can either be a literal value, or another resource (represented by the URI).
What is so good about this? It’s a little like the Ruby on Rails mantra “Convention over Configuration”; when you have only one structure, you don’t have to think about it any more—you don’t have to change your code and your database schema to accommodate a different kind of data. You can import any RDF, and it will “just work”.
RDF uses URIs
Having data that uses the same basic structure, however, isn’t enough to make it interoperable. XML documents often use id attributes, and relational databases use primary keys as unique identifiers. But outside of their own systems, these identifiers are meaningless. We could both have WordPress blogs, but the primary key 23 in your articles table identifies a different article in your database than it does in mine. RDF, on the other hand, uses URIs, which are the primary keys of a much bigger system: the world-wide web.
So we have a decentralized, fairly democratic means of uniquely identifying stuff. Anyone with some web space can create a URI to represent something (our RDF resource—see above) in a way that can be understood right across the web.
There doesn’t need to be a web page at your URI, but it’s good practice to have a human- and/or machine-readable description of what the URI represents at the other end. It is the nature of the web that anyone can say anything about anything, so you can’t control what other people say about your URI, but by publishing a description at your URI, you can at least have the final word on what you intended it to mean.
By making all data the same shape, and making the things the data describes uniquely identifiable across the web, RDF makes your code and your data more portable, more interoperable, and more useful.
How to use it
Publishing data in your HTML with eRDF
I said before that RDF can be written in lots of different ways; one of those ways is eRDF. It’s a simple way of embedding RDF statements into HTML by using attributes such as
href in a special way.
To write eRDF-enhanced HTML, first we add a
profile attribute to the opening
profile is one of HTML’s lesser-known attributes. In it, you can specify a URI (or a space-separated list of URIs). User-agents can use these URIs to determine how to extract data from the document. In our case, we are using it to say that this document can be parsed as eRDF.
Next, we declare any vocabularies we might want to use, using
link elements in the
head of the document. We will use the FOAF and Dublin Core vocabularies. FOAF is a popular vocabulary for describing people and organizations; Dublin Core is a vocabulary for metadata, providing terms such as title and date:
<link rel="schema.dc" href="http://purl.org/dc/elements/1.1/" /> <link rel="schema.foaf" href="http://xmlns.com/foaf/0.1/" />
rel attribute is telling the eRDF parser that we will refer to the FOAF vocabulary using the prefix
foaf, and to the Dublin Core vocabulary using the prefix
dc. We will then be able to use class names and
rev attributes to describe data in the document using terms from these vocabularies. These ‘eRDF-style’ attributes take the form @prefix-term@—if you’re a programmer, you can think of the prefix as a constant that represents the URI of the vocabulary. We can give the FOAF vocabulary any prefix we like, but it seems sensible to give it the prefix
foaf. (Note that the
href attribute of any vocabulary links you use must end in either a slash or a pound-sign [#].)
Now we can describe the information on the page using the terms in those vocabularies.
For our example, we’ll mark up Digital Web’s contributor profile template:
<div class="vcard -foaf-Person" id="person"> <!-- We''ll add details here in a moment--> </div>
The first thing to note is that we’ve given an
id to the
div. This is important. Remember I said that RDF uses URIs to talk about things? Well, by giving this
div a fragment identifier, we are creating a URI for it. If the page’s URL is http://www.digital-web.com/about/staff/nick_finck, our profile’s URI is now http://www.digital-web.com/about/staff/nick_finck#person.
In eRDF syntax, the fact that the class name, in this case
-foaf-Person, begins with a hyphen indicates that this class name is describing the type of thing that our
#person is. What this means is that the thing represented by this
#person) is a person (as defined in the FOAF vocabulary).
You may also have noticed the
vcard class name. Digital Web already marks up its profiles with the hCard microformat, but eRDF can coexist quite peacefully with microformats. The
vcard class name has no effect on the eRDF, or vice-versa. Here we’re using hCard to make it possible to transform the data to vCard, and eRDF to be able to extract the pure information.
OK, lets fill in some of the details:
<div class="vcard -foaf-Person" id="person"> <h1 class="foaf-name fn n">Nick Finck</h1> <img src="/images/profiles/nick_finck.jpg" alt="Nick Finck" class="foaf-img photo" /> <!-- we''ll add some more details in a moment --> </div>
foaf-name class name means that the contents of that element (in this case, the
h1 element) is the name (as defined by the FOAF vocabulary) of this
div#person, because it is the nearest ancestor with an
How To Read eRDF
In eRDF, where the element has an eRDF-style class name (
prefix-term), the nearest ancestor with an id is the subject, the class name is the predicate, and the object is either:
- The src attribute, if it exists, or
- The current element if it has an id, or
- The title attribute, if it exists, or
- The text contents of the current element.
<img src="/images/profiles/nick_finck.jpg" class="foaf-img" /> means: ”div#person is depicted by the image at /images/profiles/nick_finck.jpg”
Right, so let’s add some more markup:
<div class="vcard -foaf-Person" id="person"> <h1 class="foaf-name fn n">Nick Finck</h1> <img src="/images/profiles/nick_finck.jpg" alt="Nick Finck" class="foaf-img photo" /> <dl> <dt>Birthplace:</dt> <dd class="cv-birthPlace"> Portland, Oregon, United States </dd> <dt>Personal web site/Portfolio:</dt> <dd> <a href="http://www.nickfinck.com/" class="url" rel="foaf-homepage" rev="foaf-maker"> NickFinck.com </a> </dd> </dl> <!-- we''ll add publications in a moment --> </div>
Have a look at the Personal web site/Portfolio definition. Here, we’re not using any special eRDF class name, but we’ve got eRDF style
rev attributes. The HTML spec says
rel defines the relationship that this document has with the URL being linked to, whereas
rev defines the reverse of that—the relationship the URL being linked to has with this document.
In RDF-triple terminology, where we have an eRDF-style
The subject is the nearest ancestor with an
The predicate is the
relattribute value (in this case,
The object is the
And, where we have a
rev, the opposite is true:
the subject is the
the predicate is the
revattribute value (in this case,
the object is the nearest ancestor with an
So what we are saying is that our person (represented by
div#person) has a homepage, which is http://www.nickfinck.com/, and that homepage has a was made by (maker) our
Now have a look at the birthplace definition. FOAF doesn’t have a term for birthplace, so we need another vocabulary. I searched for ‘birthplace’ on SchemaWeb and found the CV vocabulary for describing Resumé information. I checked out the definition and it fits with how I want to use it, so if I add
<link rel="schema.cv" href="http://captsolo.net/semweb/resume/cv.rdfs#"/> to the
<head> of my document, I can use terms from the CV vocabulary.
Let’s move on to the publications section. Still within our
div#person, it looks like this:
<h2>Digital Web Articles</h2> <ol> <li class="foaf-made -bib-Article" id="dw-articles-1"> <h2 class="dc-title"> <a href="/articles/industry_transformation/"> The Transformation of an Industry </a> </h2> <p class="date"> Published on <span class="dc-date">May 13, 2004</span> </p> </li> <!-- lots more articles with the same markup --> </ol>
Again, if we want to describe something, like an article, we need to give it a URI, which means giving it an id in our html. And we need another vocabulary to give us terms to describe our article. Having had a look on SchemaWeb again, I’ve opted for a Bibtex vocabulary and added
<link rel="schema.bib" href="http://purl.org/net/nknouf/ns/bibtex#"/> to the document’s
Take a look at the
li. What we are saying here is that
#dw-articles-1 is an article. We’ve also used the Dublin Core vocabulary (dc) to give a publication date and title to the article. Note that
dc-date are properties of
#person, because now
#dw-articles-1 is the closest ancestor with an
You now know how to mark up a profile and simple bibliography. But more than that, you can use the same techniques to descriptively mark up anything you want (people, places, products, abstract concepts, chemical structures, bulletin boards, historical events, news items – anything!) You just need to find—or create—the vocabulary terms you need.
As you start to take things further, you might want to refer to the eRDF specification, which explains how to write eRDF in a little more detail.
Congratulations—your semantic HTML, whatever it may contain, can now be easily parsed into RDF.
Using RDF: a simple RDF mash-up with Exhibit
Making a mashup is simple:
- Copy and paste (and, if you want, customize) this template
<link rel="exhibit/data" />elements to point to the data you want to use, anywhere on the web.
- Load up the page in your browser. That’s it.
You can find further instructions on making Exhibits at the Simile Wiki.
RDF is the core language of the Semantic Web. What this means is that, by publishing your data as RDF on the web, by using terms from existing vocabularies where you can, and by linking to other data, you can increase the usefulness of both the data you link to, and your own. There is an increasing amount of RDF data available on the web; some sources are listed on the Linking Open Data wiki.
Even without the wider Semantic Web however, RDF’s flexibility and uniform structure make it an extremely useful form in which to have your data. You don’t need to wait for everybody else to do it too; you can use it today.
- Search for Vocabularies on SchemaWeb
- Write your own vocabularies with Protegé or SWOOP
- Find RDF libraries and Toolkits for your favorite programming language
- Have a look at a very big list of RDF and RDF-related tools presented as an Exhibit
- SIMILE’s tools
- OpenLink RDF Browser is an AJAX-powered RDF browser
- Open Linked Datasets
- List of Datasets compiled by SIMILE
- Bob DuCharme’s List of RDF Datasets
- dbpedia – Wikipedia as a query-able RDF database
- eRDF – the eRDF specification, with examples and explanations
- Wiki page about eRDF – links to tools, examples and tutorials for eRDF
- RDFa is the W3C’s official RDF-in-HTML syntax. Very similar to eRDF, it offers more syntactic power at the cost of greater complexity and requiring new attributes
- Microformats are conventions for semantically representing certain types of information in (X)HTML—this information can also be transformed into RDF via GRDDL
- GRDDL is a specification describing how documents can declare an XSLT stylesheet (or other transformation script) with which RDF can be extracted from them
- Get Semantic is a community for discussing and promoting a variety of approaches to semantic HTML
- How to Publish Linked Data on the Web
Keith Alexander is a data hungry markup junkie who likes semantic web technologies, day-dreaming, and chasing the elusive bigfoot.