Data Seed

Not Dead
June 29, 2009, 3:07 pm
Filed under: Uncategorized

Thanks to the day job pressures and a series of BSOD at home forcing me to re-install everything from scratch, my plan to actually do something with what I’ve been learning has been placed on hold.  In the mean time here’s Berners-Lee writing about how we can get open government data working.  The short version:

  • Pick something easy
  • Just get on with it
  • Figure out the hard stuff later

Tab You Latter
June 23, 2009, 1:46 pm
Filed under: Uncategorized

Over the last couple of days I’ve had some time to think, but not much time to do.  The first thing I’ve been thinking about is how to test the redirection of requests based on request type for my theoretical new site.  The way this is meant to work is that you publish your data with a URI that looks like:

http://%5Bmy site]/resource/[ID of the item]

If someone makes a request to this URI the server inspects what type of request it is, if it is a “application/rdf+xml” request then it is redirected to:

http://%5Bmy site]/data/[ID of the item]

If it is a request of type “text/html” then it is redirected to:

http://%5Bmy site]/page/[ID of the item]

This allows the you to publish the data with one common URI, but provide different views on the data depending on how it is requested. There are other schemes out there, but this one fits in nicely with the URI rewriting functionality of the .Net MVC framework so it’s what I’m planning to run with.

One thing I’m not sure about is if I make a HTML request to the RDF specific URI (or vice versa), should I be courtious to redirect the request to where I think it should go or at this stage are we past the point of hand holding?  I’m probably going to go with the former unless I find out something different.

As well as thinking about these things I was thinking about how to test it.  I’ve previously mentioned the data browsers Disco and Tabulator.  Disco has been my favourite because I found it easy work with, it’s very straight forward, go to the site, give it a URI, it does the rest.  The drawback with it is that it only works with URIs that are public, if I’m still testing the basics of my application, I don’t want it out there for the whole world to see.  So I gave Tabulator another go and I don’t see how I got it wrong before.  It too is very straight forward.  It’s a Firefox add-in (yet another one…) so anything you can point your browser can see, it can work with.  Need to test if your redirection is working OK?  Simply go to Tools -> Data Browser -> Make Firefox request RDF and toggle as required.

While I was perusing the web I also came across this great introduction to linked data.  It’s written from the standpoint of someone working with library data (I told you they love this stuff) which colours the examples but it does a good job of laying down the foundations.

Keeping the faith
June 20, 2009, 1:29 pm
Filed under: Uncategorized

Today I have two days worth of research to sum up that covers two different areas.  One is general semantic web stuff, the other is practical stuff for publishing data in .Net.  I’ll start with the higher level stuff.  I found this interesting page from 2003 written up by Tim Bray who was involved with coming up with the specification RDF.  It’s made me realise that this stuff is not a recent occurrence, RDF has been around for 10 years at least (which makes it positively ancient in our little web world).  In fact it’s been around long enough, that my research has dug up several sites that have been dedicated to the semantic web/linked data/rdf that were maintained for years, but now have been abandoned as the authors obviously didn’t see enough of this stuff actually happening.

I might just be flushed with the enthusiasm of a someone who’s new to this, but I think that I’ve picked the right time to get on board with this stuff.  I feel like the word is getting out and people are listening.  Semantic technologies are getting more main stream press like the PwC technology forecast to TBL being bought in to help show the UK government how to do open data.  This is in turn getting more developers interested.  This will hopefully lead to more data sets being available (this is the area I’m hoping to contribute to) and once there’s enough useful data someone will build that killer app that will drag everyone in.  The killer app isn’t here yet so it’s still a “build it and they will come” mentality.  I’m happy with that for now.

One of the better wikis on this stuff that I’ve found so far is from, it has sections on tools and ontologies which looks promising even if there entry on the resume ontology is a remarkably sparse for some reason.

If you’re looking for something on the semantic web seems like a good place to start.  In order to get your site spidered you can just ping them and they’ll do the rest.  As they’re still building up their index they’re more than happy for you to ping them as much as you want.  A project for a future date would be to compare this with Yahoo’s Search Monkey.

After digging up all of this really interesting information, I thought my brain would explode, so I’d take a break and think about how I could actually do some of this stuff.  I thought I’d start with looking what others have done in the .Net space.  It was pretty sparse pickings.  The two things that I did find was a C# library for dealing with RDF triples which may or may not be useful to me as I won’t have a RDF data store and I’m expecting the writting out of RDF to be fairly trivial.  This library is one of the projects that has fallen by the way side thanks to lack of take up after 4 years of work.  An interesting project that has been built on top of this library is this LINQ to RDF that allows you to eaisily create SPARQL queries (if you like using LINQ, I’m an old school SQL guy, so it’s not really my thing).

Still on the practical trip, I started to discuss with one of my colleagues how we can implement some of the recomended features of linked data like cool URIs and content negotiation given that ASP.Net webform apps don’t really lend themselves to this sort of architecture.  For cool URIs we have the choice between using the new feature URL rewriting feature in IIS7, the other was to use the (also reasonably new) .Net MVC framework.  I know neither of these ideas are particularly novel to those of you used to the LAMP stack, but it can take a while to convince MS that something is a good idea if it’s not their own.  In our case we’ll probably go with MVC for two reasons.

  1. We’re developers, if we use MVC then all of the rewriting is more within our control rather than the person who’s maintaining IIS.
  2. I haven’t had a reason to do anything with MVC and I’m always looking for something new to learn

So once I’ve gotten past “Hello World” my plan is to:

  1. Publish some dummy data in RDF format
  2. Publish some dummy data in RDFa format
  3. Work on some content negotiation routing so I can display the above data based on the appropriate request
  4. Publish some real data

I’d like to pretend that I won’t go trawling the net for even more articles to read while I’m doing this, but I know that won’t happen.  I’ll keep you all up to date with my progress on both.

New Discoveries
June 17, 2009, 4:15 pm
Filed under: Uncategorized

Thanks to some long running unit tests I’ve been able to make some progress in my research.  It seems that every time I find something new resource it throws up two more resources I have to dig into.  I’m surprised has taken me so long to find The Semantic Web Gang podcast, as it seems to bring together all of the people I knew were involved in the semantic web before I took a particular interest in the semantic web.  It has head techos who are involved in Yahoo’s Search Monkey, OpenCalais and the hosts, Talis plus others.  It covers general semantic web topics including linked data.  The most interesting thing that it’s bought to my attention so far is VoCamp which is something similar to a BarCamp but focused on working out ontologies for linked data for whatever you area of interest happens to be.  I’m going to have to look closer at the ontology for whisky.

In my last post I mentioned that I hadn’t found an ontology for resumes, which I thought was a bit strange.  It turns out that the reason I hadn’t found one was that I hadn’t looked very hard.  It seems that Uldis Bojars did his 2002 master’s thesis on this in Latvia.  While his thesis is written in Latvian and not immediately useful to me, it seems that he has carried on his interest in this topic since completing his research.  He describes how the FOAF profile could be extended to include resume information.  In his use cases he mentions that having a centralised hierarchical list of skills would be very useful for searching for people, but he notes that this is unlikely to happen.  I wonder if he’s seen O*Net?  It’s not hierarchical like he hoped, but it is maintained.  He’ll be a good contact once I don’t feel like a complete n00b at this stuff.

First Steps
June 15, 2009, 10:49 pm
Filed under: Uncategorized

So I’ve spent some time today following up on the links I didn’t get a chance to finish reading through yesterday.  The first one is simply titled “How to Publish Linked Data on the Web“.  This sounds like exactly what I’m after.  Sadly I haven’t had a chance to finish reading it yet as I keep getting distracted by the interesting links in the article.

The most practical link I’ve found so far is for the data browser Disco.  I have previously tried Tabulator but it confused me when it didn’t seem to be doing what I expected it to do (which is different than it not working).  Disco on the other hand makes perfect sense.  It is a little unusual in that it is simply a web page that you provide a linked data URI to and it goes and gets the data and then renders it in HTML for us measly humans to digest.  I need to have more of a play with this and try and follow the data down the rabbit hole.

Before I start trying to create a new ontology I would like to create a Friend Of A Freind (FOAF) file.  The biggest drawback for me is similar to the drawbacks Twitter has for me, none of my friends are living in this world yet.  Oh well, I guess if I create it they will come.  There is also the minor matter of where to host it and what namespace to put it in.  It’s a big decision, it’s not something you really want to be changing.  Given how much practical use I’d have for it is it worth registering a domain name for?  That’s a matter for another day.

Given that I’ve already mentioned the linked data key word ontology I should talk a little about that before I get onto the next link.  Ontologies are important and I can see myself spending way to much time on this.  In the linked data context an ontology is how you describe your world.  If you follow any of the links I’ve provided so far, they’ll explain ontologies and triples much better than I can.  It’s interesting that a couple of disciplines have picked this up and run with it, life scientists and librarians seem to be ahead of the game for example.  So far I haven’t seen anyone in the recruitment space publish anything.  Who knows maybe I’ll be the first?  Anyway, the think that bought on this rambling paragraph is DublinCore which comes up very frequently in anything I’ve read about ontologies.  From my quick perusal it seems to cover mostly things to do with documents, so important for linked data in general, maybe not so relevant to my particular case.  Either way, it needs closer inspection.

So from a practical point where am I at right now.

  • I’ll need to build two views of our data, one some pretty HTML probably marked up as hResume and another expressing the same information as linked data.  I’m not super familiar with hResume, but it strikes me as being woefully inexpressive.  Well I guess that’s why we have linked data.
  • I’ll have to come up with a decent URI namespace for all of this data to live in
  • The how to publish stuff seems very straight forward if you have an existing web infrastructure from what I can see, coming up with nice names for everything (URIs and ontologies) is what gives people the sleepless nights.

Planting the seed
June 14, 2009, 5:57 pm
Filed under: Uncategorized

Hello world.  I’m Steve.  I’d like to say that I’ll be your host, but that’s not really what I see this blog as being about.  What I want this blog to be about is a story of me learning about what I see as the most exciting technology at the moment, Linked Data.  For a start, does it deserve the capital letters I’ve just given it?  I don’t know.  There’s lots I don’t know.

So I’ll start at the begining with what I know so far.  It all started when I recently watched the talk given by Tim Berners-Lee at TED about linked data, if you want to know what all of this is about, Sir Tim can explain in 15 minutes better than I ever could.  While I am a techo, it does have to be something special for me to get excited.  This talk got me excited.  So I started looking into things a bit more which lead me to site, this has some more practical presentations and I’m working through them when I can (finding time to watch one hour presentations is a little tricky).  Through the linked data site I also found the AI3 blog which is written by someone who is actually doing this stuff.

This feels like a good primer, but I really need to be doing this stuff in order for me to truley get it.  So I’m hoping that I can convice my company that I can take this on as a 20% project.  Seeing as I don’t work for Google, I might be lucky to get away with ~10%, but we’ll see how we go.  I think I might have a good chance, we’re sitting on top of recruitment data and there have been noises in the past about how we can open it up to the world, I think this is it.  If I do manage to sell the idea, this blog and it’s associated research will be updated much more frequently, otherwise it will end up being something I squeeze into my “spare” time.

Anyway, that’s enough for now, if you have somehow stumbled across this blog and would like to point me in the direction of good resources, I’m open to anything.