Imaging a visitor that enters his website URL into a textbox and when he clicks the submit button, you are able to retrieve all kinds of information from the guy. His name, company info, online profiles, interests etc. all this from just a URL. It’s actually pretty easy if the website contains information about FOAF, APML or SIOC documents.

What you have to do is to download the HTML from the website and look for <link> elements in the header that matches FOAF, APML or SIOC type links. Then retrieve the URL to those documents from the href attribute and load it into an XML document. Now you can use XPath to find all the information you need.

Here’s is what a FOAF link element looks like:

<link type="application/rdf+xml" rel="meta" title="FOAF" href="http://example.com/foaf.xml" />

SIOC and APML links uses the same attributes in the same way, so we can use the title attribute to figure out which kind of document it is. All we need is a method that uses regular expressions to retrieve the document URLs from the HTML.

The code

This is a method that finds all the semantic links of a certain type in a HTML string.

private const string PATTERN = "<head.*<link( [^>]*title=\"{0}\"[^>]*)>.*</head>";

private static readonly Regex HREF = new Regex("href=\"(.*)\"", RegexOptions.IgnoreCase | RegexOptions.Compiled);

 

/// <summary>

/// Finds semantic links in a given HTML document.

/// </summary>

/// <param name="type">The type of link. Could be foaf, apml or sioc.</param>

/// <param name="html">The HTML to look through.</param>

/// <returns></returns>

private static Collection<Uri> FindLinks(string type, string html)

{

  MatchCollection matches = Regex.Matches(html, string.Format(PATTERN, type), RegexOptions.IgnoreCase | RegexOptions.Singleline);

  Collection<Uri> urls = new Collection<Uri>();

 

  foreach (Match match in matches)

  {

    if (match.Groups.Count == 2)

    {

      string link = match.Groups[1].Value;

      Match hrefMatch = HREF.Match(link);

 

      if (hrefMatch.Groups.Count == 2)

      {

        Uri url;

        string value = hrefMatch.Groups[1].Value;

        if (Uri.TryCreate(value, UriKind.Absolute, out url))

        {

          urls.Add(url);

        }

      }

    }

  }

 

  return urls;

}

Example

To find all the FOAF links in a page you can write something like this:

using (WebClient client = new WebClient())

{

  string html = client.DownloadString(txtUrl.Text);

  Collection<Uri> col = FindLinks("foaf", html);

 

  foreach (Uri url in col)

  {

    XmlDocument doc = new XmlDocument();

    doc.Load(url.ToString());

    Response.Write(Server.HtmlEncode(doc.OuterXml));

  }

}

If you want to search for APML or SIOC then just replace “foaf” with either “apml” or “sioc” in the method parameter. You might also want to take a look at my experimental FOAF parser class.

I was tag-team wrestled by Keyvan Nayyari and Janko today. They wanted me to take up the challenge of writing about my programming history. Since they are two seriously cool dudes I decided to play along.

How old were you when you started programming?

Eighteen years young.

How did you get started in programming?

Red and white wine. That was my business ten years ago. I ran a small wine import business during college and my wines where so good I drank most of it myself. That’s when I knew I had talent. So I started programming.

What was your first language?

VB 5 or 6 - I don’t remember exactly.

What was the first real program you wrote?

The first version of my Prison Bitch Name Generator, written in VB 6, revolutionized modern English forever. There's an online version of it here made by someone else.

What languages have you used since?

VB.NET, C#, Java, PHP, Action Script, Lingo (this is a weird language) and all web oriented scripting- and mark-up languages.

What was your first professional programming gig?

The Prison Bitch Name Generator never took off commercially so I had to look for other venues. I started a web design business like 3 billion other people did during the IT bubble. My success was limited but I did manage to build about 50 websites and win a design award with one of them (I wasn't the designer but took full credit like the gentleman I am). The first website must have been for a small Norwegian pharmaceutical company located in Oslo if I remember correctly.

If you knew then what you know now, would you have started programming?

Definitely yes. It’s the most gratifying, creative and challenging thing and it makes me very happy every day.

What is the one thing you would tell new developers?

Rule #1. Buy the most expensive pair of Ray-Ban’s you can find. You probably look dorky like the rest of us programmers, but with a pair of Ray-Ban’s you look like a rock start. Don’t fall into the trap that any pair of shades will do no matter the price, and take pride in wearing them 24/7/365. 

Rule #2. When a girl ask what you do for a living, lie to her. Here are some good job titles I've had great success with over the years.

  • Pet detective (girls like animals for some reason)
  • Organic chef (girls like organic food for some reason)
  • Hybrid car designer (girls like the environment for some reason)
  • Bestselling novelist (girls like to read for some reason)

What’s the most fun you’ve ever had … programming?

That’s without a doubt when I learned about the semantic web and the process of teaching myself how to implement the various technologies in ASP.NET. It only became more interesting when I learned how to consume, aggregate and do cool things with semantic technologies on the web.

And with those words I’d like to pass the torch to James Avery, Al Nyveldt and John Dyer.