Semantic Wiki, MediaWiki and Blog, defined and ideas

The idea of the Semantic Blog has been circulated for some time. It is becoming closer to a reality thanks to the combined efforts of Wikipedia and it’s contributors. The idea starts behind the Semantic Wiki.

From Wikipedia’s entry on a Semantic Wiki.

A Semantic Wiki is a Wiki that has an underlying model of the knowledge described in its pages. Regular wikis have structured text (in this article: Introduction, Example, …) and untyped hyperlinks (the links in this article). Semantic Wikis allow the ability to capture or identify further information about the pages (metadata) and their relations. Usually this knowledge model is available in a formal language, so that machines can (at least partially) process it. The technologies developed by the Semantic Web community build the basis for reasoning about the knowledge model.

In particular, machines can calculate new facts (e.g. relations between pages) from the facts represented in the knowledge model.

This leads to the logical idea of a Semantic MediaWiki.

From Wikiepdia’s entry on the Semantic MediaWiki.

The Semantic MediaWiki is an extension to the MediaWiki software (which also runs Wikipedia), which allows every user to make information more accessible to machines, which in turn makes it easier for humans to search or further use this information.

Semantic MediaWiki offer two means to make information about a page more explicit:

  • Typed links and
  • Typed attributes (of a page).

As an example, the page Berlin might say, that it is the capital of Germany. Now, in the Semantic MediaWiki, users can type the link, thus making the relation (capital of) between Berlin and Germany explicit.

On the page Berlin, the syntax would be

... [[capital of::Germany]] ...

resulting in the semantic statement “Berlin” “capital of” “Germany”.

On the page Berlin, it might say the population is 3,393,933. In the Semantic MediaWiki, users can make this information explicit, by writing

... the population is [[population:=3,993,933]] ...

resulting in the semantic statement “Berlin” “has population” “3993933”.

From Marc Fawzi’s Evolving Trends blog, the Semantic Blog.

As concluded in my previous post there’s an exponetial growth in the amount of user-generated content (videos, blogs, photos, P2P content, etc).

This enormous amount of free content available today just too much for the current “dumb search” technology used to access it.

Thus, I believe that content is now a commodity and the next layer of value is all about “Intelligent Findability.”

Take my blog for example, it’s less than 60 days old, and I’ve never blogged before, but as of today it already has ~500 RSS daily subscribers (and growing), with a noticeable increase after the iPod post I made 3 days ago, 6,281 incoming links (according to MSN) and ~70,000 page views in total so far (mostly due to the Wikipedia 3.0 post.) That goes to show how one person can generate and spread so much content. I know bloggers who generate ten times the content I generate with good spreadability.

However, in this particular instance, I don’t want a Semantic Wiki. I want a Semantic Blog. So I guess I’m going to have to call in some favors and have one developed. If you’re working on a Semantic Blog application drop me a line and if I’ll be happy to promote it to the readers.

A Semantic Blog would be the next logical step in combining AI research and Semantic web. It would be an intelligent tool for bloggers. There are already some blogs that verge on this idea, such as The Undersigned, which uses a script to get information from Google when you find that blog through Google. It then generates search results directly relevant to your search on the blog, as related topics to your original search. If you have a hosted blog, you can implement this idea through some coding. More information is available on The Undersigned blog.

From Rogue Semiotics, The Semantic blog. A short essay about what it is and how it will come to be.

It seems that the semantic web movement is being played out in microcosm in the world of blogging. Blogs, like all html, can of course already carry metadata. Search engines use this metadata to help decide the relevancy of pages. Some bloglistings, like Blogwise, take this “give me a list of things you’re going to talk about” approach familiar from html metadata.

One of the problems is that the metadata is unstructured. If you want to find blogs covering film, for instance, Blogwise provides keyword listings. But they are created by users, which means you have to think of every possible synonym – in this case, movies, film, films, filmes, and so on. This problem can be solved by instructing the search engine to understand synonyms, but it never really overcomes the fact that this approach will never give us any proper information about how material on different sites interrelates.

Another problem is that, by their nature, blogs allow people to post about whatever’s on their mind. And very often this isn’t what they thought they were going to talk about. Much better than asking people to predict what they think will be written is to look at what they actually have written.

This is where some of the blogging community has been doing some really cunning work. A lot of it gets filtered through hubs like Ben Hammersley’s site. Hang around these places for long and you’ll notice that a great of the activity is centred on links in, links out, comments and those pesky, semi-functional Movable Type-specific Trackbacks.

What these things have all got in common is that they relate one site to another. If someone with a blog leaves a comment on your blog, odds are they’ll add in their URL. Now you have some useful information. One URL (or the person behind it) reads another URL. This information can be collated. Then, if you’re a clever bod like Mark Pilgrim, you can use this data to play terrific games like Recommended Reading based on what you already link to in your blog.

Now, indirectly, Amazon are affecting the semantic blog with Alexa, which “gives you access to thousands of user reviews and ratings of web sites, plus site statistics and related links”. According to Alexa: “Now, when you search, you can let the experience of other web surfers guide you on your trek through the internet”. Well, at the very least, what Alexa means is that every site has its very own Amazon-style page, even if mine lazily implies that my site visitors are all keenly purchasing The Literature of Roguery in Seventeenth- And Eighteenth-Century Russia. Who was that then?

In the end, in my opinion, we’ll end up with consistent semantic metadata for blogs, for the simple reason that more and more functionality will appear that takes advantage of it. Bloggers, as a group, are by definition active, keen on explaining themselves and their place in the world, keen on defining their relationships to others (not just other bloggers), and constantly pointing at things in which they’re interested. This makes them prime candidates for semantic functionality.

Not that the initial attempts to enforce strictly defined metadata onto blogs look to have been wildly successful. The current thinking seems to centre on semantic matching of categories, but watch this space.


0 Responses to “Semantic Wiki, MediaWiki and Blog, defined and ideas”

  1. Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: