Tuesday 3 July 2007

Using SemWeb with MySQL

Getting SemWeb to work with MySQL is pretty much the same as for SQLite (see the last post). Again, using the news.rdf file from Mozilla, the command line to enter data into the MySQL database now becomes:

rdfstorage news.rdf --out "mysql:rdf:database=news;username=test"

This assumes that a new database (schema) called "news" has been created in MySQL and that a user "test" exists that doesn't require a password (both these things can be done using the MySQL Administrator UI).

Initially running this command line in the installed version of the SemWeb\bin directory results in the following error being generated:

System.IO.FileNotFoundException: Could not load file or assembly 'MySql.Data, Version=1.0.8.0, Culture=neutral, PublicKeyToken=c5687fc88969c44d' or one of its dependencies. The system cannot find the file specified.


This error means that the MySQL.Data.dll is missing, so the first step is to get this from MySQL ADO.Net Data Provider 5.0.7

After placing this DLL in the bin directory and running the command line again, the following error will appear:

System.IO.FileLoadException: Could not load file or assembly 'MySql.Data, Version=1.0.8.0, Culture=neutral, PublicKeyToken=c5687fc88969c44d' or one of its dependencies. The located assembly's manifest definition does not match the assembly reference. (Exception from HRESULT: 0x80131040)


In this case the important part is the "manifest definition does not match the assembly reference" bit - as happened with the SQLite DLL, this MySQL DLL is newer than the version that was used to build the SemWeb.MySQLStore.dll and so the 2 DLLs are incompatible. We need to rebuild this SemWeb DLL.


The steps to do this are as follows:

  • Open the SemWeb solution in Visual Studio
  • Add a new project of type "Class Library" and called "SemWeb.MySQLStore"
  • Delete the default "Class1.cs" that gets added to the project
  • Choose to add an existing item to the project and pick "MySQLStore.cs" from the SemWeb\src directory
  • Add a reference to the "MySQL.Data.dll " that should already be in the SemWeb\bin directory
  • Add a reference to the SemWeb project
  • In the project properties, set the output path to the SemWeb\bin directory

These are the exact same steps that were required for the SQLiteStore DLL (although obviously referencing different files). However, building the MySQLStore DLL now results in the following compiler error:

The type or namespace name 'MySqlConnection' could not be found (are you missing a using directive or an assembly reference?)

To fix this error the following needs to be added to the top of the MySQLStore.cs file:

using MySql.Data.MySqlClient;


So the complete set of using directives at the top of the file should now appear as:


using System;
using System.Collections;
using MySql.Data.MySqlClient;
using System.Data;

The project should now rebuild successfully and the new SemWeb.MySQLStore.dll should now appear in the SemWeb\bin directory.

Unfortunately this still isn't the end of the process - running the command line again results in the following error:

MySql.Data.MySqlClient.MySqlException: Table 'news.rdf_statements' doesn't exist

The DLL is now ok, its just that the database that its creating has a few problems; inspecting the database in MySQL Administrator shows that the "rdf_literals" and "rdf_entities" tables have been created but the "rdf_statements" table hasn't been made.

Setting the SEMWEB_DEBUG_SQL environment variable (type "set SEMWEB_DEBUG_SQL=1" at the command prompt) and then running the command line again generates some debug information (alternatively put a break point on the "
CreateTable" function in the SQLStore.cs file and run the program in the debugger, after first setting up the command line arguments in the project properties). The error generated is:

System.FormatException: Input string was not in a correct format.

This is caused by the SemWeb database "Open" function (from MySQLStore.cs) querying the version of the database and then trying to set this into the .NET Version class; however, the version string returned from my version of MySQL is "5.0.41-community-nt" and this doesn't match with the expected .NET version string format, which is expecting only integers seperated by a period character ('.') - hence an error is thrown on the first call to "Open" and so the creation of the first database table ("rdf_statements") fails.

To fix this I've just put a try-catch around the setting of the version string, so basically I just ignore when the version can't be set.




private void Open()
{
if (connection != null)
return;
MySqlConnection c = new MySqlConnection(connectionString);
c.Open();
connection = c; // only set field if open was successful

using (IDataReader reader = RunReader(@"show variables like 'version'"))
{
reader.Read();
try
{
version = new Version(reader.GetString(1));
}
catch(Exception e)
{
if (Debug) Console.Error.WriteLine(e.Message);
}
}
}


Finally, rebuilding the SemWeb DLL and re-running the command line successfully adds the RDF to the database:

news.rdf 0m0s, 81 statements, 1728 st/sec
Total Time: 0m2s, 81 statements, 30 st/sec

Monday 2 July 2007

Using SemWeb with SQLite

As mentioned in the previous post I've had a few problems getting SemWeb, the C# RDF library, to persist data to a database. Here I describe the steps I've had to follow to get things working.

I've been using Visual Studio on Windows - I suspect that these problems may not exist when using Mono on Linux.

I'm using the following versions:


SQLite

Following the approach described in the SemWeb documentation, I first tried to set up a database connection to a SQLite database.

Following the SemWeb documentation I performed the following steps:

  • Placed the "sqlite3.dll" (from the SQLite download) into my windows System32 directory
  • Added the "Mono.Data.SqliteClient.dll" from the the Mono install directory "Mono-1.2.4\lib\mono\2.0" into my SemWeb\bin directory
  • Downloaded the sample RDF file from http://www.mozilla.org/news.rdf
  • Entered the specified command line at a DOS prompt in the SemWeb\bin directory:
    rdfstorage.exe news.rdf --out "sqlite:rdf:Uri=file:news.sqlite;version=3"
  • Pressed return and stood well back.

I was greeted with the following:

System.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.IO.FileLoadException: The located assembly's manifest definition with the name 'Mono.Data.SqliteClient' does not match the assembly reference.

Basically my Mono SQLite adapter was newer than the version that the SemWeb.SqliteStore.dll had been built with.

Unfortunately the SemWeb Visual Studio solution doesn't come with a SemWeb.SqliteStore project, so the following steps are required:

  • Open the SemWeb solution in Visual Studio
  • Add a new project of type "Class Library" and called "SemWeb.SqliteStore"
  • Delete the default "Class1.cs" that gets added to the project
  • Choose to add an existing item to the project and pick "SQLiteStore.cs" from the SemWeb\src directory
  • Add a reference to the "Mono.Data.SqliteClient.dll" that should already be in the SemWeb\bin directory
  • Add a reference to the SemWeb project
  • In the project properties, set the output path to the SemWeb\bin directory

I also rebuilt the SemWeb and RDFStorage projects (after setting their output directories to the bin directory). Once built I re-ran the command line and this time obtained the expected result:


news.rdf 0m0s, 81 statements, 1036 st/sec
Total Time: 0m1s, 81 statements, 42 st/sec



Next time I'll describe the steps I've had to follow to get SemWeb up and running with MySQL.

Topic Maps, OWL and RDF

It's been quite a while since my last post. This has mostly been due to spending a large amount of time playing with some Topic Map (the Ontopia Omnigator) and OWL (Protege) software. Both of these are excellent applications and I was especially impressed by the ability of OWL to form advanced object relationships.

However, at the end of all this research, I think that neither Topic Maps nor OWL are suitable for my purposes; yes, both can perform complex classifications and describe advanced relationships between classes but, for my purposes of creating a heterogeneous database, I only require simple classifications and relationships. Therefore using either of these technologies would probably be like taking the proverbial hammer to a nut. For me, a stronger argument against using these technologies is that, at this moment in time, neither is freely available for C# - the language that I'd like to work with.

So my requirements are a technology that can describe simple classifications and relationships and that is freely available in C#. The one application that seems to match these requirements is Joshua Tauberer's SemWeb - an RDF library for C#. The RDF and RDFS support provided by this package can perform categorisation and describe the relationships between objects to a level suitable for my requirements. An added bonus is that this library is still under active development - version 1.0 was only released on the 10th June 2007 (unlike the other C# RDF library Drive - the web page of which had actually expired the last time I looked).

This doesn't mean to say that SemWeb is perfect - like most things I've encountered on this journey, using SemWeb is not as straightforward as it could be. The main reason for this is that SemWeb has been developed on Mono, as opposed to Microsoft's Visual Studio that I want to use. As a result of this the supplied Visual Studio solutions are incomplete (I needed to refer to the makefiles to create the required projects) and the package currently doesn't support SQL Server, the Express version of which I'd been using for some other areas of this project (such as user administration). I've therefore now switched databases to use MySQL and am now struggling to add and retrieve some RDF data from this. I'll give more details of this struggle in the next post.