dimanche 27 juillet 2008

blogo ergo sum

After being kicked again and again (which hurts, the guy is a black belt in karate) by Dodji who wanted me to blog, here is a first post.

Album Artist support

These last days, I've been trying to get back to Rhythmbox development to scratch a few itches of my own.
First, I've looked at how Rhythmbox could handle compilations, ie albums containing tracks by different artists. Currently, if the album has 12 different artists, these 12 artists will appear separately in the artist list which can quickly create a big mess. I wrote a basic patch to make it possible to set an "album artist" for such albums and to use that instead of the multiple different artists in the artist list. I had to experiment a bit with various approaches, but in the end, the patch is surprisingly small.


Song UIDs

Then, I wanted Rhythmbox to be able to provide UIDs for the songs in its database. What I call an UID is some kind of identifier that is unique to a song and that can be generated by only looking at the song data. This can be useful for various things : iPod (or whatever your portable media player of choice is) synchronization, associating user data (rating, play count, ...) to a song which persists even if the user does a mv of the song from the shell, ... I learnt after doing that work that Charlotte had been looking for such a feature in Rhythmbox for her nice Rhythmbox SOC which was good news :)

To generate that UID, I chose to hash the song title, artist, album (read from the tags of the song) with the first 8kB of data of the file (actually, this hashing scheme was heavily inspired by what Amarok does). I'm not sure yet if this is the best way to uniquely identify a song, but we'll only know after people try to use it. Before you ask, I thought about using musicbrainz/musicDNS acoustic fingerprints but as far as I know, none of those fingerprints can be generated using free software end to end, there's always some closed source webservice that must be queried to get a fingerprint from a few parameters that were generated by analyzing the song audio data.

Once again, this feature was straightforward to implement.

The main issue I had was to debug the UID generation. Indeed, metadata reading (where I chose to add the UID generation) is done by a separate process which communicates with Rhythmbox through dbus. Reading metadata is basically equivalent to feeding random data to the tag reading library, so it's really hard to guarantee the library won't crash or hang in some corner cases. Using that external process allows Rhythmbox not to crash or hang if such an event should occur during metadata reading.

But this external process also makes debugging harder: it's short lived, spawned on-demand and run in the background (ie it's not possible to print stuff to stderr or stdout). So moch's help was really welcome since he explained me how to be able to run that metadata helper process by hand and to tell rhythmbox to use it. It's really simple, all you have to do is to (optionally) increase ATTENTION_SPAN in metadata/rb-metadata-dbus-service.c so that the helper stays alive longer (by default, it dies after 30 seconds of inactivity).
Then, you can run rhythmbox-metadata in nemiver (or in your favourite debugger), this will output a line like :

unix:abstract=/tmp/dbus-vXSVpHsnpL,guid=ba4e19b37904dba3bb1fc2214889d478

If you now set the RB_DBUS_METADATA_ADDRESS environment variable to that value before running Rhythmbox, then Rhythmbox will use the metadata helper you just launched in your debugger. Now all that is left to do is debugging!

The result of this work can be found in the uid branch of my Rhythmbox git tree. It still needs some polishing, but the basics should already be working (including automatically updating your database to add UIDs when you first run it).

4 commentaires:

Jonathan a dit…

Actually, you don't need to change ATTENTION_SPAN - the '--external' argument to rhythmbox-metadata disables the timeout. I was just about to add something like that to the code and I found it was already in there.

Anonyme a dit…

If you're looking for a unique ID then why not use Musicbrainz and fall back to your own implementation only if those are not found?

They have unique ID's for song tracks (e.g. http://musicbrainz.org/track/8717d821-88ef-4873-bd91-32a0b72ba415.html shows that 8717d821-88ef-4873-bd91-32a0b72ba415 is the MBID for Helter Skelter, on the White Album by the beatles).

These are entirely separate from the audio fingerprinting which is only used (optionally) to connect an actual audio file to it MBID.

For random files acquired off the internet you can do lookup based on track id3 data rather than audio fingerprinting but that's probably a good time to just give up and fall back to your own scheme. If user's tag their files via Musicbrainz then it'll seamlessly upgrade the experience.

It's easier if the user puts in and rips a physical CD as you can get and store all the track's MBID's via the CD-TOC lookup you'd be doing anyway.

Finally it's a standard that people might already be using and would perhaps let you swap playlists without having to exchange audio files (assuming both ends had the correct files)

Christophe a dit…

Rhythmbox already stores the track musicbrainz ID if it could be found in the file tags or if it ripped the file from a CD.

Choosing to use the MBID and to fall back to my homemade track ID would be up to the code who needs an UID for a track (synchronization to/from a media player anyone ? ;)

As a side note, a plugin generating track fingerprints and getting tags automatically from musicbrainz might be quite easy to do and would rock if someone is reading that and is looking for some cool hack to do ;)

Pascal a dit…

Je viens de voir que le fingerprinting maintenant utilisé par musicbrainz est libre : http://code.google.com/p/musicip-libofa/