[okfn-discuss] Fw: [Geodata] discoverability and the wiki
Aaron Straup Cope
asc at spum.org
Tue Oct 9 16:26:47 BST 2007
Me and my big mouth...
I have checked out a copy of trunk, for CKAN. I will poke at it when I
can during the month, though I doubt I will be able to give it the
machine tag love before November.
Rufus Pollock wrote:
> Aaron Straup Cope wrote:
>> One of the principal motivations behind using SMW for the wine site
>> (grape.spum.org) was laziness.
>> At the time (I was washing the dishes) I briefly considered it as an
>> opportunity to play with Rails and/or Django and then quickly decided
>> I couldn't be bothered with setting up databases and managing
>> dependencies; both of which quickly descend in to the tedium of
>> managing relationships and input validation. Or :
> Well said -- though the danger with such mods of wikis (and I speak with
> a little experience of messing around with MoinMoin -- and MW to a much
> lesser extent -- when thinking about e.g. CKAN) is that eventually you
> are using them as a we web-app development toolkit which is *not* what
> they were really designed for. However the point is taken that one wants
> to get moving quickly.
>> The decision was also influenced by an ongoing struggle about how to
>> bridge the gap (read : chasm) between people and machines for
>> "storing" recipes; a problem that is fantastically harder than it
>> seems. Or :
> Yes! There is a fundamental trade-off as you put it:
> "walking the line between making it easy enough for people to bother
> putting data in to a system and still useful enough to make it worth the
> trouble of getting it out."
>> When I finally started poking around how to do stuff in MW the one
>> thing that stunned me was, in fact, how complicated many of the
>> articles were.
>> Like anything else, it had developed its own language of
>> specialization in the same way that people have adapted their practice
>> (and expectation) for things like tagging in delicious. Or any wiki,
>> for that matter. Or :
> To repeat my point earlier in more lapidary form:
> "When you starting using a swiss army knife to build a house both the
> house and the swiss army knife suffer"
>> Whether or not a registry of geodata lends itself to that kind of
>> practice remains, of course, an open question.
>> At this point, it is probably also worth pointing out that I am
>> intimately involved with the "machine tags" work at Flickr so if my
>> biases aren't already clear let there be no doubt :-)
>> The thing about machine tags is that they are RDF by any measure. The
>> key difference being : You don't worry about namespaces unless you
>> want to. In a controlled environment, like the SMW, though you can
>> simply set up a registry of known prefixes and let the computrons sort
>> it out.
> Sure. But I am not sure that 'namespace' issues are the big one.
> Ultimately mapping from a well defined domain object in code to RDF or
> to anything else (json/xml ...) isn't that hard. What is usually hard
> (or perhaps time-consuming) is getting a good domain model and having
> the good user interface (including getting good performance -- e.g.
> because of the versioned nature of the domain model in CKAN loading
> certain pages (fortunately not that important ones at present) take a
> while -- I've also noticed that e.g. del.icio.us has started to get
> quite unresponsive. These sort of things mean people 'just leave').
>> So, perhaps one approach would be to simply update the CKAN to (I am
>> happy to submit patches once I've looked at the code and my mother
>> isn't visiting... ;-) store machine tags to allow for chunks of
>> arbitrary domain-specific metadata, per Rufus' comment.
> This I think *is* indeed a neat way to go.
>> This is, in fact, really easy until you get to the search part. Or :
>> And there's the rub. The search part -- not only finding, but finding
>> relevant answers -- is always going to be the hard part because
>> implicit in the "problem statement" (or solution) is that someone has
>> managed to write the Do What I Mean engine.
> But that leads back to the fundamental trade-off:
> "More structure means harder for people to enter (so less of it) but
> easier to find stuff and join it together in interesting ways"
> "Less structure (dare I mention 'horse=yes'!) means easier to enter data
> but harder to find and join it together"
> Depending on where your constraints are you go one way or the other
> (e.g. if you have a bunch of librarians who will religiously use all the
> metadata fields then go for structure but if you are hoping people will
> just drop in off the 'net and do it you better make it damn easy to get
> stuff in there.
>> The RDF weirdos like to believe that TBL's magic layer cake of
>> trust+proof+logic is the answer, which is madness. The Google people
>> like to believe that their special "We're smarter than you" sauce is
>> the answer, which is hubris. Social networking sites like to believe
>> that your contacts have the answer, which is wishful thinking.
>> Meanwhile the CPAN is probably the only tool that has ever managed to
>> gracefully ("gracefully") dance around the problem; although often at
>> the expense of needing to install half the Internet just to add
>> support for plain text sprockets...
>> Which is a very long way of saying : I don't think that there's really
>> a need to worry about "random" yet.
>> It will be messy, for sure, but I tend to think it is more important
>> to let people add data quickly and easily than it is to try to imagine
>> how it will sort itself out in the end.
> That's my feeling. though the kicker here is that one might want some
> structure in order to have nice interfaces that let people add stuff
> more easily. e.g. you might want to only show the geodata related stuff
> on geodata package pages rather than the other 3000 tags people have
> used for other types of material but maybe even this is too much!
>> The sorting out is important but that's always going to be subject to
>> both the magic (computers) and conventions (humans) of the day.
>> By which I mean to say : horse=yes!
> By which you mean for this kind of stuff people do enter data are the
> constraining factor and we can work on getting info back later. I
> basically agree and that is to some extent why CKAN is the way it is (no
> text in RDF in a wiki stuff which you so poetically described as
> stabbing yourself in the eyeballs ...).
More information about the okfn-discuss