[ckan-discuss] Multiple package schemas
richard at cyganiak.de
Thu Oct 7 16:57:22 BST 2010
On 7 Oct 2010, at 09:23, David Read wrote:
> Richard's guidance doesn't contradict any of our core field guidance,
> apart from in these cases:
> * he gives more specific instructions for a couple of the resource
> fields the format field has suggested values like
> "application/rdf+xml" which is in fact two pieces of data - the
> purpose of the download (e.g. the application, an example, meta-info,
> download_page) and the format itself. These would be better in
> separate columns.
Rufus has stated at some point that the content of the format field
should be an Internet Media Type , and he encouraged the use of
made-up “pseudo-types” like “api/search”. So I blame the idea on him ;-)
I agree that having a “format” field (with media type as value where
possible) and a separate “purpose” or “type” field with values such as
“Download”, “Example”, “Schema”, “Documentation”, “API” would be good.
> * he suggests adding a number of tags according to the properties of
> the package. I think these would be better stored as extra fields,
Again, things like the “format-rdf” tag were already widely used on
CKAN before we started, so again I don't accept the blame ;-)
> I think he (and others) have chosen tags over extra fields, because
> are easier to browse/search on CKAN.
That's not the main reason. I think the main reasons for choosing tags
over custom fields are:
1. Tags are more “lightweight”. Coining a new custom field can be a
bit scary, because it feels like we might perhaps be “polluting” the
space of field names. Tags are free-form, so there is less concern
about coining new ones.
2. There is no way (as far as I can see) to check if a given custom
field name has already been used elsewhere, so if I use a “format” or
“topic” custom field I don't know if I'm stepping on someone else's toe
3. Working with custom fields is quite awkward because of the three-
fields limitation in the form.
4. Custom fields are single-value, so you can't say "format"="dc"
I'm not sure what this implies for the design of CKAN, just sharing
> If we resolved these two points, I think the LOD use case would
> suggest a schema that just describes extra fields.
Not quite. I think that some things can't be solved with just extra
1. Pre-defined values for the format field of resources. This is very
important. This field is the basis for any kind of automated access to
the data package; free-form text just doesn't cut it. Some of the
formats that are commonly used in the LOD realm are virtually unknown
elsewhere, so the values would have to be per-schema I think.
2. Positioning of custom fields. The most obviously missing field is
“author homepage”. You wouldn't believe how many LOD packages have a
homepage URL stuck behind the author name, or in the email field.
Having an “author homepage” custom field half a screen down from the
“author name/email” fields doesn't feel like it would solve this; the
custom field would have to be located close to the name/email fields.
These are the biggies I think. Everything else could perhaps be done
via extra fields.
> On 6 October 2010 22:53, Tim McNamara <paperless at timmcnamara.co.nz>
>> On 7 October 2010 06:58, Richard Cyganiak <richard at cyganiak.de>
>>> On 6 Oct 2010, at 18:17, David Read wrote:
>>>> Excellent point. Yes, maybe we want a 'schema' to merely define
>>>> specific 'extra' fields, with their validation and later their
>>>> display. Then you could have a package having several 'schemas'
>>>> simply. The core package fields then wouldn't be affect by any of
>>> But 'schemas' still might want to modify the behaviour of some of
>>> the core
>>> - add a note underneath the field
>>> - provide a selection of choices for the resource format field
>>> - provide a number of checkboxes to add specific tags with special
>>> - ...
>> Would this level of flexibility be desirable? It may it things very
>> difficult to build applications on the basis of CKAN's packages if
>> they have
>> different structures. I prefer the idea of a common set of
>> information that
>> is fixed with possible extensions. I think there should be a strong
>> community push to keep to the common set unless there are
>> compelling reasons
>> (necessity) to add an extension.
More information about the ckan-discuss