[ckan-dev] Storing/searching/displaying XML resources
Salman.Haq at neustar.biz
Tue May 8 04:26:11 BST 2012
Also, one more question:
Why use Solr to power site search and Elastic Search as a document store
when either could fulfill both purposes.
On 5/7/12 11:10 PM, "Haq, Salman" <Salman.Haq at neustar.biz> wrote:
>On 5/7/12 7:01 PM, "Rufus Pollock" <rufus.pollock at okfn.org> wrote:
>>On 7 May 2012 17:17, Haq, Salman <Salman.Haq at neustar.biz> wrote:
>>> I have a special use case where I want to store XML resources. For each
>>> resource, I want to display a custom view that allows the user to
>>> the data in the resource. This is very similar to what
>>> does except in my use case, the resource has a specialized view.
>>> Would the community recommend that I enhance ckanext-datastorer or use
>>> a template for a new custom extension? I am leaning towards the latter.
>>I think the latter may be easier -- though medium-term we may want to
>>find a way where one can plug in specialist importers to the
>>ckanext-datastorer depending on the incoming type of data (and perhaps
>>some other info).
>Yes, I think a way for plugins (I use that term loosely) to register with
>ckanext-datastorer for specific file types would be a good way to go.
>>> Also, how does ckanext-datastorer store the parsed data? It doesn't
>>> to have any special models for storing tabular data in the main
>>> db. Does it rely primarily on ElasticSearch as the backing store? Does
>>Yes, it uses the CKAN DataStore backed by ElasticSearch rather than
>Just out of curiosity, are there any ckanext's that have their own data
>>> mean that I will have to convert my XML documents into JSON documents
>>> then store them via the data API?
>>That would be the natural approach if it were possible.
>>> Also, from the docs and source code, I still can't figure out what
>>> ckanext-archiver does and how it relates to ckanext-datastorer. They
>>> seem to share some common code.
>>Archiver archives resources: i.e. it looks for resources with remote
>>urls and stores a copy of that data into the FileStore (i.e. it
>>*archives* it). The DataStorer instead processes the data and puts it
>>in the DataStore.
>Makes sense now.
>>> To elaborate more on my use case, the XML document actually represents
>>> metadata about a database (eg: tables, columns, keys, row counts, etc).
>>> way to think of the extension is as a 'metadatastorer'. The resources
>>> be in XML format, or in the future, additional formats may be supported
>>> different types of stores (eg: NoSQL dbs, etc)
>>Understood. I note we've also been thinking quite a bit about how to
>>specify metadata for datasets. In the simplest case we use the mapping
>>metadata in ElasticSearch to store info about fields (type, format
>>etc). We're also thinking about using JSON-LD contexts more heavily
>>for this purpose (see )
>That would be good. I guess a 'resource' will become a tuple of 'metadata'
>What are your thoughts about 'Single Point Of Truth' ?
>It seems a resource could have multiple representations as a file, a json
>object in ES, as a graph in some triple store, etc. Borrowing from DVCS,
>these related but separate representations resemble branches. Do people
>have thoughts about how this would be handled in the API and the UI?
>>> ckan-dev mailing list
>>> ckan-dev at lists.okfn.org
>>Co-Founder, Open Knowledge Foundation
>>Promoting Open Knowledge in a Digital Age
>>http://www.okfn.org/ - http://blog.okfn.org/
>>ckan-dev mailing list
>>ckan-dev at lists.okfn.org
>ckan-dev mailing list
>ckan-dev at lists.okfn.org
More information about the ckan-dev