[ckan-discuss] CKAN is slowwwwww
cgueret at few.vu.nl
Mon Oct 4 12:56:18 BST 2010
On 10/02/2010 05:11 PM, Rufus Pollock wrote:
> Dear Christophe,
> To follow up David's earlier comments:
> * It will probably be *much* more efficient to use the dedicated
> resource search api:
> This query returned in 390ms :) and immediately tells you there are
> 169 resources with format 'api/sparql' (note some of these may be the
> same url since a resource is associated to a specific package). The
> following query:
> Gives you the full list of resources with package ids and using those
> you can retrieve each package for further analysis.
That's indeed better. Next time, I will have a closer look at the API
before implementing some naive approach :-P
> * API slowness is something we will be looking into (in particular
> better cache configuration). That said, you are iterating through
> every item in the repository :) With more than 1500 packages at 1s a
> package you are looking at around 30m, at 2s a package 1h at 4s a
> dataset 2h ... (I note that, on what may be a slow wifi connection,
> loading google front page or flickr takes between 1-3s). For this kind
> of bulk analysis it may be worth reinstating our daily json dumps of
> the entire db.
Right, but 4s a package is still a bit low. The daily dump and some
optimisation of the API speed would be nice.
> 2010/9/30 David Read<david.read at okfn.org>:
>> Yes it shouldn't be this slow doing 1500 queries. We've suffered
>> performance problems in the past 24 hours and this is probably
>> related. Having said that, I've opened a ticket to take a proper look
>> at this:
>> This particular problem sounds ilke a job for the 'resource search'
>> feature, which achieves what you want in one query, taking under a
>> and you could add&all_fields=1 to get all the package properties to process.
>> I'm afraid this is a new feature so has been put into the ckanclient
>> yet, but should not be too hard to add in, as package search is almost
>> identical. Do write back to the list to let us know how you get on and
>> if you want any more help.
>> 2010/9/30 Christophe Guéret<cgueret at few.vu.nl>:
>>> I've made a small script (attached to this mail) using the python CKAN API
>>> to browse the content of CKAN in search for SPARQL end points.
>>> Everything works fine apart from the fact that this script takes at least 2h
>>> to run! I was hoping that it would take no more than a few seconds, or maybe
>>> a minute or so. But not hours ;-)
>>> Is it normal that CKAN is so slow to browse?
>>> Dr. Christophe Guéret (cgueret at few.vu.nl)
>>> Postdoc working on SOKS (http://www.few.vu.nl/soks)
>>> Knowledge Representation& Reasoning Group
>>> Computational Intelligence Group
>>> Department of Computer Science, AI
>>> VU University Amsterdam
>>> ckan-discuss mailing list
>>> ckan-discuss at lists.okfn.org
>> ckan-discuss mailing list
>> ckan-discuss at lists.okfn.org
Dr. Christophe Guéret (cgueret at few.vu.nl)
Postdoc working on SOKS (http://www.few.vu.nl/soks)
Knowledge Representation& Reasoning Group
Computational Intelligence Group
Department of Computer Science, AI
VU University Amsterdam
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 430 bytes
Desc: not available
More information about the ckan-discuss