[open-science] How should we publish survey/tabular data? A Panton Paper?
Maloney, Christopher (NIH/NLM/NCBI) [C]
maloneyc at ncbi.nlm.nih.gov
Mon Jul 18 15:39:29 BST 2011
[Sorry, I had meant to send this to the list, but just realized that I only replied to Peter directly. So, resending ...]
I just put a comment on your blog post as well. The article is downloadable as XML, and is in the NLM Journal Article Tag Suite (JATS) format: http://jats.nlm.nih.gov/. On our PMC version of the article, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3126798/, you can see that in the floating table view, you can cut-and-paste the table text: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3126798/table/pone-0021101-t006/.
If you open up the XML file, you will see that the table data is coded as XHTML tables. It's better than images, but still not an ideal format for transmitting data -- all semantic content is lost. A better format for the data in table 6, in particular, would be some XML format that captures the semantics of survey question-and-answer, such as, for example, the dbGaP DTD described in this article: http://conferences.idealliance.org/extreme/html/2007/Tryka01/EML2007Tryka01.html. There are probably other formats that would be even better.
IMO, it would be nice if technical article data could be captured in useful semantic forms, but it is not easy. One huge stumbling block (again, my opinion) is that JATS is a pretty rigid XML format, and doesn't have an easy extension mechanism. That actually just happens to be the topic of a talk that I'm about to give in a couple of weeks at Balisage, the paper for which is here: http://jatspan.sourceforge.net/Balisage2011Paper/Bal2011malo0713.html. But, even if that problem were solved, there would still be huge hurdles to overcome to somehow get the authors/curators of the data to get it into these forms.
Building 45, 5AN.24D-22
From: Peter Murray-Rust [mailto:pm286 at cam.ac.uk]
Sent: Saturday, July 16, 2011 5:32 PM
Subject: [open-science] How should we publish survey/tabular data? A Panton Paper?
I have just written a blog post critical of a paper in PLoSONE
you'll understand when you've read it (only takes half a minute). It's presenting survey data as TIFFs (argh!)
It makes me think we should think about a Panton Paper on how to publish SIMPLE data. Some simple points for authors, editors and reviewers to be aware of. Nothing comprehensive. But since there are so many publications of tables, can we at least get these fairly standardized.
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the open-science