[annotator-dev] Annotating a pdf rendered using pdfjs with annotator

Kristof Csillag csillag at hypothes.is
Tue Aug 12 06:05:51 UTC 2014

Dear María,

We, at the Hypothes.is project, have done something very similar. We can
now annotate PDF documents rendered by PDF.js,
both in FF and in Chrome.

Since Hypothes.is is built on top of Annotator, this involved modifying
Annotator to support PDF.js.
(Which is not trivial, because of several reasons; one is that the
current default XPath-Range-based anchoring
won't be stable on the output of PDF.js, because the exact HTML
structure which the PDF is rendered into might
change between releases, the other reason is that not the whole document
is rendered when loading the PDF,
only the first few pages. The rest is only rendered as the reader
progresses. Which means that annotations can't be
anchored to the HTML text, because it's not available initially.

The solution was to anchor the annotations directly to the text content
of the PDF, not to HTML version.
To do this, we need a different kind of anchor specification. We have
two of these:

TextPositionSelector, which stores the start and end position, in
character offset, counted from the beginning of the document,
and TextQuoteSelector, which stores the quote, and the prefix and the
suffix, which is a 32-char long segment right before
and right after the quote, so that we can find it more easily.

We are storing these selectors with the annotations. When anchoring,
first we extract the text from the PDF document
(using PDF.js's native APIs), and then look at that text, to find where
the annotations should go. (We are using fuzzy search,
to accomodate small changes, resulting from formatting changes, etc.)

When we have these anchors, we have to wait until the wanted HTML pages
are actually rendered, before actually
creating the highlight for them.

Our solution works; the only problem is that we are currently stuck with
a fork of Annotator.
We really want to contribute everything back, but that haven't happened yet.

Hopefully it will, soon after Annotator 2.0 is released.
Meanwhile, you can take a look at our fork
<https://github.com/hypothesis/annotator/tree/typed-packaging>, but
*we really really really don't encourage using, on building on that*,
because it is going to be slaughtered, hopefully soon, when we migrate
to Annotator 2.0.

Also, our fork is really old, and uses a slightly different data model,
and has a slightly different feature set than current upstream Annotator.

Again, we are working to sort this out, but it has not happened yet.

Or you can check out Hypothes.is, which is going to stay here, even as
we change Annotator versions.

Using FF, deploying the bookmarklet from here
<https://hypothes.is/alpha> should work on any HTML or PDF document, and
so you should be able to annotate PDF documents.

(The current version of the Chrome extension has PDF.js disabled,
because it would always redirect PDF files from Chrome's native PDF
engine to PDF.js, even when annotations are not needed - which is not
what all users want.)

Please let me know if that helps.


On 2014-08-12 01:40, María Teresa Chávez wrote:
> Hello, 
> I am wondering if it is possible to do the following using annotator,
> and if you could give me advice on where to start from, as I am new to
> annotator and js. 
> Embed the PDF using pdf.js (done). Overlay some annotations. The
> annotations follow a simple JSON format with three fields:
> |{
>   "anchor": "some text to anchor to",
>   "text": "the annotation text",
>   "type": "flag"
> }
> |
> Anchor is the text used to position the annotation, text is some text
> to show, and type is the kind of annotation. Display the annotations
> on top of the pdf. There should be an "Add Annotation" button, which
> lets users add annotations to selected text. 
> _______________________________________________
> annotator-dev mailing list
> annotator-dev at lists.okfn.org
> https://lists.okfn.org/mailman/listinfo/annotator-dev
> Unsubscribe: https://lists.okfn.org/mailman/options/annotator-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.okfn.org/pipermail/annotator-dev/attachments/20140812/fe76e053/attachment-0002.html>

More information about the annotator-dev mailing list