in which he developed a framework for the detection of quantity expressions in STEM documents.
### Accessibility and License
The content of this Dataset is licensed to [SIGMathLing members](/member/) for research
and tool development purposes.
Access is restricted to [SIGMathLing members](/member/) under the
[SIGMathLing Non-Disclosure-Agreement](/nda/) as for most [arXiv](http://arxiv.org)
articles, the right of distribution was only given (or assumed) to arXiv itself.
### Contents
*`Annotations.zip`: All quantity expressions detected by the spotter in a format suitable for the [Kwarc Annotation Tool (KAT)](https://github.com/kwarc/kat).
*`Documents.zip`: The documents in which quantity expressions were searched. These are modified arXMLiv documents in which each word is wrapped by a `<span>`. This was required by KAT to annotate words.
*`Harvest.zip`: Data for math web search.
*`screen-reader-documents.zip`: The documents prepared in a way that enables screen readers to read out units ("two kilometers" instead of "two k m" for "2km").
### Remarks on Annotation Format
The annotations are stored as RDF in a way suitable for the [Kwarc Annotation Tool (KAT)](https://github.com/kwarc/kat).
For more information on KAT and the KAT format consider reading [this](https://gl.kwarc.info/KAT/papers/blob/master/cicm14/paper.pdf)
and [this](https://gl.kwarc.info/KAT/papers/blob/master/cicm16/paper.pdf) paper.
From [this repository](https://gl.kwarc.info/SIGMathLing/quantity-expressions)(only for [SIGMathLing members](/member/)).
### Evaluation
According to the thesis, a manual validation of 50 randomly selected documents containing in total 646 quantity expressions yielded the following values: