diff --git a/resources/grounding-dataset.md b/resources/grounding-dataset.md index 4daaa569ab0423f3af862af0044233072b82243f..aa80e244b2b31ef6d1a378b273d09039721f850b 100644 --- a/resources/grounding-dataset.md +++ b/resources/grounding-dataset.md @@ -5,8 +5,8 @@ title: Dataset for Grounding of Formulae ### Basic Information -* Author: Takuto Asakura, AndreÌ Greiner-Petter, Akiko Aizawa, and Yusuke Miyao -* Updated: 2021-04-01 +* Author: Takuto Asakura, Yusuke Miyao, and Akiko Aizawa +* Updated: 2022-01-20 ### Accessibility and License @@ -20,19 +20,12 @@ itself. ### Description -This is the project to create a dataset for grounding of formulae. - -As a trial work, this dataset consists of an annotated long paper (20 pages in -PDF): - -* Simeone, O.: A Very Brief Introduction to Machine Learning with Applications -to Communication Systems. IEEE Transactions on Cognitive Communications and -Networking 4(4) (2018) - -The original XHTML file of the paper was taken from the [arXMLiv:08.2018 -dataset](/resources/arxmliv-dataset-082018/), and we manually annotated all -937 identifiers (i.e., `<mi>` tags) in the document to the corresponding -mathematical objects (meanings). +This dataset is a ground truth of formula grounding annotation data for 15 +scientific papers. More specifically, a total of 12,352 math identifiers were +annotated with their referring mathematical concepts, explicitly indicating +coreference relations within each article. A total of 938 text spans, called +grounding sources, that were used as the basis for human grounding were +labeled. The annotation is performed with our open-source annotation tool [MioGatto](https://github.com/wtsnjp/MioGatto). The tool is also suitable for @@ -40,5 +33,5 @@ viewing the data. Please refer to its documentation for the details. ### Download -[Download link](https://gl.kwarc.info/SIGMathLing/grounding-dataset-v1) +[Download link](https://gl.kwarc.info/SIGMathLing/grounding-dataset) ([SIGMathLing members](/member/) only)