From 7212be242a9d7d81a764d918e65fd4b37cc81d9b Mon Sep 17 00:00:00 2001 From: Takuto ASAKURA <wtsnjp@gmail.com> Date: Wed, 18 Mar 2020 00:45:33 +0900 Subject: [PATCH] add grounding-dataset-v1 --- resources/grounding-dataset-v1.md | 40 +++++++++++++++++++++++++++++++ resources/index.md | 1 + 2 files changed, 41 insertions(+) create mode 100644 resources/grounding-dataset-v1.md diff --git a/resources/grounding-dataset-v1.md b/resources/grounding-dataset-v1.md new file mode 100644 index 0000000..91abf81 --- /dev/null +++ b/resources/grounding-dataset-v1.md @@ -0,0 +1,40 @@ +--- +layout: page +title: Dataset for Grounding of Formulae, Version 1 +--- + +### Basic Information + +* Author: Takuto Asakura, AndreĢ Greiner-Petter, Akiko Aizawa, and Yusuke Miyao +* Release date: 2020-03-18 + +### Accessibility and License + +The content of this dataset is licensed to [SIGMathLing members](/member/) for +research and tool development purposes. + +Access is restricted to [SIGMathLing members](/member/) under the [SIGMathLing +Non-Disclosure-Agreement](/nda/) as for most [arXiv](http://arxiv.org) +articles, the right of distribution was only given (or assumed) to arXiv +itself. + +### Description + +This is the first public release of the dataset for grounding of formulae. + +As a trial work, this dataset consists of an annotated long paper (20 pages in +PDF): + +* Simeone, O.: A Very Brief Introduction to Machine Learning with Applications +to Communication Systems. IEEE Transactions on Cognitive Communications and +Networking 4(4) (2018) + +The original XHTML file of the paper was taken from the [arXMLiv:08.2018 +dataset](/resources/arxmliv-dataset-082018/), and we manually annotated all +937 identifiers (i.e., `<mi>` tags) in the document to the corresponding +mathematical objects (meanings). + +### Download + +[Download link](https://gl.kwarc.info/SIGMathLing/dataset-grounding-v1) +([SIGMathLing members](/member/) only) diff --git a/resources/index.md b/resources/index.md index 7b6baf9..eac938f 100644 --- a/resources/index.md +++ b/resources/index.md @@ -11,6 +11,7 @@ title: SIGMathLing - Datasets and Resources 1. [quantity expressions](/resources/quantity-expressions) 1. [arXMLiv word embeddings, 08.2017 release](/resources/arxmliv-embeddings-082017) 1. [arXMLiv corpus, 08.2017 release](/resources/arxmliv-dataset-082017/) + 1. [Dataset for Grounding of Formulae, v1](/resources/grounding-dataset-v1) ## Resources hosted externally 1. [ACL-math-annotation](http://www-al.nii.ac.jp/acl-math-annotation/) -- GitLab