grounding-dataset.md 1.48 KB
Newer Older
Takuto ASAKURA's avatar
Takuto ASAKURA committed
1
2
---
layout: page
3
title: Dataset for Grounding of Formulae
Takuto ASAKURA's avatar
Takuto ASAKURA committed
4
5
6
7
8
---

### Basic Information

* Author: Takuto Asakura, André Greiner-Petter, Akiko Aizawa, and Yusuke Miyao
9
* Updated: 2021-04-01
Takuto ASAKURA's avatar
Takuto ASAKURA committed
10
11
12
13
14
15
16
17
18
19
20
21
22

### Accessibility and License

The content of this dataset is licensed to [SIGMathLing members](/member/) for
research and tool development purposes.

Access is restricted to  [SIGMathLing members](/member/) under the [SIGMathLing
Non-Disclosure-Agreement](/nda/) as for most [arXiv](http://arxiv.org)
articles, the right of distribution was only given (or assumed) to arXiv
itself.

### Description

23
This is the project to create a dataset for grounding of formulae.
Takuto ASAKURA's avatar
Takuto ASAKURA committed
24
25
26
27
28
29
30
31
32
33
34
35
36

As a trial work, this dataset consists of an annotated long paper (20 pages in
PDF):

* Simeone, O.: A Very Brief Introduction to Machine Learning with Applications
to Communication Systems. IEEE Transactions on Cognitive Communications and
Networking 4(4) (2018)

The original XHTML file of the paper was taken from the [arXMLiv:08.2018
dataset](/resources/arxmliv-dataset-082018/), and we manually annotated all
937 identifiers (i.e., `<mi>` tags) in the document to the corresponding
mathematical objects (meanings).

37
38
39
40
The annotation is performed with our open-source annotation tool
[MioGatto](https://github.com/wtsnjp/MioGatto). The tool is also suitable for
viewing the data. Please refer to its documentation for the details.

Takuto ASAKURA's avatar
Takuto ASAKURA committed
41
42
### Download

43
[Download link](https://gl.kwarc.info/SIGMathLing/grounding-dataset-v1)
Takuto ASAKURA's avatar
Takuto ASAKURA committed
44
([SIGMathLing members](/member/) only)