Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
SIGMathLing
website
Commits
d5d99003
Commit
d5d99003
authored
Jul 22, 2021
by
Luis
Browse files
first draft of argot resource page
parent
b90a6790
Pipeline
#3585
passed with stage
in 1 minute and 13 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
resources/argot-dataset-2020.md
0 → 100644
View file @
d5d99003
---
layout
:
page
title
:
ArGoT 2021 - arXiv Glossary of Terms
---
### Release
-
This page documents: ArGoT 2021 (latest)
### Contents
-
5,023 compressed XML files using the arXiv's naming convention.
-
881,301 articles.
-
800 ZIP archives, in arXiv's Year-Month
`yymm`
naming scheme.
-
The XML sources total
`500 MB`
packaged, and
`2.1 TB`
unpacked.
### Download
-
[
Download link
](
https://gl.kwarc.info/SIGMathLing/dataset-argot-2021
)
-
[
SIGMathLing members
](
/member/
)
only. Joining is free and mostly a legal checkmark on our end - all researchers welcome!
### Description
This is the first public release of the ArGoT dataset generated by the
[
Formal Abstracts
](
https://formalabstracts.github.io/
)
research group.
ArGoT is a dataset of term-definition pairs automatically extracted from the arXiv mathematical papers.
It is comprised of XML files with the following tags and attributes:
-
article: arXiv article entry
-
name: link to the article in the arXiv
-
num: number of paragraphs in the article
-
definition: a paragraph labeled as a definition by the ML classifier
-
index: paragraph number inside the article
-
dfndum: the term (definiendum) found in the statement of the definition.
Two independently extracted versions of the dataset are provided:
-
NN: Neural network approach using a combination of LSTM for classification and LSTM-CRF for sequence tagging and
-
SGD: Stochastic Gradient Descent for classification and ChunkParser for named entity recognition.
### Citing this Resource
#### pure bibTeX
```
@MISC{SML:argot:2021,
author = {Luis Berlioz},
title = {ArGoT:2021 dataset, arXiv Glossary of Terms},
howpublished = {hosted at \url{https://sigmathling.kwarc.info/resources/argot-dataset-2021/}},
note = {SIGMathLing -- Special Interest Group on Math Linguistics},
year = {2021}
```
#### bibTeX for the bibLaTeX package (preferred)
```
@online{SML:argot:2021,
author = {Luis Berlioz},
title = {argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv},
url = {https://sigmathling.kwarc.info/resources/argot-dataset-2021/},
note = {SIGMathLing -- Special Interest Group on Math Linguistics},
year = {2021}
```
#### EndNote
```
%0 Generic
%T argot:2021 dataset, an automatically extracted glossary of mathematical terms from the arXiv
%A Berlioz, Luis
%D 2021
%I hosted at https://sigmathling.kwarc.info/resources/argot-dataset-2021/
%F SML:argot:2021b
%O SIGMathLing – Special Interest Group on Math Linguistics
```
### Accessibility and License
The content of this Dataset is licensed to
[
SIGMathLing members
](
/member/
)
for research
and tool development purposes.
Access is restricted to
[
SIGMathLing members
](
/member/
)
under the
[
SIGMathLing Non-Disclosure-Agreement
](
/nda/
)
as for most
[
arXiv
](
http://arxiv.org
)
articles, the right of distribution was only given (or assumed) to arXiv itself.
### Generated via
-
[
LaTeXML 0.8.5
](
https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.5
)
,
-
[
latexml-plugin-argot 1.1
](
docker-singularity
classifier)
### About
Part of the
[
Formal Abstracts
](
https://formalabstracts.github.io/
)
research group. Author: Luis Berlioz
### Appendix
**MathML formula example:**
```
xml
<article
name=
"1407_005/1407.2218/1407.2218.xml"
num=
"89"
>
<definition
index=
"51"
>
<stmnt>
Assume _inline_math_. We define the following space-time
norm if _inline_math_ is a time interval _display_math_
</stmnt>
<dfndum>
space-time norm
</dfndum>
</definition>
</article>
```
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment