Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
SIGMathLing
website
Commits
2646770a
Commit
2646770a
authored
Jan 24, 2018
by
Michael Kohlhase
Browse files
two data sets actually
parent
5a3f2844
Pipeline
#550
passed with stage
in 24 seconds
Changes
1
Pipelines
1
Hide whitespace changes
Inline
Side-by-side
_posts/2018-01-24-dataset.md
View file @
2646770a
---
layout
:
post
title
:
First Data Set (1.1 Million scientific HTML5 documents from arXiv)
title
:
First Data Set
s
(1.1 Million scientific HTML5 documents from arXiv
and token models
)
---
SIGMathLing has published
a
first data set
, which
also act
s
as
a
template for future data
sets. The content of th
is
data set
is
licensed to
[
SIGMathLing members
](
/member/
)
for research
SIGMathLing has published
the
first data set
s. They
also act as template
s
for future data
sets. The content of th
ese
data set
s are
licensed to
[
SIGMathLing members
](
/member/
)
for research
and tool development purposes subject to the
[
SIGMathLing Non-Disclosure-Agreement
](
/nda/
)
.
This collection of 1.1 Million HTML5 documents
...
...
@@ -13,6 +13,11 @@ the [KWARC](https://kwarc.info/) research group. It was created by converting t
[
LaTeXML
](
https://github.com/brucemiller/LaTeXML
)
using the
[
CorTeX corpus management system
](
https://github.com/dginev/CorTeX
)
.
Details can be found on the
[
SIGMathLing Resource page
](
/resources/arxmliv/
)
.
The token models are generated from this document collection via the
[
LLaMaPuN
](
https://github.com/KWARC/llamapun/releases/tag/0.1
)
and
[
GloVe
](
https://github.com/stanfordnlp/GloVe/tree/765074642a6544e47849bb85d8dc2e11e44c2922
)
libraries.
Details can be found on the
[
SIGMathLing Resource page
](
/resources/
)
.
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment