Skip to content
Snippets Groups Projects
Commit b2df67ad authored by Deyan Ginev's avatar Deyan Ginev
Browse files

some omissions in updating from 2018 embedding page

parent 84be8ca3
Branches
No related tags found
No related merge requests found
Pipeline #1732 passed
......@@ -20,16 +20,10 @@ Access is restricted to [SIGMathLing members](/member/) under the
articles, the right of distribution was only given (or assumed) to arXiv itself.
### Contents
- An 11.5 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
- An 15.2 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
- `token_model.zip`
- 300 dimensional GloVe word embeddings for the arXMLiv 08.2019 dataset
- `glove.arxmliv.11B.300d.zip` and `vocab.arxmliv.zip`
- 300d GloVe word embeddings for individual subsets
- `glove.subsets.zip`
- Embeddings and vocabulary with math lexemes omitted
- `glove.arxmliv.nomath.11B.300d.zip` and `vocab.arxmliv.nomath.zip`
- added on July 20, 2019
- used as a control when evaluating the contribution of formula lexemes
- `glove.arxmliv.15B.300d.zip` and `vocab.arxmliv.zip`
- the main arXMLiv dataset is available separately [here](/resources/arxmliv-dataset-082019/)
#### Token Model Statistics
......@@ -65,7 +59,7 @@ Please cite the main dataset when using the word embeddings, as they are generat
([SIGMathLing members](/member/) only)
### Generated via
- [llamapun 0.2.0](https://github.com/KWARC/llamapun/releases/tag/0.2.0),
- [llamapun 0.3.3](https://github.com/KWARC/llamapun/releases/tag/0.3.3),
- [GloVe 1.2, 2019](https://github.com/stanfordnlp/GloVe/tree/07d59d5e6584e27ec758080bba8b51fce30f69d8)
### Generation Parameters
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment