Skip to content
Snippets Groups Projects

Update embeddings description with "nomath" controls

Merged Deyan Ginev requested to merge update-arxmliv-embeddings into master
@@ -22,10 +22,11 @@ articles, the right of distribution was only given (or assumed) to arXiv itself.
- An 11.5 billion token model for the arXMLiv 08.2018 dataset, including subformula lexemes
- `token_model.zip`
- 300 dimensional GloVe word embeddings for the arXMLiv 08.2018 dataset
- `glove.arxmliv.5B.300d.zip` and `vocab.arxmliv.zip`
- `glove.arxmliv.11B.300d.zip` and `vocab.arxmliv.zip`
- 300d GloVe word embeddings for individual subsets
- `glove.subsets.zip`
- Embeddings and vocabulary with math lexemes omitted: `glove.arxmliv.nomath.11B.300d.zip` and `vocab.arxmliv.nomath.zip`
- Embeddings and vocabulary with math lexemes omitted
- `glove.arxmliv.nomath.11B.300d.zip` and `vocab.arxmliv.nomath.zip`
- added on July 20, 2019
- used as a control when evaluating the contribution of formula lexemes
- the main arXMLiv dataset is available separately [here](/resources/arxmliv-dataset-082018/)
Loading