some omissions in updating from 2018 embedding page

b2df67ad · Deyan Ginev · 84be8ca3 · b2df67ad
Commit b2df67ad authored Sep 19, 2019 by Deyan Ginev
--- a/resources/arxmliv-embeddings-082019.md
+++ b/resources/arxmliv-embeddings-082019.md
@@ -20,16 +20,10 @@ Access is restricted to  [SIGMathLing members](/member/) under the
 articles, the right of distribution was only given (or assumed) to arXiv itself.

 ### Contents
-  - An 11.5 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
+  - An 15.2 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
    - `token_model.zip`
  - 300 dimensional GloVe word embeddings for the arXMLiv 08.2019 dataset
-    - `glove.arxmliv.11B.300d.zip` and `vocab.arxmliv.zip`
-  - 300d GloVe word embeddings for individual subsets
-    - `glove.subsets.zip`
-  - Embeddings and vocabulary with math lexemes omitted
-    - `glove.arxmliv.nomath.11B.300d.zip` and `vocab.arxmliv.nomath.zip`
-    - added on July 20, 2019
-    - used as a control when evaluating the contribution of formula lexemes
+    - `glove.arxmliv.15B.300d.zip` and `vocab.arxmliv.zip`
  - the main arXMLiv dataset is available separately [here](/resources/arxmliv-dataset-082019/)

 #### Token Model Statistics
@@ -65,7 +59,7 @@ Please cite the main dataset when using the word embeddings, as they are generat
  ([SIGMathLing members](/member/) only)

 ### Generated via
- - [llamapun 0.2.0](https://github.com/KWARC/llamapun/releases/tag/0.2.0),
+ - [llamapun 0.3.3](https://github.com/KWARC/llamapun/releases/tag/0.3.3),
 - [GloVe 1.2, 2019](https://github.com/stanfordnlp/GloVe/tree/07d59d5e6584e27ec758080bba8b51fce30f69d8)

 ### Generation Parameters