From b2df67ad533823ad1e2f3278d7c2e9dc44616cbe Mon Sep 17 00:00:00 2001
From: Deyan Ginev <deyan.ginev@gmail.com>
Date: Thu, 19 Sep 2019 09:25:12 -0400
Subject: [PATCH] some omissions in updating from 2018 embedding page

---
 resources/arxmliv-embeddings-082019.md | 12 +++---------
 1 file changed, 3 insertions(+), 9 deletions(-)

diff --git a/resources/arxmliv-embeddings-082019.md b/resources/arxmliv-embeddings-082019.md
index f068cca..d959fe2 100644
--- a/resources/arxmliv-embeddings-082019.md
+++ b/resources/arxmliv-embeddings-082019.md
@@ -20,16 +20,10 @@ Access is restricted to  [SIGMathLing members](/member/) under the
 articles, the right of distribution was only given (or assumed) to arXiv itself.
 
 ### Contents
-  - An 11.5 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
+  - An 15.2 billion token model for the arXMLiv 08.2019 dataset, including subformula lexemes
     - `token_model.zip`
   - 300 dimensional GloVe word embeddings for the arXMLiv 08.2019 dataset
-    - `glove.arxmliv.11B.300d.zip` and `vocab.arxmliv.zip`
-  - 300d GloVe word embeddings for individual subsets
-    - `glove.subsets.zip`
-  - Embeddings and vocabulary with math lexemes omitted
-    - `glove.arxmliv.nomath.11B.300d.zip` and `vocab.arxmliv.nomath.zip`
-    - added on July 20, 2019
-    - used as a control when evaluating the contribution of formula lexemes
+    - `glove.arxmliv.15B.300d.zip` and `vocab.arxmliv.zip`
   - the main arXMLiv dataset is available separately [here](/resources/arxmliv-dataset-082019/)
 
 #### Token Model Statistics
@@ -65,7 +59,7 @@ Please cite the main dataset when using the word embeddings, as they are generat
   ([SIGMathLing members](/member/) only)
 
 ### Generated via
- - [llamapun 0.2.0](https://github.com/KWARC/llamapun/releases/tag/0.2.0),
+ - [llamapun 0.3.3](https://github.com/KWARC/llamapun/releases/tag/0.3.3),
  - [GloVe 1.2, 2019](https://github.com/stanfordnlp/GloVe/tree/07d59d5e6584e27ec758080bba8b51fce30f69d8)
 
 ### Generation Parameters
-- 
GitLab