From 2646770a7a3c5671612b7d37a5a1d0cbb08e5c80 Mon Sep 17 00:00:00 2001
From: Michael Kohlhase <michael.kohlhase@fau.de>
Date: Wed, 24 Jan 2018 20:06:17 +0100
Subject: [PATCH] two data sets actually

---
 _posts/2018-01-24-dataset.md | 13 +++++++++----
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/_posts/2018-01-24-dataset.md b/_posts/2018-01-24-dataset.md
index 43ebcab..f23ef47 100644
--- a/_posts/2018-01-24-dataset.md
+++ b/_posts/2018-01-24-dataset.md
@@ -1,9 +1,9 @@
 ---
 layout: post
-title: First Data Set (1.1 Million scientific HTML5 documents from arXiv)
+title: First Data Sets (1.1 Million scientific HTML5 documents from arXiv and token models)
 ---
-SIGMathLing has published a first data set, which also acts as a template for future data
-sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research
+SIGMathLing has published the first data sets. They also act as templates for future data
+sets. The content of these data sets are licensed to [SIGMathLing members](/member/) for research
 and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/).
 
 This collection of 1.1 Million HTML5 documents
@@ -13,6 +13,11 @@ the [KWARC](https://kwarc.info/) research group.  It was created by converting t
 [LaTeXML](https://github.com/brucemiller/LaTeXML) using the
 [CorTeX corpus management system](https://github.com/dginev/CorTeX).
 
-Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/).
+The token models are generated from this document collection via the
+[LLaMaPuN](https://github.com/KWARC/llamapun/releases/tag/0.1) and
+[GloVe](https://github.com/stanfordnlp/GloVe/tree/765074642a6544e47849bb85d8dc2e11e44c2922)
+libraries. 
+
+Details can be found on the [SIGMathLing Resource page](/resources/).
 
 
-- 
GitLab