diff --git a/_posts/2018-01-24-dataset.md b/_posts/2018-01-24-dataset.md
index 43ebcab5d98eb61918ec160d119f1425e730196a..f23ef4764bb8fb0bbd952d069d4b22a7aed0a5e3 100644
--- a/_posts/2018-01-24-dataset.md
+++ b/_posts/2018-01-24-dataset.md
@@ -1,9 +1,9 @@
 ---
 layout: post
-title: First Data Set (1.1 Million scientific HTML5 documents from arXiv)
+title: First Data Sets (1.1 Million scientific HTML5 documents from arXiv and token models)
 ---
-SIGMathLing has published a first data set, which also acts as a template for future data
-sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research
+SIGMathLing has published the first data sets. They also act as templates for future data
+sets. The content of these data sets are licensed to [SIGMathLing members](/member/) for research
 and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/).
 
 This collection of 1.1 Million HTML5 documents
@@ -13,6 +13,11 @@ the [KWARC](https://kwarc.info/) research group.  It was created by converting t
 [LaTeXML](https://github.com/brucemiller/LaTeXML) using the
 [CorTeX corpus management system](https://github.com/dginev/CorTeX).
 
-Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/).
+The token models are generated from this document collection via the
+[LLaMaPuN](https://github.com/KWARC/llamapun/releases/tag/0.1) and
+[GloVe](https://github.com/stanfordnlp/GloVe/tree/765074642a6544e47849bb85d8dc2e11e44c2922)
+libraries. 
+
+Details can be found on the [SIGMathLing Resource page](/resources/).