From cec32f4eda2b336d818b69fa2cc6011c3e695e7c Mon Sep 17 00:00:00 2001
From: Michael Kohlhase <michael.kohlhase@fau.de>
Date: Fri, 5 Jan 2018 11:30:03 +0100
Subject: [PATCH] new

---
 _posts/2018-01-08-dataset.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
 create mode 100644 _posts/2018-01-08-dataset.md

diff --git a/_posts/2018-01-08-dataset.md b/_posts/2018-01-08-dataset.md
new file mode 100644
index 0000000..8721994
--- /dev/null
+++ b/_posts/2018-01-08-dataset.md
@@ -0,0 +1,18 @@
+---
+layout: post
+title: First Data Set on SIGMathLing
+---
+SIGMathLing has published a first data set, which also acts as a template for future data
+sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research
+and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/).
+
+This collection of 1.1 Million HTML5 documents
+has been developed as part of the [arXMLiv](https://kwarc.info/systems/arXMLiv/) project at
+the [KWARC](https://kwarc.info/) research group.  It was created by converting the
+[arXiv collection of scientific preprints until August 2017](http://arxiv.org) via
+[LaTeXML](https://github.com/brucemiller/LaTeXML) using the
+[CorTeX corpus management system](https://github.com/dginev/CorTeX).
+
+Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/).
+
+
-- 
GitLab