From cec32f4eda2b336d818b69fa2cc6011c3e695e7c Mon Sep 17 00:00:00 2001 From: Michael Kohlhase <michael.kohlhase@fau.de> Date: Fri, 5 Jan 2018 11:30:03 +0100 Subject: [PATCH] new --- _posts/2018-01-08-dataset.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) create mode 100644 _posts/2018-01-08-dataset.md diff --git a/_posts/2018-01-08-dataset.md b/_posts/2018-01-08-dataset.md new file mode 100644 index 0000000..8721994 --- /dev/null +++ b/_posts/2018-01-08-dataset.md @@ -0,0 +1,18 @@ +--- +layout: post +title: First Data Set on SIGMathLing +--- +SIGMathLing has published a first data set, which also acts as a template for future data +sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research +and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/). + +This collection of 1.1 Million HTML5 documents +has been developed as part of the [arXMLiv](https://kwarc.info/systems/arXMLiv/) project at +the [KWARC](https://kwarc.info/) research group. It was created by converting the +[arXiv collection of scientific preprints until August 2017](http://arxiv.org) via +[LaTeXML](https://github.com/brucemiller/LaTeXML) using the +[CorTeX corpus management system](https://github.com/dginev/CorTeX). + +Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/). + + -- GitLab