diff --git a/_posts/2018-01-08-dataset.md b/_posts/2018-01-08-dataset.md new file mode 100644 index 0000000000000000000000000000000000000000..87219945035b26f2936bf52c7813fa46e9b7b051 --- /dev/null +++ b/_posts/2018-01-08-dataset.md @@ -0,0 +1,18 @@ +--- +layout: post +title: First Data Set on SIGMathLing +--- +SIGMathLing has published a first data set, which also acts as a template for future data +sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research +and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/). + +This collection of 1.1 Million HTML5 documents +has been developed as part of the [arXMLiv](https://kwarc.info/systems/arXMLiv/) project at +the [KWARC](https://kwarc.info/) research group. It was created by converting the +[arXiv collection of scientific preprints until August 2017](http://arxiv.org) via +[LaTeXML](https://github.com/brucemiller/LaTeXML) using the +[CorTeX corpus management system](https://github.com/dginev/CorTeX). + +Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/). + +