title: First Data Set on SIGMathLing
SIGMathLing has published a first data set, which also acts as a template for future data
sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research
and tool development purposes subject to the [SIGMathLing Non-Disclosure-Agreement](/nda/).
This collection of 1.1 Million HTML5 documents
has been developed as part of the [arXMLiv]( project at
the [KWARC]( research group. It was created by converting the
[arXiv collection of scientific preprints until August 2017]( via
[LaTeXML]( using the
[CorTeX corpus management system](
Details can be found on the [SIGMathLing Resource page](/resources/arxmliv/).
