title: First Data Set (1.1 Million scientific HTML5 documents from arXiv)
SIGMathLing has published a first data set, which also acts as a template for future data
sets. The content of this data set is licensed to [SIGMathLing members](/member/) for research
