An HTML dataset of arXiv.org ### Current release - 08.2017 ### License TODO: Official SIGMathLing license link ### Generated by - [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2), - [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0) ### Details: - Size: `todo-add-size`GB archived, - MD5: `todo-add-hash` arXMLiv_08_2017.zip - Contents: - 1,088,375 HTML5 documents - By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error` ### Description: This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only.. We welcome community feedback on all of: data quality, need for auxiliary resources (e.g. figures, token models), representation issues, as well as organization and archival best practices. Next release is planned for mid-2018, with an up-to-date arXiv dataset and community feedback incorporated. We anticipate annual dataset releases going forward. ### Citing this Resource TODO: Bibtex ### Download [Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)