Newer
Older
An HTML dataset of arXiv.org
### Current release
- 08.2017
### License
TODO: Official SIGMathLing license link
### Generated by
- [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2),
- [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0)
### Details:
- Size: `todo-add-size`GB archived,
- MD5: `todo-add-hash` arXMLiv_08_2017.zip
- Contents:
- 1,088,375 HTML5 documents
- By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error`
### Description:
This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only..
We welcome community feedback on all of: data quality, need for auxiliary resources (e.g. figures, token models), representation issues, as well as organization and archival best practices.
Next release is planned for mid-2018, with an up-to-date arXiv dataset and community feedback incorporated. We anticipate annual dataset releases going forward.
### Citing this Resource
TODO: Bibtex
### Download
[Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)