An HTML dataset of arXiv.org

### Current release
 - 08.2017

### License
 TODO: Official SIGMathLing license link

### Generated by
 - [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2), 
 - [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0)

### Details:
 - Size: `todo-add-size`GB archived, 
 - MD5: `todo-add-hash`  arXMLiv_08_2017.zip
 - Contents:
   - 1,088,375 HTML5 documents
   - By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error`
 
### Description:

  This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only.. 
  We welcome community feedback on all of: data quality, need for auxiliary resources (e.g. figures, token models), representation issues, as well as organization and archival best practices.

  Next release is planned for mid-2018, with an up-to-date arXiv dataset and community feedback incorporated. We anticipate annual dataset releases going forward.

### Citing this Resource

 TODO: Bibtex

### Download
  [Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)