Skip to content
Snippets Groups Projects
arxmliv.md 1.25 KiB
Newer Older
  • Learn to ignore specific revisions
  • Deyan Ginev's avatar
    Deyan Ginev committed
    An HTML dataset of arXiv.org
    
    ### Current release
     - 08.2017
    
    ### License
     TODO: Official SIGMathLing license link
    
    ### Generated by
     - [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2), 
     - [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0)
    
    ### Details:
     - Size: `todo-add-size`GB archived, 
     - MD5: `todo-add-hash`  arXMLiv_08_2017.zip
     - Contents:
       - 1,088,375 HTML5 documents
       - By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error`
     
    ### Description:
    
      This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only.. 
      We welcome community feedback on all of: data quality, need for auxiliary resources (e.g. figures, token models), representation issues, as well as organization and archival best practices.
    
      Next release is planned for mid-2018, with an up-to-date arXiv dataset and community feedback incorporated. We anticipate annual dataset releases going forward.
    
    ### Citing this Resource
    
     TODO: Bibtex
    
    ### Download
      [Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)