Commit 7b7cadda authored by Deyan Ginev's avatar Deyan Ginev

updating arxmliv dataset page first draft with no_problem details

parent 7f961f50
An HTML dataset of arXiv.org
# An HTML5 dataset for arXiv.org
Part of the [arXMLiv](https://kwarc.info/systems/arXMLiv/) project at the [KWARC](https://kwarc.info/) research group
### Current release
- 08.2017
......@@ -6,17 +8,21 @@ An HTML dataset of arXiv.org
### License
TODO: Official SIGMathLing license link
### Generated by
### Generated via
- [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2),
- [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0)
### Details:
- Size: `todo-add-size`GB archived,
- MD5: `todo-add-hash` arXMLiv_08_2017.zip
- Contents:
- 1,088,375 HTML5 documents
- By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error`
### Contents
- 1,088,370 HTML5 documents
- Three separate archive bundles separated by LaTeXML conversion severity
| subset | MD5 | number of documents | size archived | size unpacked |
| --- | --- | --- | --- | --- |
| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088 | 5 GB | 37 GB |
| arXMLiv_08_2017_warning.zip | md5 | 574,638 | | 595 GB |
| arXMLiv_08_2017_error.zip | md5 | 401,644 | | 421 GB |
### Description:
This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only..
......@@ -30,3 +36,4 @@ An HTML dataset of arXiv.org
### Download
[Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)
\ No newline at end of file
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment