Skip to content
Snippets Groups Projects
Commit 7b7cadda authored by Deyan Ginev's avatar Deyan Ginev
Browse files

updating arxmliv dataset page first draft with no_problem details

parent 7f961f50
No related branches found
No related tags found
1 merge request!1Document 08.2017 arxlmiv dataset release
An HTML dataset of arXiv.org # An HTML5 dataset for arXiv.org
Part of the [arXMLiv](https://kwarc.info/systems/arXMLiv/) project at the [KWARC](https://kwarc.info/) research group
### Current release ### Current release
- 08.2017 - 08.2017
...@@ -6,17 +8,21 @@ An HTML dataset of arXiv.org ...@@ -6,17 +8,21 @@ An HTML dataset of arXiv.org
### License ### License
TODO: Official SIGMathLing license link TODO: Official SIGMathLing license link
### Generated by ### Generated via
- [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2), - [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2),
- [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0) - [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0)
### Details: ### Contents
- Size: `todo-add-size`GB archived, - 1,088,370 HTML5 documents
- MD5: `todo-add-hash` arXMLiv_08_2017.zip - Three separate archive bundles separated by LaTeXML conversion severity
- Contents:
- 1,088,375 HTML5 documents | subset | MD5 | number of documents | size archived | size unpacked |
- By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error` | --- | --- | --- | --- | --- |
| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088 | 5 GB | 37 GB |
| arXMLiv_08_2017_warning.zip | md5 | 574,638 | | 595 GB |
| arXMLiv_08_2017_error.zip | md5 | 401,644 | | 421 GB |
### Description: ### Description:
This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only.. This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only..
...@@ -30,3 +36,4 @@ An HTML dataset of arXiv.org ...@@ -30,3 +36,4 @@ An HTML dataset of arXiv.org
### Download ### Download
[Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017) [Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017)
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment