From 87593b42ae11575fdee1c1bc517dc68027990237 Mon Sep 17 00:00:00 2001 From: Michael Kohlhase <michael.kohlhase@fau.de> Date: Wed, 10 Jan 2018 07:15:29 +0100 Subject: [PATCH] adding subset id and subset explanation text. --- resources/arxmliv.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/resources/arxmliv.md b/resources/arxmliv.md index df18aff..7228e04 100644 --- a/resources/arxmliv.md +++ b/resources/arxmliv.md @@ -25,11 +25,11 @@ articles, the right of distribution was only given (or assumed) to arXiv itself. - 1,088,370 HTML5 documents - Three separate archive bundles separated by LaTeXML conversion severity -| subset | MD5 | number of documents | size archived | size unpacked | +| subset ID | file name | MD5 | number of documents | size archived | size unpacked | | --- | --- | --- | --- | --- | -| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088 | 5 GB | 37 GB | -| arXMLiv_08_2017_warning.zip | `c0d5c1baf626225b48264510ac4c6bd5` | 574,638 | 71 GB | 595 GB | -| arXMLiv_08_2017_error.zip | `2f4e60b993d85d30523b064c19e45733` | 401,644 | 50 GB | 421 GB | +| no_problems| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088 | 5 GB | 37 GB | +| warning| arXMLiv_08_2017_warning.zip | `c0d5c1baf626225b48264510ac4c6bd5` | 574,638 | 71 GB | 595 GB | +| error| arXMLiv_08_2017_error.zip | `2f4e60b993d85d30523b064c19e45733` | 401,644 | 50 GB | 421 GB | ### Description @@ -52,7 +52,10 @@ A following release is planned for mid-2018, with an up-to-date arXiv dataset an The dataset should be referenced in all academic publications that present results obtained with its help. The reference should contain the identifier `arXMLiv:08.2017` in the title, the author, year, a reference to SIGMathLing, and the URL of the resource -description page. For convenience, we supply some records for bibTeX and EndNote below. +description page. For convenience, we supply some records for bibTeX and EndNote below. To +cite a particular part of the dataset use the subset identifiers in the ciation; e.g. ` +\cite[no_problem subset]{arXMLiv:08.2017}` or just explain it in the text using the +concrete identifier. #### pure bibTeX ``` -- GitLab