From 87593b42ae11575fdee1c1bc517dc68027990237 Mon Sep 17 00:00:00 2001
From: Michael Kohlhase <michael.kohlhase@fau.de>
Date: Wed, 10 Jan 2018 07:15:29 +0100
Subject: [PATCH] adding subset id and subset explanation text.

---
 resources/arxmliv.md | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/resources/arxmliv.md b/resources/arxmliv.md
index df18aff..7228e04 100644
--- a/resources/arxmliv.md
+++ b/resources/arxmliv.md
@@ -25,11 +25,11 @@ articles, the right of distribution was only given (or assumed) to arXiv itself.
   - 1,088,370 HTML5 documents
   - Three separate archive bundles separated by LaTeXML conversion severity
 
-| subset                         | MD5                                | number of documents | size archived | size unpacked |
+| subset ID | file name | MD5                                | number of documents | size archived | size unpacked |
 | ---                            | ---                                | ---                 | ---           | ---           |
-| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088             | 5 GB          | 37 GB         |
-| arXMLiv_08_2017_warning.zip    | `c0d5c1baf626225b48264510ac4c6bd5` | 574,638             | 71 GB         | 595 GB        | 
-| arXMLiv_08_2017_error.zip      | `2f4e60b993d85d30523b064c19e45733` | 401,644             | 50 GB         | 421 GB        |
+| no_problems| arXMLiv_08_2017_no_problem.zip | `036945755c7cc75ea1577cf04ca4fead` | 112,088             | 5 GB          | 37 GB         |
+| warning| arXMLiv_08_2017_warning.zip    | `c0d5c1baf626225b48264510ac4c6bd5` | 574,638             | 71 GB         | 595 GB        | 
+| error| arXMLiv_08_2017_error.zip      | `2f4e60b993d85d30523b064c19e45733` | 401,644             | 50 GB         | 421 GB        |
 
 
 ### Description
@@ -52,7 +52,10 @@ A following release is planned for mid-2018, with an up-to-date arXiv dataset an
 The dataset should be referenced in all academic publications that present results
 obtained with its help. The reference should contain the identifier `arXMLiv:08.2017` in
 the title, the author, year, a reference to SIGMathLing, and the URL of the resource
-description page. For convenience, we supply some records for bibTeX and EndNote below. 
+description page. For convenience, we supply some records for bibTeX and EndNote below. To
+cite a particular part of the dataset use the subset identifiers in the ciation; e.g. `
+\cite[no_problem subset]{arXMLiv:08.2017}` or just explain it in the text using the
+concrete identifier.  
 
 #### pure bibTeX
 ```
-- 
GitLab