From 7f961f50655a398c737d643d9a87ec0e623cf994 Mon Sep 17 00:00:00 2001 From: Deyan Ginev <d.ginev@jacobs-university.de> Date: Tue, 2 Jan 2018 21:39:36 -0500 Subject: [PATCH] mock page stub for arxmliv --- arxmliv.md | 32 ++++++++++++++++++++++++++++++++ resources.md | 4 +++- 2 files changed, 35 insertions(+), 1 deletion(-) create mode 100644 arxmliv.md diff --git a/arxmliv.md b/arxmliv.md new file mode 100644 index 0000000..3185898 --- /dev/null +++ b/arxmliv.md @@ -0,0 +1,32 @@ +An HTML dataset of arXiv.org + +### Current release + - 08.2017 + +### License + TODO: Official SIGMathLing license link + +### Generated by + - [LaTeXML 0.8.2](https://github.com/brucemiller/LaTeXML/releases/tag/v0.8.2), + - [CorTeX 0.2](https://github.com/dginev/CorTeX/releases/tag/0.2.0) + +### Details: + - Size: `todo-add-size`GB archived, + - MD5: `todo-add-hash` arXMLiv_08_2017.zip + - Contents: + - 1,088,375 HTML5 documents + - By conversion severity: 112,088 `no_problem`, 574,642 `warning`, 401,645 `error` + +### Description: + + This is a first public release of the arXMLiv dataset generated by the [KWARC](https://kwarc.info/) research group. Its intended redistribution is confined to the scope of the [SIGMathLing] interest group, and access is members-only.. + We welcome community feedback on all of: data quality, need for auxiliary resources (e.g. figures, token models), representation issues, as well as organization and archival best practices. + + Next release is planned for mid-2018, with an up-to-date arXiv dataset and community feedback incorporated. We anticipate annual dataset releases going forward. + +### Citing this Resource + + TODO: Bibtex + +### Download + [Download link (password-protected)](https://gl.kwarc.info/SIGMathLing/dataset-arXMLiv-08-2017) diff --git a/resources.md b/resources.md index 355e658..473d08a 100644 --- a/resources.md +++ b/resources.md @@ -3,4 +3,6 @@ layout: page title: SIGMathLing - Datasets and Resources --- -none yet, but see the [plan](/technical/) + 1. [arXMLiv corpus, 08.2017 release](/arxmliv/) + +Additional resources are en route, see the [plan](/technical/) for details. -- GitLab