From 8a1043e0afeab10202008dc82020b57f7925b652 Mon Sep 17 00:00:00 2001
From: Michael Kohlhase <michael.kohlhase@fau.de>
Date: Tue, 3 Apr 2018 13:25:50 +0200
Subject: [PATCH] more

---
 systems/arXMLiv.md | 23 +++++++++++++++++++++--
 1 file changed, 21 insertions(+), 2 deletions(-)

diff --git a/systems/arXMLiv.md b/systems/arXMLiv.md
index 97d2ce1..3a24db5 100644
--- a/systems/arXMLiv.md
+++ b/systems/arXMLiv.md
@@ -9,8 +9,27 @@ people:
     - mkohlhase
     - dginev
 
-website: http://cortex.mathweb.info
+website: http://cortex.mathweb.org
 repository: https://github.com/dginev/CorTeX
 ---
 
-The [Cornell e-print arXiv](http://arxiv.org) contains one of the largest corpora of scientific literature in the world. Unfortunately, its contents are locked up in the TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We translate it to XML to have a basis for uncovering it's structural semantics.
+The [Cornell e-print arXiv](http://arxiv.org) contains one of the largest corpora of
+scientific literature in the world. Unfortunately, its contents are locked up in the
+TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We
+translate it to XML and "HTML5 with [MathML](http://www.w3.org/TR/MathML/)" via
+[LaTeXML](https://dlmf.nist.gov/LaTeXML/) to have a basis for uncovering it's structural
+semantics (see the [LLaMaPuN](/systems/llamapun/) project for details).
+
+The actual corpus processing (and distribution to hundreds of worker machines) is
+performed by the [CorTeX](https://github.com/dginev/cortex) system; see the system
+state/results: [old but complete](http://cortex.mathweb.org/corpus/arXMLiv),
+[new system in Erlangen](https://corpora.mathweb.org/corpus/arxiv_1712/tex_to_html).
+
+Applications of this include a mathematical search engine [MathWebSearch](/systems/mws/): 
+(live [demo on the arXMLiv data set](http://arxivsearch.mathweb.org). 
+
+Unfortunately, we cannot re-distribute the results of the transformation freely, due to
+arXiv licensing policies. Therefore we have created the Special Interest Group for Math
+Linguistics ([SIGMathLing](http://SIGMathLing.kwarc.info) that can distribute the data
+sets under an [NDA](https://sigmathling.kwarc.info/nda/) to
+[SIGMathLing members](https://sigmathling.kwarc.info/member/)).
-- 
GitLab