Skip to content
Snippets Groups Projects
Commit 8a1043e0 authored by Michael Kohlhase's avatar Michael Kohlhase
Browse files

more

parent 51d32c19
No related branches found
No related tags found
No related merge requests found
......@@ -9,8 +9,27 @@ people:
- mkohlhase
- dginev
website: http://cortex.mathweb.info
website: http://cortex.mathweb.org
repository: https://github.com/dginev/CorTeX
---
The [Cornell e-print arXiv](http://arxiv.org) contains one of the largest corpora of scientific literature in the world. Unfortunately, its contents are locked up in the TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We translate it to XML to have a basis for uncovering it's structural semantics.
The [Cornell e-print arXiv](http://arxiv.org) contains one of the largest corpora of
scientific literature in the world. Unfortunately, its contents are locked up in the
TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We
translate it to XML and "HTML5 with [MathML](http://www.w3.org/TR/MathML/)" via
[LaTeXML](https://dlmf.nist.gov/LaTeXML/) to have a basis for uncovering it's structural
semantics (see the [LLaMaPuN](/systems/llamapun/) project for details).
The actual corpus processing (and distribution to hundreds of worker machines) is
performed by the [CorTeX](https://github.com/dginev/cortex) system; see the system
state/results: [old but complete](http://cortex.mathweb.org/corpus/arXMLiv),
[new system in Erlangen](https://corpora.mathweb.org/corpus/arxiv_1712/tex_to_html).
Applications of this include a mathematical search engine [MathWebSearch](/systems/mws/):
(live [demo on the arXMLiv data set](http://arxivsearch.mathweb.org).
Unfortunately, we cannot re-distribute the results of the transformation freely, due to
arXiv licensing policies. Therefore we have created the Special Interest Group for Math
Linguistics ([SIGMathLing](http://SIGMathLing.kwarc.info) that can distribute the data
sets under an [NDA](https://sigmathling.kwarc.info/nda/) to
[SIGMathLing members](https://sigmathling.kwarc.info/member/)).
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment