--- layout: system title: arXMLiv teaser: Translating the arXiv to XML/HTML5 start_date: '2006' people: - mkohlhase - dginev website: http://cortex.mathweb.info repository: https://github.com/dginev/CorTeX --- The [Cornell e-print arXiv](http://arxiv.org) contains one of the largest corpora of scientific literature in the world. Unfortunately, its contents are locked up in the TeX/LaTeX format, which makes it nearly useless for knowledge management techniques. We translate it to XML to have a basis for uncovering it's structural semantics.