From 104f0887be804902229787c36d7c0707a293ce47 Mon Sep 17 00:00:00 2001
From: jfschaefer <jfschaefer@outlook.com>
Date: Wed, 4 Apr 2018 10:12:49 +0200
Subject: [PATCH] extended llamapun system description

---
 systems/llamapun.md | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/systems/llamapun.md b/systems/llamapun.md
index c150bda..22ee76e 100644
--- a/systems/llamapun.md
+++ b/systems/llamapun.md
@@ -19,10 +19,21 @@ repository: https://github.com/KWARC/LLaMaPUn/
 publink: http://kwarc.github.io/bibs/llamapun
 ---
 
-The LaMaPUn project investigates the structure and meaning of scientific/technical
-documents and builds tools for extracting semantic representations from them that can be
+The [LLaMaPUn library](https://github.com/KWARC/LLaMaPUn/) is a
+[RUST](https://www.rust-lang.org) library that provides a wide range of processing
+tools for natural language and mathematics.
+It can be used to investigate the structure and meaning of scientific/technical
+documents and to build tools for extracting semantic representations from them that can be
 used to enhance access to and interaction with document corpora.
 
-The LLaMaPUn library consists of a wide range of processing tools for natural language and
-mathematics. 
+In particular, the LLaMaPUn library is used on the
+[arXMLiv](https://www.kwarc.info/projects/arXMLiv/) data set, which is a translation
+of the [arxiv](https://arxiv.org/) corpus to "HTML5 with [MathML](https://www.w3.org/TR/MathML/)".
 
+Some of the library's features are:
+ * Plaintext generation with many options (unicode normalization, word stemming, custom handling of e.g. `math` nodes, ...)
+ * Word/Sentence tokenization
+ * Support for standard NLP tools (token models for GloVe, POS tagging with SENNA, ...)
+ * Mapping between plaintext offsets and HTML nodes (using the DNM data structure)
+
+For a more complete overview, take a look at the [README file](https://github.com/KWARC/llamapun/blob/master/README.md).
-- 
GitLab