We want to use GF in order to parse mathematical documents.
Here is an overview of our current approach.
Note that the current implementation is merely a proof of concept and subject to constant change. Issues and discussions can also be found at <https://gl.kwarc.info/smglom/GF/issues>.
The Mathematical Grammatical Framework (MGF)
---
MGF uses [GF](https://www.grammaticalframework.org/) in order to parse mathematical documents.
The project is still at an early stage.
Currently, the translation between English and German is supported for a small number of examples.
Furthermore, the sentences can be formalized in a logical representation.
Preprocessing
Running the Examples
===
Consider the following example sentence:
A number of examples can be found in `examples.gfs`.
You can run them directly with
```
A positive integer $n$ is called prime iff there is no integer $1 < m < n$ such that $m \mid n$.
```bash
cat examples.gfs | gf --run
```
This sentence (written in LaTeX) gets transformed into an html representation using LaTeXML.
We use our LLaMaPUn library to process this html representation and generate a (space-separated) token stream.
The formula representation is based directly on the corresponding presentation-MathML.
For the example sentence, this gives us the following token stream:
Alternatively, you can run them individually/test your own input.
After starting the GF command line, you first need to import the grammars:
```
a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $```
i src/MgfEng.gf
i src/MgfGer.gf
i src/MgfLog.gf
```
The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.
Formula parsing
===
So far, the coverage for parsing formulae is tiny.
As an example,
Note that the German grammar (`MgfGer`) takes quite a while to get linked.
If you don't need it, feel free to skip it.
Afterwards, you can for example translate a sentence from English to German in the following way:
```
mrow( mi( m ) mo( | ) mi( n ) )
parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Ger
```
matches the rule
A logical representation can be generated with the language `Log`:
parse -lang=Eng "a unital quasigroup is called a loop" | vt -view=zathura
```
The `MathCN` concept is not good and we need to do more work on declarations (<https://gl.kwarc.info/smglom/GF/issues/2>).
Note that a png file is generated in the case of a single parse tree (which might require a different viewer).
Language/English parsing
Preprocessing
===
The central categories on the language side are: `Statement`, `Definition`, `MObj`.
A `Statement` can be a formula (as described above) or a language statement.
An `MObj` is a "mathematical object", such as an "integer".
Often, identifiers are introduced in an apposition (as in "a positive integer $n$").
This can be done using the following rule:
Consider the following example sentence:
```
appo_mobj : MObj -> MathCN -> MObj;
A positive integer $n$ is called prime, iff there is no integer $1 < m < n$ such that $m \mid n$.
```
As mentioned above, more work is needed on `MathCN`, but also `MObj` and declarations in general (<https://gl.kwarc.info/smglom/GF/issues/2>).
This sentence (written in LaTeX) gets transformed into an html representation using [LaTeXML](https://github.com/brucemiller/LaTeXML).
We use our [LLaMaPUn library](https://github.com/kwarc/llamapun) to process this html representation and generate a (space-separated) token stream.
The formula representation is based directly on the corresponding presentation-MathML.
For the example sentence, this gives us the following token stream:
```
a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $
```
The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.
parse "a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log
parse "a set is called empty iff it is the empty set"
parse "an alphabet $ mi( A ) $ is a finite set"
-- import grammars. For the German grammar, linking can take quite a while.
i src/MgfEng.gf
i src/MgfGer.gf
i src/MgfLog.gf
-- translate prime definition to German
p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Ger
-- translate prime definition to logical representation
p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log
-- parse tree of loop definition (pipe it into vp for visualization)
p -lang=Eng "a unital quasigroup is called a loop"
-- translate a statement from German to English
p -lang=Ger "es gibt eine gerade ganze Zahl" | l -lang=Eng