We want to use GF in order to parse mathematical documents.

Here is an overview of our current approach.

Note that the current implementation is merely a proof of concept and subject to constant change. Issues and discussions can also be found at <https://gl.kwarc.info/smglom/GF/issues>.

The Mathematical Grammatical Framework (MGF)

---

MGF uses [GF](https://www.grammaticalframework.org/) in order to parse mathematical documents.

The project is still at an early stage.

Currently, the translation between English and German is supported for a small number of examples.

Furthermore, the sentences can be formalized in a logical representation.

Preprocessing

Running the Examples

===

Consider the following example sentence:

A number of examples can be found in `examples.gfs`.

You can run them directly with

```

A positive integer $n$ is called prime iff there is no integer $1 < m < n$ such that $m \mid n$.

```bash

cat examples.gfs | gf --run

```

This sentence (written in LaTeX) gets transformed into an html representation using LaTeXML.

We use our LLaMaPUn library to process this html representation and generate a (space-separated) token stream.

The formula representation is based directly on the corresponding presentation-MathML.

For the example sentence, this gives us the following token stream:

Alternatively, you can run them individually/test your own input.

After starting the GF command line, you first need to import the grammars:

```

a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $```

i src/MgfEng.gf

i src/MgfGer.gf

i src/MgfLog.gf

```

The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.

Formula parsing

===

So far, the coverage for parsing formulae is tiny.

As an example,

Note that the German grammar (`MgfGer`) takes quite a while to get linked.

If you don't need it, feel free to skip it.

Afterwards, you can for example translate a sentence from English to German in the following way:

```

mrow( mi( m ) mo( | ) mi( n ) )

parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Ger

```

matches the rule

A logical representation can be generated with the language `Log`:

parse -lang=Eng "a unital quasigroup is called a loop" | vt -view=zathura

```

The `MathCN` concept is not good and we need to do more work on declarations (<https://gl.kwarc.info/smglom/GF/issues/2>).

Note that a png file is generated in the case of a single parse tree (which might require a different viewer).

Language/English parsing

Preprocessing

===

The central categories on the language side are: `Statement`, `Definition`, `MObj`.

A `Statement` can be a formula (as described above) or a language statement.

An `MObj` is a "mathematical object", such as an "integer".

Often, identifiers are introduced in an apposition (as in "a positive integer $n$").

This can be done using the following rule:

Consider the following example sentence:

```

appo_mobj : MObj -> MathCN -> MObj;

A positive integer $n$ is called prime, iff there is no integer $1 < m < n$ such that $m \mid n$.

```

As mentioned above, more work is needed on `MathCN`, but also `MObj` and declarations in general (<https://gl.kwarc.info/smglom/GF/issues/2>).

This sentence (written in LaTeX) gets transformed into an html representation using [LaTeXML](https://github.com/brucemiller/LaTeXML).

We use our [LLaMaPUn library](https://github.com/kwarc/llamapun) to process this html representation and generate a (space-separated) token stream.

The formula representation is based directly on the corresponding presentation-MathML.

For the example sentence, this gives us the following token stream:

```

a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $

```

The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.

parse "a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log

parse "a set is called empty iff it is the empty set"

parse "an alphabet $ mi( A ) $ is a finite set"

-- import grammars. For the German grammar, linking can take quite a while.

i src/MgfEng.gf

i src/MgfGer.gf

i src/MgfLog.gf

-- translate prime definition to German

p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Ger

-- translate prime definition to logical representation

p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log

-- parse tree of loop definition (pipe it into vp for visualization)

p -lang=Eng "a unital quasigroup is called a loop"

-- translate a statement from German to English

p -lang=Ger "es gibt eine gerade ganze Zahl" | l -lang=Eng