Commit b182a6ce authored by jfschaefer's avatar jfschaefer
Browse files

started cleaning up for paper

parent ed49de98
We want to use GF in order to parse mathematical documents.
Here is an overview of our current approach.
Note that the current implementation is merely a proof of concept and subject to constant change. Issues and discussions can also be found at <https://gl.kwarc.info/smglom/GF/issues>.
The Mathematical Grammatical Framework (MGF)
---
MGF uses [GF](https://www.grammaticalframework.org/) in order to parse mathematical documents.
The project is still at an early stage.
Currently, the translation between English and German is supported for a small number of examples.
Furthermore, the sentences can be formalized in a logical representation.
Preprocessing
Running the Examples
===
Consider the following example sentence:
A number of examples can be found in `examples.gfs`.
You can run them directly with
```
A positive integer $n$ is called prime iff there is no integer $1 < m < n$ such that $m \mid n$.
```bash
cat examples.gfs | gf --run
```
This sentence (written in LaTeX) gets transformed into an html representation using LaTeXML.
We use our LLaMaPUn library to process this html representation and generate a (space-separated) token stream.
The formula representation is based directly on the corresponding presentation-MathML.
For the example sentence, this gives us the following token stream:
Alternatively, you can run them individually/test your own input.
After starting the GF command line, you first need to import the grammars:
```
a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $```
i src/MgfEng.gf
i src/MgfGer.gf
i src/MgfLog.gf
```
The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.
Formula parsing
===
So far, the coverage for parsing formulae is tiny.
As an example,
Note that the German grammar (`MgfGer`) takes quite a while to get linked.
If you don't need it, feel free to skip it.
Afterwards, you can for example translate a sentence from English to German in the following way:
```
mrow( mi( m ) mo( | ) mi( n ) )
parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Ger
```
matches the rule
A logical representation can be generated with the language `Log`:
```
apply_bin_rel : FBinRelation -> FExpression -> FExpression -> FStatement;
parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Log
```
The prefix "F" indicates that the categories act on the formula level.
It's also possible to view parse trees with the commands `vt` and `vp`:
A complete formula can be one of the following two categories: A `Statement` or a `MathCN`.
The `Statement` corresponds to a clause (`Cl`). The example above ($m \mid n$) would be an example for that.
A `MathCN` on the other hand corresponds to a noun.
In "A positive integer $n$", the $n$ would be such a `MathCN`.
```
parse -lang=Eng "a unital quasigroup is called a loop" | vt
```
Similarly, in "there is no integer $1 < m < n$", the formula also is a `MathCN`. It matches the rule:
it opens the parse tree with `open`. If this does not work, you can manually
specify a command to open the graphs. E.g. to open it with the
pdf viewer `zathura`, you need to call
```
fexpr_fbinrel_fcid_fbinrel_fexpr_to_mathcn : -- introduces one identifier (like in $1 < n < 10$)
FExpression -> FBinRelation -> FComplexIdentifier -> FBinRelation -> FExpression -> MathCN;
parse -lang=Eng "a unital quasigroup is called a loop" | vt -view=zathura
```
The `MathCN` concept is not good and we need to do more work on declarations (<https://gl.kwarc.info/smglom/GF/issues/2>).
Note that a png file is generated in the case of a single parse tree (which might require a different viewer).
Language/English parsing
Preprocessing
===
The central categories on the language side are: `Statement`, `Definition`, `MObj`.
A `Statement` can be a formula (as described above) or a language statement.
An `MObj` is a "mathematical object", such as an "integer".
Often, identifiers are introduced in an apposition (as in "a positive integer $n$").
This can be done using the following rule:
Consider the following example sentence:
```
appo_mobj : MObj -> MathCN -> MObj;
A positive integer $n$ is called prime, iff there is no integer $1 < m < n$ such that $m \mid n$.
```
As mentioned above, more work is needed on `MathCN`, but also `MObj` and declarations in general (<https://gl.kwarc.info/smglom/GF/issues/2>).
This sentence (written in LaTeX) gets transformed into an html representation using [LaTeXML](https://github.com/brucemiller/LaTeXML).
We use our [LLaMaPUn library](https://github.com/kwarc/llamapun) to process this html representation and generate a (space-separated) token stream.
The formula representation is based directly on the corresponding presentation-MathML.
For the example sentence, this gives us the following token stream:
```
a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $
```
The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation.
Lexica
......
i gf/MNlpEng.gf
i gf/MNlpLog.gf
parse "a positive integer $ mi( n ) $ is called prime iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log
parse "a set is called empty iff it is the empty set"
parse "an alphabet $ mi( A ) $ is a finite set"
-- import grammars. For the German grammar, linking can take quite a while.
i src/MgfEng.gf
i src/MgfGer.gf
i src/MgfLog.gf
-- translate prime definition to German
p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Ger
-- translate prime definition to logical representation
p -lang=Eng "a positive integer $ mi( n ) $ is called prime , iff there is no integer $ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) $ such that $ mrow( mi( m ) mo( | ) mi( n ) ) $" | l -lang=Log
-- parse tree of loop definition (pipe it into vp for visualization)
p -lang=Eng "a unital quasigroup is called a loop"
-- translate a statement from German to English
p -lang=Ger "es gibt eine gerade ganze Zahl" | l -lang=Eng
concrete MNlpEng of MNlp = DLexiconEng, DGrammarEng, FGrammarEng, FLexiconEng ** {
}
concrete MNlpGer of MNlp = DLexiconGer, DGrammarGer, FGrammarGer, FLexiconGer ** {
}
concrete MNlpLog of MNlp = DLexiconLog, DGrammarLog, FGrammarLog, FLexiconLog ** {
}
......@@ -40,8 +40,8 @@ concrete DGrammarEng of DGrammar = MCatsEng, GrammarEng, ExtraEng ** open Syntax
iff_definition defi condition = lin S { s = defi.s ++ ("," | "") ++ "iff" ++ condition.s }; -- better solution for this?
-- def_defmobj_defmobj a b = mkCl a b;
def_mobj_is_mobj definiendum id definiens =
mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop
| mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop
-- mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop
mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop
(PastPartAP (mkVPSlash call_V3 (DetCN aSg_Det (mobjToCN definiendum)))));
-- mkS (mkCl (DetCN {- mkNP is for some reason ambiguous -}
-- aSg_Det (appo_mobj definiendum representative)) definiens);
......
concrete DGrammarGer of DGrammar = MCatsGer, GrammarGer** open SyntaxGer, ParadigmsGer, UtilsGer, ResGer, ParamX, Prelude in {
oper
call_V2 : V2 = mkV2 (ParadigmsGer.mkV "heißen");
call_V2A : V2A = mkV2A (ParadigmsGer.mkV "heißen");
call_V3 : V3 = mkV3 (ParadigmsGer.mkV "heißen");
exist_V2 : V2 = mkV2 (ParadigmsGer.mkV ("geben" | "existieren"));
......@@ -29,10 +30,10 @@ concrete DGrammarGer of DGrammar = MCatsGer, GrammarGer** open SyntaxGer, Paradi
prop);
-- (mkVP (mkVPSlash call_V2A prop))); -- TODO: FIX THIS!!!
iff_definition defi condition = lin S { s = \\order => (defi.s!order) ++ "genau dann , wenn" ++ condition.s!Sub }; -- better solution for this?
def_mobj_is_mobj definiendum id definiens =
mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop
| mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop
(PastPartAP (mkVPSlash call_V3 (DetCN aSg_Det (mobjToCN definiendum)))));
-- def_mobj_is_mobj definiendum id definiens =
-- mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop
-- mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop
-- (mkVp call_V2 (DetCN aSg_Det (mobjToCN definiendum))));
utter_statement s = mkUtt s;
utter_declaration s = mkUtt s;
......
......@@ -4,7 +4,7 @@ concrete DLexiconGer of DLexicon = MCatsGer ** open SyntaxGer, ParadigmsGer, Res
-- set_MObj = mkCN (mkN "set");
-- alphabet_MObj = mkCN (mkN "alphabet");
quasigroup_MObj = mkMObj (ParadigmsGer.mkN "Quasigruppe");
loop_MObj = mkMObj (ParadigmsGer.mkN "Loop");
loop_MObj = mkMObj (ParadigmsGer.mkN "Loop" Fem);
integer_MObj = mkMObj (ParadigmsGer.mkN "ganze Zahl" Fem);
finite_MObjProp = mkAP (ParadigmsGer.mkA "endlich");
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment