Commit b182a6ce by jfschaefer

### started cleaning up for paper

parent ed49de98
 We want to use GF in order to parse mathematical documents. The Mathematical Grammatical Framework (MGF) Here is an overview of our current approach. --- Note that the current implementation is merely a proof of concept and subject to constant change. Issues and discussions can also be found at . MGF uses [GF](https://www.grammaticalframework.org/) in order to parse mathematical documents. The project is still at an early stage. Currently, the translation between English and German is supported for a small number of examples. Furthermore, the sentences can be formalized in a logical representation. Preprocessing Running the Examples === === Consider the following example sentence: A number of examples can be found in `examples.gfs`. You can run them directly with ``` ```bash A positive integer \$n\$ is called prime iff there is no integer \$1 < m < n\$ such that \$m \mid n\$. cat examples.gfs | gf --run ``` ``` This sentence (written in LaTeX) gets transformed into an html representation using LaTeXML. Alternatively, you can run them individually/test your own input. We use our LLaMaPUn library to process this html representation and generate a (space-separated) token stream. After starting the GF command line, you first need to import the grammars: The formula representation is based directly on the corresponding presentation-MathML. For the example sentence, this gives us the following token stream: ``` ``` a positive integer \$ mi( n ) \$ is called prime iff there is no integer \$ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) \$ such that \$ mrow( mi( m ) mo( | ) mi( n ) ) \$``` i src/MgfEng.gf i src/MgfGer.gf i src/MgfLog.gf ``` ``` The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation. Note that the German grammar (`MgfGer`) takes quite a while to get linked. If you don't need it, feel free to skip it. Afterwards, you can for example translate a sentence from English to German in the following way: Formula parsing === So far, the coverage for parsing formulae is tiny. As an example, ``` ``` mrow( mi( m ) mo( | ) mi( n ) ) parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Ger ``` ``` matches the rule A logical representation can be generated with the language `Log`: ``` ``` apply_bin_rel : FBinRelation -> FExpression -> FExpression -> FStatement; parse -lang=Eng "a unital quasigroup is called a loop" | l -lang=Log ``` ``` The prefix "F" indicates that the categories act on the formula level. It's also possible to view parse trees with the commands `vt` and `vp`: A complete formula can be one of the following two categories: A `Statement` or a `MathCN`. ``` The `Statement` corresponds to a clause (`Cl`). The example above (\$m \mid n\$) would be an example for that. parse -lang=Eng "a unital quasigroup is called a loop" | vt A `MathCN` on the other hand corresponds to a noun. ``` In "A positive integer \$n\$", the \$n\$ would be such a `MathCN`. Similarly, in "there is no integer \$1 < m < n\$", the formula also is a `MathCN`. It matches the rule: it opens the parse tree with `open`. If this does not work, you can manually specify a command to open the graphs. E.g. to open it with the pdf viewer `zathura`, you need to call ``` ``` fexpr_fbinrel_fcid_fbinrel_fexpr_to_mathcn : -- introduces one identifier (like in \$1 < n < 10\$) parse -lang=Eng "a unital quasigroup is called a loop" | vt -view=zathura FExpression -> FBinRelation -> FComplexIdentifier -> FBinRelation -> FExpression -> MathCN; ``` ``` The `MathCN` concept is not good and we need to do more work on declarations (). Note that a png file is generated in the case of a single parse tree (which might require a different viewer). Language/English parsing Preprocessing === === The central categories on the language side are: `Statement`, `Definition`, `MObj`. Consider the following example sentence: A `Statement` can be a formula (as described above) or a language statement. An `MObj` is a "mathematical object", such as an "integer". Often, identifiers are introduced in an apposition (as in "a positive integer \$n\$"). This can be done using the following rule: ``` ``` appo_mobj : MObj -> MathCN -> MObj; A positive integer \$n\$ is called prime, iff there is no integer \$1 < m < n\$ such that \$m \mid n\$. ``` ``` As mentioned above, more work is needed on `MathCN`, but also `MObj` and declarations in general (). This sentence (written in LaTeX) gets transformed into an html representation using [LaTeXML](https://github.com/brucemiller/LaTeXML). We use our [LLaMaPUn library](https://github.com/kwarc/llamapun) to process this html representation and generate a (space-separated) token stream. The formula representation is based directly on the corresponding presentation-MathML. For the example sentence, this gives us the following token stream: ``` a positive integer \$ mi( n ) \$ is called prime , iff there is no integer \$ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) \$ such that \$ mrow( mi( m ) mo( | ) mi( n ) ) \$ ``` The LLaMaPUn library allows us to map offsets in this string back to the nodes of the HTML representation. Lexica Lexica ... ...
 i gf/MNlpEng.gf -- import grammars. For the German grammar, linking can take quite a while. i gf/MNlpLog.gf i src/MgfEng.gf parse "a positive integer \$ mi( n ) \$ is called prime iff there is no integer \$ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) \$ such that \$ mrow( mi( m ) mo( | ) mi( n ) ) \$" | l -lang=Log i src/MgfGer.gf parse "a set is called empty iff it is the empty set" i src/MgfLog.gf parse "an alphabet \$ mi( A ) \$ is a finite set" -- translate prime definition to German p -lang=Eng "a positive integer \$ mi( n ) \$ is called prime , iff there is no integer \$ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) \$ such that \$ mrow( mi( m ) mo( | ) mi( n ) ) \$" | l -lang=Ger -- translate prime definition to logical representation p -lang=Eng "a positive integer \$ mi( n ) \$ is called prime , iff there is no integer \$ mrow( mn( 1 ) mo( < ) mi( m ) mo( < ) mi( n ) ) \$ such that \$ mrow( mi( m ) mo( | ) mi( n ) ) \$" | l -lang=Log -- parse tree of loop definition (pipe it into vp for visualization) p -lang=Eng "a unital quasigroup is called a loop" -- translate a statement from German to English p -lang=Ger "es gibt eine gerade ganze Zahl" | l -lang=Eng
 concrete MNlpEng of MNlp = DLexiconEng, DGrammarEng, FGrammarEng, FLexiconEng ** { }
 concrete MNlpGer of MNlp = DLexiconGer, DGrammarGer, FGrammarGer, FLexiconGer ** { }
 concrete MNlpLog of MNlp = DLexiconLog, DGrammarLog, FGrammarLog, FLexiconLog ** { }
File moved
 ... @@ -40,8 +40,8 @@ concrete DGrammarEng of DGrammar = MCatsEng, GrammarEng, ExtraEng ** open Syntax ... @@ -40,8 +40,8 @@ concrete DGrammarEng of DGrammar = MCatsEng, GrammarEng, ExtraEng ** open Syntax iff_definition defi condition = lin S { s = defi.s ++ ("," | "") ++ "iff" ++ condition.s }; -- better solution for this? iff_definition defi condition = lin S { s = defi.s ++ ("," | "") ++ "iff" ++ condition.s }; -- better solution for this? -- def_defmobj_defmobj a b = mkCl a b; -- def_defmobj_defmobj a b = mkCl a b; def_mobj_is_mobj definiendum id definiens = def_mobj_is_mobj definiendum id definiens = mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop -- mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) (mobjToCN definiendum)) -- a unitary quasigroup is a loop | mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop mkS (mkCl (DetCN aSg_Det (mobjToCN definiens id)) -- a unitary quasigroup is called a loop (PastPartAP (mkVPSlash call_V3 (DetCN aSg_Det (mobjToCN definiendum))))); (PastPartAP (mkVPSlash call_V3 (DetCN aSg_Det (mobjToCN definiendum))))); -- mkS (mkCl (DetCN {- mkNP is for some reason ambiguous -} -- mkS (mkCl (DetCN {- mkNP is for some reason ambiguous -} -- aSg_Det (appo_mobj definiendum representative)) definiens); -- aSg_Det (appo_mobj definiendum representative)) definiens); ... ...
File moved
File moved
File moved
 ... @@ -4,7 +4,7 @@ concrete DLexiconGer of DLexicon = MCatsGer ** open SyntaxGer, ParadigmsGer, Res ... @@ -4,7 +4,7 @@ concrete DLexiconGer of DLexicon = MCatsGer ** open SyntaxGer, ParadigmsGer, Res -- set_MObj = mkCN (mkN "set"); -- set_MObj = mkCN (mkN "set"); -- alphabet_MObj = mkCN (mkN "alphabet"); -- alphabet_MObj = mkCN (mkN "alphabet"); quasigroup_MObj = mkMObj (ParadigmsGer.mkN "Quasigruppe"); quasigroup_MObj = mkMObj (ParadigmsGer.mkN "Quasigruppe"); loop_MObj = mkMObj (ParadigmsGer.mkN "Loop"); loop_MObj = mkMObj (ParadigmsGer.mkN "Loop" Fem); integer_MObj = mkMObj (ParadigmsGer.mkN "ganze Zahl" Fem); integer_MObj = mkMObj (ParadigmsGer.mkN "ganze Zahl" Fem); finite_MObjProp = mkAP (ParadigmsGer.mkA "endlich"); finite_MObjProp = mkAP (ParadigmsGer.mkA "endlich"); ... ...
File moved
File moved
File moved
File moved
File moved
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!