Skip to content
Snippets Groups Projects
Commit e1aaa761 authored by m-iancu's avatar m-iancu
Browse files

moved omdoc-semantics paper from svn

parent a5b4ec8c
Branches
No related tags found
No related merge requests found
Showing
with 56333 additions and 0 deletions
omdoc-semantics/macros
omdoc-semantics/tikz
\ No newline at end of file
\svnInfo $Id: activdocs.tex 166 2013-11-29 19:08:20Z kohlhase $
\svnKeyword $HeadURL: https://svn.kwarc.info/repos/miancu/omdoc-semantics/activdocs.tex $
In the \emph{active documents paradigm} \cite{adp}, we postulated that documents naturally
come in cross-referenced document and content structures, where both sides combine to make
the documents active.Therefore, we envision documents stored in a \emph{document commons}
referring to each-other and to formal definitions stored in a \emph{content
commons}. Then, using a \emph{document player} both humans and machines can actively
interact with the documents.
\begin{figure}
\begin{center}
\includegraphics[scale=0.6]{img/adp.png}
\end{center}
\caption{Active Documents Paradigm}
\end{figure}
The \emph{virtual documents} that arise from the narration, content and cross-references
are what we call flexiformal documents. In the active documents paradigm the document
player is responsible for constructing, on demand, flexiformal documents from the document
and content commons as well as constructing or updating the document and content commons
from flexiformal documents.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
% LocalWords: activdocs.tex emph adp includegraphics img
\svnInfo $Id: conc.tex 196 2014-07-31 13:28:23Z miancu $
\svnKeyword $HeadURL: https://svn.kwarc.info/repos/miancu/omdoc-semantics/conc.tex $
In this article we have identified requirements for target representation formats for
semantics extraction from mathematical languages: We carefully identified a set of
challenging linguistic/semantic phenomena and discuss how they can be modeled in the
\omdoc format. We show that the three-level content markup architecture fits well with the
distribution of linguistic phenomena on the phrase, discourse, and context levels. In
related study~\cite{Kohlhase:dmesmgm14}, we have shown that \omdoc/\mmt can be used as a
data model for lexical resources for multilingual mathematical terminologies; there the
theory graph model was used to account for terminological relations (hypernymy, metonymy,
antonymy, etc.).
From this exercise, we can conclude that the (flat sets of) \emph{first-order formulae},
which have been the ``default'' target format for both the foundations of mathematics and
semantics construction in computational linguistics are \emph{not an adequate target
language for formalization/semantics extraction}. In summary, we notice that we need
\begin{quote}\sf
\begin{compactenum}[\em i\/\rm:]
\item\label{req:ref} a referential theory of meaning (by pointing to symbol definitions)
\item\label{req:three} three levels of representation (objects/statements/theories)
\item\label{req:parallel} parallel markup (mix formal/informal recursively at any level)
\item\label{req:opaque} flexiformality: to allow opaque content (presentation/natural
language)
\item\label{req:pluralism} pluralism at all levels (object/logic/foundation/meta-logic)
\item\label{req:underspec} underspecification of symbol meaning
\end{compactenum}
\end{quote}
The work in \omdoc is a first step into the direction of designing a target format for the
meaning of mathematical documents and knowledge in the large, but further research and
format design efforts have to be undertaken for the last two aspects. We are
currently working on an improved language design that takes conditions \emph{i} to
\emph{vi} above as design goals and integrates \mmt into \omdoc as a logical core. Note
that conditions \emph{v} and \emph{vi} are inter-dependent, pluralism allows to use
logical systems that integrate methods for underspecification, and we can see pluralism at
a step towards underspecification at the logical level: In the semantics of natural
language, we should not have to commit to particular details of the logical foundation; we
are just interested in ``using'' the logic for representation.
Finally, we observe that this article concentrates on establishing (criteria for) a target
representation format for the semantics of CML and presents \omdoc as a relatively
complete first approximation. The question of how to construct adequate \omdoc
representations from mathematical documents is still largely unsolved.
\paragraph{Acknowledgements} The work in this article has been partially supported by the
Leibniz association under grant SAW-2012-FIZ\_KA-2 and the German Research Foundation
(DFG) under grants KO 2428/9-1 and KO 2428/13-1. The authors gratefully acknowledge the
discussions with Deyan Ginev, Andrea Kohlhase, Manfred Pinkal, Florian Rabe, and Magdalena
Wolska which led to a much better understanding of the concepts presented in this article.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
% LocalWords: conc.tex ednote todolist formalization oldpart compactitem omdoc rm mmt
% LocalWords: miancu emph compactenum CoquandHuet tcoc88 BertotCasteran Aarne underspec
% LocalWords: coersions Ranta's Ranta ttilm94 Ranta Pinkal Rabe Wolska dmesmgm14
%% examples.sty 2.0 1994/02/28
%% For use with LaTeX Version 2.09 dated 13 jun 1989
%% compatible as package with LaTeX2e
%% (tested on the second beta-release: febr. 1993 )
%%
%% ==========================================================================
%%
%% Copyright (C) 1994 Martin van den Berg
%%
%% Author : M. H. v.d. Berg (vdberg@alf.let.uva.nl)
%% ILLC/Department of Computational Linguistics
%% Faculty of Arts
%% University of Amsterdam
%%
%% Date : 28 febr 1994
%% Version: 2.0
%%
%% (== Original 1.0 version: Author: M. H. v.d. Berg, Date: 24 march 1990 ==)
%%
%% This package is distributed in the hope that it will be useful,
%% but WITHOUT ANY WARRANTY; without even the implied warranty of
%% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
%%
%% This is FREEWARE.
%% You are free to give this file to other users.
%% You are NOT ALLOWED to change it.
%% You are allowed to make a copy under another name and change the copy,
%% provided you mention the original copyright owner
%% and add an explanation of your changes.
%% You are NOT ALLOWED to take money for the distribution or use of either
%% this file or a changed version, except for a nominal charge for copying
%% etc.
%%
\def\filedate{1994/02/28}
\def\fileversion{2.0}
\def\filename{example.sty}
%
%
% M 1 A 2 R 3 T 4 I 5 N 6 B 7
%234567890123456789012345678901234567890123456789012345678901234567890123456789
%
% following LaTeX2e recommendation all internal commands are preceded by `EX@'.
%
% exit when this file was already visited:
\@ifundefined{EX@genericex@mple}{}{\endinput}
%
% to get compatibility with NTG-styles
% we use their vskip-names to get hindenting the same everywhere.
% If it is already defined, we won't change it:
%
\@ifundefined{unitindent}%
{\newdimen\unitindent\settowidth{\unitindent}{(99)\space a.\space }}%
{}%
%
% \ef is the font used for examples:
%
\@ifundefined{ef}{\let\ef\it}{}
%
% Two ways to typeset the three kind of different labels are predefined
%
\newcommand{\defaultalphex}{%
\def\examplemakelabel{\rm(\theequation)\hfill}
\def\alphexmakelabel{\rm\thealphex.\hfill}
\def\nalphexmakelabel{\rm\thealphex.\hfill}
}
\newcommand{\bracketedalphex}{%
\def\examplemakelabel{\rm(\theequation)\hfill}
\def\alphexmakelabel{\rm(\thealphex)\hfill}
\def\nalphexmakelabel{\rm(\thealphex)\hfill}
}
\defaultalphex
%
% Four ways of vskipping
% we use def's rather than skip registers to sync them with their values.
% \examplesep \alphexsep \alphexlinesep
%
\newcommand{\examplesep}{\abovedisplayskip}
\newcommand{\defaultalphskip}{\def\alphexsep{\jot}\def\alphexlinesep{\jot}}
\newcommand{\widealphskip}
{\def\alphexsep{\abovedisplayskip}\def\alphexlinesep{\abovedisplayskip}}
\newcommand{\narrowalphskip}{\def\alphexsep{\jot}\def\alphexlinesep{0pt}}
\defaultalphskip
%
% hskipping is also regulated using def's for similar reasons.
%
\newcommand{\alphexskip}{.5\unitindent}
\newcommand{\exampleskip}{\unitindent}
\newcommand{\emptyexskip}{0pt}
\newcommand{\nalphexskip}{\nalphexskipreg}\newskip\nalphexskipreg
% skip is defined dynamically in [c]nalphex[*] ^
%
% the alphabetical counter
%
\newcounter{alphex}\setcounter{alphex}{0}\def\thealphex{\alph{alphex}}
%
% ==========================================================================
%% P A R T 1, environments: example[*] and [c][n]alphex[*]
% ======= = = = ====== =
% ==========================================================================
%
% all environments for examples use a generic style.
% This takes the following arguments:
% 1: the label format that is used
% 2: the leftmargin and labelwidth when environment is not embedded
% 3: the leftmargin when environment is embedded
% 4: the name of the counter (ignored when empty)
% 5: when not empty: environment resets counter, when empty; no reset.
% the actual value is ignored
% 6: labelwidth for embedded environment, when empty, this is leftmargin.
%
% the default is as follows
% A is \exampleskip, def'd to the distance used for the example labels
% B is \nalphexskip, def'd to the distance calculated for the [c]nalpex labels
% C is \alphexskip, def'd to the distance used for the inner [c]alpex labels
%
% when used as outer environment
% |------A----->
% (1) This is how the line looks for `example'
% a. This is how the line looks for `[c][n]alphex'
% This is how the line looks for `example*'
% This is how the line looks for `[c][n]alphex*'
%
% when used as inner environment
% <------A-----|
% (1)<----B----|This is how the line looks for `example'
% a. This is how the line looks for `[c]nalphex'
% a. This is how the line looks for `[c]alphex'
% |--C-->
% This is how the line looks for `example*'
% This is how the line looks for `[c]nalphex*'
% This is how the line looks for `[c]alphex*'
%
% locally, numbering can be by disabled using the command
% \nonumber as the first token on the line.
% =========
%
%% \begin{example}: numbered examples.
% ===============
\newenvironment{example}{\begin{EX@genericex@mple}%
{\examplemakelabel}{\exampleskip}{\emptyexskip}{equation}{}{\exampleskip}}%
{\end{EX@genericex@mple}\global\@ignoretrue}%
%
%% \begin{example*}: not numbered examples.
% ================
\newenvironment{example*}{\begin{EX@genericex@mple}%
{\hfill}{\exampleskip}{\emptyexskip}{}{}{\exampleskip}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{alphex}: numbered examples.
% ===============
\newenvironment{alphex}{\begin{EX@genericex@mple}%
{\alphexmakelabel}{\exampleskip}{\alphexskip}{alphex}{new series}{}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{calphex}: numbered examples, continuing from previous.
% ===============
\newenvironment{calphex}{\begin{EX@genericex@mple}%
{\alphexmakelabel}{\exampleskip}{\alphexskip}{alphex}{}{}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{alphex*}: non numbered examples.
% ===============
\newenvironment{alphex*}{\begin{EX@genericex@mple}%
{\hfill}{\exampleskip}{\alphexskip}{}{new series}{}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{nalphex}: as alphex, but left-indent rather than right.
% ===============
\newenvironment{nalphex}{{\settowidth{\skip0}{\examplemakelabel}%
\def\@temp{\skip1=}\expandafter\@temp\exampleskip%
\addtolength{\skip1}{-\skip0}\global\nalphexskipreg\skip1}%
\begin{EX@genericex@mple}{\nalphexmakelabel}{\exampleskip}{\emptyexskip}%
{alphex}{new series}{\nalphexskip}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{cnalphex}: as calphex, but left-indent rather than right.
% ===============
\newenvironment{cnalphex}{{\settowidth{\skip0}{\examplemakelabel}%
\def\@temp{\skip1=}\expandafter\@temp\exampleskip%
\addtolength{\skip1}{-\skip0}\global\nalphexskipreg\skip1}%
\begin{EX@genericex@mple}{\nalphexmakelabel}{\exampleskip}{\emptyexskip}%
{alphex}{}{\nalphexskip}}%
{\end{EX@genericex@mple}\global\@ignoretrue}
%
%% \begin{nalphex*}: is of course the same as nalphex, but no label.
% ===============
%
% the following are environments that exist `logically'
% but are equivalent to one of the others.
%
{\catcode`\*=11
\global\let\nalphex*\example*\global\let\endnalphex*\endexample*
\global\let\calphex*\alphex*\global\let\endcalphex*\endalphex*
\global\let\cnalphex*\nalphex*\global\let\endcnalphex*\endnalphex*
\global\let\ncalphex*\cnalphex*\global\let\endncalphex*\endcnalphex*
\global\let\ncalphex\cnalphex\global\let\endncalphex\endcnalphex}
%
%% private commands
%
\newif\ifEX@outerex\EX@outerextrue
\def\EX@ifempty#1#2#3{\edef\@temp{#1}\ifx\@temp\@empty#2\else#3\fi}
%
%
\newenvironment{EX@genericex@mple}[6]%
{\ef\def\\{\EX@xstarbr}%
\begin{list}{{#1}}%
{\def\@empty{}% defined by LaTeX, but you never know...
\ifEX@outerex % == outer ==
\topsep\examplesep \itemsep\alphexsep
\leftmargin#2 \labelwidth\leftmargin
\else % == inner ==
\topsep\alphexsep \itemsep\alphexlinesep
\leftmargin#3 \EX@ifempty{#6}{\labelwidth\leftmargin}{\labelwidth#6}
\fi % == ===== ==
\partopsep0pt \parsep0pt \labelsep=0pt%
\EX@ifempty{#4}%
{\@nmbrlistfalse}%
{\EX@ifempty{#5}{\@nmbrlisttrue\def\@listctr{#4}}% equals \usecounter
{\usecounter{#4}}}% \& no counter reset
\EX@outerexfalse% From now on we are in inner environment
}%
\EX@nonumitem}%
{\end{list}\global\@ignoretrue}
%
% The \\*[] cascade to allow all the usual options; in the usual way.
% (first tests for *, then for [...])
%
\@ifundefined{@eqpen}{\newcount\@eqpen}{}%again, LaTeX should define \@eqpen.
\newcommand{\EX@xstarbr}{\unskip\@ifstar%
{\global\@eqpen\@M\EX@ystarbr}%
{\global\@eqpen\interdisplaylinepenalty \EX@ystarbr}}
\newcommand{\EX@ystarbr}{\@ifnextchar [{\EX@zstarbr}{\EX@zstarbr[\z@]}}
\def\EX@zstarbr[#1]{\penalty\@eqpen\vskip#1\EX@nonumitem}
%
% then we define an \item that checks whether the next command is \nonumber,
% and if so suppress numbering. It isn't everything, but it works.
%
\def\EX@nonumitem{\expandafter\EX@uxnonumitem\EX@ident}% gobble all spaces
\def\EX@uxnonumitem{\bgroup\def\nonumber{\nonumber}%
\def\@tempb{\nonumber}\futurelet\@tempa\EX@nextn@n}%
\def\EX@nextn@n{\ifx\@tempa\@tempb\gdef\@tempa{\item[]}%
\else\gdef\@tempa{\item\relax}\fi\egroup\@tempa}
\long\def\EX@ident#1{#1}% as \@iden in \LaTeX but \long, this doesn't help,
% but at least suppresses an confusing error message.
%
% ==========================================================================
%% P A R T 2, alphabetical labels with primes
%
% ==========================================================================
%
% print counter as letter a--z with optional primes
% called with argument number n,
% the letter is 'a' + ((n - 1) mod 26), with ((n - 1) div 26) primes.
%
% \alph, \Alph
% This redefines \alph (\Alph) to give primes when counter >26
% the character printed for n is the one the original \alph (\Alph) would
% print for ((n - 1) mod 26 + 1),
% the number of primes is the result ((n - 1) div 26) of the division.
%
% three commands relate to the alphex counter, one to the example counter.
% \addprime adds 26 to the counter, which leaves the character alone,
% ========= but prints one extra prime.
% \subtrprime subtracts 26 from the counter,
% =========== which leaves the character alone, but prints one less prime.
% calling \subtrprime for the counter < 26 is not an error, but has no effect.
%
% \lastalph returns he last value of the [c]alphex counter,
% =========
% \lastref returns the last value of the counter used by example,
% ======== equation and eqnarray.
% use: (\lastref a--\lastalph) would result in something like (73a--e).
%
% \lastalph and \lastref the latter two have an optional argument
% that is first subtracted from the counter.
%
%
\newcommand{\addprime}{\global\advance \c@alphex by 26\relax}
\newcommand{\subtrprime}
{\ifnum\c@alphex>26\global\advance \c@alphex by -26\relax\fi}
\newcommand{\resetalphex}
{\@ifnextchar[{\EX@reset@lphex}{\global\c@alphex=0\relax}}
%
\newcommand\oldstylealph{\let\alph\@@alph\let\Alph\@@Alph}
\newcommand\newstylealph{\let\alph\EX@alph@i\let\Alph\EX@Alph@i}
%
\newcommand{\lastref}{\@ifnextchar[{\EX@lastref}{\arabic{equation}}}
\newcommand{\lastalph}{\@ifnextchar[{\EX@lastalph}{\alph{alphex}}}
%
%% private commands
%
\newcount\c@EX@aa@cnt\c@EX@aa@cnt=0
%
\def\EX@lastref[#1]{\c@EX@aa@cnt=#1\c@EX@aa@cnt=-\c@EX@aa@cnt
\advance \c@EX@aa@cnt by \c@equation \arabic{EX@aa@cnt}}
\def\EX@lastalph[#1]{\c@EX@aa@cnt=#1\c@EX@aa@cnt=-\c@EX@aa@cnt
\advance \c@EX@aa@cnt by \c@alphex \alph{EX@aa@cnt}}
%
\newcommand{\EX@alph@i}[1]{\EX@alph{\@nameuse{c@#1}}}
\newcommand{\EX@Alph@i}[1]{\EX@Alph{\@nameuse{c@#1}}}
\let\@@alph\alph\let\@@Alph\Alph
\newstylealph
%
%
\def\EX@reset@lphex[#1]{\global\c@alphex=#1\relax}
%
\newcommand{\EX@alph}[1]%
{\ifcase#1\or a\or b\or c\or d\else\EX@ialph{#1}\fi}
\newcommand{\EX@ialph}[1]%
{\ifcase #1\or \or \or \or \or e\or f\or g\or h\or i\or j\or
k\or l\or m\or n\or o\or p\or q\or r\or s\or t\or u\or v\or
w\or x\or y\or z\else\EX@acc{#1}{}\EX@alph\fi
}
%
\newcommand{\EX@Alph}[1]%
{\ifcase#1\or A\or B\or C\or D\else\EX@iAlph{#1}\fi}
\newcommand{\EX@iAlph}[1]%
{\ifcase #1\or \or \or \or \or E\or F\or G\or H\or I\or J\or
K\or L\or M\or N\or O\or P\or Q\or R\or S\or T\or U\or V\or
W\or X\or Y\or Z\else\EX@acc{#1}{}\EX@Alph\fi
}
%
\newcommand{\EX@acc}[3]%
{\ifnum#1<27%
{\setbox0\hbox{#3{#1}}\hbox to \wd0{$\hbox{\box0}#2$\hss}}%
\else
\c@EX@aa@cnt=#1\advance \c@EX@aa@cnt by -26\relax
\EX@acc\c@EX@aa@cnt{#2'}{#3}%
\fi
}
\endinput
1234567890123456789012345678901234567890123456789012345678901234567890123456789
1 2 3 4 5 6 7
omdoc-semantics/img/adp.png

31.9 KiB

#FIG 3.2 Produced by xfig version 3.2.5b
Landscape
Center
Metric
A4
100.00
Single
-2
1200 2
0 32 #f7df00
0 33 #f7df00
5 1 0 2 0 7 100 0 -1 0.000 0 0 0 1 6322.500 5122.500 4650 2250 6225 1800 8070 2295
2 1 2.00 120.00 120.00
5 1 0 2 0 7 100 0 -1 0.000 0 0 0 1 6063.000 5085.000 3030 5715 3030 4455 3480 3375
2 1 2.00 120.00 120.00
5 1 0 2 0 7 100 0 -1 0.000 0 1 1 0 6386.662 5038.569 3525 6975 4290 7785 5190 8280
2 1 2.00 120.00 120.00
5 1 0 2 0 7 100 0 -1 0.000 0 0 0 1 6535.000 5393.750 9015 7110 8430 7740 7665 8190
2 1 2.00 120.00 120.00
5 1 0 2 0 7 100 0 -1 0.000 0 0 0 1 6456.136 5125.227 9195 3420 9645 4635 9600 5850
2 1 2.00 120.00 120.00
1 1 0 2 0 7 100 0 20 0.000 1 0.0000 3210 6345 1350 630 3210 6345 4560 6345
1 1 0 2 0 7 100 0 20 0.000 1 0.0000 8745 2835 1350 630 8745 2835 10095 2835
1 1 0 2 0 7 100 0 20 0.000 1 0.0000 9375 6480 1350 630 9375 6480 10725 6480
1 1 0 2 0 7 100 0 20 0.000 1 0.0000 6450 8460 1350 630 6450 8460 7800 8460
1 1 0 2 0 7 100 0 20 0.000 1 0.0000 3930 2790 1350 630 3930 2790 5280 2790
3 2 0 10 32 7 101 0 -1 0.000 0 1 0 14
2 1 10.00 300.00 300.00
5730 5670 5505 6075 6405 6795 7710 5940 7755 3465 6000 2520
2805 3510 2445 6075 4155 8100 8205 8595 10365 6705 10680 2475
8430 810 5010 540
0.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
-1.000 -1.000 -1.000 -1.000 -1.000 0.000
3 2 0 10 32 7 101 0 -1 0.000 0 1 0 12
2 1 10.00 300.00 300.00
6990 5670 7530 3870 5550 2925 2940 4725 4335 7425 7890 7965
10050 5085 9375 1845 4695 990 1095 2340 510 6120 1050 8550
0.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
-1.000 -1.000 -1.000 0.000
3 2 0 10 32 7 101 0 -1 0.000 0 1 0 16
2 1 10.00 300.00 300.00
5955 4455 5235 4455 4695 5220 5730 6930 7710 6615 8340 5130
8475 3330 6585 1890 3885 2295 1860 4095 1770 6480 3210 8280
7215 9315 10185 8370 11625 5670 12300 1530
0.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
-1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.000
3 2 0 10 31 7 101 0 -1 0.000 0 1 0 16
2 1 10.00 300.00 300.00
6855 4500 7215 3915 5730 3465 3930 4680 4200 6255 6000 7515
8070 7155 9195 4860 8610 2250 5325 1350 2040 2340 1005 4590
1770 8190 5010 9675 8790 9585 10995 9135
0.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000
-1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.000
4 0 18 99 0 18 28 0.0000 4 465 2115 2985 2700 Compute/\001
4 0 18 99 0 18 26 0.0000 4 420 2310 2985 3150 Experiment\001
4 0 18 99 0 18 28 0.0000 4 465 1830 8475 6390 Specify/\001
4 0 18 99 0 18 26 0.0000 4 330 1980 8475 6840 Formalize\001
4 0 18 99 0 18 36 0.0000 4 450 1665 8025 3015 Prove\001
4 0 18 99 0 18 36 0.0000 4 450 2550 2175 6570 Visualize\001
4 0 18 99 0 18 30 0.0000 4 480 2565 5280 8595 Conjecture\001
4 1 4 100 0 23 24 0.0000 4 165 540 6300 5670 Spiral\001
4 0 14 50 0 19 48 0.0000 4 600 2010 0 9090 Com-\001
4 0 14 50 0 19 48 0.0000 4 600 4170 45 9765 munication\001
4 0 14 50 0 19 48 0.0000 4 600 4215 720 720 Publication\001
4 0 14 50 0 19 48 0.0000 4 780 3435 10665 8775 Teaching\001
4 0 14 50 0 19 48 0.0000 4 780 4260 9180 810 Application\001
4 1 4 100 0 23 24 0.0000 4 135 270 6390 4590 The\001
4 1 4 100 0 23 24 0.0000 4 165 900 6300 5085 Creativity\001
File added
This diff is collapsed.
\svnInfo $Id: intro.tex 195 2014-07-31 13:11:55Z miancu $
\svnKeyword $HeadURL: https://svn.kwarc.info/repos/miancu/omdoc-semantics/intro.tex $
Mathematical knowledge is rich in content, sophisticated in structure and technical in
presentation. Its conservation, dissemination, and utilization constitutes a challenge
for the community and an attractive area of inquiry. Moreover, mathematics plays a
fundamental role in Science, Technology, and Engineering (STEM). Therefore, the
methodology for representing mathematical documents may be applicable, more generally, to
STEM documents: Effectively, we view mathematics as a test tube for STEM.
\begin{wrapfigure}r{7cm}\vspace*{-1em}
\includegraphics[width=7cm]{img/creativity-spiral}
\caption{The mathematical Creativity Spiral}\label{fig:buchberger}
\end{wrapfigure}
How mathematics will look in the $21^{st}$ century remains an open question, but we
conjecture that it will become a computer-supported activity, whose major components --
visualized in the form of a closed loop driving a creativity spiral by Bruno Buchberger in
1995 -- are supported by systems that synergize by exchanging representations of
mathematical knowledge.
Currently, most mathematical knowledge is laid down in the form of documents ranging from
research articles over engineering whitepapers to math textbooks. Even though these are
often encoded electronically -- as digitizations of printed material, or born-digital in
the form of PDF generated from {\LaTeX} (predominantly in math research), but also in
other formats (in education and engineering) -- they are usually only consumed by human
readers. If we want to raise this treasure chest of knowledge and use it to support
``doing mathematics'' by computer (see Figure~\ref{fig:buchberger}), we have to recover
the knowledge in these documents in a form that computers can act on. This
knowledge-recovery process is essentially the syntactic/semantic analysis phase of natural
language processing instantiated to mathematical documents. This analysis process is
traditionally viewed as a transformation from utterances/documents into a knowledge
representation, traditionally some logical system that allows to model inference
processes, e.g. first-order logic.
In this paper we focus, not on the translation process from mathematical natural language
into knowledge representations, but on the target format, namely the
representations of the resulting knowledge. At the first glance, first-order logic seems a
plausible target representation format for mathematical knowledge, after all, it is
(together with axiomatic set theory~\cite{Bernays:ast91}) the generally accepted
theoretical foundation of mathematics.
However, it is important to note that the transformation process and target format
mutually constrain each other. For instance, if we use first-order logic as a target
format, then we cannot (realistically) have a compositional treatment of verb phrases and
determiners. Richard Montague solved this problem by extending the target format to a
higher-order logic based on $\lambda$-calculus~\cite{Montague:tptoqi73}. At the
inter-sentential level, we see that anaphoric references cannot be treated properly in
static logics (like first-order and higher-order logic), which has led to the adoption of
dynamic logics in computational linguistics. Stepping up one more
level, we need still other representational devices -- e.g. discourse relations -- to
account for text/document/collection coherence (see e.g.~\cite[chapter
21]{JurMar:salp09}). Note that the levels of representation interact with each other, for
instance it is important to know in which dialogue turn we are to bind references or even
disambiguate the word meanings. Therefore, the semantics construction and analysis
algorithms profit from integrated knowledge representation across the levels.
But all of the above concentrate on ``general natural language'', which in practice almost
exclusively means non-STEM documents. Our experience with STEM documents and mathematical
ones in particular have shown that while they retain most of the linguistic complexities
(see also~\cite{Baur:susmt99,Wolska:PHD}), mathematical documents have their very own
peculiarities that mean traditional natural language processing tools are not directly
applicable (see~\cite{Wolska:bpacspm12}). On the other hand, mathematical documents are
(often) more rigorous and have more explicit representations of references and discourse
structure. Therefore it makes sense to develop representation formats that take these into
account. These formats can then act as targets for semantic analysis processes and in that
guide the development of algorithms; to facilitate this is the aim our article.
Specifically, we derive a set of requirements (Section~\ref{sec:req}) that a
representation format for STEM documents needs to satisfy from a set of phenomena peculiar
to (natural) mathematical language (Section~\ref{sec:pheno}; building on and
extending~\cite{Wolska:PHD}). In Section~\ref{sec:omdoc} we propose the \omdoc format as
one way to satisfy (many of) these and relate it to other approaches in
Section~\ref{sec:relwork}. Section~\ref{sec:conc} concludes the paper.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
% LocalWords: utilization wrapfigure vspace includegraphics buchberger ednote
% LocalWords: visualized synergize pheno omdoc omdoc relwork conc Bernays img
% LocalWords: digitizations tptoqi73 KamRey fdtl93 JurMar salp09 susmt99 st
% LocalWords: bpacspm12 whitepapers ast91 Baur Wolska miancu
This diff is collapsed.
% add bibtex entries here
%%%% MMT/LF macros %%%%
\newcommand{\scoreplus}[1]{#1:+}
\newcommand{\scorezero}[1]{#1:0}
\newcommand{\scoreminus}[1]{#1:-}
%%%%%%%%
\newcommand{\theory}{\mathbf{theory}}
\newcommand{\view}{\mathbf{view}}
\newcommand{\meta}{\mathbf{meta}}
\renewcommand{\include}{\mathbf{include}}
\newcommand{\type}{\mathtt{type}}
\newcommand{\lfarr}{\arr}
\newcommand{\tp}{\mathit{tp}}
\newcommand{\tm}{\mathit{tm}}
\newcommand{\nat}{\mathit{nat}}
\newcommand{\zero}{\mathit{zero}}
\renewcommand{\succ}{\mathit{succ}}
\newcommand{\plus}{\mathit{plus}}
\newcommand{\llist}{\mathit{list}}
\newcommand{\nil}{\mathit{nil}}
\newcommand{\cons}{\mathit{cons}}
\newcommand{\conc}[2]{\mathit{append} #1 #2}
%%%%%% OpenMath Macros %%%%%
\newcommand{\oma}[1]{\mathsf{OMA}(#1)}
\newcommand{\omi}[1]{\mathsf{OMI}(#1)}
\newcommand{\oms}[1]{#1}
\newcommand{\omss}[1]{\mathsf{OMS}(#1)}
\newcommand{\omf}[1]{\mathsf{OMF}(#1)}
\newcommand{\omv}[1]{\mathsf{OMV}(#1)}
\newcommand{\ombind}[3]{\mathsf{OMBIND}(#1,#2,#3)}
\def\om#1{\fbox{\ensuremath{#1}}}
%%%%%%%%%%%%%%%%%%%%%%%%
\newcommand{\uom}{{\scshape{UOM}}\xspace}
\newcommand{\latin}{{\scshape{LATIN}}\xspace}
\newcommand{\omdocv}[1]{{\scshape{OMDoc}}#1\xspace}
\newcommand{\latex}{{\scshape{\LaTeX}}\xspace}
\newcommand{\latexml}{{\scshape{{\LaTeX}ML}}\xspace}
\newcommand{\defemph}[1]{{\bf#1}}
\newcommand{\bnfbr}[1]{{\color{red}(}#1{\color{red})}}
\renewcommand{\arr}{\rightarrow}
\newcommand{\pfarr}{\rightharpoonup}
\newcommand{\impacts}{\looparrowright}
%\newcommand{\depends}{\looparrowleft}
\newcommand{\tgraph}{\mathcal{G}}
\newcommand{\dec}{Dec}
\newcommand{\mdec}{Mod}
\newcommand{\sdec}{Sym}
\newcommand{\sass}{Ass}
\newcommand{\term}{\omega}
\newcommand{\context}{X}
%\newcommand{\tsyn}{\mathit{Syntax}}
\newcommand{\ac}[1]{\mathcal{A}_{#1}}
\newcommand{\vc}[1]{\mathcal{V}_{#1}}
\newcommand{\scc}[1]{\mathcal{S}_{#1}}
\newcommand{\keyw}[1]{\textsf{#1}}
\newcommand{\nonterminal}[1]{\texttt{#1}}
\newcommand{\module}{\nonterminal{Theory}}
\newcommand{\statement}{\nonterminal{Statement}}
\newcommand{\object}{\nonterminal{Type}}
\newcommand{\notation}{\nonterminal{Notation}}
\newcommand{\amk}{\mathtt{C}}
\newcommand{\dmk}{\mathtt{D}}
\newcommand{\qmk}{\mathtt{S}}
\newcommand{\st}[1]{\ov{#1}}
%% this should be the font used in listings
\newcommand{\ttt}[1]{\texttt{#1}}
\newcommand{\sbt}{\hspace{0.2em}\,\begin{picture}(-1,1)(-1,-3)\circle*{3}\end{picture}\hspace{0.2em}\ }
\newcommand{\iden}{\mathrm{Id}}
\newcommand{\idenmod}{M}
\newcommand{\identhy}{T}
\newcommand{\idenview}{v}
\newcommand{\idensym}{C}
\newcommand{\lname}{c}
\newcommand{\idencomp}{o}
\newcommand{\compdef}{\mathtt{def}}
\newcommand{\comptype}{\mathtt{tp}}
\newcommand{\component}[2]{{#2}^{#1}}
% \newcommand{\add}{\mathcal{A}}
% \newcommand{\delete}{\mathcal{D}}
% \newcommand{\rename}{\mathcal{R}}
% \newcommand{\update}{\mathcal{U}}
%
% \newcommand{\boxempty}{\boxed{\cdot}}
% \newcommand{\boxfull}[1]{\boxed{#1}}
%
% \renewcommand{\r}{\mathcal{R}}
% \newcommand{\rc}{\mathcal{RC}}
% \newcommand{\fs}{\mathcal{FS}}
% \renewcommand{\t}{\mathcal{T}}
% \renewcommand{\n}{\mathcal{N}}
% \newcommand{\p}{\mathcal{P}}
% \renewcommand{\d}{\mathcal{D}}
% \newcommand{\f}{\mathcal{F}}
\lstset{mathescape,
numbers=none,
numbersep=2pt, % how far the line-numbers are from the code
numberstyle=\scriptsize\color{gray}, % the style that is used for the line-numbers
basicstyle=\sffamily,
columns=fullflexible,
aboveskip=0pt,belowskip=0pt}
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
\RequirePackage{listings}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Listings
%
% we define our own dialects for omdoc in listings.sty
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\lstdefinelanguage{DTD}%
{morekeywords={ATTLIST,ELEMENT,EMPTY,ANY,ID,%
IDREF,IDREFS,ENTITY,ENTITIES,NMTOKEN,NMTOKENS,NOTATION,%
INCLUDE,IGNORE,SYSTEM,PUBLIC,NDATA,%
PCDATA,REQUIRED,IMPLIED,FIXED},%%% preceded by #
alsoother=$,%
alsoletter=:,%
showstringspaces=false,
sensitive=true}
\lstdefinelanguage{RNC}%
{morekeywords={default,namespace,=,start,attribute,include,element,notallowed},
alsoother=$,%
alsoletter=:,%
showstringspaces=false,
sensitive=true}
\lstdefinelanguage{DublinCore}[]{XML}%
{morekeywords={Contributor,Creator,Title,Subject,Description,Publisher,Type,Format,Source,Language,Relation,Rights,Date,Identifier},
sensitive=true}
\lstdefinelanguage{OpenMath}[]{XML}%
{morekeywords={OMS,OMV,OMI,OMB,OMSTR,OMF,OMA,OMBIND,OMBVAR,OME,OMATTR,OMATP,OMR,OMOBJ,OMFOREIGN},
sensitive=true}
\lstdefinelanguage[om]{OpenMath}[]{XML}{keywordsprefix={om:},sensitive=true}
\lstdefinelanguage{omCD}[]{XML}%
{morekeywords={CD,CDBase,CDName,CDURL,CDDate,CDReviewDate,CDStatus,CDVersion,CDRevision,CDUses,
CDDefinition,Name,Description,Role,CMP,FMP},
sensitive=true}
\lstdefinelanguage{MathML}[]{XML}%
{morekeywords={math,semantics,annotation-xml,annotation,share,
maction,mspace,
msub,msup,msubsup,
mrow,mo,mi,mn,mfrac,mfenced,mtext,
apply,bind,bvar,ci,cn,sep,csymbol,
condition,domainofapplication,lowlimit,uplimit,degree,
interval,inverse,lambda,compose,ident,domain,codomain,image,
piecewise, piece, otherwise,
quotient,factorial,divide,max,min,minus,plus,power,rem,times, root,gcd,lcm,
and,or,xor,not,implies,forall,exists,
abs,conjugate,arg,real,imaginary,floor,ceiling,
sin,cos,tan,sec,csc,cot,sinh,cosh,tanh,sech,csch,coth,
arcsin,arccos,arctan,arccosh,arccot,arccoth,arccsc,arccsch,arcsec,arcsech,arcsinh,arctanh,
eq,neq,gt,lt,geq,leq,equivalent,approx,factorof,
int,diff,partialdiff,divergence,grad,curl,laplacian,
set,list,union,intersect,in,notin,subset,prsubset,notsubset,notprsubset,setdiff,card,cartesianproduct,
sum,product,limit,tendsto,exp,ln,log,mean,sdev,variance,median,mode,moment,momentabout,
vector,matrix,matrixrow,determinant,transpose,selector,vectorproduct,scalarproduct,outerproduct,
integers,reals,rationals,naturalnumbers,complexes,primes,
exponentiale,imaginaryi,notanumber,true,false,emptyset,pi,eulergamma,infinity,
mstack,msgroup,msrow,msline,mscarries,mscarry,mlongdiv,
msqrt,mroot,mstyle,merror,mpadded,mphantom,menclose,
munder,mover,munderover,mmultiscripts,mprescripts,
mtable,mlabeledtr,mtr,mtd,
maligngroup,malignmark,
reln,fn,declare},
sensitive=true}
\lstdefinelanguage[m]{MathML}[]{XML}{keywordsprefix={m:},sensitive=true}
\lstdefinelanguage[codeml]{XML}[]{XML}%
{morekeywords={cpg,cpb,cpo,cpi,cpbr,cptype,cpt,cpd,cpc,cpstyle,apply,bind,bvar,ccv,ccb,ccsym,ccdef,symbol,type,semantics,pcode,rawcode,code},
morekeywords=[2]{id,xlink:xref,xlink:type,xlink:role,open,close,separator,type,xml:lang,name,cd,format,href,breaks},
sensitive=true}
\lstdefinelanguage[1.2]{OMDoc}[]{XML}%
{morekeywords={omdoc,catalogue,loc,ignore,with,ref,omgroup,adt,sortdef,insort,constructor,
recognizer,argument,destructor,morphism,inclusion,theory-inclusion,decomposition,
axiom-inclusion,path-just,obligation,omlet,private,code,input,output,effect,data,
metadata,extradata,proof,proofobject,metacomment,derive,conclude,hypothesis,
method,premise,presentation,use,omstyle,xslt,style,element,attribute,text,
value-of,recurse,exercise,hint,solution,mc,choice,answer,CMP,p,omtext,term,phrase,symbol,
commonname,type,FMP,assumption,conclusion,symbol,axiom,definition,requation,pattern,
value,measure,ordering,assertion,alternative,example,theory,imports,
narrative,tableofcontents,authorindex,content,pcontext,pc,
map,separator,operator,lbrack,rbrack,
OMS,OMV,OMI,OMB,OMSTR,OMF,OMA,OMBIND,OMBVAR,OME,OMATTR,OMATP,OMR,OMOBJ,OMFOREIGN,
om:OMS,om:OMV,om:OMI,om:OMB,om:OMSTR,om:OMF,om:OMA,om:OMBIND,om:OMBVAR,
om:OME,om:OMATTR,om:OMATP,om:OMR,om:OMOBJ,om:OMFOREIGN,
dc:contributor,dc:creator,dc:title,dc:subject,dc:description,dc:publisher,dc:type,
dc:format,dc:source,dc:language,dc:relation,dc:rights,dc:date,dc:identifier,
cc:license,cc:permissions,cc:prohibitions,cc:requirements},
sensitive=true}
\lstdefinelanguage[1.3]{OMDoc}[1.2]{OMDoc}%
{morekeywords={meta,link,resource,notation,prototype,expr,rendering,render}}
\lstdefinelanguage[1.6]{OMDoc}[]{XML}%
{morekeywords={omdoc,catalogue,loc,ignore,with,ref,omgroup,adt,sortdef,insort,constructor,
recognizer,argument,destructor,omlet,private,code,input,output,effect,data,
metadata,extradata,proof,proofobject,metacomment,derive,conclude,hypothesis,
method,premise,presentation,use,omstyle,xslt,style,element,attribute,text,
value-of,recurse,exercise,hint,solution,mc,choice,answer,CMP,p,omtext,term,phrase,object,
commonname,type,FMP,assumption,conclusion,symbol,axiom,definition,requation,pattern,
value,measure,ordering,assertion,alternative,example,theory,imports,
narrative,tableofcontents,authorindex,content,pcontext,pc,
map,separator,operator,lbrack,rbrack,
OMS,OMV,OMI,OMB,OMSTR,OMF,OMA,OMBIND,OMBVAR,OME,OMATTR,OMATP,OMOBJ,OMFOREIGN,
om:OMS,om:OMV,om:OMI,om:OMB,om:OMSTR,om:OMF,om:OMA,om:OMBIND,om:OMBVAR,
om:OME,om:OMATTR,om:OMATP,om:OMR,om:OMOBJ,om:OMFOREIGN,
dc:contributor,dc:creator,dc:title,dc:subject,dc:description,dc:publisher,dc:type,
dc:format,dc:source,dc:language,dc:relation,dc:rights,dc:date,dc:identifier,
cc:license,cc:permissions,cc:prohibitions,cc:requirements,
% now the new stuff
link,meta,resource,
view,structure,maps,metamorphism,constant,conass,definiens,
notation,prototype,expr,rendering,render},
sensitive=true}
\lstdefinelanguage{XMLSchema}[]{XML}{keywordsprefix={xs:},sensitive=true}
\lstdefinelanguage{sTeX}[LaTeX]{TeX}
{morekeywords={module,importmodule,importOMDocmodule,metalanguage,
symdef,defpath,keydef,
assertion,axiom,definition,omtext,definiendum}}
\lstset{language=[1.2]OMDoc,showstringspaces=false,basicstyle=\sf}
% LocalWords: morekeywords ATTLIST IDREF IDREFS NMTOKEN NMTOKENS NDATA PCDATA
% LocalWords: alsoother alsoletter showstringspaces RNC notallowed XSLT xsl ci
% LocalWords: DublinCore OMV OMI OMSTR OMF OMA OMBIND OMBVAR OME OMATTR OMATP
% LocalWords: omCD CDName CDURL CDReviewDate CDStatus CDVersion CDRevision eq
% LocalWords: CDUses CDDefinition forall bvar mrow codeml cpg cpb cpo cpi cpbr
% LocalWords: cptype cpt cpc cpstyle ccv ccb ccsym ccdef pcode rawcode xlink
% LocalWords: lang cd omdoc loc omgroup adt sortdef insort destructor omlet mc
% LocalWords: metadata extradata proofobject metacomment omstyle xslt recurse
% LocalWords: omtext commonname requation XMLSchema keywordsprefix xs xslt sty
% LocalWords: basicstyle DTD OpenMath OMS OMB OMOBJ CMP FMP MathML mo mi xml
% LocalWords: cpd xref href catalogue ref inline namespace tableofcontents pc
% LocalWords: authorindex pcontext lbrack rbrack mn cn sep csymbol lowlimit lt
% LocalWords: domainofapplication uplimit ident gcd lcm arg csc sinh tanh sech
% LocalWords: csch coth arcsin arccos arctan arccosh arccot arccoth arccsc neq
% LocalWords: arccsch arcsec arcsech arcsinh arctanh geq leq factorof notin ln
% LocalWords: partialdiff laplacian prsubset notsubset notprsubset setdiff fn
% LocalWords: cartesianproduct tendsto sdev momentabout matrixrow outerproduct
% LocalWords: vectorproduct scalarproduct naturalnumbers exponentiale emptyset
% LocalWords: imaginaryi notanumber eulergamma reln conass expr maction gt exp
% LocalWords: lstdefinelanguage approx OMFOREIGN importOMDocmodule symdef def
% LocalWords: defpath keydef indextt tt mylstref lst mylstsref mylstlref
% LocalWords: lstdefinestyle normalsize lstinline ttfamily snippetin lstset
% LocalWords: scriptsize mspace
This diff is collapsed.
File added
%% Size : 20 pages
%% Deadline : Sept 30th 2013 (Monday)
\documentclass{article}
\usepackage{a4wide}\pagestyle{plain}
% we use sans-serif for examples
\usepackage{lmodern}
\usepackage{amssymb}
\def\ef{\itshape\sffamily}
\def\nlex#1{``{\ef#1}''}
\usepackage{example} % linguistic examples
\def\tup#1{\lbrack#1\rbrack}
\newcommand\cn[1]{\ensuremath{\mathsf{#1}}}
\makeatletter
\newcounter{phenomenon}
\newcommand\phenomenon[2][]{\smallskip\noindent\textbf{P\stepcounter{phenomenon}\arabic{phenomenon}: #2 }%
\def\@test{#1}\ifx\@test\@empty\else%ate
\expandafter\edef\csname label@#1\endcsname{P\thephenomenon}
\expandafter\edef\csname title@#1\endcsname{#2}
\fi\ignorespaces}
\newcommand\phenref[1]{\textbf{\@ifundefined{label@#1}{\message{undefined phenomenon #1}??}{\@nameuse{label@#1}}}}
\newcommand\phenTref[1]{\@ifundefined{title@#1}{\message{undefined phenomenon title #1}??}{\@nameuse{title@#1}}}
\newcounter{requirement}
\newcommand\requirement[2][]{\smallskip\noindent\textbf{R\stepcounter{requirement}\arabic{requirement}: #2 }%
\def\@test{#1}\ifx\@test\@empty\else%
\expandafter\edef\csname label@#1\endcsname{R\therequirement}\fi
\expandafter\edef\csname title@#1\endcsname{#2}
\ignorespaces}
\newcommand\reqref[1]{\textbf{\@ifundefined{label@#1}{\message{undefined requirement #1}??}{\@nameuse{label@#1}}}}
\newcommand\reqTref[1]{\@ifundefined{title@#1}{\message{undefined requirement title #1}??}{\@nameuse{title@#1}}}
\makeatother
\def\tmeta#1{\ensuremath{\langle\kern-.2em\langle\text{\rm #1}\rangle\kern-.2em\rangle}}
%\usepackage{url}
\usepackage{wrapfig}
\usepackage{amsmath,amssymb}
\usepackage{xspace}
\usepackage{lstomdoc}
\lstset{language=[1.6]OMDoc,basicstyle=\sf}
\usepackage{paralist}
\usepackage{wrapfig}
%\usepackage{xparse} % custom argument syntax
%\usepackage{stmaryrd}
\usepackage{tikz}
\usepackage{tikzinput}
\usepackage{rotating}
\usetikzlibrary{arrows}
\usetikzlibrary{mmt}
%\usepackage{stmaryrd} % semantic brackets
%\usepackage{xkeyval}
%\usepackage{multicol}
%\usepackage{arydshln} % dashed lines in tables
%\usepacakge{tabularx} % for columns that stretch
\usepackage{xspace}
\usepackage[eso-foot,today, final]{svninfo}
\svnInfo $Id: paper.tex 197 2014-09-16 21:00:52Z miancu $
\svnKeyword $HeadURL: https://svn.kwarc.info/repos/miancu/omdoc-semantics/paper.tex $
\setlength{\hfuzz}{3pt} \hbadness=10001
\setcounter{tocdepth}{2} % for pdf bookmarks
\def\bA{\mathbf{A}}
\usepackage[bookmarks,linkcolor=red,citecolor=blue,urlcolor=gray,colorlinks,breaklinks,bookmarksopen,bookmarksnumbered]{hyperref}
\usepackage{multirow}
\usepackage{hhline}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% local macros and configurations
\usepackage[hide]{ed}
\usepackage{macros/basics}
\usepackage{local}
\title{Co-Representing Structure and Meaning of Mathematical Documents}
\author{Michael Kohlhase and Mihnea Iancu\\
Computer Science, Jacobs University Bremen, Germany\\
\url{http://kwarc.info/}}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\begin{document}
\maketitle
\begin{abstract}
The digital revolution for documents in science, technology, engineering, and
mathematics has largely been restricted to simplifying document access for human
readers, but has not fulfilled the promise of computer-supported knowledge
appropriation.
We claim that this is largely a consequence of the lack of explicitly represented
document semantics that computers could act upon. Automated semantics-extraction that
could remedy this shortcoming still has two bottlenecks: the lack of language processing
algorithms and -- less commonly recognized -- that of comprehensive representation
formats that can adequately model the functional structure of documents and that of the
knowledge conveyed by them.
In this paper we analyze phenomena of common mathematical language and derive
requirements for representation formats for mathematical documents. We show how these
requirements can be met by the OMDoc (Open Mathematical Documents) format and compare it
to related approaches.
\end{abstract}
% \tableofcontents
% \newpage
\section{Introduction} \label{sec:intro}
\input{intro}
\section{Phenomena of Mathematical Vernacular} \label{sec:pheno}
\input{phenomena}
\section{Requirements for a target language for semantics construction analysis} \label{sec:req}
\input{requirements}
% \section{Active Documents}\label{sec:activdocs}
% \input{activdocs}
\section{OMDoc}\label{sec:omdoc}
\input{omdoc}
\section{Related Work} \label{sec:relwork}
\input{relwork}
\section{Conclusion}\label{sec:conc}
\input{conc}
\bibliographystyle{alpha}
%\bibliography{macros/bib/rabe,macros/bib/systems,macros/bib/pub_rabe,macros/bib/historical,kbib/kwarc,local}
%\bibliography{preamble,kwarcpubs,extpubs,kwarccrossrefs,extcrossrefs,local}
\bibliography{kwarc,local}
%\bibliography{kbib/preamble,kbib/kwarcpubs,kbib/extpubs,kbib/kwarccrossrefs,kbib/extcrossrefs,local}
\end{document}
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: t
%%% End:
% LocalWords: maketitle ednote tableofcontents newpage pheno activdocs omdoc kwarcpubs
% LocalWords: relwork conc kbib kwarc specializations recognized analyze extpubs
% LocalWords: kwarccrossrefs extcrossrefs
This diff is collapsed.
theory group_face = {
U : type.
o : U -> U -> U.
e : G.
i : G -> G.
/ : G -> G -> G.
o_assoc : _.
e_ax : _.
/_ax1 : _.
/_ax2 : _.
}
theory group_1 = {
G : type.
o : G -> G -> G.
e : G.
inv : G -> G.
assoc : ( a o b) o c = a o (b o c).
e_ax : a o e = a.
inv_ax : a o (inv a) = e.
}
theory slash1 = {
%include group_1.
/ : G -> G -> G = [a][b] a o (inv b).
}
theory group_2 = {
G : type.
/ : G -> G -> G.
/_ax_1 : a / a = b / b.
/_ax_2 : a / (b / b) = a.
/_ax_3 : (a / a) / (b / c) = c / b.
/_ax_4 : (a / c) / (b / c) = a / b.
}
theory circ_i_2 = {
%include group_2.
e : G = a / a.
o : G -> G -> G = [a][b] a / (e / b).
inv : G -> G = [a] e / a.
}
view v2 : group2 -> slash1 {
_.
}
view v1 : group1 -> circ_i_2 = {
_.
}
\svnInfo $Id: relwork.tex 197 2014-09-16 21:00:52Z miancu $
\svnKeyword $HeadURL: https://svn.kwarc.info/repos/miancu/omdoc-semantics/relwork.tex $
We have seen that \omdoc can adequately represent many of the phenomena presented in
Section~\ref{sec:pheno}, let us compare it with others in the space of representation
formats for mathematical knowledge; Figure~\ref{fig:relwork-overview} gives an overview.
\begin{figure}[ht]
\def\pp{\textbf{+}}\def\pm{\textbf{--}}\def\po{0}
\begin{center}
\begin{tabular}{|l|c|c|c|c|c|}\hline\small%|
& \rotatebox{60}{\bf{FOL}} & \rotatebox{60}{\bf{Naproche}} & \rotatebox{60}{\bf{WTT}} & \rotatebox{60}{\bf{MathLang}} & \rotatebox{60}{\bf{\omdoc}} \\\hline\hline
\phenref{formling} \phenTref{formling} & \pp & \pp & \pp & \pp & \pp \\\hline
\phenref{formroles} \phenTref{formroles} & \pp & \pp &\pp & \pp & \pp \\\hline
\phenref{notverb} \phenTref{notverb} & \pm & \pm & \pm & \pp & \pp \\\hline
\phenref{referential} \phenTref{referential} &\pp & \pp &\pp & \pp & \pp \\\hline
\phenref{naming} \phenTref{naming} &\pp & \pp &\pp & \pp & \pp \\\hline
\phenref{formamb} \phenTref{formamb} & \pm & \po &\pp & \pp & \pp \\\hline
\phenref{elision} \phenTref{elision} & \pm & \pm & \pm & \pp & \pp \\\hline
\phenref{statements} \phenTref{statements} & \pm & \pp &\pp & \pp & \pp \\\hline
\phenref{inline} \phenTref{inline} & \pm & \pm & \pm & \pp & \pp \\\hline
\phenref{objcontext} \phenTref{objcontext} & \pm & \pm & \po & \po & \pp \\\hline
\phenref{framing} \phenTref{framing} & \pm & \pm & \pm & \pm & \pp \\\hline
\phenref{recaps} \phenTref{recaps} & \pm & \pm & \pm & \pm & \pp \\\hline\hline
\reqref{flexiform} \reqTref{flexiform} & \pm & \po & \po & \pp & \pp \\\hline
\reqref{pluralism} \reqTref{pluralism} & \pm & \pm & \po & \po & \pp \\\hline
\reqref{underspec} \reqTref{underspec} & \pm & \pm & \pm & \pm & \pm \\\hline
\end{tabular}
\end{center}
\caption{Features of Mathematical Knowledge Representation Languages }\label{fig:relwork-overview}
\end{figure}
\paragraph{First-Order Logic}
On the surface, first-order logic is an appealing target format for translating
mathematical documents since, together with axiomatic set theory \cite{Bernays:ast91}, it
forms the most widely accepted foundation of mathematics. However, due to its first-order
nature and limitations on handling some natural language phenomena, (e.g. anaphora) other
variants, such as higher-order or dynamic logics~\cite{harel84} were identified as more
appealing.
In general, FOL and related logics are good at representing formulae
(\scoreplus{\phenref{formling}}, \scoreplus{\phenref{formroles}}), and allow for named
declarations (\scoreplus{\phenref{naming}}) and references
(\scoreplus{\phenref{referential}}). However, it is not realistically possible to
represent notations and verbalizations (\scoreminus{\phenref{notverb}}) as well as to
handle ambiguity (\scoreminus{\phenref{formamb}}) or elision
(\scoreminus{\phenref{elision}}).
Moreover, while there are many ways to extend FOL with a more or less complex module
system, modularity is not typically a part of FOL's standard formulation. Therefore, one
cannot adequately represent the high level discourse (\scoreminus{\phenref{statements}},
\scoreminus{\phenref{inline}}) and context (\scoreminus{\phenref{objcontext}},
\scoreminus{\phenref{framing}}, \scoreminus{\phenref{recaps}}) structure of mathematical
documents (e.g. theorems, definitions, proofs, theories, etc. and the relations between
them). As a direct consequence, the phenomena from the discourse and context level
discussed in Section~\ref{sec:pheno} cannot be captured in plain FOL or related logics.
\paragraph{Naproche}
The Naproche project~\cite{CramerFKKSV09} starts from the common mathematical language to develop a \emph{controlled natural language} (CNL) for mathematical texts
(\scorezero{\phenref{flexiform}}) and a proof checking software to formally check texts
written in this language. Therefore, the Naproche CNL focuses on proofs to permit adequate
proof checking by the Naproche system and doesn't necessarily aim to cover all phenomena
from section~\ref{sec:pheno}. Still, it is interesting to evaluate it with respect to
these requirements.
At the phrase level, the Naproche CNL includes a language for formulae which can be
integrated within natural language phrases (\scoreplus{\phenref{formling}},
\scoreplus{\phenref{formroles}}) and include references
(\scoreplus{\phenref{referential}}) and declarations
(\scoreplus{\phenref{naming}}). However, there is no explicit support for notations and
verbalizations (\scoreminus{\phenref{notverb}}) or elisions
(\scoreminus{\phenref{elision}}) and only limited coverage of ambiguity (as one would
expect from a controlled language; \scorezero{\phenref{formamb}})
At the discourse level, Naproche's CNL offers explicit markup for \emph{axioms},
\emph{definitions}, \emph{lemmas} and \emph{theorems} as well as \emph{proofs} with
\emph{case distinctions} and \emph{assumptions} (\scoreplus{\phenref{statements}}) but not
for inline statements (\scoreminus{\phenref{inline}}). Proofs are represented using
\emph{proof representation structures} (PRS) which are adapted from discourse
representation structures \cite{KamRey:fdtl93}.
The context level, as defined in~\ref{sec:pheno}, is missing as there is no explicit markup
for documents, theories or groups of statements (\scoreminus{\phenref{objcontext}},
\scoreminus{\phenref{framing}}, \scoreminus{\phenref{recaps}}).
\paragraph{Weak Type Theory}
WTT (Weak Type Theory)~\cite{KamNed:arbflm04} is a refinement of de Bruijn's Mathematical
Vernacular (\scorezero{\phenref{flexiform}}) and is designed to act as an intermediary
between common mathematical language and formal mathematics based on various logics
(\scorezero{\phenref{pluralism}}).
At the \emph{phrase level}, WTT has primitives inspired by common mathematical language:
terms, sets, nouns and adjectives (\scoreplus{\phenref{formling}},
\scoreplus{\phenref{formroles}}). At the statement level, WTT has \emph{statements} and
\emph{definitions} (\scoreplus{\phenref{statements}}) but doesn't distinguish, for
instance, examples or lemmas and does not allow for inline statements
(\scoreminus{\phenref{inline}}). \emph{Books} in WTT are the only context level element,
roughly representing ordered collections of statements (\scorezero{\phenref{objcontext}}).
Therefore, the context level of WTT is limited in expressivity, for instance with respect
to framing (\scoreminus{\phenref{framing}}) or complex knowledge sharing via parametric
imports (\scoreminus{\phenref{recaps}}). %-mathematical theories
As WTT is designed to be structurally close to the grammar of natural language it is
suitable as a first step in the formalization process. Consequently, translating a CML
text into WTT is significantly easier than fully formalizing it \cite{wtt-Jojgov05,
wtt-comp-miz}. However, WTT is relatively weak and limited in terms of capturing the
semantics of mathematical texts (especially at the context level) and further
formalization can require significant effort \cite{wtt-Jojgov05, wtt-form-math}.
\paragraph{MathLang}
Roughly based on WTT, MathLang \cite{KWZ:CmtiM} aims at reaching a compromise between expressivity and formality and thus
provide an interface language between mathematicians and computers (\scoreplus{\phenref{flexiform}}) as well as a framework to make the links with existing formal proof systems (\scorezero{\phenref{pluralism}}).
With respect to semantization and knowledge representation MathLang adopts a paradigm of annotating (fragments of) texts with their grammatical or semantic category.
Its design distinguishes three different levels of annotation granularity (called ``aspects'' in MathLang).
\begin{itemize}
\item The Core Grammar Aspect (CGa) is a kind of weak type system directly based on WTT
(\scoreplus{\phenref{formling}}, \scoreplus{\phenref{formroles}},
\scoreplus{\phenref{referential}}, \scoreplus{\phenref{naming}},
\scoreplus{\phenref{formamb}}) \cite{KamNed:arbflm04} and de Bruijn's mathematical
vernacular \cite{DB:MathVernacular87}. The types themselves are motivated by both
mathematical and grammatical considerations, including e.g. ``noun'', ``adjective'',
``set'', ``definition'' or ``context''.
\item The Text and Symbol Aspect (TSa) enables establishing the association between
textual presentation and mathematical meaning. Although still annotation based, the
effort to weave together presentation-oriented and content-oriented representations is
reminiscent of MathML's parallel markup. A notion of \emph{souring} rules (named in
relation to syntactic \emph{sugaring}) is introduced to permit better control over the
structure of mathematical texts. As a result it allows representing notations
(\scoreplus{\phenref{notverb}}) and elisions (\scoreplus{\phenref{elision}}).
Together, CGa and TSa mostly correspond to the \omdoc object level although some CGa
categories such as ``definition'', ``declaration'' and ``statement'' straddle the border
to the statement level.
\item The Document Rhetorical Aspect (DRa) allows labeling fragments of text and
establishing relations between them. For instance a text fragment can be labeled as a
``theorem'' and another as a ``proof'' for it. This corresponds roughly to the statement
level in \omdoc allowing both top-level and inline statements
(\scoreplus{\phenref{statements}}, \scoreplus{\phenref{inline}}).
\end{itemize}
Just like WTT, which it extends, MathLang lacks at the context level where it only has
``documents'' as ordered collections of statements (\scorezero{\phenref{objcontext}}) and
no support for framing (\scoreminus{\phenref{framing}}) or recaps
(\scoreminus{\phenref{recaps}}).
%statements can be further structured by using a notation of \emph{local scoping}
%In fact, the annotation paradigm itself is limiting as it binds, for each item, the
% semantic information to the syntactic representation. At the object level, this is
% solved with the TSa's souring rules
\paragraph{Homogenous Pluralism} With the \omdoc format we have presented a representation
that takes the concept of heterogeneous pluralism to the extreme, but there are also
logics in the homogenous tradition that might be worth considering as bases for
representation formats for mathematical linguistics: Type theories with dependent record
types (e.g. the calculus of constructions (CoC~\cite{CoquandHuet:tcoc88}) of the Coq
system~\cite{BertotCasteran:coq04}) can represent many of the structures at the theory
level in a logic-internal way by encoding what we think of as theories as record types and
views as type coersions. However, it is still unclear whether the CoC or
its derivatives can cope with dynamic approaches like DRT or DPL, though Aarne Ranta's
work~\cite{Ranta:ttilm94,Ranta:GF04} on GF might also be of relevance here.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
% LocalWords: relwork.tex miancu KamNed arbflm04 Bruijn's emph wtt-Jojgov05 pp textbf po
% LocalWords: wtt-comp-miz wtt-form-math CmtiM CGa TSa omdoc DRa pheno ednote textbf
% LocalWords: Naproche CramerFKKSV09 KampReyle hline rotatebox rotatebox Bernays ast91
% LocalWords: rotatebox rotatebox rotatebox phenref formling phenTref formling Harel
% LocalWords: formroles formroles notverb notverb formamb formamb objcontext scoreplus
% LocalWords: objcontext reqref flexiform reqTref flexiform underspec scoreplus tcxoc88
% LocalWords: underspec relwork-overview scoreminus scorezero CoquandHuet BertotCasteran
% LocalWords: Aarne Ranta's Ranta ttilm94 Ranta KamRey fdtl93 tcoc88
This diff is collapsed.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment