Skip to content
Snippets Groups Projects
Commit 83684a0a authored by m-iancu's avatar m-iancu
Browse files

initial version

parent 97f3774f
No related branches found
No related tags found
No related merge requests found
Showing
with 1039 additions and 0 deletions
We have presented theory graph design patterns for document structures at the theory level
in \omdoc2. These design patterns allow us to account for documents that recap or recount
mathematical knowledge already marked up in other documents.
In the future we want to look into the communication-enabling partial isomorphisms
postulated in Section~\ref{sec:course} and see whether~\cite{KohRabSac:fvip11} is directly
applicable.
\ednote{Interesting theoretical, practical, and social/organizational problems $\leadsto$
OAFF}
\ No newline at end of file
A central idea behind the OMDoc/OpenMath data model is that all symbols have a home theory
(the document that introduced the concept; essentially the content dictionary). In the
context of flexiformal\footnote{i.e. formalized to flexible depths; see~\cite{KohKoh:tfndc11} and
~\cite{Kohlhase:tffm13} for a discussion of this notion.} digital libraries, this information design creates a tension
between usability (which demand standardized content dictionaries) and
flexibility/locality.
\ednote{Expand, discuss the alternatives, discuss multiple theories and
realms~\cite{CarFarKoh:tr13}.}
\newcommand\mthy[2]{\begin{tabular}{l}\textsf{#1}\\\hline #2\end{tabular}}
\newcommand\cn[1]{\ensuremath{\mathsf{#1}}}
\def\tup#1{\langle #1\rangle}
%systems
\newcommand{\mws}{{\scshape{MathWebSearch}}\xspace}
\newcommand{\mmt}{{\scshape{Mmt}}\xspace}
\newcommand{\latin}{{\scshape{LATIN}}\xspace}
\newcommand{\openmath}{{\scshape{OpenMath}}\xspace}
\newcommand{\latexml}{{\scshape{LaTeXML}}\xspace}
\newcommand{\omdoc}{{\scshape{OMDoc}}\xspace}
\newcommand{\sTeX}{{\scshape{sTeX}}\xspace}
%BNF grammar
\newcommand{\alt}{\;{\color{red}|}\;}
\newcommand{\bbc}{{\color{red}::=}}
\newcommand{\rep}{{\color{red}\ast}}
\newcommand{\opt}[1]{{\color{red}[}#1{\color{red}]}}
@Comment{$ biblatex control file $}
@Comment{$ biblatex version 2.4 $}
Do not modify this file!
This is an auxiliary file used by the 'biblatex' package.
This file may safely be deleted. It will be recreated as
required.
@Control{biblatex-control,
options = {2.4:0:0:1:0:0:1:1:0:1:0:0:12:1:3:1:79:+},
}
File added
\documentclass{llncs}
\usepackage{url,amstext,amssymb,textcmds}
\usepackage{tikz,standalone}
\usepackage{labeledquote,xspace}
\usetikzlibrary{shapes,arrows}
\usetikzlibrary{mmt}
\usepackage{modules,presentation,paralist}
\def\gray{\textcolor{gray}}
\usepackage[show]{ed}
\usepackage{paralist}
\usepackage{listings}
%\usepackage{hyperref}
%\hypersetup{linkcolor=red,citecolor=blue,urlcolor=gray,colorlinks,breaklinks,bookmarksopen,bookmarksnumbered}
%\usepackage[eso-foot,today]{svninfo}
\usepackage[final]{svninfo}
\svnInfo $Id: paper.tex 1222 2014-01-23 08:15:46Z kohlhase $
\svnKeyword $HeadURL: https://svn.mathweb.org/repos/mws/doc/papers/mir12/paper.tex $
\usepackage[linkcolor=black]{hyperref}
\usepackage[hyperref=auto,style=alphabetic,citestyle=alphabetic,isbn=false,backend=bibtex]{biblatex}
\bibliography{pub_rabe,systems,kwarcpubs,extpubs,kwarccrossrefs,extcrossrefs}
\usepackage{local}
\title{Theory Level Structures in Informal Mathematics}
\author{Michael Kohlhase, Mihnea Iancu\\
}
\institute{Computer Science, Jacobs University, Bremen, Germany \\ \email{initial.last@jacobs-university.de}}
\newcommand\tikzinput[2][]{\input{#2}}
\def\qvar#1{\fbox{#1}}
\def\llquote#1{\textlangle\kern-.2em\textlangle #1\textrangle\kern-.2em\textrangle}
\def\flatsearch{\textsc{FlatSearch}\xspace}
\begin{document}
\maketitle
\begin{abstract}
Representation formats based on theory graphs are appealing for formalized mathematics
as they provide logic-compatible modularity and foster reuse. Like its formal counterpart, informal mathematics
is also fundamentally modular, as documents routinely refer to other documents and rely on concepts defined elsewhere.
However, while related, the relations used in informal mathematics do not directly correspond to basic theory graph
notions such as imports or theory morphisms.
In this paper we study some common use patterns from informal mathematical
documents and discuss how to represent them as theory-level structures in theory graphs.
We use this as a start for developing a notion of \emph{flexiformal} theory graphs which can
adequately represent content of flexible formality.
\end{abstract}
\section{Introduction}\label{sec:intro}
\input{intro}
\section{Preliminaries}\label{sec:prel}
\input{prel}
\section{Theory Level Phenomena in Informal Mathematics}\label{sec:pheno}
\input{phenomena}
\section{Theory Patterns in MMT/OMDoc Theory Graphs}\label{sec:patterns}
\input{patterns}
\section{Services/Authoring Workflows}
\input{workflows}
\section{Conclusion and Future Work}\label{sec:concl}
\input{conc}
%\bibliographystyle{plain}
\printbibliography
\end{document}
\ednote{tu discuss how to represent the examples in \ref{sec:pheno} using theory graphs}
We introduce \ednote{MI@MK is this true? I need to check} the notion of \emph{theory patterns} which we define as
any connected subgraph of a theory graph.
Adding an equivalent definition corresponds to a realm extension, where the face is fixed, and the view from the face to the current
theory is postulated (in informal math). Mathematical papers re-introduce concepts with the implied assumption of semantic equivalence in the context of the paper.
I.e. every statement in the paper holds for the original definition too, but not necessarily the converse.
\begin{figure}[ht]\centering
\begin{tikzpicture}
\node[thy] (p) at (-2,0) {Paper};
\node[thy] (l) at (2,0) {Cited Paper};
\node[thy] (d) at (2,-1) {$\cdots$};
\node[thy] (b) at (2,-2) {$\bot$};
\node[thy] (r) at (0,2) {Realm Face};
\draw[view] (r) to node[above] {$\cn{v_1}$} (p);
\draw[view] (r) to node[above] {$\cn{v_2}$} (l);
\draw[include] (l) to (p);
\draw[conservative] (b) to (d);
\draw[conservative] (d) to (l);
\draw (-1,-2.5) -- (3,-2.5) -- (3,2.5) node[above] {Realm} -- (-1,2.5) -- (-1,-2.5);
\draw (1,-2.5) -- (3,-2.5) -- (3,0.5) node[above] {Pillar} -- (1,0.5) -- (1,-2.5);
\end{tikzpicture}
\end{figure}
\begin{oldpart}{from bluenote}
We will progress by going through the examples from the introduction. \ednote{MK@MK: say
something about what digital libraries should look like: a theory for every variant
highly modularized, narratively enhanced overview for salient theories that are often
used, see Carette/Farmer high-level theories, \cite{CarFarKoh:tr13}}
\begin{example}[A Course grounded in a Formal Library]\label{sec:course}
Take for instance a course which introduces (naive) set theory informally, but grounds
itself in a formal, modular definition. Then we have the situation in
Figure~\ref{fig:slides-library}. On the right hand side, we have a careful introduction in
the form of a modular theory graph starting at a theory \cn{ZFset} that introduces
membership relation and the axioms of existence, extensionality, and separation and
defines the set constructor $\{\cdot |\cdot\}$ from these axioms. On the left we have a
theory \cn{SET} that ``adopts'' the symbols $\in$ and $\{\cdot |\cdot\}$ via a partial
inclusion $\cn{a_1}$ from \cn{ZFset} to \cn{SET} but ``defines'' them by alluding the
intuitions of the students. Note that a partial inclusion always gives rise to a view in
the opposite direction, here the view \cn{v_1} from \cn{SET} to \cn{set}. Note that we
cannot discharge the proof obligations in \cn{v_1}, since the definition of the set
constructor $\{\cdot |\cdot\}$ is opaque -- i.e. given as natural language, which is not
subject to formal methods; see~\cite{Kohlhase:fsgo13} for a discussion. We should think
of \cn{v_1} as a ``definitional view'' that gives meaning to the opaque parts in \cn{SET}:
the proof obgligations have to be met in order for the diagram to commute (which is an
invariant we want to maintain).
We call this situation where the symbols of a theory are imported via a partial inclusion
and their meaning is specified via a view that is a partial inverse an \textbf{adoption}
and the morphism pair an \textbf{adoption} morphism.
\begin{figure}[ht]\centering
\begin{tikzpicture}[yscale=1.7]
\node at (1,2.6) {\large Library};
\node at (-4.5,2.6) {\large Course};
\node[thy] (zfset) at (1,-.2) {\mthy{ZFset}{$\in$, \cn{Ex}, \cn{Ext}, \cn{Sep}; $\{\cdot |\cdot\}$}};
\node[thy] (zfunion) at (-.2,1) {\mthy{ZFunion}{\cn{Un}, $\cup$}};
\node[thy] (zfpow) at (2.2,1) {\mthy{ZFpow}{\cn{Pow}, $\mathcal{P}$}};
\node at (1.1,1) {\ldots};
\node[thy] (zfop) at (1,2) {ZFops};
\draw[include] (zfset) -- (zfunion);
\draw[include] (zfset) -- (zfpow);
\draw[include] (zfunion) -- (zfop);
\draw[include] (zfpow) -- (zfop);
\node[thy] (set) at (-4.5,-.2) {\mthy{SET}{$\in$, $\{\cdot |\cdot\}$}};
\node[thy] (sop) at (-4.5,2) {\mthy{SETOPS}{$\cup$,\ldots,$\mathcal{P}$}};
\draw[view] (set) to[out=-5,in=185] node[below] {$\cn{v_1}$} (zfset);
\draw[pinclude] (zfset) to[out=175,in=5] node[above] {$\cn{a_1}\colon\in,\{\cdot |\cdot\}$}(set);
\draw[view] (sop) to[out=-5,in=185] node[below] {$\cn{v_2}: \text{incl}\; \cn{v_1}$} (zfop);
\draw[pinclude] (zfop) to[out=175,in=5] node[above] {$\cn{a_2}\colon\cup,\ldots,\mathcal{P},\text{incl}\; \cn{a_1}$} (sop);
\draw[include] (set) -- (sop);
\end{tikzpicture}
\caption{A course grounded in a modular Library}\label{fig:slides-library}
\end{figure}
Let us now continue looking at the example in Figure~\ref{fig:slides-library}: The
informal course materials go on in style, introducing the set operations ranging from set
union to the power set in one go in the theory \cn{SETOPS}. On the library side, we
introduce the set theory axioms one by one, derive the respective operators from them, and
at the end collect all the material in the theory \cn{ZFops}. A this point, we can justify
the theory \cn{SETOPS} via a partial inclusion of the symbols $\cup$, \ldots,
$\mathcal{P}$ from \cn{ZFops}, which gives rise to another definitional view \cn{v2} to
\cn{ZFops}. As \mmt allows assignments to structures, we can simply assign the inclusion
from \cn{SET} to \cn{SETOPS} to the (unique) inclusion from \cn{ZFset} to \cn{ZFops} (as
indicated by the ``assignment'' include $\cn{v_1}$ on $\cn{v_1}$).
We observe that the two theory graphs are self-contained: the course materials can be
understood without knowing about the library; in particular, the membership relation used
in the definition of the union operator in \cn{SETOPS} is from theory \cn{SET}.
This self-containedness is important intra-course didactics. But it also has the problem
that the courses become insular; how are students going to communicate with mathematicians
who have learned their maths from other courses? Here is where the views \cn{v_i} come
in. Say the other mathematicians have course theories \cn{\overline{SET}} and
\cn{\overline{SETOPS}} with views \cn{\overline{v_1}} and \cn{\overline{v_2}} into the
same library, then the views \cn{v_1} and \cn{\overline{v_1}} induce a partial
isomorphisms between \cn{SET} and \cn{\overline{SET}} in the sense
of~\cite{KohRabSac:fvip11} (and correspondingly between \cn{SETOPS} and
\cn{\overline{SETOPS}}) that justify communication.
\end{example}
\subsection{Accelerated Turing Machines}
The case of the mathematical paper case study mentioned above is more difficult, since we
do not have a direct generalization relation as in the case above. In this case, the
definition of the accelerated Turing machine involves a concrete step size ($2^{-n}$),
whereas the definition it recaps allows arbitrary sequences of step sizes as long as their
sum remains finite. Thus we have the situation in Figure~\ref{fig:atm}. Theory \cn{ATM}
contains the (opaque) sentence (\ref{lq:atm}), but there cannot be a view from \cn{ATM} to
\cn{atm} as that is more general. But we do have a view to \cn{atm(2^{-n})}, which
naturally arises in treatments of accelerated Turing machines as an example.
\begin{figure}[ht]\centering
\begin{tikzpicture}[scale=1.3]
\node at (-.5,2.7) {\large Paper};
\node at (2.5,2.7) {\large Literature};
\node[thy] (ATMp) at (-.5,2) {\cn{ATMhalt}};
\node[thy] (catmp) at (2,2) {\cn{atm(2^{-n})halt}};
\node[thy] (ATM) at (-.5,1) {\cn{ATM}};
\node[thy] (atm) at (3.5,0) {\cn{atm}};
\node[thy] (catm) at (2,1) {\cn{atm(2^{-n})}};
\node[thy,dotted] (atmp) at (3.5,1) {\cn{atmhalt}};
\draw[view] (ATM) -- node[above] {\cn{v_1}} (catm);
\draw[view] (ATMp) -- node[above] {\cn{v_2}} (catmp);
\draw[include] (atm) -- (catm);
\draw[include] (ATM) -- (ATMp);
\draw[include] (catm) -- (catmp);
\draw[include,dotted] (atm) -- (atmp);
\draw[include,dotted] (atmp) -- (catmp);
\end{tikzpicture}
\caption{Definitional View for ATM}\label{fig:atm}
\end{figure}
The more important aspect is that the contribution of~\cite{CalStai:natm09} (depicted here
as theory \cn{ATMhalt} as it concerns undecidability of the accelerated halting problem
by accelerated Turing machines) can be justified via a view into a theory
\cn{atm(2^{-n})halt}, which can be systematically constructed from \cn{ATMhalt} modulo
\cn{v_1} as pushouts along inclusions always exist in \mmt. In the particular example, we
can do even better: as the proof in theory \cn{ATMhalt} does not use any properties of the
step size sequence $2^{-n}$, there is a theory \cn{atmhalt} that generalizes
\cn{atm(2^{-n})halt} (shown dotted in Figure~\ref{fig:atm}). Note that this construction
is automatic and needs human intervention -- but one that the authors expect their readers
will readily do, otherwise they would not have restricted themselves to the concrete
sequence.
\end{oldpart}
\ No newline at end of file
\ednote{to discuss and categorize examples}
\begin{oldpart}{Examples list}
\begin{itemize}
\item http://arxiv.org/pdf/1502.01042.pdf uses http://arxiv.org/pdf/math/0401301.pdf with alternative definition
(similar to ATM case)
\item http://arxiv.org/pdf/1502.01212.pdf
\item http://arxiv.org/pdf/1502.00978.pdf
\item http://arxiv.org/pdf/1502.02591.pdf
\item http://arxiv.org/pdf/1502.02308.pdf
\item http://arxiv.org/pdf/1502.02413.pdf
\end{itemize}
\end{oldpart}
\subsection{Recapitulation in Papers}
A common pattern occurs in scientific papers which often have a preliminaries section to
recapitulate the central notions and fix notations. In its simplest incarnation this can be seen
as an import. However, in general, it can have a more complex behavior (e.g. see example \ref{ex:atm}).
\begin{example}\label{ex:atm}
For instance, \cite{CalStai:natm09} contains the sentence
\begin{labeledquote}\em\label{lq:atm}
An accelerated Turing machine (sometimes called Zeno machine) is a Turing machine that
takes $2^{-n}$ units of time (say seconds) to perform its $n$\textsuperscript{th} step;
we assume that steps are in some sense identical except for the time taken for their
execution.
\end{labeledquote}
This is a telegraphic version of the full definition, which is given in the literature.
Actually~\cite{CalStai:natm09} continues with an overview of the literature, citing no
less than 12 papers, which address the topic of accelerated Turing machines. One of these
supposedly contains the formal definition, which involves generalizing Turing machines to
timed ones, introducing computational time structures, and singling out accelerating ones,
e.g. using (\ref{lq:atm-formal}).
\begin{labeledquote}\label{lq:atm-formal}\em
\textbf{Definition 1.3}: An \textbf{accelerated Turing machine} is a \underline{Turing
machine} $M=\langle X,\Gamma,S,s_o,\Box,\delta\rangle$ working with with a
\underline{computational time structure} $T= \tup{\{t_i\}_i,<,+}$ with
$T\subseteq\mathbb{Q_{+}}$ ($\mathbb{Q_{+}}$ is the set of non-negative rationals) such
that $\sum_{i\in \mathbb{N}} t_i <\infty$.
\end{labeledquote}
\end{example}
\begin{oldpart}{from bluenote, not sure what to do with this -- see ednote}
In \cite{CalStaiKoh:natms12} we are undertaking the experiment to
flexiformalize \cite{CalStai:natm09}, and in
this we have to disambiguate the concept of an accelerated Turing machine.
\ednote{MI@MK: couldn't find the paper referenced, link seems broken, couldn't find repo
on strawberry}
\end{oldpart}
In the example above the papers local definition of an ``accelerated Turing machine'' (ATM)
is slightly different from the one referenced in the literature.
Specifically, it's a special case where the computation time structure is $2^{-n}$.
Therefore, treating it as an import is not adequate as the two concepts have clearly distinguishable semantics.
We have two choices for grounding a concept $c$ that occurs in recap :
\begin{compactenum}
\item either we find the ``real definition'' from the literature and use this as the ``origin'' or
\item we consider the introductory sentence in the recap (e.g. \ref{lq:atm} for ATMs in example \ref{ex:atm})
as a definition and use this as the ``origin''.
\end{compactenum}
Which one is better depends on the situation.
If the notion used in the paper is exactly the same as in the literature, we
should probably choose 1. and just import the respective theory\footnote{Assuming that the
paper in question has already been flexiformalized} so that we can reference it. In this case
the local ``definition'' (e.g. \ref{lq:atm} in example \ref{ex:ex:atm})of $c$ becomes a ``verbalization'' \ednote{introduce verbalizations in preliminaries}
of the original definition in the literature.
The advantage of this choice is that the statements about $c$ made in the paper
are directly compatible with any others that also reference this definition of $c$. This
has enormous advantages for knowledge management and knowledge access.
If the local notion is in any way different (such as in the ATM example above) than
the one in the literature (e.g. a generalization or a special case), we are forced into the second possibility.
However, while two concepts are different, they are still related -- after all, there was a reason to consider
the technical differences so small that they can be neglected.
Therefore, we should establish formally the relation between the two variants and their containing theories.
We call this relation an \emph{adoption} as the local notion adopts (part of) its meaning from the literature.
In section \ref{sec:repr} we discuss how adoptions can be represented in theory graphs using imports and theory morphisms.
\ednote{MK@MK: more, e.g. about the distinction of notations and
differing concepts.}
\subsection{Slides and Background Materials}
A similar situation obtain with slides and background materials (lecture notes, books,
encyclopedias), where the slides often have telegraphic versions of the real statements,
which verbalize more rigorous definition.
The problem of all of these materials is that they are of a certain ``reference
character'': For encyclopedias this is obvious as their primary purpose is to serve as
reference works, even if they are not the documents that initially introduce the concepts
in question. Textbooks and lecture notes are natural references as they are the places
where students first encounter the concepts. \ednote{MK: The solution for this is ADP and
transclusion}
\ednote{to introduce basic OMDoc notions (docs, theories, declarations, verbalizations, notation definitions, URIs), realms}
\ No newline at end of file
\begin{oldpart}{from bluenote}
\subsection{\protect\sTeX}
Now let us consider how we would efficiently write such theory graphs in practice. We
develop a practical \sTeX-based solution for the course materials example from
Section~\ref{sec:course}. Let us assume that we already have a \sTeX representation of the
modular library on the right of Figure~\ref{fig:slides-library}, where the modules
\cn{ZF*} introduce semantic macros for all the relevant symbols. When we develop a new
course, we would like to take advantage of the existing semantic macro definitions since
\begin{inparaenum}[\em i\rm)]
\item they represent a sizable investment, and
\item we automatically ensure consistent notation by using them.
\end{inparaenum}
For the {\LaTeX} workflow this is very simple, we can just use the
\lstinline|\importmodule| macro to import the module \cn{ZFset} into the new module
\cn{SET}. Similarly, we can import module \cn{ZFops} into \cn{SETOPS}.
But the generated OMDoc representations would be wrong: we would generate structures in
the opposite direction instead of the views \cn{v_i}. Therefore we extend \sTeX modules
package with a new macro \lstinline|\adoptmodule|, which is synonymous with
\lstinline|\importmodule| for {\LaTeX} formatting, but whose \latexml binding generates
the appropriate view \emph{and} generates the symbols with home theory \cn{SET} instead of
their native \cn{ZFset} we say that the symbols are \textbf{adopted} in module
\cn{SET}. Note that the situation is a bit subtle here; to see how let us consider
recapping theory \cn{ZFops} in \cn{SETOPS}. Here we only want to adopt the symbols into
\cn{SETOPS} that have not already been imported from \cn{SET} (after being adopted there
from \cn{ZFset}). This behavior ensures that the course theories are self-contained and
are interpreted by the intended views.\ednote{MK: \sTeX needs to generate the adopted
notation definitions as well. Maybe we should use \omdoc references here to save space.}
But there is another subtlety: views are not allowed inside theories (the \omdoc
equivalent of \sTeX modules), so we have to wait for the end of the module and deposit the
view there.
With \lstinline|\adoptmodule| we can also handle the accelerate Turing machines paper.
\subsection{Services}
\begin{todolist}{expand}
\item auto-generation of recap material from content commons
\item need a community process of moving ``original theory'' towards encyclopedias.
\end{todolist}
\end{oldpart}
We have presented an extension of a formula search search engine to modular formal
libraries. In our application case, most of the services that are needed for this
extension:
\begin{inparaenum}[\em i)]
\item compositional naming of induced statements,
\item flattening,
\item transformation of formulae to content MathML, and
\item presentation of arbitrary objects
\end{inparaenum}
already came with the {\mmt} system. Any other system (e.g. the Isabelle
system~\cite{Nipkow-Paulson-Wenzel:2002} for the Isabelle library~\cite{IsabelleKB:on})
would have to implement similar functionality. Our experience with querying shows that the
statements induced by views are by far more useful than those induced by inclusions,
because (as in our graph in Figure~\ref{fig:elal-graph}) the interest is in inducing
statements from the abstract part to concrete structures. As a consequence, {\flatsearch}
becomes more and more useful as the graph acquires more and more views.
Currently, our {\flatsearch} is restricted to fully formal modular libraries, since flattening is
only understood for those. With a notion of flattening of flexiformal representations
(representations of mathematical knowledge at flexible levels of formality;
see~\cite{KohKoh:tfndc11}) we could extend {\flatsearch} to modular flexiformalizations of
traditional mathematical documents. Indeed Laubner's study of the first 35 pages of
Bourbaki's Algebra mentioned in the introduction revealed an underlying theory graph with
51 theory nodes and 107 theory morphisms of which 12 were views, but 63 had non-trivial
assignments. Especially the last number shows the value of the {\flatsearch} method.
Indeed one of the problems readers face with the Bourbaki
books (which are otherwise well-liked for their structured approach) is that particular
mathematical structures and objects can only be understood, if one already knows all the
material they depend on. One author even said that
\begin{quote}\em
Bourbaki was a dinosaur, the head too far away from the tail. Explaining: [\ldots] You
could say ``Dieudonn\'e what is the result about so and so?'' and he would go to the
shelf and take down the book and open it to the right page. After Dieudonn\'e retired no
one was able to do this. So Bourbaki lost awareness of his own
body\rm\hfill\cite{Richter:nb:on}
\end{quote}
A flexiformalization of the Bourbaki books together with an extension of MMT that can deal
with flattening of informal texts would go a long way to alleviate these problems.
\flatsearch queries are currently restricted to pure unification queries, for a more
expressive user experience, we plan to embed it into a more comprehensive mathematical
query language, most probably starting with~\cite{Rabe:qlfml12}.
Finally note that our query method is similar in spirit to ``Semantic Web Queries'', where
queries for RDF triples that are induced by a background ontology from explicitly
represented RDF facts are possible. The inferencing involved in this corresponds to the
flattening step in \flatsearch. However, DL-reasoners like Racer~\cite{HarMoe03:racer}
answer queries by performing inferences on the fly, and do not have the efficiency of a
search index at their disposal, and can only deal with relatively small A-boxes as a
result. Current high-efficiency triple stores that do employ (database) indexing
techniques only support a very restricted subset of ontologies (as a typical example,
Virtuoso~\cite{OpenLinkVirtuoso:webpage} supports transitive closures of selected
relations, but not a lot more).
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "all"
%%% End:
% LocalWords: miancu inparaenum Nipkow-Paulson-Wenzel elal-graph KohKoh ednote
% LocalWords: tfndc11 Laubner's flatsearch ldots Dieudonn rm hfill qlfml12 mmt
% LocalWords: conc.tex
flatsearch/flatsearch.png

20.7 KiB

\ednote{MI@MK : I tried to reorganize this and intertwine comments but it might need rewriting}
In the last two decades the way we access information has been completely transformed by
search engines. If we want to know the birthdate of Barack Obama, the first hit of a Google
query for ``obama birthdate'' leads us to a wikipedia page that has the information in the
first line of text: August 4. 1961. But not all information can be directly queried: to
the best of our knowledge there is no standard search engine that can directly answer the
question \emph{who was US president when Barack Obama was born}, because the answer
combines two pieces of information, Obamas birthdate and the fact that John F. Kennedy was
in office in January 1961 until his assassination in November 1963. In fact, the question
breaks one of the basic assumptions of search engines: i.e. that it is enough to return
links to relevant web pages, which have the answer (as part of the text). Indeed, there is
apparently no page on the wide web that explicitly answers our question (and if there
were, we could easily find a question whose answer isn't represented). But of course, the
answer is induced by knowledge explicit on the web. If we call the space of induced
answers the \textbf{knowledge space}, then we would like to (but currently cannot) search
the knowledge space induced by the web. Humans naturally build up a knowledge space from
the facts they learn, and can answer queries based on it, but are -- compared to search
engines -- severely limited in the breadth of information they can process.
In this paper we explore the notion of \emph{knowledge space} in formal theory graphs and
discuss how to access or generate induced statements (i.e. statements that are in the knowledge
space but not in the original theory graph). We then use it as a basis for building math-literate \cite{math-literate}
MKM services.
We note that
\begin{compactenum}
\item if the representation framework of the modular library
provides a systematic naming scheme for induced statements and
\item if the library system implements
a flattening operation that generates some or all induced statements,
\end{compactenum}
then we can use the flattened theory graph as a basis for semantic services, enabling them to work on the knowledge space.
Concretely, we use the {\mmt} \cite{MMT} language and system, which has a well developed URI scheme that already can
refer to induced statements. \ednote{maybe partially for realms}
We discuss induced statements and implement flattening for {\mmt} and apply it to the LATIN \cite{latin} library, which is a highly
modular atlas of logics.
Then, we use the {\mws}~\cite{ProKoh:mwssofse12} formula search engine on the flattened theory graph to provide a method for searching the mathematical knowledge space of them
LATIN library.
This paper is organized as follows. In section \ref{sec:relwork} we briefly review related work, then in section \ref{sec:prel} we briefly introduce
{\mws}, {\mmt} and the {\latin} atlas. In section \ref{sec:libraries} we discuss the problem of generating and accessing induced
statements in {\mmt}. Then, in section \ref{sec:searching} we describe how {\mws} can be leveraged to search for induced statements.
Finally, in section \ref{sec:concl} we discuss future work and conclude.
\section{Related Work}
\ednote{not sure where this fits, was in intro but it felt too long}
There are search engines that attempt to search the knowledge space. For instance DBPedia
extracts facts from Wikipedia, represents them as RDF triples, and can answer queries that
can be expressed by chaining relations that occur as predicates of the triples; see
e.g. ~\cite{AuLeh:whilic07} for details. Even though the RDF formalization of Wikipedia
facts as well as the expressivity of queries is rather limited, it can answer (a suitable
version of) the Obama president question above.
But the space of mathematical knowledge cannot suitably be expressed in RDF triples, since
it contains mathematical formulae (see~\cite{Lange:PhD,} for a discussion), makes
non-trivial use of quantifications and introduces concepts and vocabulary
dynamically. A question that cannot be answered by mathematical search engines is the situation
in example \ref{ex:intro}.
\begin{example}\label{ex:intro}
We have identified a structure $(\mathbb{S},\sharp)$ as an
associative, unital, idempotent magma, and we are interested whether $\sharp$ is
commutative. To find out, we would like to search for $\fbox{x}\sharp \fbox{y} = \fbox{y}
\sharp \fbox{x}$ and find this as an instance of the fact that idempotent monoids are
Abelian, which we proved in our Algebra 101 course, even though the statement of
the theorem does not involve the operator $\sharp$.
\end{example}
In comparison with DBPedia,
\begin{inparaenum}[a)]
\item we start out from fully formal knowledge, so that the need for (automated)
formalization is alleviated, and
\item we only allow for a controlled regime of inducing knowledge items in the knowledge
space: inheritance of statements via theory morphisms.
\end{inparaenum}
We will see that in the context of mathematical knowledge, the second restriction is very
natural (if the modularity framework is sufficiently strong), and the first restriction
can be alleviated by automated semantization methods that are researched independently of
the work presented in this paper.
This paper picks up ideas introduced in~\cite{Laubner:utgmm07} under the first author's
supervision. Bastian Laubner's system was based on a partial formalization of Bourbaki's
Algebra~\cite{Bourbaki:a74} in the OMDoc format~\cite{Kohlhase:OMDoc1.2}, which supported
modular formalization in theory graphs, but did not provide naming conventions for induced
statements and did not support flattening. In the time since 2007, we have developed the
{\mmt} format~\cite{RabKoh:WSMSML11} and system~\cite{Rabe:MMT}, which does, and what is
more, we now have a {\mmt} theory graph of more than 800 modules: the {\latin} Logic
Atlas~\cite{CodHorKoh:palai11}. As the atlas grows, it becomes more and more difficult to
read the atlas, since most of the theories are specified by inclusion reference to other
theories, and the actual statements are only reached by long and confusing walks through
the graph. This problem can be solved by technology: incremental, in-place flattening in
the {\mmt} front-end. At the same time, it becomes more and more difficult to avoid
duplication and maintain a maximally shared structure, for this we need a search system
that takes induced statements (these are the ones we want to share) into account.
Intuitively, the knowledge space of a mathematical library is formed of all the statements that can be proved in it so it
is not possible to (easily) mechanically generate the entire knowledge space.
However, in the context of theory graphs a substantial amount of interesting knowledge can be generated
by leveraging relations between theories (such as imports or theory morphisms).
To better understand the situation in modular libraries, consider the theory graph in
Figure~\ref{fig:elal-graph}. The right side of the graph introduces the elementary
algebraic hierarchy building up algebraic structures step by step up to rings; the left
side contains a construction of the integers. In this graph, the nodes are theories
\footnote{We have left out the quantifiers for the variables $x$, $y$,
and $z$ from the axioms to reduce visual complexity. The always range over the
respective base set. Furthermore, all axioms are named; but we only state the names we
actually use in the examples.},
the solid edges are imports and the dashed edges are views.
\begin{figure}[ht]\centering
\tikzinput{tikz/mmt/elalg-nat-graph}
\caption{A {\mmt} Graph for Elementary Algebra}\label{fig:elal-graph}
\end{figure}
In {\mmt} theory graphs, theory morphisms can carry a name, and inherited constants can be
disambiguated via the inclusion that induced them. An application of this is in the
definition of the ring theory, which inherits all of its operators (and their axioms) via
the two inclusions $\mathsf{m}$ (for the multiplicative operations) and $\mathsf{a}$ (for
the additive operations). To complete the ring we only need to add the two distributivity
axioms in the inherited operators $\mathsf{m}\kern-.1em/\kern-.2em\circ$ and
$\mathsf{a}\kern-.1em/\kern-.2em\circ$.
Furthermore, morphisms can carry an assignment which maps symbols and axioms from the
source theory to terms in the target theory. We see this in the view $\mathsf{e}$ from
\textsf{Monoid} to \textsf{NatArith}, which assigns $\mathbf{N}$ to the base set $G$,
multiplication ($\cdot$) to $\circ$ and the number $1$ to the unit $e$. The document
theory morphism property, $\mathsf{e}$ also contains proofs for all \textsf{Monoid} axioms
in \textsf{NatArith}.
It is a special feature of {\mmt} that assignments can also map morphisms into the source
theory to morphisms into the target theory. We use this to specify the morphism
$\mathsf{c}$ modularly (in particular, this allows to re-use the proofs from $\mathsf{e}$
and $\mathsf{c}$).
Note that already in this small graph, there are a lot of induced statements. For
instance, the associativity axiom is inherited in seven times (via inclusions; twice into
\textsf{Ring}) and induced four times (via views; twice each into \textsf{NatArith} and
\textsf{IntArith}). All in all, we have more than an hundred induced statements from the
axioms alone. If we assume just 5 theorems proven per theory (a rather conservative
estimation), then we obtain a number of induced statements that is an order of magnitude
higher.
Another crucial ingredient of {\mmt} for the endeavor of searching for induced statements
is the fact that, as discussed in section \ref{sec:mmt}, {\mmt} supplies names (called
{\mmt} URIs) for all induced statements. For example, if we take $u$ to be the URI of the
theory graph in Figure~\ref{fig:elal-graph}, the statement $\forall x,y,z :
\mathbb{Z}.(x+y)+z = x+(y+z)$ induced by the view $\mathsf{c}$ in \textsf{IntArith} has
the {\mmt} URI $u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc}. Note that given
a theory graph, the {\mmt} system can dereference valid {\mmt} URIs and return the induced
statements, even though they are not explicitly represented in the graph.
\paragraph{Generating Induced Statements}
We already hinted in section \ref{sec:intro} that generating the induced statements is a \emph{theory graph operation}
i.e. it takes a theory graph as input and returns a different one, specifically the flattened theory graph.
An \emph{enriching theory graph operation} is an operation that adds statements or theories to the original theory graph.
Effectively enriching theory graph operations generate induced statements and thus make a part of the knowledge space explicit
so that it can be used by applications.
Theory graph flattening is an \emph{enriching theory graph operation}.
Following the algebraic structures example above we can formally define the process of flattening a theory (or
a theory graph).
\begin{definition}
Given a theory graph $\gamma$ the flattening of a theory $T$ in $\gamma$ is a theory $\overline{T}$ with the same URI as $T$ containing:
\begin{itemize}
\item all symbol declarations that are in $T$.
\item all symbol declarations that are imported into $T$.
\item for every view $v : S \rightarrow T$ the projection of every $S$-based declaration
over view $v$. Here, by $S$-based declaration we refer to the declarations in $S$ and in
theories that import $S$.
\end{itemize}
\end{definition}
The URIs of the induced declarations are based on the definition of {\mmt} URIs from
section \ref{sec:mmt} (see also the example above) and permit recovering the origin of
induced declarations. Specifically, symbol declarations from $T$ or imported in $T$ by
the meta or an include preserve their name while symbol declarations imported in $T$ by a
structure or induced by a view are additionally qualified with the name of that structure
or view.
\begin{definition}
The flattening of theory graph $\gamma$ is a theory graph $\overline{\gamma}$ with the
same URI as $\gamma$ containing the following module declarations :
\begin{itemize}
\item for every theory $T$ in $\gamma$ the theory $\overline{T}$
\item every view $v : S \rightarrow T$ in $\gamma$
\end{itemize}
\end{definition}
Note that, since theory flattening preserves theory URIs and doesn't add new axioms (only
new theorems) any view from $S$ to $T$ is also a view from $\overline{S}$ to
$\overline{T}$ so the views in $\overline{\gamma}$ are still valid.
An important observation is that the flattening operation for theory graphs is not
idempotent and flattening can be applied repeatedly to a theory graph to generate new
induced symbol declarations. Effectively this generates symbol declarations induced not
only by one view but by a chain (composition) of views. More precisely, a theory graph
flattened $n$ times will all contain declarations induced by view chains of length at most
$n$. Clearly, if there is a view cycle (e.g. $v_1 : S \rightarrow T$ and $v_2 : T
\rightarrow S$) this process can continue indefinitely yielding ever bigger content
libraries.
\paragraph{Induced statements in the {\latin} library}
\ednote{MI@MI: regenerate statistics}
We implemented library flattening as described above in {\mmt} and tested it on the
{\latin} library. The flattening (once) of the {\latin} library increases the number of
declarations from 2310 to 58847 (a factor of 25.4) and the total size of the library from
123.9 MB to 1.8 GB (a factor of 14.8). As expected, the multiplication factor depends on
the level of modularity of the library. For instance, the highly modular math sub-library
containing mainly algebraic structures increases from 2.3 MB to 79 MB thus having a
multiplication factor of 34.3, more than double the library average. The size of the MWS
harvests also increases considerably, from 25.2 MB to 539.0 MB.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "all"
%%% End:
% LocalWords: miancu elal-graph tikzinput tikz mmt mathsf mathsf circ textsf
% LocalWords: textsf NatArith mathbf cdot IntArith rightarrow forall mathbb
% LocalWords: assoc
%systems
\newcommand{\mws}{{\scshape{MathWebSearch}}\xspace}
\newcommand{\mmt}{{\scshape{Mmt}}\xspace}
\newcommand{\latin}{{\scshape{LATIN}}\xspace}
\newcommand{\openmath}{{\scshape{OpenMath}}\xspace}
\newcommand{\omdoc}{{\scshape{OMDoc}}\xspace}
%BNF grammar
\newcommand{\alt}{\;{\color{red}|}\;}
\newcommand{\bbc}{{\color{red}::=}}
\newcommand{\rep}{{\color{red}\ast}}
\newcommand{\opt}[1]{{\color{red}[}#1{\color{red}]}}
@Comment{$ biblatex control file $}
@Comment{$ biblatex version 2.4 $}
Do not modify this file!
This is an auxiliary file used by the 'biblatex' package.
This file may safely be deleted. It will be recreated as
required.
@Control{biblatex-control,
options = {2.4:0:0:1:0:0:1:1:0:1:0:0:12:1:3:1:79:+},
}
File added
\documentclass{llncs}
\usepackage{url,amstext,amssymb,textcmds}
\usepackage{tikz,standalone}
\usepackage{labeledquote,xspace}
\usetikzlibrary{shapes,arrows}
\usepackage{modules,presentation,paralist}
\def\gray{\textcolor{gray}}
\usepackage[show]{ed}
\usepackage{paralist}
%\usepackage{hyperref}
%\hypersetup{linkcolor=red,citecolor=blue,urlcolor=gray,colorlinks,breaklinks,bookmarksopen,bookmarksnumbered}
%\usepackage[eso-foot,today]{svninfo}
\usepackage[final]{svninfo}
\svnInfo $Id: paper.tex 1222 2014-01-23 08:15:46Z kohlhase $
\svnKeyword $HeadURL: https://svn.mathweb.org/repos/mws/doc/papers/mir12/paper.tex $
\usepackage[linkcolor=black]{hyperref}
\usepackage[hyperref=auto,style=alphabetic,citestyle=alphabetic,isbn=false,backend=bibtex]{biblatex}
\bibliography{pub_rabe,systems,kwarcpubs,extpubs,kwarccrossrefs,extcrossrefs}
\usepackage{local}
\title{Math Literate Knowledge Management}
\author{Michael Kohlhase, Mihnea Iancu\\
}
\institute{Computer Science, Jacobs University, Bremen, Germany \\ \email{initial.last@jacobs-university.de}}
\newcommand\tikzinput[2][]{\input{#2}}
\def\qvar#1{\fbox{#1}}
\def\llquote#1{\textlangle\kern-.2em\textlangle #1\textrangle\kern-.2em\textrangle}
\def\flatsearch{\textsc{FlatSearch}\xspace}
\begin{document}
\maketitle
\begin{abstract}
Theory graphs enable modular representations of mathematical knowledge
that capture the theory-level semantics in a concise way.
However, this means that a substantial amount of knowledge can be induced
from the theory graph without having to be explicitly represented in it.
This is good for knowledge management as the number of induced
(i.e. not explicitly represented) statements can grow exponentially in
the explicitly represented ones. But this bad for accessibility, since conventional
(computer supported) access methods only work on explicitly represented materials.
In this paper we discuss how induced knowledge arises in highly modular,
theory-graph based, mathematical libraries and establish how to access
it to make it available for applications. We present the {\flatsearch}
extension of {\mws} system and show this access method in action.
\end{abstract}
\section{Introduction}\label{sec:intro}
\input{intro}
\section{Preliminaries}\label{sec:prel}
\input{prel}
\section{Induced Statements in Theory Graphs}\label{sec:libraries}
\input{libs}
\section{Accessing Induced Statements}\label{sec:searching}
\input{search}
\section{Conclusion and Future Work}\label{sec:concl}
\input{conc}
%\bibliographystyle{plain}
\printbibliography
\end{document}
\begin{oldpart}{from old version, should introduce realms and de-emphasize {\mws}}
\subsection{{\mws}}\label{sec:mws}
{\mws} (MWS)\cite{KohPro:MWSmanual} is an open-source, open-format, content-oriented search engine for mathematical expressions.
%We will only give a brief overview of {\mws} here and refer to for details.
The MWS system consists of three components:
\begin{itemize}
\item A \emph{crawler subsystem} collects data from a mathematical knowledge repository
and converts the mathematical expressions into \emph{{\mws} harvests}. The harvests are
files containing mathematical expressions encoded in a MWS-specific XML-based format
(see \cite{KohPro:MWSmanual} for details).
\item The \emph{core system} builds the search index from MWS harvests and processes
search queries.
\item The \emph{RESTful interface} provides a public HTTP API for interacting with the
core system (i.e. submitting queries and receiving results).
\end{itemize}
Therefore, with respect to {\mws}, our {\flatsearch} approach involves providing a MWS crawler that generates
MWS harvests from flattened {\mmt} libraries and a MWS frontent for submitting queries that will interact with the MWS RESTful interface.
\subsection{MMT}\label{sec:mmt}
{\mmt} \cite{RK:mmt:10} is a generic, formal module system for mathematical knowledge. The {\mmt} language is
designed to be applicable to a large collection of declarative formal base languages and
all {\mmt} notions are fully abstract in the choice of the base language.
%\ednote{rephrase}
We will only give a brief introduction to {\mmt} here and then discuss the concepts using
examples in section \ref{sec:libraries}. We refer to \cite{RK:mmt:10} for further details.
The central notion is that of a \textbf{theory graph} containing \emph{theories} and
\emph{views} (morphisms between theories). Note that {\mmt} libraries actually contain
(possibly nested) collections of documents which in turn contain modules but we omit this
here for simplicity and focus on theory graphs instead.
Theories $S$ are formed from a set of typed symbols and axioms describing their
properties. Views $v : S \rightarrow T$ are morphisms between theories which map axioms
of the source theory ($S$) to theorems of the target theory ($T$). This property ensures
that all theorems of the source theory induce theorems of the target theory and induces a
homomorphic translation from $S$-terms to $T$-terms by replacing every occurrence of an
$S$-axiom with is corresponding $T$-theorem.
In addition to views, the module level structure in {\mmt} theory graphs is given by
theory inheritance. The most general kind of inheritance in {\mmt} is represented by
\emph{structures} which are (possibly) partial named imports (and defined using theory
morphisms). \emph{Includes} are trivial structures which are unnamed and total while
\emph{metas} are distinguished includes representing the meta-theory (each theory can have
at most one meta-theory).
Every {\mmt} declaration is identified by a canonical, globally unique URI. Theories and
views can be referenced relative to the URI $U$ of the theory graph that contains them by
$U$?\llquote{theory-name} and $U$?\llquote{view-name}, respectively.
Symbol declarations can be referenced relative to the URI of their containing theory $T$
by $T$?\llquote{symbol-name}, where \llquote{symbol-name} is of the form
\llquote{$m_1$}/\ldots\llquote{$m_n$}/\llquote{const}, where the $m_i$ are the names of
morphisms inducing the symbol \llquote{const} into $T$. Similarly, assignment
declarations can be referenced relative to the URI of their containing view $v$ by
$v$?\llquote{symbol-name}.
% Intuitively, a theory morphism formalizes a translation between two formal languages.
% For example, the inclusion from the theory of semigroups to the theory of monoids (which
% extends the former with two declarations for the unit element and the neutrality axiom)
% can be formalized as a theory morphism. More complex examples are the G\"odel-Gentzen
% negative translation from classical to intuitionistic logic or the interpretation of
% higher-order logic in set theory.
% Every {\mmt} declaration is identified by a canonical, globally unique URI. In
% particular, the URIs of symbol and assignment declarations are of the form $\identhy?c$
% and $\idenview?c$.
% {\mmt} symbol declarations subsume most semantically relevant statements in declarative
% mathematical languages including function and predicate symbols, type and universe
% symbols, and --- using the Curry-Howard correspondence --- axioms, theorems, and
% inference rules. Their syntax and semantics is determined by the foundation, in which
% {\mmt} is parametric. In particular, the validity of a theory graph is defined relative
% to a type system provided by the \textbf{foundation}.
The \mmt system provides an API to the {\mmt} data structures described above and the \mmt
implementation \cite{project:mmt,RK:mmt:10} provides a Scala-based \cite{scala} open
source implementation of the {\mmt} API. The \mmt system can also generate MWS harvests
from {\mmt} libraries so, from the perspective of {\mws}, it can be seen as a MWS
crawler. Therefore, with respect to {\mmt} our {\flatsearch} approach involves generating
the flattened version on a library and then exporting it as MWS harvests.
\subsection{{\latin}}\label{sec:latin}
The \latin project (\textbf{L}ogic \textbf{At}las and \textbf{In}tegrator)
\cite{project:latin,LATIN:url} aims at developing tools for interfacing logics and proof
systems. It focuses on developing a knowledge representation framework that is
foundationally unconstrained and thus allows for the representation of most meta-theoretic
foundations of mathematical knowledge in the same format.
The \latin logic graph is built as an {\mmt} library and already contains work related to
first-order logic (TPTP \cite{tptp}, CASL \cite{CaslTCS03,CASL-RM} and Mizar
\cite{mizar,UrbanBan:pem06}), higher-order logic (PVS \cite{OwRu92}, Isabelle/HOL
\cite{Nipkow-Paulson-Wenzel:2002} and HasCASL \cite{SchMos:htisdfp02}), logical frameworks
(LF \cite{Pfenning91} and Isabelle \cite{isabelle-isar}) and mathematics (set theory,
natural and rational numbers, algebraic structures and others). Furthermore, whenever
possible, these representations are structured and related using logic morphisms.
The \latin library currently contains 449 theories and 382 views between them. The
theories themselves contain a total of 2310 symbol declarations and 1072 direct imports
(including metas) resulting in an average of 5.14 declarations and 2.39 direct imports per
theory. This amounts to a size of 123.9 MB in its native OMDoc format and 25.2 MB when
converted to MWS harvests.
\end{oldpart}
%Due to it's size, diversity and high level of modularity the \latin logic graph .
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "all"
%%% End:
% LocalWords: miancu mws mws KohPro MWSmanual emph mmt ednote textbf mapsto
% LocalWords: rightarrow llquote ldots odel-Gentzen identhy idenview ogic tptp
% LocalWords: tegrator CaslTCS03 pem06 Nipkow-Paulson-Wenzel htisdfp02
% LocalWords: Pfenning91 isabelle-isar
To search for induced statements, we use our {\mws} system~\cite{ProKoh:mwssofse12},
which indexes formula-URL pairs and provides a web interface querying the formula index
via unification. This can be used for
\begin{compactdesc}
\item[Instance Search] e.g. to find all instance of associativity we can issue the query
$\forall x,y,z : \qvar{S}.(x\qvar{op} y) \qvar{op} z = x \qvar{op} (y\qvar{op} z)$,
where the $\qvar{-}$ are query variables that can be instantiated in the query. In the
library from Figure~\ref{fig:elal-graph} we would find the commutativity axiom
\textsf{SemiGrp}/\textsf{assoc}, its directly inherited versions in \textsf{Monoid}, to
\textsf{Ring} and in particular the version
$u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc}.
\item[Applicable Theorem Search] where universal variables in the index can be
instantiated as well; this was introduced for a non-modular formal libraries
in~\cite{IKRU:mizar:11}. Here we could search for $3+4=\qvar{R}$ and find the induced
statement $u$?\textsf{IntArith}?\textsf{c}/\textsf{comm} with the substitution $R\mapsto
4+3$, which allows the user to instantiate the query and obtain the equation $3+4=4+3$
together with the justification $u$?\textsf{IntArith}?\textsf{c}/\textsf{comm} that can
directly be used in a proof.
\end{compactdesc}
The implementation of a web service that conducts such searches is very simple: instead of
harvesting formulae directly from a formal digital library directly as
in~\cite{IKRU:mizar:11}, we flatten the library first, and then harvest
formulae. Conveniently,{\mmt} flattening gives the included constants and axioms local names
that are syntactically identical to the respective symbol paths in {\mmt} URIs, so that the
generation of {\mmt} URIs for the formula harvests is trivial.
The only non-trivial part of the implementation is the search human-oriented front-end,
i.e. the input of search queries and the presentation of search results. The {\latin}
atlas is written in an extension of the TWELF encoding~\cite{RS:twelfmod:09} of
LF~\cite{HarperEtAl:affdl93}, so it is natural to use an extension of LF notation with
query variables for input. Therefore, we use the {\mmt} notation language and
interpretation service described in \cite{IR:ui:12} to transform LF-style input into
{\mmt} objects and subsequently to MWS queries.
The presentation of the {\mmt} URIs requires some work as well: while the {\mmt} system
can directly dereference the {\mmt} URI and thus be used to present the induced statement,
humans want a justification that is more understandable than a {\mmt} URI. Fortunately,
this can be generated from the {\mmt} URI by a simple template-based algorithm.
Let us consider the search result
$u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc} from the instantiation search
above, where we take $u$ to be \verb|http://cds.omdoc.org/cds/elal|. The first step is to
localize the result in the theory $u$?\textsf{IntArith} with the sentence
\begin{labeledquote}\label{lq:theory}\sf
Induced statement $\forall x,y,z : \mathbb{Z}.(x+y)+z = x+(y+z)$ found in
\underline{\texttt{http://cds.omdoc.org/cds/elal?IntArith}} (\underline{subst},
\underline{justification}).
\end{labeledquote}
Here the underlined fragments carries hyperlinks, the second pointing to the
justification:
\begin{labeledquote}\sf\label{lq:view}
\underline{\textsf{IntArith}} is a \underline{\textsf{CGroup}} if we interpret $\circ$
as + and $G$ as $\mathbb{Z}$.
\end{labeledquote}
which can be directly from the information associated to the morphism $\mathsf{c}$ in the
{\mmt} URI. Then we skip over $\mathsf{g}$, since its assignment is trivial and generate the
sentence.
\begin{labeledquote}\sf\label{lq:inclusions}
\underline{\textsf{CGroup}}s are \underline{\textsf{SemiGrp}}s \underline{by
construction}
\end{labeledquote}
and finally we ground the explanation by the sentence
\begin{labeledquote}\sf\label{lq:ground}
In \underline{\textsf{SemiGrp}}s we have the axiom \underline{$\mathsf{assoc}: \forall
x,y,z : G . (x\circ y)\circ z=x\circ (y\circ z)$}
\end{labeledquote}
The sentences (\ref{lq:theory}) to (\ref{lq:ground}) can be generated from templates,
since the {\mmt} system gives access to the necessary information: source and target theory
as well as the assignment $\psi'$ for (\ref{lq:view}), the fact that the path from
\textsf{SemiGrp} to \textsf{CGroup} only consists of inclusion that triggers the template
for (\ref{lq:inclusions}) and the original formulation of the axiom $\mathsf{assoc}$.
\begin{figure}
\begin{center}
\includegraphics[width=\textwidth]{flatsearch}
\end{center}
\caption{The {\flatsearch} Web Interface for {\latin}}\label{fig:flatsearch}
\end{figure}
Note that we make use of another peculiarity of the {\mmt} system in this explanation: all
constants in the theory graph carry notation declarations~\cite{KMR:NoLMD08}, which can be
used to generate human-readable presentations of arbitrary formal objects in the
graph.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "paper"
%%% End:
% LocalWords: miancu mws ProKoh mwsofse compactdesc forall qvar qvar qvar qvar
% LocalWords: qvar elal-graph textsf SemiGrp textsf assoc textsf IntArith comm
% LocalWords: mapsto mmt affdl93 ednote labeledquote lq mathbb texttt subst
% LocalWords: circ mathsf
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment