initial version

83684a0a · m-iancu · 97f3774f · 83684a0a · 83684a0a · 83684a0a
Commit 83684a0a authored Feb 10, 2015 by m-iancu
--- a/adoptions/conc.tex
+++ b/adoptions/conc.tex
+We have presented theory graph design patterns for document structures at the theory level
+in \omdoc2. These design patterns allow us to account for documents that recap or recount
+mathematical knowledge already marked up in other documents. 
+
+In the future we want to look into the communication-enabling partial isomorphisms
+postulated in Section~\ref{sec:course} and see whether~\cite{KohRabSac:fvip11} is directly
+applicable.
+
+\ednote{Interesting theoretical, practical, and social/organizational problems $\leadsto$
+  OAFF}
\ No newline at end of file
--- a/adoptions/intro.tex
+++ b/adoptions/intro.tex
+A central idea behind the OMDoc/OpenMath data model is that all symbols have a home theory
+(the document that introduced the concept; essentially the content dictionary). In the
+context of flexiformal\footnote{i.e. formalized to flexible depths; see~\cite{KohKoh:tfndc11} and
+  ~\cite{Kohlhase:tffm13} for a discussion of this notion.} digital libraries, this information design creates a tension
+between usability (which demand standardized content dictionaries) and
+flexibility/locality.
+\ednote{Expand, discuss the alternatives, discuss multiple theories and
+  realms~\cite{CarFarKoh:tr13}.}
+  
--- a/adoptions/local.sty
+++ b/adoptions/local.sty
+\newcommand\mthy[2]{\begin{tabular}{l}\textsf{#1}\\\hline #2\end{tabular}}
+\newcommand\cn[1]{\ensuremath{\mathsf{#1}}}
+\def\tup#1{\langle #1\rangle}
+%systems
+\newcommand{\mws}{{\scshape{MathWebSearch}}\xspace}
+\newcommand{\mmt}{{\scshape{Mmt}}\xspace}
+\newcommand{\latin}{{\scshape{LATIN}}\xspace}
+\newcommand{\openmath}{{\scshape{OpenMath}}\xspace}
+\newcommand{\latexml}{{\scshape{LaTeXML}}\xspace}
+\newcommand{\omdoc}{{\scshape{OMDoc}}\xspace}
+\newcommand{\sTeX}{{\scshape{sTeX}}\xspace}
+
+
+%BNF grammar
+\newcommand{\alt}{\;{\color{red}|}\;}
+\newcommand{\bbc}{{\color{red}::=}}
+\newcommand{\rep}{{\color{red}\ast}}
+\newcommand{\opt}[1]{{\color{red}[}#1{\color{red}]}}
+
--- a/adoptions/paper-blx.bib
+++ b/adoptions/paper-blx.bib
+@Comment{$ biblatex control file $}
+@Comment{$ biblatex version 2.4 $}
+Do not modify this file!
+
+This is an auxiliary file used by the 'biblatex' package.
+This file may safely be deleted. It will be recreated as
+required.
+
+@Control{biblatex-control,
+  options = {2.4:0:0:1:0:0:1:1:0:1:0:0:12:1:3:1:79:+},
+}
--- a/adoptions/paper.pdf
+++ b/adoptions/paper.pdf
--- a/adoptions/paper.tex
+++ b/adoptions/paper.tex
+\documentclass{llncs}
+\usepackage{url,amstext,amssymb,textcmds}
+\usepackage{tikz,standalone}
+\usepackage{labeledquote,xspace}
+\usetikzlibrary{shapes,arrows}
+\usetikzlibrary{mmt}
+\usepackage{modules,presentation,paralist}
+\def\gray{\textcolor{gray}}
+\usepackage[show]{ed}
+\usepackage{paralist}
+\usepackage{listings}
+
+%\usepackage{hyperref}
+%\hypersetup{linkcolor=red,citecolor=blue,urlcolor=gray,colorlinks,breaklinks,bookmarksopen,bookmarksnumbered}
+
+
+%\usepackage[eso-foot,today]{svninfo}
+\usepackage[final]{svninfo}
+\svnInfo $Id: paper.tex 1222 2014-01-23 08:15:46Z kohlhase $
+\svnKeyword $HeadURL: https://svn.mathweb.org/repos/mws/doc/papers/mir12/paper.tex $
+\usepackage[linkcolor=black]{hyperref}
+
+\usepackage[hyperref=auto,style=alphabetic,citestyle=alphabetic,isbn=false,backend=bibtex]{biblatex}
+\bibliography{pub_rabe,systems,kwarcpubs,extpubs,kwarccrossrefs,extcrossrefs}
+\usepackage{local}
+
+
+\title{Theory Level Structures in Informal Mathematics}
+\author{Michael Kohlhase, Mihnea Iancu\\
+}
+
+\institute{Computer Science, Jacobs University, Bremen, Germany \\ \email{initial.last@jacobs-university.de}}
+\newcommand\tikzinput[2][]{\input{#2}}
+\def\qvar#1{\fbox{#1}}
+\def\llquote#1{\textlangle\kern-.2em\textlangle #1\textrangle\kern-.2em\textrangle}
+\def\flatsearch{\textsc{FlatSearch}\xspace}
+
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+Representation formats based on theory graphs are appealing for formalized mathematics 
+as they provide logic-compatible modularity and foster reuse. Like its formal counterpart, informal mathematics 
+is also fundamentally modular, as documents routinely refer to other documents and rely on concepts defined elsewhere. 
+However, while related, the relations used in informal mathematics do not directly correspond to basic theory graph 
+notions such as imports or theory morphisms.
+
+In this paper we study some common use patterns from informal mathematical 
+documents and discuss how to represent them as theory-level structures in theory graphs.
+We use this as a start for developing a notion of \emph{flexiformal} theory graphs which can 
+adequately represent content of flexible formality.
+\end{abstract}
+
+
+
+\section{Introduction}\label{sec:intro}
+\input{intro}
+
+\section{Preliminaries}\label{sec:prel}
+\input{prel}
+
+\section{Theory Level Phenomena in Informal Mathematics}\label{sec:pheno}
+\input{phenomena}
+
+\section{Theory Patterns in MMT/OMDoc Theory Graphs}\label{sec:patterns}
+\input{patterns}
+
+\section{Services/Authoring Workflows}
+\input{workflows}
+
+\section{Conclusion and Future Work}\label{sec:concl}
+\input{conc}
+
+
+
+%\bibliographystyle{plain}
+
+\printbibliography
+\end{document}
--- a/adoptions/patterns.tex
+++ b/adoptions/patterns.tex
+\ednote{tu discuss how to represent the examples in \ref{sec:pheno} using theory graphs}
+We introduce \ednote{MI@MK is this true? I need to check} the notion of \emph{theory patterns} which we define as 
+any connected subgraph of a theory graph. 
+
+Adding an equivalent definition corresponds to a realm extension, where the face is fixed, and the view from the face to the current 
+theory is postulated (in informal math). Mathematical papers re-introduce concepts with the implied assumption of semantic equivalence in the context of the paper.
+I.e. every statement in the paper holds for the original definition too, but not necessarily the converse.
+
+
+\begin{figure}[ht]\centering
+ \begin{tikzpicture}
+  \node[thy] (p) at (-2,0) {Paper};
+  \node[thy] (l) at (2,0) {Cited Paper};
+  \node[thy] (d) at (2,-1) {$\cdots$};  
+  \node[thy] (b) at (2,-2) {$\bot$};
+  
+  \node[thy] (r) at (0,2) {Realm Face};
+  
+  \draw[view] (r) to node[above] {$\cn{v_1}$} (p);
+  \draw[view] (r) to node[above] {$\cn{v_2}$} (l);
+  \draw[include] (l) to (p);
+  \draw[conservative] (b) to (d);
+  \draw[conservative] (d) to (l);
+  
+  \draw (-1,-2.5) -- (3,-2.5) -- (3,2.5) node[above] {Realm} -- (-1,2.5) -- (-1,-2.5);
+  \draw (1,-2.5) -- (3,-2.5) -- (3,0.5) node[above] {Pillar} -- (1,0.5) -- (1,-2.5);
+  
+
+ \end{tikzpicture}
+\end{figure}
+
+
+\begin{oldpart}{from bluenote}
+
+We will progress by going through the examples from the introduction. \ednote{MK@MK: say
+  something about what digital libraries should look like: a theory for every variant
+  highly modularized, narratively enhanced overview for salient theories that are often
+  used, see Carette/Farmer high-level theories, \cite{CarFarKoh:tr13}}
+
+\begin{example}[A Course grounded in a Formal Library]\label{sec:course}
+Take for instance a course which introduces (naive) set theory informally, but grounds
+itself in a formal, modular definition. Then we have the situation in
+Figure~\ref{fig:slides-library}. On the right hand side, we have a careful introduction in
+the form of a modular theory graph starting at a theory \cn{ZFset} that introduces
+membership relation and the axioms of existence, extensionality, and separation and
+defines the set constructor $\{\cdot |\cdot\}$ from these axioms. On the left we have a
+theory \cn{SET} that ``adopts'' the symbols $\in$ and $\{\cdot |\cdot\}$ via a partial
+inclusion $\cn{a_1}$ from \cn{ZFset} to \cn{SET} but ``defines'' them by alluding the
+intuitions of the students. Note that a partial inclusion always gives rise to a view in
+the opposite direction, here the view \cn{v_1} from \cn{SET} to \cn{set}. Note that we
+cannot discharge the proof obligations in \cn{v_1}, since the definition of the set
+constructor $\{\cdot |\cdot\}$ is opaque -- i.e. given as natural language, which is not
+subject to formal methods; see~\cite{Kohlhase:fsgo13} for a discussion.  We should think
+of \cn{v_1} as a ``definitional view'' that gives meaning to the opaque parts in \cn{SET}:
+the proof obgligations have to be met in order for the diagram to commute (which is an
+invariant we want to maintain).
+
+We call this situation where the symbols of a theory are imported via a partial inclusion
+and their meaning is specified via a view that is a partial inverse an \textbf{adoption}
+and the morphism pair an \textbf{adoption} morphism. 
+
+\begin{figure}[ht]\centering
+\begin{tikzpicture}[yscale=1.7]
+  \node at (1,2.6) {\large Library};
+  \node at (-4.5,2.6) {\large Course};
+  \node[thy] (zfset) at (1,-.2) {\mthy{ZFset}{$\in$, \cn{Ex}, \cn{Ext}, \cn{Sep}; $\{\cdot |\cdot\}$}};
+  \node[thy] (zfunion) at (-.2,1) {\mthy{ZFunion}{\cn{Un}, $\cup$}};
+  \node[thy] (zfpow) at (2.2,1) {\mthy{ZFpow}{\cn{Pow}, $\mathcal{P}$}};
+  \node  at (1.1,1) {\ldots};
+  \node[thy] (zfop) at (1,2) {ZFops};
+  \draw[include] (zfset) -- (zfunion);
+  \draw[include] (zfset) --  (zfpow);
+  \draw[include] (zfunion) -- (zfop);
+  \draw[include] (zfpow) -- (zfop);
+  \node[thy] (set) at (-4.5,-.2) {\mthy{SET}{$\in$, $\{\cdot |\cdot\}$}};
+  \node[thy] (sop) at (-4.5,2) {\mthy{SETOPS}{$\cup$,\ldots,$\mathcal{P}$}};
+  \draw[view] (set) to[out=-5,in=185] node[below] {$\cn{v_1}$} (zfset);
+  \draw[pinclude] (zfset) to[out=175,in=5] node[above] {$\cn{a_1}\colon\in,\{\cdot |\cdot\}$}(set);
+  \draw[view] (sop) to[out=-5,in=185] node[below] {$\cn{v_2}: \text{incl}\; \cn{v_1}$} (zfop);
+  \draw[pinclude] (zfop) to[out=175,in=5] node[above] {$\cn{a_2}\colon\cup,\ldots,\mathcal{P},\text{incl}\; \cn{a_1}$} (sop);
+  \draw[include] (set) -- (sop);
+\end{tikzpicture}
+\caption{A course grounded in a modular Library}\label{fig:slides-library}
+\end{figure}
+
+Let us now continue looking at the example in Figure~\ref{fig:slides-library}: The
+informal course materials go on in style, introducing the set operations ranging from set
+union to the power set in one go in the theory \cn{SETOPS}. On the library side, we
+introduce the set theory axioms one by one, derive the respective operators from them, and
+at the end collect all the material in the theory \cn{ZFops}. A this point, we can justify
+the theory \cn{SETOPS} via a partial inclusion of the symbols $\cup$, \ldots,
+$\mathcal{P}$ from \cn{ZFops}, which gives rise to another definitional view \cn{v2} to
+\cn{ZFops}. As \mmt allows assignments to structures, we can simply assign the inclusion
+from \cn{SET} to \cn{SETOPS} to the (unique) inclusion from \cn{ZFset} to \cn{ZFops} (as
+indicated by the ``assignment'' include $\cn{v_1}$ on $\cn{v_1}$).
+
+We observe that the two theory graphs are self-contained: the course materials can be
+understood without knowing about the library; in particular, the membership relation used
+in the definition of the union operator in \cn{SETOPS} is from theory \cn{SET}.
+
+This self-containedness is important intra-course didactics. But it also has the problem
+that the courses become insular; how are students going to communicate with mathematicians
+who have learned their maths from other courses?  Here is where the views \cn{v_i} come
+in. Say the other mathematicians have course theories \cn{\overline{SET}} and
+\cn{\overline{SETOPS}} with views \cn{\overline{v_1}} and \cn{\overline{v_2}} into the
+same library, then the views \cn{v_1} and \cn{\overline{v_1}} induce a partial
+isomorphisms between \cn{SET} and \cn{\overline{SET}} in the sense
+of~\cite{KohRabSac:fvip11} (and correspondingly between \cn{SETOPS} and
+\cn{\overline{SETOPS}}) that justify communication.
+\end{example}
+
+
+\subsection{Accelerated Turing Machines}
+
+The case of the mathematical paper case study mentioned above is more difficult, since we
+do not have a direct generalization relation as in the case above. In this case, the
+definition of the accelerated Turing machine involves a concrete step size ($2^{-n}$),
+whereas the definition it recaps allows arbitrary sequences of step sizes as long as their
+sum remains finite. Thus we have the situation in Figure~\ref{fig:atm}. Theory \cn{ATM}
+contains the (opaque) sentence (\ref{lq:atm}), but there cannot be a view from \cn{ATM} to
+\cn{atm} as that is more general. But we do have a view to \cn{atm(2^{-n})}, which
+naturally arises in treatments of accelerated Turing machines as an example.
+
+\begin{figure}[ht]\centering
+  \begin{tikzpicture}[scale=1.3]
+    \node at (-.5,2.7) {\large Paper};
+    \node at (2.5,2.7) {\large Literature};
+    \node[thy] (ATMp) at (-.5,2) {\cn{ATMhalt}};
+    \node[thy] (catmp) at (2,2) {\cn{atm(2^{-n})halt}};
+    \node[thy] (ATM) at (-.5,1) {\cn{ATM}};
+    \node[thy] (atm) at (3.5,0) {\cn{atm}};
+    \node[thy] (catm) at (2,1) {\cn{atm(2^{-n})}};
+    \node[thy,dotted] (atmp) at (3.5,1) {\cn{atmhalt}};
+    \draw[view] (ATM) -- node[above] {\cn{v_1}} (catm); 
+    \draw[view] (ATMp) -- node[above] {\cn{v_2}} (catmp); 
+    \draw[include] (atm) -- (catm);
+    \draw[include] (ATM) -- (ATMp);
+    \draw[include] (catm) -- (catmp);
+    \draw[include,dotted] (atm) -- (atmp);
+    \draw[include,dotted] (atmp) -- (catmp);
+  \end{tikzpicture}
+  \caption{Definitional View for ATM}\label{fig:atm}
+\end{figure}
+
+The more important aspect is that the contribution of~\cite{CalStai:natm09} (depicted here
+as theory \cn{ATMhalt} as it concerns undecidability of the accelerated halting problem
+by accelerated Turing machines) can be justified via a view into a theory
+\cn{atm(2^{-n})halt}, which can be systematically constructed from \cn{ATMhalt} modulo
+\cn{v_1} as pushouts along inclusions always exist in \mmt. In the particular example, we
+can do even better: as the proof in theory \cn{ATMhalt} does not use any properties of the
+step size sequence $2^{-n}$, there is a theory \cn{atmhalt} that generalizes
+\cn{atm(2^{-n})halt} (shown dotted in Figure~\ref{fig:atm}). Note that this construction
+is automatic and needs human intervention -- but one that the authors expect their readers
+will readily do, otherwise they would not have restricted themselves to the concrete
+sequence.
+\end{oldpart}
\ No newline at end of file
--- a/adoptions/phenomena.tex
+++ b/adoptions/phenomena.tex
+\ednote{to discuss and categorize examples}
+
+\begin{oldpart}{Examples list}
+\begin{itemize}
+\item http://arxiv.org/pdf/1502.01042.pdf uses http://arxiv.org/pdf/math/0401301.pdf with alternative definition
+(similar to ATM case)
+\item http://arxiv.org/pdf/1502.01212.pdf
+\item http://arxiv.org/pdf/1502.00978.pdf 
+\item http://arxiv.org/pdf/1502.02591.pdf
+\item http://arxiv.org/pdf/1502.02308.pdf
+\item http://arxiv.org/pdf/1502.02413.pdf
+\end{itemize}
+\end{oldpart}
+
+\subsection{Recapitulation in Papers}
+
+A common pattern occurs in scientific papers which often have a preliminaries section to
+recapitulate the central notions and fix notations. In its simplest incarnation this can be seen
+as an import. However, in general, it can have a more complex behavior (e.g. see example \ref{ex:atm}).
+
+\begin{example}\label{ex:atm}
+For instance, \cite{CalStai:natm09} contains the sentence
+\begin{labeledquote}\em\label{lq:atm}
+  An accelerated Turing machine (sometimes called Zeno machine) is a Turing machine that
+  takes $2^{-n}$ units of time (say seconds) to perform its $n$\textsuperscript{th} step;
+  we assume that steps are in some sense identical except for the time taken for their
+  execution.
+\end{labeledquote}
+This is a telegraphic version of the full definition, which is given in the literature.
+Actually~\cite{CalStai:natm09} continues with an overview of the literature, citing no
+less than 12 papers, which address the topic of accelerated Turing machines.  One of these
+supposedly contains the formal definition, which involves generalizing Turing machines to
+timed ones, introducing computational time structures, and singling out accelerating ones,
+e.g. using (\ref{lq:atm-formal}).
+
+\begin{labeledquote}\label{lq:atm-formal}\em
+  \textbf{Definition 1.3}: An \textbf{accelerated Turing machine} is a \underline{Turing
+    machine} $M=\langle X,\Gamma,S,s_o,\Box,\delta\rangle$ working with with a
+  \underline{computational time structure} $T= \tup{\{t_i\}_i,<,+}$ with
+  $T\subseteq\mathbb{Q_{+}}$ ($\mathbb{Q_{+}}$ is the set of non-negative rationals) such
+  that $\sum_{i\in \mathbb{N}} t_i <\infty$.
+\end{labeledquote}
+\end{example}
+
+\begin{oldpart}{from bluenote, not sure what to do with this -- see ednote}
+In \cite{CalStaiKoh:natms12} we are undertaking the experiment to
+flexiformalize \cite{CalStai:natm09}, and in
+this we have to disambiguate the concept of an accelerated Turing machine. 
+\ednote{MI@MK: couldn't find the paper referenced, link seems broken, couldn't find repo
+on strawberry}
+\end{oldpart}
+
+In the example above the papers local definition of an ``accelerated Turing machine'' (ATM) 
+is slightly different from the one referenced in the literature.
+Specifically, it's a special case where the computation time structure is $2^{-n}$. 
+Therefore, treating it as an import is not adequate as the two concepts have clearly distinguishable semantics.
+
+We have two choices for grounding a concept $c$ that occurs in recap :
+\begin{compactenum}
+\item either we find the ``real definition'' from the literature and use this as the ``origin'' or
+\item we consider the introductory sentence in the recap (e.g. \ref{lq:atm} for ATMs in example \ref{ex:atm}) 
+as a definition and use this as the ``origin''.
+\end{compactenum}
+Which one is better depends on the situation.
+If the notion used in the paper is exactly the same as in the literature, we
+should probably choose 1. and just import the respective theory\footnote{Assuming that the
+paper in question has already been flexiformalized} so that we can reference it. In this case 
+the local ``definition'' (e.g. \ref{lq:atm} in example \ref{ex:ex:atm})of $c$ becomes a ``verbalization'' \ednote{introduce verbalizations in preliminaries}
+of the original definition in the literature. 
+
+The advantage of this choice is that the statements about $c$ made in the paper
+are directly compatible with any others that also reference this definition of $c$. This
+has enormous advantages for knowledge management and knowledge access.
+
+If the local notion is in any way different (such as in the ATM example above) than 
+the one in the literature (e.g. a generalization or a special case), we are forced into the second possibility. 
+However, while two concepts are different, they are still related -- after all, there was a reason to consider 
+the technical differences so small that they can be neglected. 
+Therefore, we should establish formally the relation between the two variants and their containing theories.
+We call this relation an \emph{adoption} as the local notion adopts (part of) its meaning from the literature.
+In section \ref{sec:repr} we discuss how adoptions can be represented in theory graphs using imports and theory morphisms. 
+
+\ednote{MK@MK: more, e.g. about the distinction of notations and
+  differing concepts.}
+
+\subsection{Slides and Background Materials}
+
+A similar situation obtain with slides and background materials (lecture notes, books,
+encyclopedias), where the slides often have telegraphic versions of the real statements,
+which verbalize more rigorous definition. 
+
+The problem of all of these materials is that they are of a certain ``reference
+character'': For encyclopedias this is obvious as their primary purpose is to serve as
+reference works, even if they are not the documents that initially introduce the concepts
+in question. Textbooks and lecture notes are natural references as they are the places
+where students first encounter the concepts.  \ednote{MK: The solution for this is ADP and
+  transclusion}
--- a/adoptions/prel.tex
+++ b/adoptions/prel.tex
+\ednote{to introduce basic OMDoc notions (docs, theories, declarations, verbalizations, notation definitions, URIs), realms}
\ No newline at end of file
--- a/adoptions/workflows.tex
+++ b/adoptions/workflows.tex
+\begin{oldpart}{from bluenote}
+\subsection{\protect\sTeX}
+Now let us consider how we would efficiently write such theory graphs in practice. We
+develop a practical \sTeX-based solution for the course materials example from
+Section~\ref{sec:course}. Let us assume that we already have a \sTeX representation of the
+modular library on the right of Figure~\ref{fig:slides-library}, where the modules
+\cn{ZF*} introduce semantic macros for all the relevant symbols. When we develop a new
+course, we would like to take advantage of the existing semantic macro definitions since 
+\begin{inparaenum}[\em i\rm)] 
+\item they represent a sizable investment, and 
+\item we automatically ensure consistent notation by using them.
+\end{inparaenum}
+
+For the {\LaTeX} workflow this is very simple, we can just use the
+\lstinline|\importmodule| macro to import the module \cn{ZFset} into the new module
+\cn{SET}. Similarly, we can import module \cn{ZFops} into \cn{SETOPS}.
+
+But the generated OMDoc representations would be wrong: we would generate structures in
+the opposite direction instead of the views \cn{v_i}. Therefore we extend \sTeX modules
+package with a new macro \lstinline|\adoptmodule|, which is synonymous with
+\lstinline|\importmodule| for {\LaTeX} formatting, but whose \latexml binding generates
+the appropriate view \emph{and} generates the symbols with home theory \cn{SET} instead of
+their native \cn{ZFset} we say that the symbols are \textbf{adopted} in module
+\cn{SET}. Note that the situation is a bit subtle here; to see how let us consider
+recapping theory \cn{ZFops} in \cn{SETOPS}. Here we only want to adopt the symbols into
+\cn{SETOPS} that have not already been imported from \cn{SET} (after being adopted there
+from \cn{ZFset}). This behavior ensures that the course theories are self-contained and
+are interpreted by the intended views.\ednote{MK: \sTeX needs to generate the adopted
+  notation definitions as well. Maybe we should use \omdoc references here to save space.}
+
+But there is another subtlety: views are not allowed inside theories (the \omdoc
+equivalent of \sTeX modules), so we have to wait for the end of the module and deposit the
+view there.
+
+With \lstinline|\adoptmodule| we can also handle the accelerate Turing machines paper. 
+
+\subsection{Services}
+\begin{todolist}{expand}
+  \item auto-generation of recap material from content commons
+  \item need a community process of moving ``original theory'' towards encyclopedias.
+\end{todolist}
+\end{oldpart}
--- a/flatsearch/conc.tex
+++ b/flatsearch/conc.tex
+
+We have presented an extension of a formula search search engine to modular formal
+libraries. In our application case, most of the services that are needed for this
+extension:
+\begin{inparaenum}[\em i)]
+\item compositional naming of induced statements,
+\item flattening,
+\item transformation of formulae to content MathML, and 
+\item presentation of arbitrary objects
+\end{inparaenum}
+already came with the {\mmt} system. Any other system (e.g. the Isabelle
+system~\cite{Nipkow-Paulson-Wenzel:2002} for the Isabelle library~\cite{IsabelleKB:on})
+would have to implement similar functionality. Our experience with querying shows that the
+statements induced by views are by far more useful than those induced by inclusions,
+because (as in our graph in Figure~\ref{fig:elal-graph}) the interest is in inducing
+statements from the abstract part to concrete structures. As a consequence, {\flatsearch} 
+becomes more and more useful as the graph acquires more and more views.
+
+Currently, our {\flatsearch} is restricted to fully formal modular libraries, since flattening is
+only understood for those. With a notion of flattening of flexiformal representations
+(representations of mathematical knowledge at flexible levels of formality;
+see~\cite{KohKoh:tfndc11}) we could extend {\flatsearch} to modular flexiformalizations of
+traditional mathematical documents. Indeed Laubner's study of the first 35 pages of
+Bourbaki's Algebra mentioned in the introduction revealed an underlying theory graph with
+51 theory nodes and 107 theory morphisms of which 12 were views, but 63 had non-trivial
+assignments. Especially the last number shows the value of the {\flatsearch} method. 
+Indeed one of the problems readers face with the Bourbaki
+books (which are otherwise well-liked for their structured approach) is that particular
+mathematical structures and objects can only be understood, if one already knows all the
+material they depend on. One author even said that
+\begin{quote}\em
+  Bourbaki was a dinosaur, the head too far away from the tail. Explaining: [\ldots] You
+  could say ``Dieudonn\'e what is the result about so and so?'' and he would go to the
+  shelf and take down the book and open it to the right page. After Dieudonn\'e retired no
+  one was able to do this. So Bourbaki lost awareness of his own
+  body\rm\hfill\cite{Richter:nb:on}
+\end{quote}
+A flexiformalization of the Bourbaki books together with an extension of MMT that can deal
+with flattening of informal texts would go a long way to alleviate these problems.
+
+\flatsearch queries are currently restricted to pure unification queries, for a more
+expressive user experience, we plan to embed it into a more comprehensive mathematical
+query language, most probably starting with~\cite{Rabe:qlfml12}. 
+
+Finally note that our query method is similar in spirit to ``Semantic Web Queries'', where
+queries for RDF triples that are induced by a background ontology from explicitly
+represented RDF facts are possible. The inferencing involved in this corresponds to the
+flattening step in \flatsearch. However, DL-reasoners like Racer~\cite{HarMoe03:racer}
+answer queries by performing inferences on the fly, and do not have the efficiency of a
+search index at their disposal, and can only deal with relatively small A-boxes as a
+result. Current high-efficiency triple stores that do employ (database) indexing
+techniques only support a very restricted subset of ontologies (as a typical example,
+Virtuoso~\cite{OpenLinkVirtuoso:webpage} supports transitive closures of selected
+relations, but not a lot more).
+
+%%% Local Variables: 
+%%% mode: LaTeX
+%%% TeX-master: "all"
+%%% End: 
+
+% LocalWords:  miancu inparaenum Nipkow-Paulson-Wenzel elal-graph KohKoh ednote
+% LocalWords:  tfndc11 Laubner's flatsearch ldots Dieudonn rm hfill qlfml12 mmt
+% LocalWords:  conc.tex
--- a/flatsearch/flatsearch.png
+++ b/flatsearch/flatsearch.png
--- a/flatsearch/intro.tex
+++ b/flatsearch/intro.tex
+\ednote{MI@MK : I tried to reorganize this and intertwine comments but it might need rewriting}
+
+In the last two decades the way we access information has been completely transformed by
+search engines. If we want to know the birthdate of Barack Obama, the first hit of a Google
+query for ``obama birthdate'' leads us to a wikipedia page that has the information in the
+first line of text: August 4. 1961. But not all information can be directly queried: to
+the best of our knowledge there is no standard search engine that can directly answer the
+question \emph{who was US president when Barack Obama was born}, because the answer
+combines two pieces of information, Obamas birthdate and the fact that John F. Kennedy was
+in office in January 1961 until his assassination in November 1963. In fact, the question
+breaks one of the basic assumptions of search engines: i.e. that it is enough to return
+links to relevant web pages, which have the answer (as part of the text). Indeed, there is
+apparently no page on the wide web that explicitly answers our question (and if there
+were, we could easily find a question whose answer isn't represented). But of course, the
+answer is induced by knowledge explicit on the web. If we call the space of induced
+answers the \textbf{knowledge space}, then we would like to (but currently cannot) search
+the knowledge space induced by the web. Humans naturally build up a knowledge space from
+the facts they learn, and can answer queries based on it, but are -- compared to search
+engines -- severely limited in the breadth of information they can process.
+
+
+In this paper we explore the notion of \emph{knowledge space} in formal theory graphs and 
+discuss how to access or generate induced statements (i.e. statements that are in the knowledge 
+space but not in the original theory graph). We then use it as a basis for building math-literate \cite{math-literate}
+MKM services.
+
+We note that 
+\begin{compactenum} 
+\item if the representation framework of the modular library
+provides a systematic naming scheme for induced statements and
+\item if the library system implements 
+a flattening operation that generates some or all induced statements,        
+\end{compactenum}
+then we can use the flattened theory graph as a basis for semantic services, enabling them to work on the knowledge space.
+
+Concretely, we use the {\mmt} \cite{MMT} language and system, which has a well developed URI scheme that already can 
+refer to induced statements. \ednote{maybe partially for realms}
+We discuss induced statements and implement flattening for {\mmt} and apply it to the LATIN \cite{latin} library, which is a highly
+modular atlas of logics.
+Then, we use the {\mws}~\cite{ProKoh:mwssofse12} formula search engine on the flattened theory graph to provide a method for searching the mathematical knowledge space of them
+LATIN library.
+
+This paper is organized as follows. In section \ref{sec:relwork} we briefly review related work, then in section \ref{sec:prel} we briefly introduce 
+{\mws}, {\mmt} and the {\latin} atlas. In section \ref{sec:libraries} we discuss the problem of generating and accessing induced
+statements in {\mmt}. Then, in section \ref{sec:searching} we describe how {\mws} can be leveraged to search for induced statements.
+Finally, in section \ref{sec:concl} we discuss future work and conclude.
+
+
+\section{Related Work}
+\ednote{not sure where this fits, was in intro but it felt too long}
+There are search engines that attempt to search the knowledge space. For instance DBPedia
+extracts facts from Wikipedia, represents them as RDF triples, and can answer queries that
+can be expressed by chaining relations that occur as predicates of the triples; see
+e.g. ~\cite{AuLeh:whilic07} for details. Even though the RDF formalization of Wikipedia
+facts as well as the expressivity of queries is rather limited, it can answer (a suitable
+version of) the Obama president question above. 
+
+But the space of mathematical knowledge cannot suitably be expressed in RDF triples, since
+it contains mathematical formulae (see~\cite{Lange:PhD,} for a discussion), makes
+non-trivial use of quantifications and introduces concepts and vocabulary
+dynamically. A question that cannot be answered by mathematical search engines is the situation
+in example \ref{ex:intro}.
+\begin{example}\label{ex:intro}
+We have identified a structure $(\mathbb{S},\sharp)$ as an
+associative, unital, idempotent magma, and we are interested whether $\sharp$ is
+commutative. To find out, we would like to search for $\fbox{x}\sharp \fbox{y} = \fbox{y}
+\sharp \fbox{x}$ and find this as an instance of the fact that idempotent monoids are
+Abelian, which we proved in our Algebra 101 course, even though the statement of
+the theorem does not involve the operator $\sharp$.
+\end{example}
+
+In comparison with DBPedia,
+\begin{inparaenum}[a)] 
+\item we start out from fully formal knowledge, so that the need for (automated)
+  formalization  is alleviated, and
+\item we only allow for a controlled regime of inducing knowledge items in the knowledge
+  space: inheritance of statements via theory morphisms. 
+\end{inparaenum}
+We will see that in the context of mathematical knowledge, the second restriction is very
+natural (if the modularity framework is sufficiently strong), and the first restriction
+can be alleviated by automated semantization methods that are researched independently of
+the work presented in this paper. 
+
+This paper picks up ideas introduced in~\cite{Laubner:utgmm07} under the first author's
+supervision. Bastian Laubner's system was based on a partial formalization of Bourbaki's
+Algebra~\cite{Bourbaki:a74} in the OMDoc format~\cite{Kohlhase:OMDoc1.2}, which supported
+modular formalization in theory graphs, but did not provide naming conventions for induced
+statements and did not support flattening.  In the time since 2007, we have developed the
+{\mmt} format~\cite{RabKoh:WSMSML11} and system~\cite{Rabe:MMT}, which does, and what is
+more, we now have a {\mmt} theory graph of more than 800 modules: the {\latin} Logic
+Atlas~\cite{CodHorKoh:palai11}. As the atlas grows, it becomes more and more difficult to
+read the atlas, since most of the theories are specified by inclusion reference to other
+theories, and the actual statements are only reached by long and confusing walks through
+the graph. This problem can be solved by technology: incremental, in-place flattening in
+the {\mmt} front-end. At the same time, it becomes more and more difficult to avoid
+duplication and maintain a maximally shared structure, for this we need a search system
+that takes induced statements (these are the ones we want to share) into account.
--- a/flatsearch/libs.tex
+++ b/flatsearch/libs.tex
+Intuitively, the knowledge space of a mathematical library is formed of all the statements that can be proved in it so it
+is not possible to (easily) mechanically generate the entire knowledge space. 
+However, in the context of theory graphs a substantial amount of interesting knowledge can be generated
+by leveraging relations between theories (such as imports or theory morphisms).
+
+To better understand the situation in modular libraries, consider the theory graph in
+Figure~\ref{fig:elal-graph}. The right side of the graph introduces the elementary
+algebraic hierarchy building up algebraic structures step by step up to rings; the left
+side contains a construction of the integers. In this graph, the nodes are theories
+\footnote{We have left out the quantifiers for the variables $x$, $y$,
+  and $z$ from the axioms to reduce visual complexity. The always range over the
+  respective base set. Furthermore, all axioms are named; but we only state the names we
+  actually use in the examples.},
+the solid edges are imports and the dashed edges are views.
+\begin{figure}[ht]\centering
+\tikzinput{tikz/mmt/elalg-nat-graph}
+\caption{A {\mmt} Graph for Elementary Algebra}\label{fig:elal-graph}
+\end{figure}
+
+In {\mmt} theory graphs, theory morphisms can carry a name, and inherited constants can be
+disambiguated via the inclusion that induced them. An application of this is in the
+definition of the ring theory, which inherits all of its operators (and their axioms) via
+the two inclusions $\mathsf{m}$ (for the multiplicative operations) and $\mathsf{a}$ (for
+the additive operations). To complete the ring we only need to add the two distributivity
+axioms in the inherited operators $\mathsf{m}\kern-.1em/\kern-.2em\circ$ and
+$\mathsf{a}\kern-.1em/\kern-.2em\circ$.
+
+Furthermore, morphisms can carry an assignment which maps symbols and axioms from the
+source theory to terms in the target theory. We see this in the view $\mathsf{e}$ from
+\textsf{Monoid} to \textsf{NatArith}, which assigns $\mathbf{N}$ to the base set $G$,
+multiplication ($\cdot$) to $\circ$ and the number $1$ to the unit $e$. The document
+theory morphism property, $\mathsf{e}$ also contains proofs for all \textsf{Monoid} axioms
+in \textsf{NatArith}. 
+
+It is a special feature of {\mmt} that assignments can also map morphisms into the source
+theory to morphisms into the target theory. We use this to specify the morphism
+$\mathsf{c}$ modularly (in particular, this allows to re-use the proofs from $\mathsf{e}$
+and $\mathsf{c}$).
+
+Note that already in this small graph, there are a lot of induced statements. For
+instance, the associativity axiom is inherited in seven times (via inclusions; twice into
+\textsf{Ring}) and induced four times (via views; twice each into \textsf{NatArith} and
+\textsf{IntArith}). All in all, we have more than an hundred induced statements from the
+axioms alone. If we assume just 5 theorems proven per theory (a rather conservative
+estimation), then we obtain a number of induced statements that is an order of magnitude
+higher.
+
+
+Another crucial ingredient of {\mmt} for the endeavor of searching for induced statements
+is the fact that, as discussed in section \ref{sec:mmt}, {\mmt} supplies names (called
+{\mmt} URIs) for all induced statements.  For example, if we take $u$ to be the URI of the
+theory graph in Figure~\ref{fig:elal-graph}, the statement $\forall x,y,z :
+\mathbb{Z}.(x+y)+z = x+(y+z)$ induced by the view $\mathsf{c}$ in \textsf{IntArith} has
+the {\mmt} URI $u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc}. Note that given
+a theory graph, the {\mmt} system can dereference valid {\mmt} URIs and return the induced
+statements, even though they are not explicitly represented in the graph.
+
+\paragraph{Generating Induced Statements}
+
+We already hinted in section \ref{sec:intro} that generating the induced statements is a \emph{theory graph operation}
+i.e. it takes a theory graph as input and returns a different one, specifically the flattened theory graph. 
+
+An \emph{enriching theory graph operation} is an operation that adds statements or theories to the original theory graph. 
+Effectively enriching theory graph operations generate induced statements and thus make a part of the knowledge space explicit 
+so that it can be used by applications. 
+Theory graph flattening is an \emph{enriching theory graph operation}.
+
+Following the algebraic structures example above we can formally define the process of flattening a theory (or
+a theory graph).
+\begin{definition}
+Given a theory graph $\gamma$ the flattening of a theory $T$ in $\gamma$ is a theory $\overline{T}$ with the same URI as $T$ containing:
+\begin{itemize}
+\item all symbol declarations that are in $T$.
+\item all symbol declarations that are imported into $T$.
+\item for every view $v : S \rightarrow T$ the projection of every $S$-based declaration
+  over view $v$. Here, by $S$-based declaration we refer to the declarations in $S$ and in
+  theories that import $S$.
+\end{itemize}
+\end{definition}
+
+The URIs of the induced declarations are based on the definition of {\mmt} URIs from
+section \ref{sec:mmt} (see also the example above) and permit recovering the origin of
+induced declarations.  Specifically, symbol declarations from $T$ or imported in $T$ by
+the meta or an include preserve their name while symbol declarations imported in $T$ by a
+structure or induced by a view are additionally qualified with the name of that structure
+or view.
+
+
+\begin{definition}
+  The flattening of theory graph $\gamma$ is a theory graph $\overline{\gamma}$ with the
+  same URI as $\gamma$ containing the following module declarations :
+ \begin{itemize}
+  \item for every theory $T$ in $\gamma$ the theory $\overline{T}$ 
+  \item every view $v : S \rightarrow T$ in $\gamma$
+ \end{itemize}
+\end{definition}
+
+Note that, since theory flattening preserves theory URIs and doesn't add new axioms (only
+new theorems) any view from $S$ to $T$ is also a view from $\overline{S}$ to
+$\overline{T}$ so the views in $\overline{\gamma}$ are still valid.
+
+
+An important observation is that the flattening operation for theory graphs is not
+idempotent and flattening can be applied repeatedly to a theory graph to generate new
+induced symbol declarations. Effectively this generates symbol declarations induced not
+only by one view but by a chain (composition) of views.  More precisely, a theory graph
+flattened $n$ times will all contain declarations induced by view chains of length at most
+$n$.  Clearly, if there is a view cycle (e.g. $v_1 : S \rightarrow T$ and $v_2 : T
+\rightarrow S$) this process can continue indefinitely yielding ever bigger content
+libraries.
+
+
+\paragraph{Induced statements in the {\latin} library}
+\ednote{MI@MI: regenerate statistics}
+We implemented library flattening as described above in {\mmt} and tested it on the
+{\latin} library.  The flattening (once) of the {\latin} library increases the number of
+declarations from 2310 to 58847 (a factor of 25.4) and the total size of the library from
+123.9 MB to 1.8 GB (a factor of 14.8).  As expected, the multiplication factor depends on
+the level of modularity of the library. For instance, the highly modular math sub-library
+containing mainly algebraic structures increases from 2.3 MB to 79 MB thus having a
+multiplication factor of 34.3, more than double the library average.  The size of the MWS
+harvests also increases considerably, from 25.2 MB to 539.0 MB.
+
+%%% Local Variables: 
+%%% mode: LaTeX
+%%% TeX-master: "all"
+%%% End: 
+
+% LocalWords:  miancu elal-graph tikzinput tikz mmt mathsf mathsf circ textsf
+% LocalWords:  textsf NatArith mathbf cdot IntArith rightarrow forall mathbb
+% LocalWords:  assoc
--- a/flatsearch/local.sty
+++ b/flatsearch/local.sty
+%systems
+\newcommand{\mws}{{\scshape{MathWebSearch}}\xspace}
+\newcommand{\mmt}{{\scshape{Mmt}}\xspace}
+\newcommand{\latin}{{\scshape{LATIN}}\xspace}
+\newcommand{\openmath}{{\scshape{OpenMath}}\xspace}
+\newcommand{\omdoc}{{\scshape{OMDoc}}\xspace}
+
+%BNF grammar
+\newcommand{\alt}{\;{\color{red}|}\;}
+\newcommand{\bbc}{{\color{red}::=}}
+\newcommand{\rep}{{\color{red}\ast}}
+\newcommand{\opt}[1]{{\color{red}[}#1{\color{red}]}}
+
--- a/flatsearch/paper-blx.bib
+++ b/flatsearch/paper-blx.bib
+@Comment{$ biblatex control file $}
+@Comment{$ biblatex version 2.4 $}
+Do not modify this file!
+
+This is an auxiliary file used by the 'biblatex' package.
+This file may safely be deleted. It will be recreated as
+required.
+
+@Control{biblatex-control,
+  options = {2.4:0:0:1:0:0:1:1:0:1:0:0:12:1:3:1:79:+},
+}
--- a/flatsearch/paper.pdf
+++ b/flatsearch/paper.pdf
--- a/flatsearch/paper.tex
+++ b/flatsearch/paper.tex
+\documentclass{llncs}
+\usepackage{url,amstext,amssymb,textcmds}
+\usepackage{tikz,standalone}
+\usepackage{labeledquote,xspace}
+\usetikzlibrary{shapes,arrows}
+\usepackage{modules,presentation,paralist}
+\def\gray{\textcolor{gray}}
+\usepackage[show]{ed}
+\usepackage{paralist}
+
+%\usepackage{hyperref}
+%\hypersetup{linkcolor=red,citecolor=blue,urlcolor=gray,colorlinks,breaklinks,bookmarksopen,bookmarksnumbered}
+
+
+%\usepackage[eso-foot,today]{svninfo}
+\usepackage[final]{svninfo}
+\svnInfo $Id: paper.tex 1222 2014-01-23 08:15:46Z kohlhase $
+\svnKeyword $HeadURL: https://svn.mathweb.org/repos/mws/doc/papers/mir12/paper.tex $
+\usepackage[linkcolor=black]{hyperref}
+
+\usepackage[hyperref=auto,style=alphabetic,citestyle=alphabetic,isbn=false,backend=bibtex]{biblatex}
+\bibliography{pub_rabe,systems,kwarcpubs,extpubs,kwarccrossrefs,extcrossrefs}
+\usepackage{local}
+
+
+\title{Math Literate Knowledge Management}
+\author{Michael Kohlhase, Mihnea Iancu\\
+}
+
+\institute{Computer Science, Jacobs University, Bremen, Germany \\ \email{initial.last@jacobs-university.de}}
+\newcommand\tikzinput[2][]{\input{#2}}
+\def\qvar#1{\fbox{#1}}
+\def\llquote#1{\textlangle\kern-.2em\textlangle #1\textrangle\kern-.2em\textrangle}
+\def\flatsearch{\textsc{FlatSearch}\xspace}
+
+
+\begin{document}
+
+\maketitle
+
+\begin{abstract}
+  
+  Theory graphs enable modular representations of mathematical knowledge 
+  that capture the theory-level semantics in a concise way.
+  However, this means that a substantial amount of knowledge can be induced 
+  from the theory graph without having to be explicitly represented in it.
+  This is good for knowledge management as the number of induced 
+  (i.e. not explicitly represented) statements can grow exponentially in
+  the explicitly represented ones. But this bad for accessibility, since conventional
+  (computer supported) access methods only work on explicitly represented materials.
+  
+  In this paper we discuss how induced knowledge arises in highly modular, 
+  theory-graph based, mathematical libraries and establish how to access 
+  it to make it available for applications. We present the {\flatsearch} 
+  extension of {\mws} system and show this access method in action.
+  
+\end{abstract}
+
+
+\section{Introduction}\label{sec:intro}
+\input{intro}
+
+\section{Preliminaries}\label{sec:prel}
+\input{prel}
+
+\section{Induced Statements in Theory Graphs}\label{sec:libraries}
+\input{libs}
+
+\section{Accessing Induced Statements}\label{sec:searching}
+\input{search}
+
+\section{Conclusion and Future Work}\label{sec:concl}
+\input{conc}
+
+%\bibliographystyle{plain}
+
+\printbibliography
+\end{document}
--- a/flatsearch/prel.tex
+++ b/flatsearch/prel.tex
+\begin{oldpart}{from old version, should introduce realms and de-emphasize {\mws}}
+\subsection{{\mws}}\label{sec:mws}
+{\mws} (MWS)\cite{KohPro:MWSmanual} is an open-source, open-format, content-oriented search engine for mathematical expressions.
+%We will only give a brief overview of {\mws} here and refer to  for details.
+
+The MWS system consists of three components: 
+\begin{itemize}
+\item A \emph{crawler subsystem} collects data from a mathematical knowledge repository
+  and converts the mathematical expressions into \emph{{\mws} harvests}. The harvests are
+  files containing mathematical expressions encoded in a MWS-specific XML-based format
+  (see \cite{KohPro:MWSmanual} for details).
+\item The \emph{core system} builds the search index from MWS harvests and processes
+  search queries.
+\item The \emph{RESTful interface} provides a public HTTP API for interacting with the
+  core system (i.e. submitting queries and receiving results).
+\end{itemize}
+
+Therefore, with respect to {\mws}, our {\flatsearch} approach involves providing a MWS crawler that generates 
+MWS harvests from flattened {\mmt} libraries and a MWS frontent for submitting queries that will interact with the MWS RESTful interface.
+
+\subsection{MMT}\label{sec:mmt}
+{\mmt} \cite{RK:mmt:10} is a generic, formal module system for mathematical knowledge. The {\mmt} language is
+designed to be applicable to a large collection of declarative formal base languages and
+all {\mmt} notions are fully abstract in the choice of the base language.
+%\ednote{rephrase}
+
+We will only give a brief introduction to {\mmt} here and then discuss the concepts using
+examples in section \ref{sec:libraries}. We refer to \cite{RK:mmt:10} for further details.
+
+The central notion is that of a \textbf{theory graph} containing \emph{theories} and
+\emph{views} (morphisms between theories).  Note that {\mmt} libraries actually contain
+(possibly nested) collections of documents which in turn contain modules but we omit this
+here for simplicity and focus on theory graphs instead.
+
+Theories $S$ are formed from a set of typed symbols and axioms describing their
+properties.  Views $v : S \rightarrow T$ are morphisms between theories which map axioms
+of the source theory ($S$) to theorems of the target theory ($T$).  This property ensures
+that all theorems of the source theory induce theorems of the target theory and induces a
+homomorphic translation from $S$-terms to $T$-terms by replacing every occurrence of an
+$S$-axiom with is corresponding $T$-theorem.
+
+
+In addition to views, the module level structure in {\mmt} theory graphs is given by
+theory inheritance. The most general kind of inheritance in {\mmt} is represented by
+\emph{structures} which are (possibly) partial named imports (and defined using theory
+morphisms). \emph{Includes} are trivial structures which are unnamed and total while
+\emph{metas} are distinguished includes representing the meta-theory (each theory can have
+at most one meta-theory).
+
+Every {\mmt} declaration is identified by a canonical, globally unique URI.  Theories and
+views can be referenced relative to the URI $U$ of the theory graph that contains them by
+$U$?\llquote{theory-name} and $U$?\llquote{view-name}, respectively.
+
+Symbol declarations can be referenced relative to the URI of their containing theory $T$
+by $T$?\llquote{symbol-name}, where \llquote{symbol-name} is of the form
+\llquote{$m_1$}/\ldots\llquote{$m_n$}/\llquote{const}, where the $m_i$ are the names of
+morphisms inducing the symbol \llquote{const} into $T$.  Similarly, assignment
+declarations can be referenced relative to the URI of their containing view $v$ by
+$v$?\llquote{symbol-name}.
+
+
+% Intuitively, a theory morphism formalizes a translation between two formal languages.
+% For example, the inclusion from the theory of semigroups to the theory of monoids (which
+% extends the former with two declarations for the unit element and the neutrality axiom)
+% can be formalized as a theory morphism.  More complex examples are the G\"odel-Gentzen
+% negative translation from classical to intuitionistic logic or the interpretation of
+% higher-order logic in set theory.
+
+
+% Every {\mmt} declaration is identified by a canonical, globally unique URI. In
+% particular, the URIs of symbol and assignment declarations are of the form $\identhy?c$
+% and $\idenview?c$.
+
+% {\mmt} symbol declarations subsume most semantically relevant statements in declarative
+% mathematical languages including function and predicate symbols, type and universe
+% symbols, and --- using the Curry-Howard correspondence --- axioms, theorems, and
+% inference rules. Their syntax and semantics is determined by the foundation, in which
+% {\mmt} is parametric. In particular, the validity of a theory graph is defined relative
+% to a type system provided by the \textbf{foundation}.
+
+
+
+The \mmt system provides an API to the {\mmt} data structures described above and the \mmt
+implementation \cite{project:mmt,RK:mmt:10} provides a Scala-based \cite{scala} open
+source implementation of the {\mmt} API. The \mmt system can also generate MWS harvests
+from {\mmt} libraries so, from the perspective of {\mws}, it can be seen as a MWS
+crawler. Therefore, with respect to {\mmt} our {\flatsearch} approach involves generating
+the flattened version on a library and then exporting it as MWS harvests.
+
+
+\subsection{{\latin}}\label{sec:latin}
+The \latin project (\textbf{L}ogic \textbf{At}las and \textbf{In}tegrator)
+\cite{project:latin,LATIN:url} aims at developing tools for interfacing logics and proof
+systems. It focuses on developing a knowledge representation framework that is
+foundationally unconstrained and thus allows for the representation of most meta-theoretic
+foundations of mathematical knowledge in the same format.
+
+The \latin logic graph is built as an {\mmt} library and already contains work related to
+first-order logic (TPTP \cite{tptp}, CASL \cite{CaslTCS03,CASL-RM} and Mizar
+\cite{mizar,UrbanBan:pem06}), higher-order logic (PVS \cite{OwRu92}, Isabelle/HOL
+\cite{Nipkow-Paulson-Wenzel:2002} and HasCASL \cite{SchMos:htisdfp02}), logical frameworks
+(LF \cite{Pfenning91} and Isabelle \cite{isabelle-isar}) and mathematics (set theory,
+natural and rational numbers, algebraic structures and others). Furthermore, whenever
+possible, these representations are structured and related using logic morphisms.
+
+The \latin library currently contains 449 theories and 382 views between them.  The
+theories themselves contain a total of 2310 symbol declarations and 1072 direct imports
+(including metas) resulting in an average of 5.14 declarations and 2.39 direct imports per
+theory.  This amounts to a size of 123.9 MB in its native OMDoc format and 25.2 MB when
+converted to MWS harvests.
+\end{oldpart}
+
+%Due to it's size, diversity and high level of modularity the \latin logic graph .
+ 
+
+%%% Local Variables: 
+%%% mode: LaTeX
+%%% TeX-master: "all"
+%%% End: 
+
+% LocalWords:  miancu mws mws KohPro MWSmanual emph mmt ednote textbf mapsto
+% LocalWords:  rightarrow llquote ldots odel-Gentzen identhy idenview ogic tptp
+% LocalWords:  tegrator CaslTCS03 pem06 Nipkow-Paulson-Wenzel htisdfp02
+% LocalWords:  Pfenning91 isabelle-isar
--- a/flatsearch/search.tex
+++ b/flatsearch/search.tex
+
+To search for induced statements, we use our {\mws} system~\cite{ProKoh:mwssofse12},
+which indexes formula-URL pairs and provides a web interface querying the formula index
+via unification. This can be used for 
+\begin{compactdesc}
+\item[Instance Search] e.g. to find all instance of associativity we can issue the query
+  $\forall x,y,z : \qvar{S}.(x\qvar{op} y) \qvar{op} z = x \qvar{op} (y\qvar{op} z)$,
+  where the $\qvar{-}$ are query variables that can be instantiated in the query. In the
+  library from Figure~\ref{fig:elal-graph} we would find the commutativity axiom
+  \textsf{SemiGrp}/\textsf{assoc}, its directly inherited versions in \textsf{Monoid}, to
+  \textsf{Ring} and in particular the version
+  $u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc}.
+\item[Applicable Theorem Search] where universal variables in the index can be
+  instantiated as well; this was introduced for a non-modular formal libraries
+  in~\cite{IKRU:mizar:11}. Here we could search for $3+4=\qvar{R}$ and find the induced
+  statement $u$?\textsf{IntArith}?\textsf{c}/\textsf{comm} with the substitution $R\mapsto
+  4+3$, which allows the user to instantiate the query and obtain the equation $3+4=4+3$
+  together with the justification $u$?\textsf{IntArith}?\textsf{c}/\textsf{comm} that can
+  directly be used in a proof.
+\end{compactdesc}
+The implementation of a web service that conducts such searches is very simple: instead of
+harvesting formulae directly from a formal digital library directly as
+in~\cite{IKRU:mizar:11}, we flatten the library first, and then harvest
+formulae. Conveniently,{\mmt} flattening gives the included constants and axioms local names
+that are syntactically identical to the respective symbol paths in {\mmt} URIs, so that the
+generation of {\mmt} URIs for the formula harvests is trivial.
+
+The only non-trivial part of the implementation is the search human-oriented front-end,
+i.e. the input of search queries and the presentation of search results. The {\latin}
+atlas is written in an extension of the TWELF encoding~\cite{RS:twelfmod:09} of
+LF~\cite{HarperEtAl:affdl93}, so it is natural to use an extension of LF notation with
+query variables for input. Therefore, we use the {\mmt} notation language and
+interpretation service described in \cite{IR:ui:12} to transform LF-style input into
+{\mmt} objects and subsequently to MWS queries.
+
+
+The presentation of the {\mmt} URIs requires some work as well: while the {\mmt} system
+can directly dereference the {\mmt} URI and thus be used to present the induced statement,
+humans want a justification that is more understandable than a {\mmt} URI. Fortunately,
+this can be generated from the {\mmt} URI by a simple template-based algorithm.
+
+Let us consider the search result
+$u$?\textsf{IntArith}?\textsf{c}/\textsf{g}/\textsf{assoc} from the instantiation search
+above, where we take $u$ to be \verb|http://cds.omdoc.org/cds/elal|. The first step is to
+localize the result in the theory $u$?\textsf{IntArith} with the sentence
+\begin{labeledquote}\label{lq:theory}\sf
+  Induced statement $\forall x,y,z : \mathbb{Z}.(x+y)+z = x+(y+z)$ found in
+  \underline{\texttt{http://cds.omdoc.org/cds/elal?IntArith}} (\underline{subst},
+  \underline{justification}).
+\end{labeledquote}
+Here the underlined fragments carries hyperlinks, the second pointing to the
+justification:
+\begin{labeledquote}\sf\label{lq:view}
+  \underline{\textsf{IntArith}} is a \underline{\textsf{CGroup}} if we interpret $\circ$
+  as + and $G$ as $\mathbb{Z}$.
+\end{labeledquote}
+which can be directly from the information associated to the morphism $\mathsf{c}$ in the
+{\mmt} URI. Then we skip over $\mathsf{g}$, since its assignment is trivial and generate the
+sentence.
+\begin{labeledquote}\sf\label{lq:inclusions}
+  \underline{\textsf{CGroup}}s are \underline{\textsf{SemiGrp}}s \underline{by
+    construction}
+\end{labeledquote}
+and finally we ground the explanation by  the sentence
+\begin{labeledquote}\sf\label{lq:ground}
+  In \underline{\textsf{SemiGrp}}s we have the axiom \underline{$\mathsf{assoc}: \forall
+    x,y,z : G . (x\circ y)\circ z=x\circ (y\circ z)$}
+\end{labeledquote}
+The sentences (\ref{lq:theory}) to (\ref{lq:ground}) can be generated from templates,
+since the {\mmt} system gives access to the necessary information: source and target theory
+as well as the assignment $\psi'$ for (\ref{lq:view}), the fact that the path from
+\textsf{SemiGrp} to \textsf{CGroup} only consists of inclusion that triggers the template
+for (\ref{lq:inclusions}) and the original formulation of the axiom $\mathsf{assoc}$.
+
+\begin{figure}
+\begin{center}
+\includegraphics[width=\textwidth]{flatsearch}
+\end{center}
+\caption{The {\flatsearch} Web Interface for {\latin}}\label{fig:flatsearch}
+\end{figure}
+
+Note that we make use of another peculiarity of the {\mmt} system in this explanation: all
+constants in the theory graph carry notation declarations~\cite{KMR:NoLMD08}, which can be
+used to generate human-readable presentations of arbitrary formal objects in the
+graph.
+
+
+%%% Local Variables: 
+%%% mode: LaTeX
+%%% TeX-master: "paper"
+%%% End: 
+
+% LocalWords:  miancu mws ProKoh mwsofse compactdesc forall qvar qvar qvar qvar
+% LocalWords:  qvar elal-graph textsf SemiGrp textsf assoc textsf IntArith comm
+% LocalWords:  mapsto mmt affdl93 ednote labeledquote lq mathbb texttt subst
+% LocalWords:  circ mathsf