We have seen how a viewfinder can be used for theory \emph{discovery}. But with minor variations, extensions or more specialized interfaces many other potential use cases open up, which we plan to investigate in the future:

\begin{newpart}{DM}

We have seen how a viewfinder can be used for theory \emph{discovery} and finding constants with specific desired properties, but many other potential use cases are imaginable. The main problems to solve with respect to these is less about the algorithm or software design challenges, but user interfaces.

The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desirable in a setting where a user is actively writing or editing a theory, so the integration in jEdit is sensible. However, the across-library use case in Sec. \ref{sec:pvs} already would be a lot more useful in a theory exploration setting, such as when browsing available archives on MathHub \ednote{cite} or in the graph viewer integrated in \mmt\ednote{cite}. Additional specialized user interfaces would enable or improve the following use cases:

\end{newpart}

\begin{itemize}

\item If the codomain of a view is a theory representing a specific model, it would tell her that those

\item\textbf{Model-/Countermodel Finding:}If the codomain of a view is a theory representing a specific model, it would tell her that those

are \emph{examples} of her abstract theory.

Furthermore, partial views -- especially those that are total on some included theory -- could

be insightful \emph{counterexamples}.

\item Given that the Viewfinder looks for \emph{partial} views, we can use it to find natural

\item\textbf{Library Refactoring:}Given that the Viewfinder looks for \emph{partial} views, we can use it to find natural

extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' --

the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and

\emph{Ideals}, which would suggest to her to refactor her library accordingly.

...

...

@@ -15,11 +19,11 @@ We have seen how a viewfinder can be used for theory \emph{discovery}. But with

be refactored as an extension of the codomain, which would allow her to use all theorems and definitions

therein.

\item If we additionally consider views into and out of the theories found, this can make theory discovery even

\item\textbf{Theory Generalization:}If we additionally consider views into and out of the theories found, this can make theory discovery even

more attractive. For example, a view from a theory of vector spaces intro matroids could inform Jane additionally,

that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.

\item If we were to keep book on our transfomations during preprocessing and normalization, we could use the found

\item\textbf{Folklore-based Conjecture:}If we were to keep book on our transfomations during preprocessing and normalization, we could use the found

views for translating both into the codomain as well as back from there into our starting theory.

This would allow for e.g. discovering and importing theorems and useful definitions from some other library --

...

...

@@ -27,13 +31,7 @@ We have seen how a viewfinder can be used for theory \emph{discovery}. But with

A useful interface might specifically prioritize views into theories on top of which there are many

theorems and definitions that have been discovered.

\item For the last two use cases, it would be advantageous to look for views \emph{into} our working

theory instead.

\end{itemize}

For some of these use cases it would be advantageous to look for views \emph{into} our working theory instead.

Note that even though the algorithm is in principle symmetric, some aspects often depend

on the direction -- e.g. how we preprocess the theories,

which constants we use as starting points or how we treat and evaluate the resulting (partial) views.

\item The last example in Section \ref{sec:usecase} shows how we can find properties like commutativity and

associativity, which can in turn inform a better normalization of the theory, which in turn would potentially

allow for finding more views. This could iteratively improve the results of the viewfinder.

\end{itemize}

\ No newline at end of file

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we treat and evaluate the resulting (partial) views (see Sections \ref{sec:algparams} and \ref{sec:preproc}).

\caption{Theory Classification for beautiful sets}\label{fig:theory-classification-ex}

\end{figure}

\end{oldpart}

\paragraph{Related Work}

Existing systems have so far only worked with explicitly given theory morphisms, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.

...

...

@@ -87,7 +91,7 @@ Firstly, we present a the design and implementation of a generic theory morphism

The algorithm tries to match two symbols by unifying their types.

This is made efficient by separating the term into a hashed representation of its abstract syntax tree (which serves as a fast plausibility check for pre-selecting matching candidates) and the list of symbol occurrences in the term, into which the algorithm recurses.

Secondly, we apply this view finder in two concrete case studies. \ednote{add 1-2 sentences for each case study}

Secondly, we apply this view finder in two concrete case studies: In the first, we start with an abstract theory and try to figure out if it already exists in the same library. In the second example, we write down a simple theory of commutative operators in one language to find all commutative operators in another library based on a different language.

\paragraph{Overview}

In Section~\ref{sec:prelim}, we revise the basics of MMT and the representations of (exemplary) the PVS and HOL Light libraries.

We present a method for finding morphisms between formal theories, both within as well

as across libraries based on different logical foundations. These morphisms can yield

both (more or less formal) \emph{alignments} between individual symbols as well as

truth-preserving morphisms between whole theories. As they induce new theorems in the

target theory for any of the source theory, theory morphisms are high-value elements of

as across libraries based on different logical foundations.

% These morphisms can yield both (more or less formal) \emph{alignments} between individual symbols as well as truth-preserving morphisms between whole theories.

As they induce new theorems in the target theory for any of the source theory, theory morphisms are high-value elements of

a modular formal library. Usually, theory morphisms are manually encoded, but this

practice requires authors who are familiar with source and target theories at the same

time, which limits the scalability of the manual approach.

...

...

@@ -100,15 +99,15 @@

\section{Finding Theory Morphisms}\label{sec:viewfinder}

\input{viewfinder}

\section{Extended Use Case}\label{sec:usecase}

\section{Implementation \& Use Case}\label{sec:usecase}

\input{usecase}

\section{Unrealized Applications}

\section{Low-hanging Fruit: Currently Unrealized Applications}

@@ -21,7 +21,7 @@ In particular, if we represent proofs as typed terms, theory morphisms preserve

This property makes theory morphism so valuable for structuring, refactoring, and integrating large corpora.

MMT achieves language-independence through the use of \textbf{meta-theories}: every MMT-theory may designate a previously defined theory as its meta-theory.

For example, when represent the HOL Light library in MMT, we first write a theory $L$ for HOL Light.

For example, when we represent the HOL Light library in MMT, we first write a theory $L$ for the logical primitives of HOL Light.

Then each theory in the HOL Light library is represented as a theory with $L$ as its meta-theory.

In fact, we usually go one step further: $L$ itself is a theory, whose meta-theory is a logical framework such as LF.

That allows $L$ to concisely define the syntax and inference system of HOL Light.

...

...

@@ -39,7 +39,7 @@ Thus, we assume that $\Sigma$ and $\Sigma'$ have the same meta-theory $M$, and t

& Theory $\Sigma$& Morphism $\sigma:\Sigma\to\Sigma'$\\

\hline

set of & typed constant declarations $c:E$& assignments $c\mapsto E'$\\

$\Sigma$-expressions $E$& formed from $M$- and $\Sigma$-constants & mapped to $\Sigma'$by homomorphic extension \\

$\Sigma$-expressions $E$& formed from $M$- and $\Sigma$-constants & mapped to $\Sigma'$expressions\\

\hline

\end{tabular}

\end{center}

...

...

@@ -59,8 +59,11 @@ Complex expressions are of the form $\ombind{o}{x_1:t_1,\ldots,x_m:t_m}{a_1,\ldo

\item$a_i$ is an argument of $o$

\end{compactitem}

The bound variable context may be empty, and we write $\oma{o}{\vec{a}}$ instead of $\ombind{o}{\cdot}{\vec{a}}$.

For example, \ednote{give examples}

\begin{newpart}{DM}

For example, the expression $\lambda x,y:\mathbb N.\;(x +\cn{one})\cdot(y+\cn{one})$ would be written as $\ombind{\lambda}{x:\mathbb N,y : \mathbb N}{\oma{\cdot}{\oma{+}{x,\cn{one}},\oma{+}{y,\cn{one}}}}$ instead.

%For example, the second axiom ``Every subset of a beautiful set is beautiful'' (i.e. the term $\forall s,t : \cn{set\ }X.\;\cn{beautiful}(s)\wedge t \subseteq s \Rightarrow \cn{beautiful}(t)$) would be written as

Finally, we remark on a few additional features of the MMT language that are important for large-scale case studies but not critical to understand the basic intuitions of results.

MMT provides a module systems that allows theories to instantiate and import each other. The module system is conservative: every theory can be elaborated into one that only declares constants.

MMT constants may carry an optional definiens, in which case we write $c:E=e$.

...

...

@@ -90,49 +93,49 @@ Defined constants can be eliminated by definition expansion.

%\end{center}

%\end{example

\begin{oldpart}{FR: replaced with the above}

For the purposes of this paper, we will work with the (only slightly simplified) grammar given in Figure \ref{fig:mmtgrammar}.

\begin{figure}[ht]\centering\vspace*{-1em}

\begin{mdframed}

\begin{tabular}{rl@{\quad}l|l}

\cn{Thy}$::=$&$T[:T]=\{(\cn{Inc})^\ast\ (\cn{Const})^\ast\}$& Theory &\multirow{2}{*}{Modules}\\

\cn{View}$::=$&$T : T \to T =\{(\cn{Ass})^\ast\}$& View &\\\hline

\item Theories have a module name, an optional \emph{meta-theory} and a body consisting of \emph{includes} of other theories

and a list of \emph{constant declarations}.

\item Constant declarations have a name and two optional term components; a \emph{type} ($[:t]$), and a \emph{definition} ($[=t]$).

\item Views $V : T_1\to T_2$ have a module name, a domain theory $T_1$, a codomain theory $T_2$ and a body consisting of assignments

$C = t$.

\item Terms are either

\begin{itemize}

\item variables $x$,

\item symbol references $T?C$ (referencing the constant $C$ in theory $T$),

\item applications $\oma{f}{a_1,\ldots,a_n}$ of a term $f$ to a list of arguments $a_1,\ldots,a_n$ or

\item binding application $\ombind{f}{x_1[:t_1][=d_1],\ldots,x_n[:t_n][=d_n]}{b}$, where $f$\emph{binds} the variables

$x_1,\ldots,x_n$ in the body $b$ (representing binders such as quantifiers, lambdas, dependent type constructors etc.).

\end{itemize}

\end{itemize}

The term components of a constant in a theory $T$ may only contain symbol references to constants declared previously in $T$, or that are declared in some theory $T'$ (recursively) included in $T$ (or its meta-theory, which we consider an \emph{include} as well).

We can eliminate all includes in a theory $T$ by simply copying over the constant declarations in the included theories; we call this process \emph{flattening}. We will often and without loss of generality assume a theory to be \emph{flat} for convenience.

An assignment in a view $V:T_1\to T_2$ is syntactically well-formed if for any assignment $C=t$ contained, $C$ is a constant declared in the flattened domain $T_1$ and $t$ is a syntactically well-formed term in the codomain $T_2$. We call a view \emph{total} if all \emph{undefined} constants in the domain have a corresponding assignment and \emph{partial} otherwise.

\end{oldpart}

%\begin{oldpart}{FR: replaced with the above}

%For the purposes of this paper, we will work with the (only slightly simplified) grammar given in Figure \ref{fig:mmtgrammar}.

% \item Theories have a module name, an optional \emph{meta-theory} and a body consisting of \emph{includes} of other theories

% and a list of \emph{constant declarations}.

% \item Constant declarations have a name and two optional term components; a \emph{type} ($[:t]$), and a \emph{definition} ($[=t]$).

% \item Views $V : T_1 \to T_2$ have a module name, a domain theory $T_1$, a codomain theory $T_2$ and a body consisting of assignments

% $C = t$.

% \item Terms are either

% \begin{itemize}

% \item variables $x$,

% \item symbol references $T?C$ (referencing the constant $C$ in theory $T$),

% \item applications $\oma{f}{a_1,\ldots,a_n}$ of a term $f$ to a list of arguments $a_1,\ldots,a_n$ or

% \item binding application $\ombind{f}{x_1[:t_1][=d_1],\ldots,x_n[:t_n][=d_n]}{b}$, where $f$ \emph{binds} the variables

% $x_1,\ldots,x_n$ in the body $b$ (representing binders such as quantifiers, lambdas, dependent type constructors etc.).

% \end{itemize}

%\end{itemize}

%The term components of a constant in a theory $T$ may only contain symbol references to constants declared previously in $T$, or that are declared in some theory $T'$ (recursively) included in $T$ (or its meta-theory, which we consider an \emph{include} as well).

%We can eliminate all includes in a theory $T$ by simply copying over the constant declarations in the included theories; we call this process \emph{flattening}. We will often and without loss of generality assume a theory to be \emph{flat} for convenience.

%

%An assignment in a view $V:T_1\to T_2$ is syntactically well-formed if for any assignment $C=t$ contained, $C$ is a constant declared in the flattened domain $T_1$ and $t$ is a syntactically well-formed term in the codomain $T_2$. We call a view \emph{total} if all \emph{undefined} constants in the domain have a corresponding assignment and \emph{partial} otherwise.

%

%\end{oldpart}

\subsection{Proof Assistant Libraries in MMT}\label{sec:oaf}

@@ -16,10 +16,44 @@ Right-clicking anywhere within the theory allows Jane to select \cn{MMT} $\to$ \

\caption{The Theory of Matroids in the MitM Library}\label{fig:use:target}

\end{figure}

\paragraph{} Using a different library as target, we can for example quickly write down a theory for a commutative binary operator (still using the Math-in-the-Middle foundation), allowing us to search e.g. the PVS Prelude library for any commutative operators, as in Figure \ref{fig:use:pvs}.

We have so far assumed one fixed meta-theory for all theories involved; we will now discuss the situation when looking for views between theories in different libraries (and built on different foundations).

Obviously, various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:

\begin{itemize}

\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the encoding.\ednote{Mention/cite alignment-translation paper}

\item We can do additional transformations before preprocessing theories, auch as normalizing expressions, eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.

\end{itemize}

When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.

\subsection{Normalization}\label{sec:preproc}

The common logical framework used for all the libraries at our disposal -- namely LF and extensions thereof -- makes it easy to systematically normalize theories built on various logical foundations. We currently use the following approaches to preprocessing theories:

\begin{itemize}

\item Free variables in a term, often occurences of theory parameters as e.g. used extensively in the PVS system, are replaced by holes.

\item For foundations that use product types, we curry function types $(A_1\times\ldots A_n)\to B$ to $A_1\to\ldots\to A_n\to B$. We treat lambda-expressions and applications accordingly.

\item Higher-order abstract syntax encodings are eliminated by raising atomic types, function types, applications and lambdas to the level of the logical framework. This eliminates (redundant) implicit arguments that only occur due to their formalization in the logical framework.

This has the advantage that possible differences between the types of the relevant subterms and implicit type arguments (e.g. in the presence of subtyping) do not negatively affect viewfinding.

\item We use the curry-howard correspondence to transform axioms and theorems of the form $\vdash(P\Rightarrow Q)$ to function types $\vdash P \to\vdash Q$. Analogously, we transform judgments of the form $\vdash\forall x : A.\;P$ to $\prod_{x:A}\vdash P$.

\item For classical logics, we afterwards rewrite all logical connectives using their usual definitions using negation and conjunction only. Double negations are eliminated.

\item Typed Equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality.

\item The arguments of conjunctions and equalities are reordered (currently only by their number of subterms).

\end{itemize}

\subsection{Implementation}\label{sec:pvs}

\paragraph{} Using the above normalization methods, we can examplary write down a theory for a commutative binary operator using the Math-in-the-Middle foundation, while targeting e.g. the PVS Prelude library -- allowing us to find all commutative operators, as in Figure \ref{fig:use:pvs}.

\begin{figure}[ht]\centering

\fbox{\includegraphics[width=\textwidth]{pvs}}

\caption{Searching for Commutative Operators in PVS}\label{fig:use:pvs}

\end{figure}

\ednote{8 results for NASA, but NASA doesn't work in jEdit because of limited memory}

\ No newline at end of file

\ednote{8 results for NASA, but NASA doesn't work in jEdit because of limited memory}

This example also hints at a way to iteratively improve the results of the viewfinder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more views.