@@ -2,26 +2,26 @@ We have seen how a view finder can be used for theory \emph{discovery} and findi

The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desirable in a setting where a user is actively writing or editing a theory, so the integration in jEdit is sensible. However, the across-library use case in Sec. \ref{sec:pvs} already would be a lot more useful in a theory exploration setting, such as when browsing available archives on MathHub~\cite{mathhub} or in the graph viewer integrated in \mmt ~\cite{RupKohMue:fitgv17}. Additional specialized user interfaces would enable or improve the following use cases:

\begin{itemize}

\item\textbf{Model-/Countermodel Finding:} If the codomain of a morphism is a theory representing a specific model, it would tell her that those are \emph{examples} of her abstract theory.

\item\textbf{Model-/Countermodel Finding:} If the codomain of a view is a theory representing a specific model, it would tell her that those are \emph{examples} of her abstract theory.

Furthermore, partial morphisms -- especially those that are total on some included theory -- could be insightful \emph{counterexamples}.

Furthermore, partial views -- especially those that are total on some included theory -- could be insightful \emph{counterexamples}.

\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial}morphisms, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to refactor her library accordingly.

\item\textbf{Library Refactoring:} Given that the view finder looks for \emph{partial}views, we can use it to find natural extensions of a starting theory. Imagine Jane removing the last of her axioms for ``beautiful sets'' -- the other axioms (disregarding finitude of her sets) would allow her to find e.g. both Matroids and \emph{Ideals}, which would suggest to her to refactor her library accordingly.

Additionally, \emph{surjective} partial morphisms would inform her, that her theory would probably better be refactored as an extension of the codomain, which would allow her to use all theorems and definitions therein.

Additionally, \emph{surjective} partial views would inform her, that her theory would probably better be refactored as an extension of the codomain, which would allow her to use all theorems and definitions therein.

\item\textbf{Theory Generalization:} If we additionally consider morphisms into and out of the theories found, this can make theory discovery even more attractive. For example, a morphism from a theory of vector spaces intro matroids could inform Jane additionally, that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.

\item\textbf{Theory Generalization:} If we additionally consider views into and out of the theories found, this can make theory discovery even more attractive. For example, a view from a theory of vector spaces intro matroids could inform Jane additionally, that her beautiful sets, being matroids, form a generalization of the notion of linear independence in linear algebra.

\item\textbf{Folklore-based Conjecture:} If we were to keep book on our transfomations during preprocessing and normalization, we could use the found morphisms for translating both into the codomain as well as back from there into our starting theory.

\item\textbf{Folklore-based Conjecture:} If we were to keep book on our transfomations during preprocessing and normalization, we could use the found views for translating both into the codomain as well as back from there into our starting theory.

This would allow for e.g. discovering and importing theorems and useful definitions from some other library -- which on the basis of our encodings can be done directly by the view finder.

A useful interface might specifically prioritize morphisms into theories on top of which there are many theorems and definitions that have been discovered.

A useful interface might specifically prioritize views into theories on top of which there are many theorems and definitions that have been discovered.

\end{itemize}

For some of these use cases it would be advantageous to look for morphisms \emph{into} our working theory instead.

For some of these use cases it would be advantageous to look for views \emph{into} our working theory instead.

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we aggregate and evaluate the resulting (partial) morphisms (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we aggregate and evaluate the resulting (partial) views (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).

We present a general MKM utility that given a \MMT theory and an \MMT library $\cL$ finds

partial and total morphisms into $\cL$.

partial and total views into $\cL$.

Such a view finder can be used to drive various MKM applications ranging from theory classification to library merging and refactoring.

We have presented the first and last of these and show that they are feasible. For the applications discussed but unrealized in this paper, we mainly need to determine the right application context and user interface.

\paragraph{Future Work}

The current view finder is already efficient enough for the limited libraries we used for testing.

To increase efficiency, we plan to explore term indexing techniques~\cite{Graf:ti96} that support $1:n$ and even $n:m$ matching and unification queries.

The latter will be important for the library refactoring and merging applications which look for all possible (partial and total) morphisms in one or between two libraries.

The latter will be important for the library refactoring and merging applications which look for all possible (partial and total) views in one or between two libraries.

As such library-scale operations will have to be run together with theory flattening to a fixed point and re-run upon every addition to the library, it will be important to integrate them with the \MMT build system and change management processes~\cite{am:doceng10,iancu:msc}.

@@ -62,31 +62,31 @@ between \MMT theories. $\cL$: given a query theory $Q$, the view finder compute

and the assignments made by the views.

\paragraph{Related Work}

Existing systems have so far only worked with explicitly given theory morphisms, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.

Automatically and systematically searching for new theory morphisms was first undertaken in \cite{NorKoh:efnrsmk07} in 2006.

Existing systems have so far only worked with explicitly given views, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.

Automatically and systematically searching for new views was first undertaken in \cite{NorKoh:efnrsmk07} in 2006.

However, at that time no large corpora of formalized mathematics were available in standardized formats that would have allowed easily testing the ideas in large corpora.

This situation has changed since then as multiple such exports have become available.

In particular, we have developed the MMT language \cite{RK:mmt:10} and the concrete syntax of the OMDoc XML format \cite{omdoc} as a uniform representation language for such corpora.

And we have translated multiple proof assistant libraries into this format, including the ones of PVS in \cite{KMOR:pvs:17} and HOL Light in \cite{KR:hollight:14}.

Building on these developments, we are now able, for the first time, to apply generic methods --- i.e., methods that work at the MMT level --- to search for theory morphisms in these libraries.

Building on these developments, we are now able, for the first time, to apply generic methods --- i.e., methods that work at the MMT level --- to search for views in these libraries.

While inspired by the ideas of \cite{NorKoh:efnrsmk07}, our design and implementation are completely novel.

In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{theory morphism} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.

In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{view} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.

\cite{hol_isahol_matching} applies techniques related to ours to a related problem.

Instead, of theory morphisms inside a single corpus, they use machine learning to find similar constants in two different corpora.

Their results can roughly be seen as a single partial morphism from one corpus to the other.

Instead of views inside a single corpus, they use machine learning to find similar constants in two different corpora.

Their results can roughly be seen as a single partial view from one corpus to the other.

\paragraph{Approach and Contribution}

Our contribution is twofold. Firstly, we present a the design and implementation of a

generic theory morphism finder that works with arbitrary corpora represented in MMT. The

generic view finder that works with arbitrary corpora represented in MMT. The

algorithm tries to match two symbols by unifying their types. This is made efficient by

separating the term into a hashed representation of its abstract syntax tree (which serves

as a fast plausibility check for pre-selecting matching candidates) and the list of symbol

occurrences in the term, into which the algorithm recurses.

Secondly, we apply this view finder in two concrete case studies: In the first, we start with an abstract theory and try to figure out if it already exists in the same library. In the second example, we write down a simple theory of commutative operators in one language to find all commutative operators in another library based on a different language.

Secondly, we apply this view finder in two concrete case studies: In the first, we start with an abstract theory and try to figure out if it already exists in the same library -- the use case mention above. In the second example, we write down a simple theory of commutative operators in one language to find all commutative operators in another library based on a different language.

\paragraph{Overview}

In Section~\ref{sec:prelim}, we revise the basics of MMT and the representations of (exemplary) the PVS and HOL Light libraries.

Intuitively, \mmt is a declarative language for theories and theory morphisms over an arbitrary object language.

Intuitively, \mmt is a declarative language for theories and views over an arbitrary object language.

Its treatment of object languages is abstract enough to subsume most practically relevant logics and type theories.

Fig.~\ref{fig:mmt} gives an overview of the fundamental MMT concepts.

...

...

@@ -8,17 +8,17 @@ In the simplest case, \textbf{theories} $\Sigma$ are lists of \textbf{constant d

Naturally, $E$ must be subject to some type system (which MMT is also parametric in), but the details of this are not critical for our purposes here.

We say that $\Sigma'$ includes $\Sigma$ if it contains every constant declaration of $\Sigma$.

Correspondingly, a \textbf{theory morphism}$\sigma:\Sigma\to\Sigma'$ is a list of \textbf{assignments}$c\mapsto e'$ of $\Sigma'$-expressions $e'$ to $\Sigma$-constants $c$.

Correspondingly, a \textbf{view}$\sigma:\Sigma\to\Sigma'$ is a list of \textbf{assignments}$c\mapsto e'$ of $\Sigma'$-expressions $e'$ to $\Sigma$-constants $c$.

To be well-typed, $\sigma$ must preserve typing, i.e., we must have $\vdash_{\Sigma'}e':\ov{\sigma}(E)$.

Here $\ov{\sigma}$ is the homomorphic extension of $\sigma$, i.e., the map of $\Sigma$-expressions to $\Sigma'$-expressions that substitutes every occurrence of a $\Sigma'$-constant with the $\Sigma'$-expression assigned by $\sigma$.

We call $\sigma$\textbf{simple} if the expressions $e'$ are always $\Sigma'$-\emph{constants} rather than complex expressions.

The type-preservation condition for an assignment $c\mapsto c'$ reduces to $\ov{\sigma}(E)=E'$ where $E$ and $E'$ are the types of $c$ and $c'$.

We call $\sigma$\textbf{partial} if it does not contain an assignment for every $\Sigma$-constant.

A partial morphism from $\Sigma$ to $\Sigma'$ is the same as a total morphism from some theory included by $\Sigma$ to $\Sigma'$.

A partial view from $\Sigma$ to $\Sigma'$ is the same as a total view from some theory included by $\Sigma$ to $\Sigma'$.

Importantly, we can then show generally at the MMT-level that if $\sigma$ is well-typed, then $\ov{\sigma}$ preserves all $\Sigma$-judgments.

In particular, if we represent proofs as typed terms, theory morphisms preserve the theoremhood of propositions.

This property makes theory morphism so valuable for structuring, refactoring, and integrating large corpora.

In particular, if we represent proofs as typed terms, views preserve the theoremhood of propositions.

This property makes views so valuable for structuring, refactoring, and integrating large corpora.

MMT achieves language-independence through the use of \textbf{meta-theories}: every MMT-theory may designate a previously defined theory as its meta-theory.

For example, when we represent the HOL Light library in MMT, we first write a theory $L$ for the logical primitives of HOL Light.

...

...

@@ -36,7 +36,7 @@ Thus, we assume that $\Sigma$ and $\Sigma'$ have the same meta-theory $M$, and t

\multicolumn{3}{|c|}{meta-theory: a fixed theory $M$}\\

%\hline

\hline

& Theory $\Sigma$&Morphism$\sigma:\Sigma\to\Sigma'$\\

& Theory $\Sigma$&iew$\sigma:\Sigma\to\Sigma'$\\

\hline

set of & typed constant declarations $c:E$& assignments $c\mapsto E'$\\

$\Sigma$-expressions $E$& formed from $M$- and $\Sigma$-constants & mapped to $\Sigma'$ expressions \\

@@ -5,7 +5,7 @@ We now generalize to view-finding between theories in different libraries (and b

\end{itemize}

The normalizations mentioned in Section \ref{sec:normalizeintra} already suggest equating the involved logical primitives (such as logical connectives) via a meta-morphism.

Foundation-specific normalizations specifically for finding morphisms \emph{across} libraries is to our knowledge an as-of-yet unexplored field of investigation. Every formal system has certain unique idiosyncrasies, best practices or widely used features; finding an ideal normalization method is a correspondingly difficult domain-specific problem.

Foundation-specific normalizations specifically for finding views \emph{across} libraries is to our knowledge an as-of-yet unexplored field of investigation. Every formal system has certain unique idiosyncrasies, best practices or widely used features; finding an ideal normalization method is a correspondingly difficult domain-specific problem.

We will discuss some of our findings specifically regarding the PVS library as a case study.

...

...

@@ -16,7 +16,7 @@ PVS~\cite{pvs} is a proof assistant under active development, based on a higher-

In practice, theory parameters are often used in PVS for the signature of an abstract theory. For example, the theory of groups \cn{group\_def} in the NASA library has three theory parameters $(\cn T,\ast,\cn{one})$ for the signature, and includes the theory \cn{monoid\_def} with the same parameters; the axioms for a group are then formalized as a predicate on the theory parameters.

Given that the same practice is used in few other systems (if any), searching for morphisms without treating theory parameters in some way will not give us any useful results on these theories. We offer three approaches to handling these situations:

Given that the same practice is used in few other systems (if any), searching for views without treating theory parameters in some way will not give us any useful results on these theories. We offer three approaches to handling these situations:

\begin{enumerate}

\item\emph{Simple treatment:} We can interpret references to theory parameters as free variables and turn them into holes. Includes of parametric theories with arguments are turned into simple includes.

\item\emph{Covariant treatment:} We introduce new constants for the theory parameters and replace occurrences of the parameters by constant references. Includes with parameters are again replaced by normal includes.

...

...

@@ -47,7 +47,7 @@ We have tried the first two approaches regarding theory parameters -- i.e. the s

@@ -56,7 +56,7 @@ We have tried the first two approaches regarding theory parameters -- i.e. the s

\caption{Results of Inter- and Intra-Library View Finding in the PVS NASA Library}\label{fig:pvsresults}

\end{figure}

Most of the results in the simple MitM$\to$NASA case are artefacts of the theory parameter treatments -- in fact only two of the 17 results are meaningful (to operations on sets and the theory of number fields). In the covariant case, the more requirements of each simple morphism lead to fuller (one total) and less spurious morphisms.

Most of the results in the simple MitM$\to$NASA case are artefacts of the theory parameter treatments -- in fact only two of the 17 results are meaningful (to operations on sets and the theory of number fields). In the covariant case, the more requirements of each simple views lead to fuller (one total) and less spurious views.

With a theory from the NASA library as domain, the results are already too many to be properly evaluated by hand.

As an additional use case, we can write down a theory for a commutative binary operator using the MitM foundation, while targeting the PVS Prelude library -- allowing us to find all commutative operators, as in Figure \ref{fig:use:pvs} (using the simple approach to theory parmeters).

...

...

@@ -65,7 +65,7 @@ As an additional use case, we can write down a theory for a commutative binary o

\fbox{\includegraphics[width=\textwidth]{pvs}}

\caption{Searching for Commutative Operators in PVS}\label{fig:use:pvs}

\end{figure}

This example also hints at a way to iteratively improve the results of the view finder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more morphisms.

This example also hints at a way to iteratively improve the results of the view finder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more views.

Let $C$ be a corpus of theories with the same fixed meta-theory $M$.

We call the problem of finding theory morphisms between theories of $C$ the \defemph{view finding problem} and an algorithm that solves it a \defemph{view finder}.

We call the problem of finding theory views between theories of $C$ the \defemph{view finding problem} and an algorithm that solves it a \defemph{view finder}.

Note that a view finder is sufficient to solve the theory classification use case from the introduction:

Jane provides a $M$-theory $Q$ of beautiful sets, the view finder computes all (total) morphisms from $Q$ into $C$, and returns presentations of target theories and the assignments made by the morphisms.

Jane provides a $M$-theory $Q$ of beautiful sets, the view finder computes all (total) views from $Q$ into $C$, and returns presentations of target theories and the assignments made by the views.

\paragraph{Efficiency Considerations}

The cost of this problem quickly explodes.

First of all, it is advisable to restrict attention to simple morphisms.

Eventually we want to search for arbitrary morphisms as well.

But that problem is massively harder because it subsumes theorem proving: a morphism from $\Sigma$ to $\Sigma'$ maps $\Sigma$-axioms to $\Sigma'$-proofs, i.e., searching for a theory morphism requires searching for proofs.

First of all, it is advisable to restrict attention to simple views.

Eventually we want to search for arbitrary views as well.

But that problem is massively harder because it subsumes theorem proving: a view from $\Sigma$ to $\Sigma'$ maps $\Sigma$-axioms to $\Sigma'$-proofs, i.e., searching for a view requires searching for proofs.

Secondly, if $C$ has $n$ theories, we have $n^2$ pairs of theories between which to search.

(It is exactly $n^2$ because the direction matters, and even morphisms from a theory to itself are interesting.)

Moreover, for two theories with $m$ and $n$ constants, there are $n^m$ possible simple morphisms.

(It is exactly $n^m$ because morphisms may map different constants to the same one.)

Thus, we can in principle enumerate and check all possible simple morphisms in $C$.

But for large $C$, it quickly becomes important to do so in an efficient way that eliminates ill-typed or uninteresting morphisms early on.

Thirdly, it is desirable to search for \emph{partial}theory morphisms as well.

In fact, identifying refactoring potential is only possible if we find partial morphisms: then we can refactor the involved theories in a way that yields a total morphism.

(It is exactly $n^2$ because the direction matters, and even views from a theory to itself are interesting.)

Moreover, for two theories with $m$ and $n$ constants, there are $n^m$ possible simple views.

(It is exactly $n^m$ because views may map different constants to the same one.)

Thus, we can in principle enumerate and check all possible simple views in $C$.

But for large $C$, it quickly becomes important to do so in an efficient way that eliminates ill-typed or uninteresting views early on.

Thirdly, it is desirable to search for \emph{partial}views as well.

In fact, identifying refactoring potential is only possible if we find partial views: then we can refactor the involved theories in a way that yields a total view.

Moreover, many proof assistant libraries do not follow the little theories paradigm or do not employ any theory-like structuring mechanism at all.

These can only be represented as a single huge theory, in which case we have to search for partial morphisms from this theory to itself.

While partial morphism can be reduced to and then checked like total ones, searching for partial morphisms makes the number of possible morphisms that must be checked much larger.

These can only be represented as a single huge theory, in which case we have to search for partial views from this theory to itself.

While partial views can be reduced to and then checked like total ones, searching for partial views makes the number of possible views that must be checked much larger.

Finally, even for a simple theory morphism, checking reduces to a set of equality constraints, namely the constraints $\vdash_{\Sigma'}\ov{\sigma}(E)=E'$ for the type-preservation condition.

Finally, even for a simple view, checking reduces to a set of equality constraints, namely the constraints $\vdash_{\Sigma'}\ov{\sigma}(E)=E'$ for the type-preservation condition.

Depending on $M$, this equality judgment may be undecidable and require theorem proving.

\paragraph{Algorithm Overview}

A central motivation for our algorithm is that equality in $M$ can be soundly approximated very efficiently by using a normalization function on $M$-expressions.

This has the additional benefit that relatively little meta-theory-specific knowledge is needed, and all such knowledge is encapsulated in a single well-understood function.

This way we can implement theory morphism--search generically for arbitrary $M$.

This way we can implement view--search generically for arbitrary $M$.

Our algorithm consists of two steps.

First, we preprocess all constant declarations in $C$ with the goal of moving as much intelligence as possible into a step whose cost is linear in the size of $C$.

Then, we perform the morphism search on the optimized data structures produced by the first step.

Then, we perform the view search on the optimized data structures produced by the first step.

\subsection{Preprocessing}

...

...

@@ -46,8 +46,8 @@ Normalization involves two steps.

Firstly, \textbf{MMT-level} normalization performs generic transformations that do not depend on the meta-theory $M$.

These include elaboration of structured theories and definition expansion, which we mentioned in Sect.~\ref{sec:prelim}.

Importantly, we do not fully eliminate defined constant declarations $c:E=e$ from a theory $\Sigma$: instead, we replace them with primitive constants $c:E$ and replace every occurrence of $c$ in other declarations with $e$.

If $\Sigma$ is the domain theory, we can simply ignore $c:E$ (because morphism do not have to provide an assignment to defined constants).

But if the $\Sigma$ is the codomain theory, retaining $c:E$ increases the number of morphisms we can find.

If $\Sigma$ is the domain theory, we can simply ignore $c:E$ (because views do not have to provide an assignment to defined constants).

But if the $\Sigma$ is the codomain theory, retaining $c:E$ increases the number of views we can find.

Secondly, \textbf{meta-theory-level} normalization applies an $M$-specific normalization function.

In general, we assume this normalization to be given as a black box.

...

...

@@ -61,7 +61,7 @@ We skip the details here and only mention a few of these building blocks:

\item Equalities are normalized such that the left hand side has a smaller abstract syntax tree.

\end{compactenum}

Above multiple normalization steps make use of a total order on abstract syntax trees $(t,s)$.

We omit the details and only remark that we try to avoid using the names of the constants in $s$ in the definition of the order --- otherwise, declarations that could be matched by a morphism would be normalized differently.

We omit the details and only remark that we try to avoid using the names of the constants in $s$ in the definition of the order --- otherwise, declarations that could be matched by a view would be normalized differently.

Even when breaking ties between requires comparing two constants, we can first try to recursively compare the syntax trees of their types.

%In addition to the above optimizations in the algorithm, we can normalize the theories themselves in various ways to improve the number and quality of the morphisms found. To accommodate these, we add two additional features to the algorithm:

...

...

@@ -86,11 +86,11 @@ We define \textbf{abstract syntax trees} as pairs $(t,s)$ where $t$ is subject t

(where $Nat$ is a non-terminal for natural numbers) and $s$ is a list of constant names.

We obtain an abstract syntax tree from an MMT expression $E$ by (i) switching to de-Bruijn representation of bound variables and (ii) replacing all occurrences of constants with $C_i$ in such a way that every $C_i$ refers to the $i$-th element of $s$.

Abstract syntax trees have the nice property that they commute with the application of simple morphisms $\sigma$:

Abstract syntax trees have the nice property that they commute with the application of simple views $\sigma$:

If $(t,s)$ represents $E$, then $\sigma(E)$ is represented by $(t,s')$ where $s'$ arises from $s$ by replacing every constant with its $\sigma$-assignment.

The above does not completely specify $i$ and $s$ yet, and there are several possible canonical choices among the abstract syntax trees representing the same expression.

The trade-off is subtle because we want to make it easy to both identify and check theory morphisms later on.

The trade-off is subtle because we want to make it easy to both identify and check views later on.

We call $(t,s)$ the \textbf{long} abstract syntax tree for $E$ if $C_i$ replaces the $i$-th occurrence of a constant in $E$ when $E$ is read in left-to-right order.

In particular, the long tree does not merge duplicate occurrences of the same constant into the same number.

The \textbf{short} abstract syntax tree for $E$ arises from the long one by removing all duplicates from $s$ and replacing the $C_i$ accordingly.

...

...

@@ -107,7 +107,7 @@ The corresponding long syntax tree is :

\end{example}

For our algorithm, we pick the \emph{long} abstract syntax tree, which may appear surprising.

The reason is that shortness is not preserved when applying a simple morphism: whenever a morphism maps two different constants to the same constant, the resulting tree is not short anymore.

The reason is that shortness is not preserved when applying a simple view: whenever a view maps two different constants to the same constant, the resulting tree is not short anymore.

Longness, on the other hand, is preserved.

The disadvantage that long trees take more time to traverse is outweighed by the advantage that we never have to renormalize the trees.

...

...

@@ -133,20 +133,20 @@ The disadvantage that long trees take more time to traverse is outweighed by the

Consider two constants $c:E$ and $c':E'$ preprocessed into long abstract syntax trees $(t,s)$ and $(t',s')$.

It is now straightforward to show the following Lemma:

\begin{lemma}

The assignment $c\mapsto c'$ is well-typed in a morphism$\sigma$ if $t=t'$ (in which case $s$ and $s'$ must have the same length $l$) and $\sigma$ also contains $s_i\mapsto s'_i$ for $i=1,\ldots,l$.

The assignment $c\mapsto c'$ is well-typed in a view$\sigma$ if $t=t'$ (in which case $s$ and $s'$ must have the same length $l$) and $\sigma$ also contains $s_i\mapsto s'_i$ for $i=1,\ldots,l$.

\end{lemma}

Of course, the condition about $s_i\mapsto s'_i$ may be redundant if $s$ contain duplicates; but because $s$ has to be traversed anyway, it is cheap to skip all duplicates. We call the set of assignments $s_i\mapsto s'_i$ the \textbf{prerequisites} of $c\mapsto c'$.

This lemma is the center of our search algorithm explained in

\begin{lemma}[Core Algorithm]

Consider two constant declarations $c$ and $c'$ in theories $\Sigma'$ and $\Sigma'$.

We define a morphism by starting with $\sigma=c\mapsto c'$ and recursively adding all prerequisites to $\sigma$ until

We define a view by starting with $\sigma=c\mapsto c'$ and recursively adding all prerequisites to $\sigma$ until

\begin{compactitem}

\item either the recursion terminates

\item or $\sigma$ contains two different assignments for the same constant, in which case we fail.

\end{compactitem}

If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple morphism from $\Sigma$ to $\Sigma'$.

If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple view from $\Sigma$ to $\Sigma'$.

\end{lemma}

\begin{example}\label{ex:beautifinite} Consider two constants $c,c'$ with types $\forall x:\cn{set},y:\cn{set}.\;\cn{beautiful}(x)\wedge y \subseteq x \Rightarrow\cn{beautiful}(y)$ and $\forall x:\cn{powerset},y:\cn{powerset}.\;\cn{finite}(x)\wedge y \subseteq x \Rightarrow\cn{finite}(y)$. Their syntax trees are

...

...

@@ -156,22 +156,22 @@ If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple mo

Since $t=t'$, we set $c\mapsto c'$ and compare $s$ with $s'$, meaning we check (ignoring duplicates) that $\forall\mapsto\forall$, $\cn{set}\mapsto\cn{powerset}$, $\Rightarrow\mapsto\Rightarrow$, $\wedge\mapsto\wedge$, $\cn{beautiful}\mapsto\cn{finite}$ and $\subseteq\mapsto\subseteq$ are all valid.

\end{example}

To find all morphisms from $\Sigma$ to $\Sigma'$, we first run the core algorithm on every pair of $\Sigma$-constants and $\Sigma'$-constants.

This usually does not yield big morphisms yet.

To find all views from $\Sigma$ to $\Sigma'$, we first run the core algorithm on every pair of $\Sigma$-constants and $\Sigma'$-constants.

This usually does not yield big views yet.

For example, consider the typical case where theories contain some symbol declarations and some axioms, in which the symbols occur.

Then the core algorithm will only find morphisms that map at most $1$ axiom.

Then the core algorithm will only find views that map at most $1$ axiom.

Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \cite{KKMR:alignments:16}). But we can also use these small morphisms as building blocks to construct larger, possibly total ones:

\begin{lemma}[Amalgamating Morphisms]

We call two partial morphisms \textbf{compatible} if they agree on all constants for which both provide an assignment.

Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \cite{KKMR:alignments:16}). But we can also use these small views as building blocks to construct larger, possibly total ones:

\begin{lemma}[Amalgamating Views]

We call two partial views \textbf{compatible} if they agree on all constants for which both provide an assignment.

The union of compatible well-typed morphisms is again well-typed.

The union of compatible well-typed views is again well-typed.

\end{lemma}

\begin{example}

Consider the partial morphism from Example \ref{ex:beautifinite} and imagine a second partial morphism for the axioms

Consider the partial view from Example \ref{ex:beautifinite} and imagine a second partial view for the axioms

$\cn{beautiful}(\emptyset)$ and $\cn{finite}(\emptyset)$. The former has the requirements

The latter requires only $\cn{set}\mapsto\cn{powerset}$ and $\emptyset\mapsto\emptyset$. Since both morphisms agree on all assignments, we can merge them into one morphism mapping both axioms and all requirements of both.

The latter requires only $\cn{set}\mapsto\cn{powerset}$ and $\emptyset\mapsto\emptyset$. Since both views agree on all assignments, we can merge them into one view mapping both axioms and all requirements of both.

\end{example}

\subsection{Optimizations}\label{sec:algparams}

...

...

@@ -184,10 +184,10 @@ Because the preprocessing performs normalization, it can be time-consuming.

Therefore, we allow for storing the preprocessing results to disk and reloading them in a later run.

\paragraph{Fixing the Meta-Theory}

We improve the preprocessing in a way that exploits the common meta-theory, which is meant to be fixed by every morphism.

We improve the preprocessing in a way that exploits the common meta-theory, which is meant to be fixed by every view.

All we have to do is, when building the abstract syntax trees $(t,s)$, to retain all references to constants of the meta-theory in $t$ instead of replacing them with numbers.

With this change, $s$ will never contain meta-theory constants, and the core algorithm will only find morphisms that fix all meta-theory constants.

Because $s$ is much shorter now, the morphism finding is much faster.

With this change, $s$ will never contain meta-theory constants, and the core algorithm will only find views that fix all meta-theory constants.

Because $s$ is much shorter now, the view search is much faster.

It is worth pointing out that the meta-theory is not always as fixed as one might think.

Often we want to consider to be part of the meta-theory certain constants that are defined early on in the library and then used widely.

...

...

@@ -198,23 +198,23 @@ Note that we still only have to cache one set of preprocessing results for each

The core algorithm starts with an assignment $c\mapsto c'$ and then recurses into constant that occur in the declarations of $c$ and $c'$.

This occurs-in relation typically splits the constants into layers.

A typical theory declares types, which then occur in the declarations of function symbols, which then occur in axioms.

Because morphism that only type and function symbols are rarely interesting (because they do not allow transporting non-trivial theorems), we always start with assignments where $c$ is an axiom.

Because views that only map type and function symbols are rarely interesting (because they do not allow transporting non-trivial theorems), we always start with assignments where $c$ is an axiom.

%\emph{The length of $s$ in the short abstract syntax tree:} By choosing a minimal length for $s$, we can guarantee that only morphisms will be produced that relate a minimal number of distinct constants.

\paragraph{Exploiting Theory Structure}

Libraries are usually highly structured using imports between theories.

If $\Sigma$ is imported into $\Sigma'$, then the set of partial morphisms out of $\Sigma'$ is a superset of the set of partial morphisms out of $\Sigma$.

If implemented naively, that would yield a quadratic blow-up in the number of morphism to consider.

If $\Sigma$ is imported into $\Sigma'$, then the set of partial views out of $\Sigma'$ is a superset of the set of partial views out of $\Sigma$.

If implemented naively, that would yield a quadratic blow-up in the number of views to consider.

Instead, when running our algorithm on an entire library, we only consider morphism between theories that are not imported into other theories.

In an additional postprocessing phase, the domain and codomain of each found partial morphism$\sigma$ are adjusted to the minimal theories that make $\sigma$ well-typed.

Instead, when running our algorithm on an entire library, we only consider views between theories that are not imported into other theories.

In an additional postprocessing phase, the domain and codomain of each found partial view$\sigma$ are adjusted to the minimal theories that make $\sigma$ well-typed.

\caption{``Beautiful Sets'' in MMT Surface Syntax and View Finder Results}\label{fig:use:source}

\end{figure}

...

...

@@ -227,7 +227,7 @@ A screenshot of Jane's theory of beautiful sets is given in Figure \ref{fig:use:

Right-clicking anywhere within the theory allows Jane to select \cn{MMT}$\to$\cn{Find}\cn{Views\ to...}$\to$\cn{MitM/smglom}.

The latter menu offers a choice of known libraries in which the view finder should look for codomain theories; \cn{MitM/smglom} is the Math-in-the-Middle library based that we have developed \cite{ODK:mitm:16} to describe the common knowledge used in various CICM systems.

After choosing \cn{MitM/smglom}, the view finder finds two morphisms (within less than one second) and shows them (bottom of Figure~\ref{fig:use:source}).

After choosing \cn{MitM/smglom}, the view finder finds two views (within less than one second) and shows them (bottom of Figure~\ref{fig:use:source}).

The first of these ($\cn{View0}$) has a theory for matroids as its codomain, which is given in Figure \ref{fig:use:target}.

Inspecting that theory and the assignments in the view, we see that it indeed represents the well-known correspondence between beautiful sets and matroids.

%The latter uses predefined propositions in its axioms and uses a type \cn{coll} for the collection of sets, while the former has the statements of the axioms directly in the theory and uses a predicate \cn{beautiful}.