Let $C$ be a corpus of theories with the same fixed meta-theory $M$.

We call the problem of finding theory morphisms between theories of $C$ the \defemph{view finding problem} and an algorithm that solves it a \defemph{view finder}.

\end{definition}

Note that a view finder is sufficient to solve the theory classification use case from the introduction:

Jane provides a $M$-theory $Q$ of beautiful sets, the view finder computes all (total) views from $Q$ into $C$, and returns presentations of target theories and the assignments made by the views.

\ednote{FR@DM: standardize whether we call it morphism or view}

\paragraph{Efficiency Considerations}

The cost of this problem quickly explodes.

First of all, it is advisable to restrict attention to simple morphisms.

Eventually we want to search for arbitrary morphisms as well.

...

...

@@ -26,6 +27,7 @@ While partial morphism can be reduced to and then checked like total ones, searc

Finally, even for a simple theory morphism, checking reduces to a set of equality constraints, namely the constraints $\vdash_{\Sigma'}\ov{\sigma}(E)=E'$ for the type-preservation condition.

Depending on $M$, this equality judgment may be undecidable and require theorem proving.

\paragraph{Algorithm Overview}

A central motivation for our algorithm is that equality in $M$ can be soundly approximated very efficiently by using a normalization function on $M$-expressions.

This has the additional benefit that relatively little meta-theory-specific knowledge is needed, and all such knowledge is encapsulated in a single well-understood function.

Thus, we can implement theory morphism--search generically for arbitrary $M$.

...

...

@@ -40,12 +42,12 @@ The preprocessing phase computes for every constant declaration $c:E$ a normal f

Both steps are described below.

\paragraph{Normalization}

Normalization involves two steps:

\begin{compactenum}

\itemMMT-level normalization performs generic transformations that do not depend on the meta-theory $M$.

These include elaboration of structured theories and definition expansion, which we mentioned in Sect.~\ref{sec:prelim}.

\item Meta-theory-level normalization applies an $M$-specific normalization function, which we assume as a black box for now and discuss further in Sect.~\ref{sec:preproc}.

\end{compactenum}

Normalization involves two steps.

Firstly,\textbf{MMT-level} normalization performs generic transformations that do not depend on the meta-theory $M$.

These include elaboration of structured theories and definition expansion, which we mentioned in Sect.~\ref{sec:prelim}.

Secondly, \textbf{meta-theory-level} normalization applies an $M$-specific normalization function, which we assume as a black box for now and discuss further in Sect.~\ref{sec:preproc}.

\paragraph{Abstract Syntax Trees}

We define \textbf{abstract syntax trees} as pairs $(t,s)$ where $t$ is subject to the grammar

...

...

@@ -62,15 +64,16 @@ We call $(t,s)$ the \textbf{long} abstract syntax tree for $E$ if $C_i$ replaces

In particular, the long tree does not merge duplicate occurrences of the same constant into the same number.

The \textbf{short} abstract syntax tree for $E$ arises from the long one by removing all duplicates from $s$ and replacing the $C_i$ accordingly.

\begin{newpart}{DM}

\paragraph{} As an example, consider again the term $\lambda x,y:\mathbb N.\;(x +\cn{one})\cdot(y+\cn{one})$ with internal representation $\ombind{\lambda}{x:\mathbb N,y : \mathbb N}{\oma{\cdot}{\oma{+}{x,\cn{one}},\oma{+}{y,\cn{one}}}}$.

\begin{example}

Consider again the term $\lambda x,y:\mathbb N.\;(x +\cn{one})\cdot(y+\cn{one})$ with internal representation $\ombind{\lambda}{x:\mathbb N,y : \mathbb N}{\oma{\cdot}{\oma{+}{x,\cn{one}},\oma{+}{y,\cn{one}}}}$.

The \emph{short} syntax tree and list of constants associated with this term would be:

\paragraph{} In our algorithm, we pick the \emph{long} abstract syntax tree, which may appear surprising.

For our algorithm, we pick the \emph{long} abstract syntax tree, which may appear surprising.

The reason is that shortness is not preserved when applying a simple morphism: whenever a morphism maps two different constants to the same constant, the resulting tree is not short anymore.

Longness, on the other hand, is preserved.

The disadvantage that long trees take more time to traverse is outweighed by the advantage that we never have to renormalize the trees.

...

...

@@ -101,29 +104,62 @@ The assignment $c\mapsto c'$ is well-typed in a morphism $\sigma$ if $t=t'$ (in

\end{lemma}

Of course, the condition about $s_i\mapsto s'_i$ may be redundant if $s$ contain duplicates; but because $s$ has to be traversed anyway, it is cheap to skip all duplicates. We call the set of assignments $s_i\mapsto s'_i$ the \textbf{prerequisites} of $c\mapsto c'$.

This lemma is the center of our search algorithm:

\paragraph{\textbf{Core Algorithm}} For two given constant declarations $c$ and $c'$, we build a morphism by starting with $\sigma=c\mapsto c'$ and recursively adding all prerequisites to $\sigma$ until

This lemma is the center of our search algorithm explained in

\begin{lemma}[Core Algorithm]

Consider two constant declarations $c$ and $c'$ in theories $\Sigma'$ and $\Sigma'$.

We define a morphism by starting with $\sigma=c\mapsto c'$ and recursively adding all prerequisites to $\sigma$ until

\begin{compactitem}

\item either the process terminates, in which case we have found a morphism,

\item or $\sigma$ contains two different assignments for the same constant (a contradiction), in which case we fail.

\item either the recursion terminates

\item or $\sigma$ contains two different assignments for the same constant, in which case we fail.

\end{compactitem}

\begin{newpart}{DM}

\paragraph{} The general procedure for finding views between two theories $T_1,T_2$ now is as follows:

\begin{enumerate}

\item Compute the long abstract syntax trees for all constants in $T_1$ and $T_2$

\item Find the set $P$ of all pairs $(c,c')$ with $c\in T_1$, $c'\in T_2$ and long abstract syntax trees $(t,s)$ and $(t',s')$ such that $t=t'$.

\item For each pair in $P$, execute the core algorithm, yielding a set of partial morphisms $M$.

\end{enumerate}

\end{newpart}

If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple morphism from $\Sigma$ to $\Sigma'$.

\end{lemma}

% \ednote{continue description of algorithm: how do we pick $c\mapsto c'$, and so on}

To find all morphisms from $\Sigma$ to $\Sigma'$, we first run the core algorithm on every pair of $\Sigma$-constants and $\Sigma'$-constants.

This usually does not yield big morphisms yet.

For example, consider the typical case where theories contain some symbol declarations and some axioms, in which the symbols occur.

Then the core algorithm will only find morphisms that map at most $1$ axiom.

\subsection{Improvements and Parameters of the Algorithm}\label{sec:algparams}

But we can use these small morphisms as building blocks to construct larger, possibly total ones:

\begin{lemma}[Amalgamating Morphisms]

We call two partial morphisms \textbf{compatible} if they agree on all constants for which both provide an assignment.

The union of compatible well-typed morphisms is again well-typed.

\end{lemma}

\paragraph{Extending the abstract syntax tree} By the very nature of the approach described in Section \ref{sec:oaf}, many symbols will be common to domain and codomain of a given viewfinding problem: Most importantly, assuming a common meta-theory, we most certainly do not want to reassign the symbols therein.

\ednote{@DM: example here}

\subsection{Optimizations}\label{sec:algparams}

The above presentation is intentionally simple to convey the general idea.

In the sequel, we describe several improvements that we have implemented to be scalable.

\paragraph{Choice of Meta-Theory and Meta-Morphisms}

The above descriptions treats meta-theory constants like any other constant.

As such it finds also morphisms that do not fix the meta-theory.

Our implementation extends our algorithm so that a meta-theory is fixed.

All we have to do is when building the abstract syntax trees to retain all references to constants of the meta-theory instead of replacing them with numbers.

%Firstly, consider a library of theories in logic $L$ in which we want to find intra-library morphisms.

%At first glance, the meta-theory should consist of $L$ and the logical framework used to define $L$ (if any).

%Therefore, our algorithm allows fixing a meta-theory:

%

%But often the library contains certain theories that represent concrete mathematical objects, e.g., the real numbers and constants for their operations.

%Most of the time, we want to fix those constants as well instead of finding morphisms that, e.g., map real addition to any other commutative binary operator anywhere in the library.

%But sometimes we do not want to fix them, e.g., if there are multiple axiomatizations of the real numbers that we want to refactor.

%

%Secondly, consider two libraries in logics $L$ and $L'$ between which we want to find inter-library morphisms.

%Here it makes sense to use only the logical framework used to define the two logics as the meta-theory.

%Moreover, if we have already found a partial morphism $m$ that maps, e.g., the propositional connectives of $L$ to those of $L'$, we can use $m$ as a \textbf{meta-morphism}.

%In that case, we are only looking for morphisms that extend $m$.

%

%That can be useful because it lets us use the same preprocessing result even if we flexibly change the meta-theory.

%Changing the meta-theory happens more often than one might think, and we present two examples.

Therefore, we consider the \emph{common} symbols as a fixed part of the abstract syntax tree of a constant. The symbols will hence be encoded in the component $t$ instead of the list $s$. This will not only exclude spurious matches, but also reduce the number of plausible matches and consequently speed up the algorithm.

The symbols will hence be encoded in the component $t$ instead of the list $s$. This will not only exclude spurious matches, but also reduce the number of plausible matches and consequently speed up the algorithm.

\paragraph{Picking starting pairs:} Note, that we will still find many spurious matches if executed in its general form. The reason being that (for example) all atomic types match each other, as will all binary operators etc. Most of these results will not be interesting. Furthermore, since the algorithm needs to recurse into the lists $s$, many potential matches will need to be checked repeatedly. Both problems can be massively reduced by selecting specific pairs of encodings as \emph{starting pairs} for the algorithm, so that the majority of matching constants will only be considered if the algorithm runs into them during recursing. Potential useful starting points are: