@@ -34,4 +34,4 @@ The theory discovery use case described in Sec. \ref{sec:usecase} is mostly desi

\end{itemize}

For some of these use cases it would be advantageous to look for views \emph{into} our working theory instead.

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we treat and evaluate the resulting (partial) views (see Sections \ref{sec:algparams} and \ref{sec:preproc}).

\ No newline at end of file

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction -- e.g. how we preprocess the theories, which constants we use as starting points or how we treat and evaluate the resulting (partial) views (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).

@@ -60,7 +60,7 @@ Complex expressions are of the form $\ombind{o}{x_1:t_1,\ldots,x_m:t_m}{a_1,\ldo

\end{compactitem}

The bound variable context may be empty, and we write $\oma{o}{\vec{a}}$ instead of $\ombind{o}{\cdot}{\vec{a}}$.

\begin{newpart}{DM}

For example, the expression$\lambda x,y:\mathbb N.\;(x +\cn{one})\cdot(y+\cn{one})$ would be written as $\ombind{\lambda}{x:\mathbb N,y : \mathbb N}{\oma{\cdot}{\oma{+}{x,\cn{one}},\oma{+}{y,\cn{one}}}}$ instead.

For example, the axiom$\forall x:\cn{set},y:\cn{set}.\;\cn{beautiful}(x)\wedge y \subseteq x \Rightarrow\cn{beautiful}(y)$ would be written as $\ombind{\forall}{x:\cn{set},y:\cn{set}}{\oma{\Rightarrow}{\oma{\wedge}{\oma{\cn{beautiful}}{x},\oma{\subseteq}{y,x}},\oma{\cn{beautiful}}{y}}}$ instead.

%For example, the second axiom ``Every subset of a beautiful set is beautiful'' (i.e. the term $\forall s,t : \cn{set\ }X.\;\cn{beautiful}(s)\wedge t \subseteq s \Rightarrow \cn{beautiful}(t)$) would be written as

When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.

\begin{itemize}

\item For foundations that use product types, we curry function types $(A_1\times\ldots A_n)\to B$ to $A_1\to\ldots\to A_n\to B$. We treat lambda-expressions and applications accordingly.

For example: \[f =\lambda(n,m) : \mathbb N \times\mathbb N .\; n + m\text{ becomes } f =\lambda n : \mathbb N.\;\lambda m : \mathbb N .\; n + m\]\[f(\langle a, b\rangle)\text{ becomes }f(a,b).\]

\item Since judgments -- in our imports -- correspond to constants of type $\vdash P$ for some proposition $P$, we can use the curry-howard correspondence to equate the type $\vdash(P \Rightarrow Q)$ with the function type $(\vdash P)\to(\vdash Q)$, as well as the judgment $\vdash\forall x : A.\;P$ with the function type $(x:A)\to\vdash P$.

Since both styles of formalization are more or less preferred in different systems, we replace each occurence of the former by the latter.

\item For classical logics, we rewrite all remaining logical connectives using their usual definitions on the basis of negation and conjunction only. Double negations are eliminated.

\item Typed Equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality.

For example, the typed equality $\oma{\cn{equals}}{\mathbb N,2,1+1}$ becomes $\oma{\cn{equals}}{2,1+1}$, allowing to match the same equation on e.g. the real numbers.

\item The arguments of conjunctions and equalities are ordered (currently only by their number of subterms).

\end{itemize}

\end{newpart}

\subsection{Implementation}\label{sec:usecase}

The Viewfinder algorithm is implemented in the MMT system and exposed within the jEdit-IDE, allowing us to realize the use case

stated in the introduction. A screenshot of Jane's theory of beautiful sets is given in Figure \ref{fig:use:source}; it

is based on the (basic higher-order logic) foundation of the Math-in-the-Middle library (\cite{ODK:mitm:16} developed natively in MMT.

...

...

@@ -16,26 +37,22 @@ Right-clicking anywhere within the theory allows Jane to select \cn{MMT} $\to$ \

\caption{The Theory of Matroids in the MitM Library}\label{fig:use:target}

We have so far assumed one fixed meta-theory for all theories involved; we will now discuss the situation when looking for views between theories in different libraries (and built on different foundations).

Obviously, various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:

\begin{itemize}

\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the abstract syntax trees (e.g. alignment-based translations as in \cite{MRLR:alignments:17})

\item We can do additional transformations before preprocessing theories, auch as normalizing expressions, eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.

\item We can do additional transformations before preprocessing theories, such as eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.

\end{itemize}

When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.

\subsection{Normalization}\label{sec:preproc}

\subsection{Normalization in PVS}\label{sec:normalizeinter}

The common logical framework used for all the libraries at our disposal -- namely LF and extensions thereof -- makes it easy to systematically normalize theories built on various logical foundations. We currently use the following approaches to preprocessing theories:

\begin{itemize}

\item Free variables in a term, often occurences of theory parameters as e.g. used extensively in the PVS system, are replaced by holes.

\itemFor foundations that use product types, we curry function types $(A_1\times\ldots A_n)\to B$ to $A_1\to\ldots\to A_n\to B$. We treat lambda-expressions and applications accordingly.

\itemWe curry function types $(A_1\times\ldots A_n)\to B$ to $A_1\to\ldots\to A_n\to B$. We treat lambda-expressions and applications accordingly.

For example: \[f =\lambda(n,m) : \mathbb N \times\mathbb N .\; n + m\text{ becomes } f =\lambda n : \mathbb N.\;\lambda m : \mathbb N .\; n + m\]\[f(\langle a, b\rangle)\text{ becomes }f(a,b).\]

\item Higher-order abstract syntax encodings are eliminated by raising atomic types, function types, applications and lambdas to the level of the logical framework. This eliminates (redundant) implicit arguments that only occur due to their formalization in the logical framework.

...

...

@@ -45,14 +62,6 @@ The common logical framework used for all the libraries at our disposal -- name

Hence every application of a function in pvs -- as \mmt-term: $\oma{\cn{pvsapply}}{A,B,f,a}$ -- has two additional implicit arguments, which we eliminate by replacing this expression by $\oma{f}{a}$.

This prevents false negatives due to mismatching type arguments in the presence of subtyping.

\item Since judgments -- in our imports -- correspond to constants of type $\vdash P$ for some proposition $P$, we can use the curry-howard correspondence to equate the type $\vdash(P \Rightarrow Q)$ with the function type $(\vdash P)\to(\vdash Q)$, as well as the judgment $\vdash\forall x : A.\;P$ with the function type $(x:A)\to\vdash P$.

Since both styles of formalization are more or less preferred in different systems, we replace each occurence of the former by the latter.

\item For classical logics, we rewrite all remaining logical connectives using their usual definitions on the basis of negation and conjunction only. Double negations are eliminated.

\item Typed Equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality.

For example, the typed equality $\oma{\cn{equals}}{\mathbb N,2,1+1}$ becomes $\oma{\cn{equals}}{2,1+1}$, allowing to match the same equation on e.g. the real numbers.

\item The arguments of conjunctions and equalities are ordered (currently only by their number of subterms).

\end{itemize}

\subsection{Implementation}\label{sec:pvs}

...

...

@@ -64,6 +73,4 @@ The common logical framework used for all the libraries at our disposal -- name

\end{figure}

\ednote{8 results for NASA, but NASA doesn't work in jEdit because of limited memory}

This example also hints at a way to iteratively improve the results of the viewfinder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more views.

\end{newpart}

\ No newline at end of file

This example also hints at a way to iteratively improve the results of the viewfinder: since we can find properties like commutativity and associativity, we can use the results to in turn inform a better normalization of the theory by exploiting these properties. This in turn would potentially allow for finding more views.

@@ -2,8 +2,7 @@ Let $C$ be a corpus of theories with the same fixed meta-theory $M$.

We call the problem of finding theory morphisms between theories of $C$ the \defemph{view finding problem} and an algorithm that solves it a \defemph{view finder}.

Note that a view finder is sufficient to solve the theory classification use case from the introduction:

Jane provides a $M$-theory $Q$ of beautiful sets, the view finder computes all (total) views from $Q$ into $C$, and returns presentations of target theories and the assignments made by the views.

\ednote{FR@DM: standardize whether we call it morphism or view}

Jane provides a $M$-theory $Q$ of beautiful sets, the view finder computes all (total) views from $Q$ into $C$, and returns presentations of target theories and the assignments made by the morphisms.

\paragraph{Efficiency Considerations}

The cost of this problem quickly explodes.

...

...

@@ -65,13 +64,15 @@ In particular, the long tree does not merge duplicate occurrences of the same co

The \textbf{short} abstract syntax tree for $E$ arises from the long one by removing all duplicates from $s$ and replacing the $C_i$ accordingly.

\begin{example}

Consider again the term $\lambda x,y:\mathbb N.\;(x +\cn{one})\cdot(y+\cn{one})$ with internal representation $\ombind{\lambda}{x:\mathbb N,y : \mathbb N}{\oma{\cdot}{\oma{+}{x,\cn{one}},\oma{+}{y,\cn{one}}}}$.

Consider again the axiom $\forall x:\cn{set},y:\cn{set}.\;\cn{beautiful}(x)\wedge y \subseteq x \Rightarrow\cn{beautiful}(y)$ with internal representation \[\ombind{\forall}{x:\cn{set},y:\cn{set}}{\oma{\Rightarrow}{\oma{\wedge}{\oma{\cn{beautiful}}{x},\oma{\subseteq}{y,x}},\oma{\cn{beautiful}}{y}}}.\]

The \emph{short} syntax tree and list of constants associated with this term would be:

For our algorithm, we pick the \emph{long} abstract syntax tree, which may appear surprising.

The reason is that shortness is not preserved when applying a simple morphism: whenever a morphism maps two different constants to the same constant, the resulting tree is not short anymore.

...

...

@@ -116,6 +117,13 @@ We define a morphism by starting with $\sigma=c\mapsto c'$ and recursively addin

If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple morphism from $\Sigma$ to $\Sigma'$.

\end{lemma}

\begin{example}\label{ex:beautifinite} Consider two constants $c,c'$ with types $\forall x:\cn{set},y:\cn{set}.\;\cn{beautiful}(x)\wedge y \subseteq x \Rightarrow\cn{beautiful}(y)$ and $\forall x:\cn{powerset},y:\cn{powerset}.\;\cn{finite}(x)\wedge y \subseteq x \Rightarrow\cn{finite}(y)$. Their syntax trees are

Since $t=t'$, we set $c\mapsto c'$ and compare $s$ with $s'$, meaning we check (ignoring duplicates) that $\forall\mapsto\forall$, $\cn{set}\mapsto\cn{powerset}$, $\Rightarrow\mapsto\Rightarrow$, $\wedge\mapsto\wedge$, $\cn{beautiful}\mapsto\cn{finite}$ and $\subseteq\mapsto\subseteq$ are all valid.

\end{example}

To find all morphisms from $\Sigma$ to $\Sigma'$, we first run the core algorithm on every pair of $\Sigma$-constants and $\Sigma'$-constants.

This usually does not yield big morphisms yet.

For example, consider the typical case where theories contain some symbol declarations and some axioms, in which the symbols occur.

...

...

@@ -127,9 +135,12 @@ We call two partial morphisms \textbf{compatible} if they agree on all constants

The union of compatible well-typed morphisms is again well-typed.

\end{lemma}

\ednote{@DM: example here}

\begin{example}

Consider the partial morphism from Example \ref{ex:beautifinite} and imagine a second partial morphism for the axioms

$\cn{beautiful}(\emptyset)$ and $\cn{finite}(\emptyset)$. The former has the requirements

The latter requires only $\cn{set}\mapsto\cn{powerset}$ and $\emptyset\mapsto\emptyset$. Since both morphisms agree on all assignments, we can merge them into one morphism mapping both axioms and all requirements of both.