@@ -10,6 +10,7 @@ To increase efficiency, we plan to explore term indexing techniques~\cite{Graf:t

The latter will be important for the library refactoring and merging applications which look for all possible (partial and total) views in one or between two libraries.

As such library-scale operations will have to be run together with theory flattening to a fixed point and re-run upon every addition to the library, it will be important to integrate them with the \MMT build system and change management processes~\cite{am:doceng10,iancu:msc}.

\ifshort

\paragraph{Enabled Applications}

Our work enables a number of advanced applications.

Maybe surprisingly, a major bottleneck here concerns less the algorithm or software design

...

...

@@ -28,6 +29,7 @@ Note that we would need to keep book on our transformations during preprocessing

A useful interface might specifically prioritize views into theories on top of which there are many theorems and definitions that have been discovered.

\end{compactitem}

Note that even though the algorithm is in principle symmetric, some aspects often depend on the direction --- e.g. how we preprocess the theories, which constants we use as starting points or how we aggregate and evaluate the resulting (partial) views (see Sections \ref{sec:algparams}, \ref{sec:normalizeintra} and \ref{sec:normalizeinter}).

\else\fi

\paragraph{Acknowledgments}

The authors gratefully acknowledge financial support from the OpenDreamKit Horizon 2020

@@ -55,8 +55,6 @@ Complex expressions are of the form $\ombind{o}{x_1:t_1,\ldots,x_m:t_m}{a_1,\ldo

The bound variable context may be empty, and we write $\oma{o}{\vec{a}}$ instead of $\ombind{o}{\cdot}{\vec{a}}$.

For example, the axiom $\forall x:\cn{set},y:\cn{set}.\;\cn{beautiful}(x)\wedge y

\subseteq x \Rightarrow\cn{beautiful}(y)$ would instead be written as \[\ombind{\forall}{x:\cn{set},y:\cn{set}}{\oma{\Rightarrow}{\oma{\wedge}{\oma{\cn{beautiful}}{x},\oma{\subseteq}{y,x}},\oma{\cn{beautiful}}{y}}}\]

%For example, the second axiom ``Every subset of a beautiful set is beautiful'' (i.e. the term $\forall s,t : \cn{set\ }X.\;\cn{beautiful}(s)\wedge t \subseteq s \Rightarrow \cn{beautiful}(t)$) would be written as

Finally, we remark on a few additional features of the MMT language that are important for large-scale case studies but not critical to understand the basic intuitions of results.

MMT provides a module system that allows theories to instantiate and import each other. The module system is conservative: every theory can be \emph{elaborated} into one that only declares constants.

...

...

@@ -85,7 +83,7 @@ Defined constants can be eliminated by definition expansion.

%\hline

%\end{tabular}

%\end{center}

%\end{example

%\end{example}

%\begin{oldpart}{FR: replaced with the above}

%For the purposes of this paper, we will work with the (only slightly simplified) grammar given in Figure \ref{fig:mmtgrammar}.

...

...

@@ -131,61 +129,61 @@ Defined constants can be eliminated by definition expansion.

%

%\end{oldpart}

%\subsection{Proof Assistant Libraries in MMT}\label{sec:oaf}

%

%As part of the OAF project~\cite{OAFproject:on}, we have imported several proof assistant libraries into the MMT system. To motivate some of the design choices made in this paper, we will outline the general procedure behind these imports.

%

%\paragraph{} First, we formalize the core logical foundation of the system. We do so by using the logical framework LF~\cite{lf} (at its core a dependently-typed lambda calculus) and various extensions thereof, which are implemented in and supported by the MMT system. In LF, we can formalize the foundational primitives using the usual judgments-as-types and higher-order abstract syntax encodings -- hence theorems and axioms are represented as constants with a type $\vdash P$ for some proposition $P$, and primitive constructs like lambdas are formalized as LF-functions taking LF-lambda-expressions -- which serve as a general encoding of any variable binders -- as arguments.

%

%The resulting formalizations are then used as meta-theory for imports of the libraries of the system under consideration. This results in a theory graph as in Figure \ref{fig:oaf}.

%

%\begin{figure}[ht]\centering

% \begin{tikzpicture}

% \node (MMT) at (2,2.5) {MMT};

%

% \draw[fill=orange!40] (2,1) ellipse (1.5cm and .6cm);

% \node[color=orange] at (-3.3,1) {Logical Frameworks};

% \node (L) at (1,1) {LF};

% \node (Lx) at (3,1) {LF+X};

% \draw[arrow](MMT) -- (L);

% \draw[arrow](MMT) -- (Lx);

% \draw[arrow](L) -- (Lx);

%

% \draw[fill=red!40] (2,-.5) ellipse (3.2cm and .6cm);

% \node[color=red] at (-3.3,-.5) {Foundations};

% \node at (2,-.7) {\ldots};

%

% \draw[fill=blue!40] (0,-2.25) ellipse (1.9cm and .8cm);

%

% \node (H) at (0,-.5) {HOL Light};

% \node[color=blue!80] at (-3.5,-2) {HOL Light library};

% \node (B) at (-1,-2) {Bool};

% \node (A) at (1,-2) {Arith};

% \node (E) at (0,-2.5) {\ldots};

% \draw[arrow](L) -- (H);

% \draw[arrow](H) -- (B);

% \draw[arrow](H) -- (A);

% \draw[arrow](B) -- (A);

%

% \draw[fill=olive!40] (4,-2.25) ellipse (1.9cm and .8cm);

%

% \node (M) at (4,-.5) {PVS};

% \node[color=olive] at (-3.3,-2.5) {PVS library};

% \node (B') at (3,-2) {Booleans};

% \node (A') at (5,-2) {Reals};

% \node (E') at (4,-2.5) {\ldots};

%

% \node (A) at (1,-2) {Arith};

% \node (E) at (0,-2.5) {\ldots};

% \draw[arrow](Lx) -- (M);

% \draw[arrow](M) -- (B');

% \draw[arrow](M) -- (A');

% \draw[arrow](B') -- (A');

% \end{tikzpicture}

% \caption{A (Simplified) Theory Graph for the OAF Project}\label{fig:oaf}

%\end{figure}

\ifshort\else

\subsection{Proof Assistant Libraries in MMT}\label{sec:oaf}

As part of the OAF project~\cite{OAFproject:on}, we have imported several proof assistant libraries into the MMT system. To motivate some of the design choices made in this paper, we will outline the general procedure behind these imports.

\paragraph{} First, we formalize the core logical foundation of the system. We do so by using the logical framework LF~\cite{lf} (at its core a dependently-typed lambda calculus) and various extensions thereof, which are implemented in and supported by the MMT system. In LF, we can formalize the foundational primitives using the usual judgments-as-types and higher-order abstract syntax encodings -- hence theorems and axioms are represented as constants with a type $\vdash P$ for some proposition $P$, and primitive constructs like lambdas are formalized as LF-functions taking LF-lambda-expressions -- which serve as a general encoding of any variable binders -- as arguments.

The resulting formalizations are then used as meta-theory for imports of the libraries of the system under consideration. This results in a theory graph as in Figure \ref{fig:oaf}.

\begin{figure}[ht]\centering

\begin{tikzpicture}

\node (MMT) at (2,2.5) {MMT};

\draw[fill=orange!40] (2,1) ellipse (1.5cm and .6cm);

\node[color=orange] at (-3.3,1) {Logical Frameworks};

\node (L) at (1,1) {LF};

\node (Lx) at (3,1) {LF+X};

\draw[arrow](MMT) -- (L);

\draw[arrow](MMT) -- (Lx);

\draw[arrow](L) -- (Lx);

\draw[fill=red!40] (2,-.5) ellipse (3.2cm and .6cm);

\node[color=red] at (-3.3,-.5) {Foundations};

\node at (2,-.7) {\ldots};

\draw[fill=blue!40] (0,-2.25) ellipse (1.9cm and .8cm);

\node (H) at (0,-.5) {HOL Light};

\node[color=blue!80] at (-3.5,-2) {HOL Light library};

\node (B) at (-1,-2) {Bool};

\node (A) at (1,-2) {Arith};

\node (E) at (0,-2.5) {\ldots};

\draw[arrow](L) -- (H);

\draw[arrow](H) -- (B);

\draw[arrow](H) -- (A);

\draw[arrow](B) -- (A);

\draw[fill=olive!40] (4,-2.25) ellipse (1.9cm and .8cm);

\node (M) at (4,-.5) {PVS};

\node[color=olive] at (-3.3,-2.5) {PVS library};

\node (B') at (3,-2) {Booleans};

\node (A') at (5,-2) {Reals};

\node (E') at (4,-2.5) {\ldots};

\node (A) at (1,-2) {Arith};

\node (E) at (0,-2.5) {\ldots};

\draw[arrow](Lx) -- (M);

\draw[arrow](M) -- (B');

\draw[arrow](M) -- (A');

\draw[arrow](B') -- (A');

\end{tikzpicture}

\caption{A (Simplified) Theory Graph for the OAF Project}\label{fig:oaf}

@@ -12,16 +12,6 @@ PVS~\cite{pvs} is a proof assistant under active development based on a higher-o

In addition to the \emph{Prelude} library, which contains the most common domains of mathematical discourse and is shipped with PVS itself, there is a large library of formal mathematics developed and maintained by NASA~\cite{PVSlibraries:on}.

In \cite{KMOR:pvs:17}, we represent PVS as a meta-theory in MMT and implemented a translator that transforms both libraries into MMT format. We use a meta-view that embeds MitM's higher-order logic into PVS's higher-order logic and make sure that we normalize PVS-formulas in the same way as MitM-formulas.

%\begin{oldpart}{MK: this sounds like a left-over from the time we had a PVS->HOL-Light use case. I think this can just be eft out. or reduce to the last sentence and leave out the paragrpah}

%\paragraph{Formula Normalization}

%Naturally advanced features of the PVS type system such as predicate subtyping, record types, inductive types will not be part of views from MitM, which does not have corresponding features.

%Therefore, our view finder can mostly ignore them.

%

%Instead, we only have to normalize PVS formulas to the extent that they use logical features that correspond to those of MitM.

%This is particular the basic higher-order logic.

%Thus, we use a meta-view that embeds MitM's higher-order logic into PVS's higher-order logic and make sure that we normalize PVS-formulas in the same way as MitM-formulas.

%\end{oldpart}

\paragraph{Theory Structure Normalization}

PVS's complex and prevalently used parametric theories critically affect view finding because they affect the structure of theories.

For example, the theory of groups \cn{group\_def} in the NASA library has three theory parameters $(\cn T,\ast,\cn{one})$ for the signature of groups, and includes the theory \cn{monoid\_def} with the same parameters, and then declares the axioms for a group in terms of these parameters.

@@ -60,22 +60,6 @@ Above multiple normalization steps make use of a total order on abstract syntax

We omit the details and only remark that we try to avoid using the names of constants in the definition of the order --- otherwise, declarations that could be matched by a view would be normalized differently.

Even when breaking ties between requires comparing two constants, we can first try to recursively compare the syntax trees of their types.

%In addition to the above optimizations in the algorithm, we can normalize the theories themselves in various ways to improve the number and quality of the morphisms found. To accommodate these, we add two additional features to the algorithm:

%\begin{enumerate}

% \item Some normalizations -- such as elaborating definitions -- may also reduce the number of results, for example if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

% \item Similarly, certain idiosyncrasies (such as PVS's common usage of theory parameters, see Section \ref{sec:normalizeinter}) call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree. These can be unified with any other symbol or hole by the core algorithm, but are not recursed into and do not count as a requirement. The subterms that are to be considered holes can be marked as such during preprocessing or normalizing.

%\end{enumerate}

%

%The common logical framework LF makes it easy to systematically normalize theories built on various logical foundations. On the basis of the above features, we currently use the following approaches to normalizing theories:

%\begin{itemize}

% \item Free variables in a term are replaced by holes.

% \item Since judgments -- in our imports -- correspond to constants of type $\vdash P$ for some proposition $P$, we can use the Curry/Howard correspondence to equate the type $\vdash (P \Rightarrow Q)$ with the function type $(\vdash P) \to (\vdash Q)$, as well as the judgment $\vdash \forall x : A.\;P$ with the function type $(x:A) \to \vdash P$. We arbitrarily normalize to the former.

% \item For classical logics, we rewrite all remaining logical connectives using their usual definitions on the basis of negation and conjunction only. Double negations are eliminated.

% \item Typed equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality. For example, $\oma{\cn{equals}}{\mathbb N,2,1+1}$ becomes $\oma{\cn{equals}}{2,1+1}$, allowing to match the same equation on e.g. $\mathbb{R}$.

% \item The arguments of conjunctions and equalities are ordered (currently only by their number of subterms).

%\end{itemize}

\paragraph{Abstract Syntax Trees}

We define \textbf{abstract syntax trees} as pairs $(t,s)$ where $t$ is subject to the grammar

@@ -107,23 +91,6 @@ The reason is that shortness is not preserved when applying a simple view: whene

Length, on the other hand, is preserved.

The disadvantage that long trees take more time to traverse is outweighed by the advantage that we never have to renormalize the trees.

%\begin{oldpart}{FR: replaced with the above}

%

%\subsection{The General Algorithm} The aim is to find typing-preserving morphisms between theories, i.e. given a constant $C:t$ in a theory $T_1$, we want to find a view $V:T_1\to T_2$ such that if $V(C)=C'$ and $C':t'\in T_2$, we have $V(t)=t'$. Correspondingly, we need to consider the types of the constants in our two theories, which we assume to be flat.

%

%To run a proper unification algorithm is in our situation infeasible, since the flat version of a theory can become prohibitively large (and obviously finding two unifiable types in two theories is a search problem quadratic in the number of constants). To solve that problem, we first preprocess our theories such that pre-selecting plausibly ``unifiable'' constants becomes as fast as possible.

%

%\paragraph{} We do so by first transforming each constant $C$ in a theory to an \emph{encoding} $(\cn{h}(C),\cn{syms}(C))$ the following way:

%

%Consider the syntax tree of the type $t$ of a constant $C$. We first systematically replace the leaves by an abstract representation, yielding a data structure $\cn{tree}(C)$. We can eliminate variables by replacing them by their De Bruijn index, and symbol references by enumerating them and storing the symbol's names in a list $\cn{syms}(C)$.

%

%As a result, we get a pair $(\cn{tree}(C),\cn{syms}(C))$. Now an assignment $V(C)=D$ is valid, iff $\cn{tree}(C)=\cn{tree}(D)$, the lists $\cn{syms}(C)$ and $\cn{syms}(D)$ have the same length and their pairwise assignments $V(\cn{syms}(C)_i)=\cn{syms}(D)_i$ are all valid.

%

%Furthermore, since we only need $\cn{tree}(C)$ for an equality check, we can immediately replace $\cn{tree}(C)$ by an integer hashcode $\cn{h}(C)$.

%

%Now given a constant $C\in T_1$, we can find valid matches in $T_2$ by computing the encodings for each constant in $T_1$ and $T_2$, taking the list of constants $C'\in T_2$ with $\cn{h}(C')=\cn{h}(C)$ and recursing into their respective lists of symbols.

%\end{oldpart}

\subsection{Search}

Consider two constants $c:E$ and $c':E'$, where $E$ and $E'$ are preprocessed into long abstract syntax trees $(t,s)$ and $(t',s')$.

...

...

@@ -144,6 +111,9 @@ We define a view by starting with $\sigma=c\mapsto c'$ and recursively adding al

If the above algorithm succeeds, then $\sigma$ is a well-typed partial simple view from $\Sigma$ to $\Sigma'$.

\end{lemma}

\ifshort\else\begin{proof}

\end{proof}\fi

\begin{example}\label{ex:beautifinite} Consider two constants $c$ and $c'$ with types $\forall x:\cn{set},y:\cn{set}.\;\cn{beau}\-\cn{tiful}(x)\wedge y \subseteq x \Rightarrow\cn{beautiful}(y)$ and $\forall x:\cn{powerset},y:\cn{powerset}.\;\cn{finite}(x)\wedge y \subseteq x \Rightarrow\cn{finite}(y)$. Their syntax trees are

@@ -163,6 +133,9 @@ We call two partial views \textbf{compatible} if they agree on all constants for

The union of compatible well-typed views is again well-typed.

\end{lemma}

\ifshort\else\begin{proof}

\end{proof}\fi

\begin{example}

Consider the partial view from Example \ref{ex:beautifinite} and imagine a second partial view for the axioms

$\cn{beautiful}(\emptyset)$ and $\cn{finite}(\emptyset)$. The former has the requirements

...

...

@@ -206,6 +179,21 @@ If implemented naively, that would yield a quadratic blow-up in the number of vi

Instead, when running our algorithm on an entire library, we only consider views between theories that are not imported into other theories.

In an additional postprocessing phase, the domain and codomain of each found partial view $\sigma$ are adjusted to the minimal theories that make $\sigma$ well-typed.

%In addition to the above optimizations in the algorithm, we can normalize the theories themselves in various ways to improve the number and quality of the morphisms found. To accommodate these, we add two additional features to the algorithm:

%\begin{enumerate}

% \item Some normalizations -- such as elaborating definitions -- may also reduce the number of results, for example if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.

% \item Similarly, certain idiosyncrasies (such as PVS's common usage of theory parameters, see Section \ref{sec:normalizeinter}) call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree. These can be unified with any other symbol or hole by the core algorithm, but are not recursed into and do not count as a requirement. The subterms that are to be considered holes can be marked as such during preprocessing or normalizing.

%\end{enumerate}

%

%The (for our purposes assumed) common logical framework LF makes it easy to systematically normalize theories built on various logical foundations. On the basis of the above features, we currently use the following approaches to normalizing theories:

%\begin{itemize}

% \item Free variables in a term are replaced by holes.

% \item Since judgments -- in our imports -- correspond to constants of type $\vdash P$ for some proposition $P$, we can use the Curry/Howard correspondence to equate the type $\vdash (P \Rightarrow Q)$ with the function type $(\vdash P) \to (\vdash Q)$, as well as the judgment $\vdash \forall x : A.\;P$ with the function type $(x:A) \to \vdash P$. We arbitrarily normalize to the former.

% \item For classical logics, we rewrite all remaining logical connectives using their usual definitions on the basis of negation and conjunction only. Double negations are eliminated.

% \item Typed equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality. For example, $\oma{\cn{equals}}{\mathbb N,2,1+1}$ becomes $\oma{\cn{equals}}{2,1+1}$, allowing to match the same equation on e.g. $\mathbb{R}$.

% \item The arguments of conjunctions and equalities are ordered (currently only by their number of subterms).

\caption{``Beautiful Sets'' in MMT Surface Syntax}\label{fig:use:source}

...

...

@@ -230,9 +218,11 @@ The latter menu offers a choice of known libraries in which the view finder shou

\noindent After choosing \cn{MitM/smglom}, the view finder finds two views (within less than one second) and shows them (Figure~\ref{fig:use:results}).

The first of these ($\cn{View1}$) has a theory for matroids as its codomain, which is given in Figure \ref{fig:use:target}.

Inspecting that theory and the assignments in the view, we see that it indeed represents the well-known correspondence between beautiful sets and matroids.

%The latter uses predefined propositions in its axioms and uses a type \cn{coll} for the collection of sets, while the former has the statements of the axioms directly in the theory and uses a predicate \cn{beautiful}.

%Additionally, the implication that beautiful sets (or sets in a matroid) are finite is stated as a logical formula in the former, while the latter uses the Curry/Howard correspondence.

\ifshort\else

The latter uses predefined propositions in its axioms and uses a type \cn{coll} for the collection of sets, while the former has the statements of the axioms directly in the theory and uses a predicate \cn{beautiful} -- since (typed) sets are defined as predicates, definition expansion is required for matching.

Additionally, the implication that beautiful sets (or sets in a matroid) are finite is stated as a logical formula in the former, while the latter uses the Curry/Howard correspondence.