Commit 935d6960 authored by Michael Kohlhase's avatar Michael Kohlhase
Browse files

Merge branch 'master' of gl.kwarc.info:oaf/alignment-finder

parents b0a6f80e f7a206d0
\begin{newpart}{moved}
\subsection{Normalization}\label{sec:normalizeintra}
When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.
In addition to the above optimizations in the algorithm, we can normalize the theories themselves in various ways to improve the number and quality of the morphisms found. To accomodate these, we add two additional features to the algorithm:
\begin{enumerate}
\item Some normalizations -- such as elaborating definitions -- may also reduce the number of results, for example if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow \textbf{several abstract syntax trees for the same constant}, such as one with definitions expanded and one ``as is''.
\item Similarly, certain idiosyncracies (such as PVS's common usage of theory parameters, see Section \ref{sec:normalizeinter}) call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree, which may be unified with any other symbol or hole by the core algorithm, but are not recursed into and do not count as a requirement. The subterms that are to be considered holes can be marked as such during preprocessing or normalizing.
\end{enumerate}
Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for \textbf{holes} in the constant lists of an abstract syntax tree, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.
The common logical framework used for all the libraries at our disposal -- namely LF and extensions thereof -- makes it easy to systematically normalize theories built on various logical foundations. On the basis of the above features, we currently use the following approaches to normalizing theories:
\begin{itemize}
\item Free variables in a term are replaced by holes.
\item For foundations that use product types, we curry function types $(A_1 \times\ldots A_n)\to B$ to $A_1 \to \ldots \to A_n\to B$. We treat lambda-expressions and applications accordingly.
For example: \[f = \lambda (n,m) : \mathbb N \times \mathbb N .\; n + m\text{ becomes } f = \lambda n : \mathbb N.\; \lambda m : \mathbb N .\; n + m\] \[f(\langle a, b\rangle)\text{ becomes }f(a,b).\]
......@@ -32,6 +37,8 @@ is based on the (basic higher-order logic) foundation of the Math-in-the-Middle
Right-clicking anywhere within the theory allows Jane to select \cn{MMT} $\to$ \cn{Find\ Views\ to...} $\to$ \cn{MitM/smglom} (the main Math-in-the-Middle library), telling her (within less than one second) that two views have been found, the most promising of which points to the theory
\cn{matroid\_theory} (see Figure \ref{fig:use:target}) in the library.
Note that the latter makes use of predefined propositions in its axioms and uses a type \cn{coll} for the collection of sets, while the former has the statements of the axioms directly in the theory and uses a predicate \cn{beautiful}. Additionally, the implication that beautiful sets (or sets in a matroid) are finite is stated as a logical formula in the former, while the latter uses the curry-howard correspondence.
\begin{figure}[ht]\centering
\fbox{\includegraphics[width=0.6\textwidth]{matroids}}
\caption{The Theory of Matroids in the MitM Library}\label{fig:use:target}
......@@ -39,19 +46,50 @@ Right-clicking anywhere within the theory allows Jane to select \cn{MMT} $\to$ \
\section{Inter-Library Viewfinding}\label{sec:across}
We have so far assumed one fixed meta-theory for all theories involved; we will now discuss the situation when looking for views between theories in different libraries (and built on different foundations).
We have so far assumed one fixed meta-theory for all theories involved; we will now discuss the situation when looking for morphisms between theories in different libraries (and built on different foundations).
Obviously, various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:
\begin{itemize}
\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the abstract syntax trees (e.g. alignment-based translations as in \cite{MRLR:alignments:17})
\item We can do additional transformations before preprocessing theories, such as eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.
\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary meta-morphism between theories -- especially (but not exclusively) on the meta-theories -- which are applied before computing the abstract syntax trees (e.g. alignment-based translations as in \cite{MRLR:alignments:17})
\item We can apply additional foundation-specific normalizations before preprocessing theories, such as eliminating higher-order abstract syntax encodings or implicit arguments, or elaborating specific abbreviations/definitions.
\end{itemize}
\begin{newpart}{DM}
The normalizations mentioned in Section \ref{sec:normalizeintra} already suggest equating the involved logical primitives (such as logical connectives) via a meta-morphism.
Foundation-specific normalizations specifically for finding morphisms \emph{across} libraries is to our knowledge an as-of-yet unexplored field of investigation. Every formal system has certain unique idiosyncracies, best practices or widely used features; finding an ideal normalization method is a correspondingly difficult domain-specific problem.
We will discuss some of our findings specifically regarding the PVS library as a case study.
\subsection{Normalization in PVS}\label{sec:normalizeinter}
PVS~\cite{pvs} is a proof assistant under active development, based on a higher-order logic with predicate subtyping and various convenience features such as record types, update expressions and inductive datatypes. In addition to the \emph{Prelude} library, which contains the most common domains of mathematical discourse and is shipped with PVS itself, there is a large library of formal mathematics developed and maintained by NASA~\cite{nasa}.
\paragraph{} While features like subtyping and records are interesting challenges, we will concentrate on one specific idiosyncracy in PVS -- its prevalent use of \emph{theory parameters}.
In practice, theory parameters are often used in PVS for the signature of an abstract theory. For example, the theory of groups \cn{group\_def} in the NASA library has three theory parameters $(\cn T,\ast,\cn{one})$ for the signature, and includes the theory \cn{monoid\_def} with the same parameters; the axioms for a group are then formalized as a predicate on the theory parameters.
Given that the same practice is used in few other systems (if any), finding views without treating theory parameters in some way will not give us any useful results on these theories. We offer three approaches to handling these situations:
\begin{enumerate}
\item We can simply treat references to theory parameters as free variables and turn them into holes. Includes of parametric theories with arguments are turned into simple includes.
\item \emph{Covariant treatment:} We introduce new constants for the theory parameters and replace occurences of the parameters by constant references. Includes with parameters are again replaced by normal includes.
In the above mentioned theory \cn{group\_def}, we would hence add three new constants \cn T, $\ast$ and \cn{one} with their corresponding types.
\item \emph{Contravariant treatment:} Theory parameters are eliminated by binding them as arguments to \emph{each constant in the theory}. References to the treated constants are replaced by applications of the symbols to the parameters of the original include.
In the above mentioned theory \cn{group\_def}, we would change e.g. the unary predicate \cn{inverse\_exists?} with type $T\to\cn{bool}$ to a function with type $(T : \cn{pvstype})\to(\ast : T \to T \to T)\to (\cn{one}:T)\to(T\to\cn{bool})$.
An include of $\cn{group\_def}(S,\circ,e)$ in some other theory, e.g. \cn{group}, would be replaced by a simple include, but occurences of \cn{inverse\_exists?} in \cn{group} would be replaced by $\oma{\cn{inverse\_exists?}}{S,\circ,e}$.
\end{enumerate}
The first approach is the most straight-forward, but will lead to many false positives and negatives.
We conjecture that the second approach is most useful for inter-library search, since it most closely corresponds to formalizations of abstract theories in other systems. A problem here is that the newly introduced constants are not pass on to includes without additional tranformations.
The third approach would turn every occurence of e.g. group-related symbols into function applications, which is a rather rare practice in most other systems. However, since this treatment of theory parameters comes closest to the semantics of the parameters, we conjecture that it is the most useful approach for intra-library viewfinding between PVS theories.
\paragraph{} Additionally to the theory parameter related normlalization, we use the following techniques:
The common logical framework used for all the libraries at our disposal -- namely LF and extensions thereof -- makes it easy to systematically normalize theories built on various logical foundations. We currently use the following approaches to preprocessing theories:
\begin{itemize}
\item Free variables in a term, often occurences of theory parameters as e.g. used extensively in the PVS system, are replaced by holes.
\item We curry function types $(A_1 \times\ldots A_n)\to B$ to $A_1 \to \ldots \to A_n\to B$. We treat lambda-expressions and applications accordingly.
For example: \[f = \lambda (n,m) : \mathbb N \times \mathbb N .\; n + m\text{ becomes } f = \lambda n : \mathbb N.\; \lambda m : \mathbb N .\; n + m\] \[f(\langle a, b\rangle)\text{ becomes }f(a,b).\]
......@@ -63,6 +101,7 @@ The common logical framework used for all the libraries at our disposal -- name
This prevents false negatives due to mismatching type arguments in the presence of subtyping.
\end{itemize}
\end{newpart}
\subsection{Implementation}\label{sec:pvs}
\paragraph{} Using the above normalization methods, we can examplary write down a theory for a commutative binary operator using the Math-in-the-Middle foundation, while targeting e.g. the PVS Prelude library -- allowing us to find all commutative operators, as in Figure \ref{fig:use:pvs}.
......
......@@ -129,7 +129,7 @@ This usually does not yield big morphisms yet.
For example, consider the typical case where theories contain some symbol declarations and some axioms, in which the symbols occur.
Then the core algorithm will only find morphisms that map at most $1$ axiom.
But we can use these small morphisms as building blocks to construct larger, possibly total ones:
Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \cite{KKMR:alignments:16}). But we can also use these small morphisms as building blocks to construct larger, possibly total ones:
\begin{lemma}[Amalgamating Morphisms]
We call two partial morphisms \textbf{compatible} if they agree on all constants for which both provide an assignment.
......@@ -179,9 +179,7 @@ The symbols will hence be encoded in the component $t$ instead of the list $s$.
\end{itemize}
\paragraph{Picking starting theories:} If we try to find views between whole collections of theories, we can obviously disregard all theories that are already included in some other theory in our collections, since we work with a normalized (and dependency-closed) version of a theory. Consequently, by only using maximal theories we do not find any fewer views but speed up the search significantly.
\begin{newpart}{DM}
\paragraph{Morphism aggregation:} The morphisms found by the algorithm are always induced by a single assignment $c\mapsto c'$. Depending on what we intend to do with the results, we might prefer to consider them individually (e.g. to yield \emph{alignments} in the sense of \cite{KKMR:alignments:16}), aggregate them into ideally total views (by merging compatible morphisms, where two morphisms $v_1,v_2$ are compatible if there are no assignments $(c\mapsto c_1)\in v_1$ and $(c\mapsto c_2)\in v_2$ with $c_1\neq c_2$) with varying degrees of modularity.
\end{newpart}
\paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $t$ of an abstract syntax tree, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories incolved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the viewfinding process itself rather efficent.
%%% Local Variables:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment