Commit da2b72b8 authored by Michael Kohlhase's avatar Michael Kohlhase
Browse files

shortened the intro, spellcheck

parent c6bb38e6
......@@ -39,6 +39,9 @@ We have seen how a viewfinder can be used for theory \emph{discovery}. But with
\end{itemize}
%%% Local Variables:
%%% mode: latex
%%% eval: (visual-line-mode) (set-fill-coloumn 5000)
%%% eval: (visual-line-mode) (set-fill-column 5000)
%%% TeX-master: "paper"
%%% End:
% LocalWords: emph specialized generalization transfomations normalization prioritize
% LocalWords: sec:usecase
......@@ -8,7 +8,7 @@ classification}.
The basic use case is the following: Jane, a mathematician, becomes interested in a class
of mathematical objects, say -- as a didactic example -- something she initially calls
``beautiful subsets'' of a base set $\cB$ (or just ``beautiful over $\cB$''). These have
the following properties:
the following properties $Q$:
\begin{compactenum}
\item the empty set is beautiful over $\cB$
\item every subset of a beautiful set is beautiful over $\cB$
......@@ -16,52 +16,50 @@ the following properties:
there is an $x\in A\backslash B$, such that $B\cup\{x\}$ is beautiful over $\cB$.
\end{compactenum}
To see what is known about beautiful sets, she types these three conditions into a theory
classifier, which interprets them as a \MMT theory $Q$, computes all (total) views from
$Q$ into a library $\cL$, and returns presentations of target theories and the assignments
made by the views. In our example we have the situation in
Figure~\ref{fig:theory-classification-ex}. In the base use case, Jane learns that her
``beautiful sets'' correspond to the well-known structure of
matroids~\cite{wikipedia:matroid}, so she can directly apply matroid theory to her
problems.
classifier, which computes any theories in a library $\cL$ that match these (after a
suitable renaming). In our case, Jane learns that her ``beautiful sets'' correspond to
the well-known structure of matroids~\cite{wikipedia:matroid}, so she can directly apply
matroid theory to her problems.
\begin{newpart}{MK: this could/should be extended}
In extended use cases, we could
\begin{compactitem}
\item use partial views to find theories that share significant structure with $Q$, so
that we can formalize $Q$ with modularly with minimal effort. Say Jane was interested in
``dazzling subsets'', i.e. beautiful subsets that obey a fourth condition, then she
could just contribute a theory that extends \textsf{matroid} by a formalization of the
fourth condition -- and maybe rethink the name.
\item use existing views in $\cL$ (compositions of views are vivews) to give Jane more
information about her beautiful subsets, e.g. that matroids (and thus beautiful sets)
form a generalization of the notion of linear independence from linear algebra.
\end{compactitem}
\end{newpart}
In extended use cases, a theory classifier find theories that share significant structure
with $Q$, so that we can formalize $Q$ with modularly with minimal effort. Say Jane was
interested in ``dazzling subsets'', i.e. beautiful subsets that obey a fourth condition,
then she could just contribute a theory that extends \textsf{matroid} by a formalization
of the fourth condition -- and maybe rethink the name.
% \item use existing views in $\cL$ (compositions of views are vivews) to give Jane more
% information about her beautiful subsets, e.g. that matroids (and thus beautiful sets)
% form a generalization of the notion of linear independence from linear algebra.
% \end{compactitem}
%
% \begin{figure}[ht]\centering\lstset{aboveskip=0pt,belowskip=0pt}
% \begin{tabular}{|p{5cm}|p{5cm}|}\hline
% $Q$ & Result \\\hline
% \begin{lstlisting}[mathescape]
% theory query : F?MitM
% include ?set
% Base : {A} set A
% beautiful {A} set A $\rightarrow$ prop
% empty : $\vdash$ beautiful $\emptyset$
% subset-closed: $\vdash$ {}
% \end{lstlisting}
% &
% \begin{lstlisting}[mathescape]
% theory matroid : F?MitM
% include ?baseset
% independent {A} set A $\rightarrow$
% empty: $\vdash$
% hereditary : $\vdash$
% augmentation: $\vdash$
% \end{lstlisting}
% \\\hline
% \end{tabular}
% \caption{Theory Classification for beautiful sets}\label{fig:theory-classification-ex}
% \end{figure}
\begin{figure}[ht]\centering\lstset{aboveskip=0pt,belowskip=0pt}
\begin{tabular}{|p{5cm}|p{5cm}|}\hline
$Q$ & Result \\\hline
\begin{lstlisting}[mathescape]
theory query : F?MitM
include ?set
Base : {A} set A
beautiful {A} set A $\rightarrow$ prop
empty : $\vdash$ beautiful $\emptyset$
subset-closed: $\vdash$ {}
\end{lstlisting}
&
\begin{lstlisting}[mathescape]
theory matroid : F?MitM
include ?baseset
independent {A} set A $\rightarrow$
empty: $\vdash$
hereditary : $\vdash$
augmentation: $\vdash$
\end{lstlisting}
\\\hline
\end{tabular}
\caption{Theory Classification for beautiful sets}\label{fig:theory-classification-ex}
\end{figure}
We reduce the theory classification problem to that of finding theory morphisms (views)
between \MMT theories. $\cL$: given a query theory $Q$, the viewfinder computes all
(total) views from $Q$ into a library $\cL$, and returns presentations of target theories
and the assignments made by the views.
\paragraph{Related Work}
Existing systems have so far only worked with explicitly given theory morphisms, e.g., in IMPS \cite{imps} or Isabelle \cite{isabelle}.
......@@ -76,16 +74,17 @@ Building on these developments, we are now able, for the first time, to apply ge
While inspired by the ideas of \cite{NorKoh:efnrsmk07}, our design and implementation are completely novel.
In particular, the theory makes use of the rigorous language-independent definitions of \emph{theory} and \emph{theory morphism} provided by MMT, and the practical implementation makes use of the MMT system, which provides high-level APIs for these concepts.
\cite{cezary+thibault paper} applies techniques related to ours to a related problem.
\cite{cezary+thibault-paper} applies techniques related to ours to a related problem.
Instead, of theory morphisms inside a single corpus, they use machine learning to find similar constants in two different corpora.
Their results can roughly be seen as a single partial morphism from one corpus to the other.
\paragraph{Approach and Contribution}
Our contribution is twofold.
Firstly, we present a the design and implementation of a generic theory morphism finder that works with arbitrary corpora represented in MMT.
The algorithm tries to match two symbols by unifying their types.
This is made efficient by separating the term into a hashed representation of its abstract syntax tree (which serves as a fast plausibility check for pre-selecting matching candidates) and the list of symbol occurrences in the term, into which the algorithm recurses.
Our contribution is twofold. Firstly, we present a the design and implementation of a
generic theory morphism finder that works with arbitrary corpora represented in MMT. The
algorithm tries to match two symbols by unifying their types. This is made efficient by
separating the term into a hashed representation of its abstract syntax tree (which serves
as a fast plausibility check for pre-selecting matching candidates) and the list of symbol
occurrences in the term, into which the algorithm recurses.
Secondly, we apply this view finder in two concrete case studies. \ednote{add 1-2 sentences for each case study}
......@@ -94,6 +93,11 @@ In Section~\ref{sec:prelim}, we revise the basics of MMT and the representations
%%% Local Variables:
%%% mode: latex
%%% eval: (visual-line-mode) (set-fill-coloumn 5000)
%%% eval: (visual-line-mode) (set-fill-column 5000)
%%% TeX-master: "paper"
%%% End:
% LocalWords: defemph compactenum textsf formalization vivews generalization centering
% LocalWords: compactitem lstset aboveskip 0pt,belowskip hline lstlisting mathescape
% LocalWords: rightarrow vdash emptyset baseset NorKoh:efnrsmk07 formalized omdoc emph
% LocalWords: standardized emph thibault-paper ednote
......@@ -122,7 +122,7 @@ Open Archive for Formalizations (KO 2428/13-1).
%%% Local Variables:
%%% mode: latex
%%% eval: (visual-line-mode) (set-fill-coloumn 5000)
%%% eval: (visual-line-mode) (set-fill-column 5000)
%%% TeX-master: t
%%% End:
......@@ -133,4 +133,4 @@ Open Archive for Formalizations (KO 2428/13-1).
% LocalWords: tableofcontents defemph compactenum emptyset newpart compactitem lstset
% LocalWords: centering aboveskip 0pt,belowskip hline lstlisting mathescape rightarrow
% LocalWords: vdash baseset sec:concl subsubsection Formalizations emph sec:usecase
% LocalWords: usecase
% LocalWords: usecase Unrealized
......@@ -192,6 +192,11 @@ The resulting formalizations are then used as meta-theory for imports of the lib
%%% Local Variables:
%%% mode: latex
%%% eval: (visual-line-mode) (set-fill-coloumn 5000)
%%% eval: (visual-line-mode) (set-fill-column 5000)
%%% TeX-master: "paper"
%%% End:
% LocalWords: fig:mmt textbf textbf mapsto vdash_ emph hline ctabular ldots,x_m:t_m vec
% LocalWords: ldots,a_n compactitem cdot vec ednote ex:lamsyn urtheory mathcal f,t
% LocalWords: x:A;t oldpart fig:mmtgrammar centering vspace mdframed multirow ldots,x_n
% LocalWords: OAFproject:on vdash formalized formalizations tikzpicture ldots Arith
......@@ -22,4 +22,5 @@ Right-clicking anywhere within the theory allows Jane to select \cn{MMT} $\to$ \
\fbox{\includegraphics[width=\textwidth]{pvs}}
\caption{Searching for Commutative Operators in PVS}\label{fig:use:pvs}
\end{figure}
\ednote{8 results for NASA, but NASA doesn't work in jEdit because of limited memory}
\ No newline at end of file
\ednote{8 results for NASA, but NASA doesn't work in jEdit because of limited memory}
% LocalWords: ednote centering fbox includegraphics textwidth beautysource smglom pvs
......@@ -106,7 +106,7 @@ We can pick any two constant declarations $c$ and $c'$ and build a morphism by s
\subsection{Improvements and Parameters of the Algorithm}
\paragraph{Extending the abstract syntax tree} By the very nature of the approach described in Section \ref{sec:oaf}, many symbols will be common to domain and codomain of a given viewfinding problem: Since all of the libraries available are ultimately based on (extensions of) LF, we never want to reassign the symbols provided by the LF theory. Furthermore, if we want to find views between theories that share a common meta-theory (e.g. theories within the same library), we most certainly do not want to reassign the symbols in that meta-theory.
\paragraph{Extending the abstract syntax tree} By the very nature of the approach described in Section \ref{sec:oaf}, many symbols will be common to domain and codomain of a given view-finding problem: Since all of the libraries available are ultimately based on (extensions of) LF, we never want to reassign the symbols provided by the LF theory. Furthermore, if we want to find views between theories that share a common meta-theory (e.g. theories within the same library), we most certainly do not want to reassign the symbols in that meta-theory.
In these situations, we can consider the \emph{common} symbols as a fixed part of the abstract syntax tree of a constant. The symbols will hence be encoded in the component $\cn{h}(C)$ instead of the list $\cn{syms}(C)$. This will not only exclude spurious matches, but also reduce the number of plausible matches and consequently speed up the algorithm.
......@@ -119,26 +119,26 @@ In these situations, we can consider the \emph{common} symbols as a fixed part o
\paragraph{Picking starting theories:} If we try to find views between whole collections of theories, we can obviously disregard all theories that are included in some other theory in our collections, since we need to work with the flat theories anyway. Consequently, by only using maximal theories we do not find any fewer views but speed up the search significantly.
\paragraph{Preprocessing and Translations}
Especially when looking for views between theories in different libraries (and built on different foundations), various differences in available foundational primitives and library-specific best practices and idiosyncracies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:
Especially when looking for views between theories in different libraries (and built on different foundations), various differences in available foundational primitives and library-specific best practices and idiosyncrasies can prevent the algorithm from finding desired matches. There are two approaches to increasing the number of results in these cases:
\begin{itemize}
\item We can preprocess theories, by e.g. normalizing expressions, eliminating higher-order abstract syntax encodings or encoding-related redundant information (such as the type of a typed equality, which in the presence of subtyping can be different from the types of both sides of an equation), or elaborating abbreviations/definitions.
\item In many instances, the translations between two foundations is too complex to be discovered purely syntactically. In these cases we can provide arbitrary translations between theories, which are applied before computing the encoding.\ednote{Mention/cite alignment-translation paper}
\end{itemize}
When elaborating definitions, it is important to consider that this may also reduce the number of results, if both theories use similar abbreviations for complex terms, or the same concept is declared axiomatically in one theory, but definitionally in the other. For that reason, we can allow several encodings for the same constant during preprocessing, such as one with definitions expanded and one ``as is''.
Similarly, certain idiosyncracies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for ``holes'' in the lists $\cn{syms}(C)$, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.
Similarly, certain idiosyncrasies -- such as PVS's common usage of theory parameters -- call for not just matching symbol references, but also variables or possibly even complex expressions. To handle these situations, we additionally allow for ``holes'' in the lists $\cn{syms}(C)$, which may be unified with any other symbol or hole, but are not recursed into. The subterms that are to be considered holes can be marked as such during preprocessing.
\paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $\cn{h}(C)$ of an encoding, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories incolved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the viewfinding process itself rather efficent.
\paragraph{Storing Encodings} Finally, for computing the encodings of a theory we only need to know the symbols to be fixed in the component $\cn{h}(C)$ of an encoding, for which only the meta-theories of the theories are relevant. They also determine the specific preprocessings and translations we want to likely use. Since there is only a small number of meta-theories involved that are relevant in practice, we can store and retrieve the encodings for the most important situations. Since computing the encodings (as well as sorting the theories in a library by their dependencies) is the most expensive part of the algorithm, this -- once computed and stored -- makes the view-finding process itself rather efficient.
\subsection{Preprocessing and Normalization}
The common logical framework used for all the libraries at our disposal makes it easy to systematically normalize theories built on various logical foundations. We currently use the following approaches to preprocessing theories:
\begin{itemize}
\item Free variables in a term, often occurences of theory parameters as e.g. used extensively in the PVS system, are replaced by holes.
\item Free variables in a term, often occurrences of theory parameters as e.g. used extensively in the PVS system, are replaced by holes.
\item For foundations that use product types, we curry function types $(A_1 \times\ldots A_n)\to B$ to $A_1 \to \ldots \to A_n\to B$. We treat lambda-expressions and applications accordingly.
\item Higher-order abstract syntax encodings are eliminated by raising atomic types, function types, applications and lambdas to the level of the logical framework. This eliminates (redundant) implicit arguments that only occur due to their formalization in the logical framework.
This has the advantage that possible differences between the types of the relevant subterms and implicit type arguments (e.g. in the presence of subtyping) do not negatively affect viewfinding.
\item We use the curry-howard correspondence to transform axioms and theorems of the form $\vdash (P\Rightarrow Q)$ to function types $\vdash P \to \vdash Q$. Analogously, we transform judgments of the form $\vdash \forall x : A.\;P$ to $\prod_{x:A}\vdash P$.
This has the advantage that possible differences between the types of the relevant subterms and implicit type arguments (e.g. in the presence of subtyping) do not negatively affect view-finding.
\item We use the Curry-Howard correspondence to transform axioms and theorems of the form $\vdash (P\Rightarrow Q)$ to function types $\vdash P \to \vdash Q$. Analogously, we transform judgments of the form $\vdash \forall x : A.\;P$ to $\prod_{x:A}\vdash P$.
\item For classical logics, we afterwards rewrite all logical connectives using their usual definitions using negation and conjunction only. Double negations are eliminated.
\item Typed Equalities are transformed to untyped ones; again getting rid of the redundant type argument of the equality.
\item The arguments of conjunctions and equalities are reordered (currently only by their number of subterms).
......@@ -146,6 +146,11 @@ The common logical framework used for all the libraries at our disposal makes it
%%% Local Variables:
%%% mode: latex
%%% eval: (visual-line-mode) (set-fill-coloumn 5000)
%%% eval: (visual-line-mode) (set-fill-column 5000)
%%% TeX-master: "paper"
%%% End:
% LocalWords: defemph defemph emph vdash_ normalization optimized compactenum textbf th
% LocalWords: t,s ednote renormalize oldpart syms mapsto mapsto ldots,l compactitem
% LocalWords: normalizing subtyping subterms preprocessings ldots ldots formalization
% LocalWords: vdash Rightarrow vdash vdash vdash forall vdash
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment