Skip to content
Snippets Groups Projects
Commit a5b4ec8c authored by Michael Kohlhase's avatar Michael Kohlhase
Browse files

copying over more papers from SVN

parent b24487bb
No related branches found
No related tags found
No related merge requests found
\section{Conclusion and Further Work}\label{sec:conclusion}
We have demonstrated that a Markup Language for the {\emph{content}} of physics can be
designed by extending the content and context markup format {\omdoc} with a
representational infrastructure for the principal objects of physics: observables,
systems, and experiments. The resulting language {\physml} is able to catch the logical
and operational structure specific to physics, differentiating this field from others. The
extension presented in this paper is part of the ongoing enterprise to extend the {\omdoc}
format to the {\bf{STEM}} fields ({\underline{S}}ciences, {\underline{T}}echnology,
{\underline{E}}ngineering and {\underline{M}}athematics).
The next step is now to evaluate the language by marking up a larger body of knowledge in
physics in {\physml}. We have started work on the technically ubiquitous and basic field
of thermostatics. This should give us a clear indication whether {\physml} is adequate for
all of physics, or pinpoint the necessary changes to the language design. An
international collaboration on the further development of {\physml} is looked for,
including experts from theoretical and applied physics and related fields, in particular
mathematics and chemistry.
New and powerful services can be implemented once the scientific content can be
semantically encoded, retrieved, and reused digitally. In physics, these include the
search for other experiments on the same observables, dimension and algebraic checking of
mathematical equations, mapping to other mathematical representations of the same
theoretical physical expression, etc.
Using the approach of analyzing the operational and logical practices of a scientific
discipline field, and map this to field-specific modules extending the semantic markup
language {\omdoc} will allow to spread semantic content markup to other scientific fields.
With authors to increasingly make use of markup languages, and retrieval engines
following suit to offer intelligent search algorithms making use of the known markup
languages, users will gain effective tools to increase the reachout of their scientific work,
having the {\emph{content}}, not just the text, of the work of others at their fingertips.
%%% Local Variables:
%%% mode: LaTeX
%%% TeX-master: "mkm06"
%%% End:
% LocalWords: mkm
\section{Introduction}\label{sec:intro}
The distributivity of information and services over the Internet has changed all aspects
of life, and science is not an exception. We anticipate that the systems currently
investigated in the community will eventually change scientific practice and that they
will have a strong societal impact, provided that they can inter-operate to cover the
whole work-flow of scientific research, education and application.
\begin{wrapfigure}{r}{6.5cm}\vspace{-.6cm}
\includegraphics[width=6.5cm]{sci-method}\vspace{-.3cm}
\caption{The Scientific Method}\label{fig:nw-Methode}\vspace{-.6cm}
\end{wrapfigure}
To further this vision we need to develop, implement, and provide semantic-based and
context-aware techniques for acquiring, organizing, processing, sharing and using
knowledge in science.
Our starting point is the view of the {\emph{scientific method}} as a spiral (see
Fig.~\ref{fig:nw-Methode}), where we have our focus on physics here. In this view,
scientific research in physics moves in a spiral trajectory from original ideas to results
and even applications. Ideas pass through the processes of observation of natural
processes, then of concept formulation to describe these. These allow scientists to
express initial theories about (quantitative laws of nature governing) them, which are
then explored (what are the consequences of the model assumptions) leading to predictions
about processes that can be verified or falsified (to a certain degree) experimentally.
These experiments usually lead to new observations, starting the next round in the spiral
until a quantitative (mathematically formulated) \textit{theory} predicting exclusively
correct results from experiments is formulated. Observables in physics have to be
suitably found such that they can be physically measured, their algebraic counterparts
being then candidates for building stones of a theory. The semantics of mathematics as
such is more confined, searching for logically correct sets of rules.
At the moment, most of the steps in Fig.~\ref{fig:nw-Methode} are separately supported by
software systems, e.g. literature searches in {\googlescholar} or {\wikipedia}, theory
exploration in computer algebra systems like {\mathematica}, and experiments in simulation
systems. But the systems are, by and large, not able to inter-operate since they use
differing data formats, make differing model assumptions, and are bound to an implicitly
given context that is only documented in publications about the systems. For instance,
copy-and-paste from {\googlescholar} or {\wikipedia} to {\mathematica} or a simulation
system is impossible because of this format problem. Moreover, where possible, copy and
paste can be very dangerous, since computer algebra systems make differing assumptions on
the Computercode-libraries,
the simulation systems are based on\footnote{A simple example, where the lack of
explicit context led to a very expensive failure was the September 1999 loss of a \$125
million Mars orbiter, which crashed on Mars. The cause was that NASA used for its
specifications metric units, but the Lockheed Martin engineers misinterpreted the data
assuming they were given using Imperial units of measurement.}.
We are set here to arrive at a content markup format for physics. Early concept
discussions and visions~\cite{Hilf:texdocc,PML:web,Hilf:guestrow,Hilf:p05} have not led to
a realization in terms of an encoding, since the problem was attacked from the ground up.
In this paper we will build the bridge from vision to a usable markup language by
extending the {\omdoc} ({\underline{O}pen} {\underline{M}athematical}
{\underline{Doc}uments}) format~\cite{Kohlhase:omdoc1.2} by an infrastructure for
(physical) systems, observables and experiments and call this new module and the extended
system {\physml} (Physics Markup Language). Since we can now share all the infrastructure
--- in particular the theory and statement levels --- with mathematics, the language
design for {\physml} becomes feasible.
%%% Local Variables:
%%% mode: stex
%%% TeX-master: "mkm06"
%%% End:
% LocalWords: cience ech logy ngineering athematical uments ciences echnology
% LocalWords: athematics stex mkm
Title: Capturing the Content of Physics
Abstract:
Today's scientific documents are {\emph{machine-readable}}, therefore we can publish them
on the web, send them in e-mails, and search for words in them via Google. However, we
cannot search for a relevant experiment, check dimensions in equations, or change units or
coordinate systems in an exposition. For this we would have to make the documents also
{\emph{machine-understandable}} by capturing the content of the embedded knowledge.
To facilitate this, we propose to realize a content markup language PhysML by extending
the OMDoc format (Open Mathematical Documents) was initially developed as a content-markup
format for mathematical documents by an infrastructure for physics, concentrating on
{\emph{observables}}, {\emph{systems}}, and {\emph{experiments}}. The semantic information
embedded in OMDoc documents has for instance been used by eLearning systems to automate
user-adaption of course materials or for semantic search for mathematical formulae. OMDoc
marks up knowledge on three levels:
\begin{description}
\item[Object Level] it uses OpenMath and content MathML for objects represented as
mathematical formulae;
\item[Statement Level] OMDoc provides original markup primitives that allow to specify the
semantical structure and interdependencies of theorems, axioms, definitions, proofs, and
\item[Context-Level] statements are grouped into mathematical theories, whose structure
can be expressed by a rich set of theory morphisms.
\end{description}
Our extension only changes the statement level; the object and context levels stay the
same: they model the general ``scientific method''. Thus the extended three-level approach
to knowledge representation can be used as an open basis for true eScience.
% LocalWords: Google PhysML
This diff is collapsed.
This diff is collapsed.
File added
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment