Commit d461f401 authored by Ulrich's avatar Ulrich

evaluation as csv

parent b3d5e764
Document,astro-ph0001117,astro-ph0008271,astro-ph0011211,astro-ph0012434,astro-ph0102308,cond-mat0001199,cond-mat0002133,cond-mat0003070,Cond-mat0007243,cond-mat0009225,cond-mat0011175,cond-mat0102209,hep-ex0006009,hep-lat0007038,hep-lat0008012,hep-ph0010254,hep-th0011187,math0101174,physics0007034,physics0011001
correct Qes (TP),51,9,47,5,33,19,3,0,2,5,1,8,33,19,11,11,0,0,5,5
missed Qes (FN),7,5,6,0,3,0,0,0,0,0,0,0,0,2,1,4,0,0,0,0
false Qes (FP),3,3,2,1,5,1,0,5,1,1,2,0,0,31,2,0,1,3,0,2
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
Total Documents ,=COUNTA(B1:Z1),,,,,,,,,,,,,,,,,,,
Total TP,=SUM(B2:Z2),,,,,,,,,,,,,,,,,,,
Total FN,=SUM(B3:Z3),,,,,,,,,,,,,,,,,,,
Total FP,=SUM(B4:Z4),,,,,,,,,,,,,,,,,,,
Total Qes,=B10+B11,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,
Precision,=B10/(B10+B12),,,,,,,,,,,,,,,,,,,
Recall,=B10/(B10 + B11),,,,,,,,,,,,,,,,,,,
F-Score,=2*B15*B16/(B15 + B16),,,,,,,,,,,,,,,,,,,
Number of documents: 20
Total:
QEs detected correctly: 267
missed QEs: 28
false QEs: 63
astro-ph0001117:
QEs detected correctly: 51
missed QEs: 7
false QEs: 3
astro-ph0008271:
QEs detected correctly: 9
missed QEs: 5
false QEs: 3
astro-ph0011211:
QEs detected correctly: 47
missed QEs: 6
false QEs: 2
astro-ph0012434:
QEs detected correctly: 5
missed QEs: 0
false QEs: 1
astro-ph0102308:
QEs detected correctly: 33
missed QEs: 3
false QEs: 5
cond-mat0001199:
QEs detected correctly: 19
missed QEs: 0
false QEs: 1
cond-mat0002133:
QEs detected correctly: 3
missed QEs: 0
false QEs: 0
cond-mat0003070:
QEs detected correctly: 0
missed QEs: 0
false QEs: 5
cond-mat0007243:
QEs detected correctly: 2
missed QEs: 0
false QEs: 1
cond-mat0009225:
QEs detected correctly: 5
missed QEs: 0
false QEs: 1
cond-mat0011175:
QEs detected correctly: 1
missed QEs: 0
false QEs: 2
cond-mat0102209:
QEs detected correctly: 8
missed QEs: 0
false QEs: 0
hep-ex0006009:
QEs detected correctly: 33
missed QEs: 0
false QEs: 0
hep-lat0007038:
QEs detected correctly: 19
missed QEs: 2
false QEs: 31
hep-lat0008012:
QEs detected correctly: 11
missed QEs: 1
false QEs: 2
hep-ph0010254:
QEs detected correctly: 11
missed QEs: 4
false QEs: 0
hep-th0011187:
QEs detected correctly: 0
missed QEs: 0
false QEs: 1
math0101174:
QEs detected correctly: 0
missed QEs: 0
false QEs: 3
physics0007034:
QEs detected correctly: 5
missed QEs: 0
false QEs: 0
physics0011001:
QEs detected correctly: 5
missed QEs: 0
false QEs: 2
......@@ -547,18 +547,26 @@ at the end:
\ednote{Discuss with Michael}
\subsubsection{Quantitative Evaluation}
\ednote{do this}
\begin{itemize}
\item how many documents were handled successful?
\item annotate samples in KAT and calculate F-Score
\end{itemize}
The implementation was tested on a set of about 35000 documents and could successfully process nearly all documents.
Fatal errors occurred only in about 150 documents. The overall runtime was about 80 hours on nine 2.00 GHz cores of a Intel Xeon E5-2650
processor. This is equivalent to a runtime of about 70 seconds per document on a single core. It includes preprocessing as well as spotting
and scoring of quantity expressions. Note that, the scoring task involves the execution of Schäfer's declaration spotter which adds
a significant amount of runtime.
The author evaluated the quality of the implementation by manually validating 20 randomly selected documents.
They include 295 quantity expressions of which 267 were successfully recognized (true positives).
We regard the detection of a quantity expression as correct when its highest scored possible meaning reflects the correct
meaning of the expression. 28 quantity expressions were not detected (false negatives) and 63 times expressions were
marked as quantity expressions although they are not (false positives).
In this setup, true negatives are equivalent to non-quantity expressions which were successfully not detected.
However, there is no meaningful quantification of these expressions in this case.
The evaluation results in a precision of $267 \,/\, (267 + 63) \approx 81\%$ and in a recall of
$267 \, / \, (267 + 28) \approx 91\%$. This gives us an F-score of about 85\%.
\subsubsection{Qualitative Evaluation}
\ednote{Describe some errors and their causes here}
\begin{itemize}
\item problems in bigger formuals (astro-ph9211002), due to me and due to MathML
\item
\item abkürzungen (MC)
\end{itemize}
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment