Discussion:
[sword-devel] [jsword-devel] Method to find if BibleBook is contained in a Book
DM Smith
2014-03-26 14:49:32 UTC
Permalink
John,

Putting this up on sword-devel, since that is a more appropriate location for the discussion to continue. This is really not about JSword, but rather about module making.

The nature of osis2mod is to retain all markup except <verse> and </verse> (or their equivalent milestoned version.) This means that the markup for a chapter is put in the module's storage for that chapter and noted in the index. In the case of the chapter that is given below, it is split into 2 parts, Verse 0 and Verse 1.
Verse 0 will get the preamble of the chapter:
<chapter osisID="EpJer.1">
Verse 1 will get:
</chapter>
(These will have been transformed into their milestoned versions.)

Also, verse 2 to 72 will be "linked" to verse 1, meaning that in the index they are given the same location as verse 1.

So, verse 0 has chapter start content and verse 1 to 72 have chapter end content.

Also, osis2mod does not complain if a verse is missing. Never has, never will. It does "complain" of a verse being present that is not in the versification. Always has, always will.

That emptyvss indicates that all verses are present means exactly that: All verses are present. This is not good if the module is in fact incomplete.

That JSword indicates that these "empty" verses are present means that they have non-zero length in the module.

JSword is graceful in handling this. It determines that the module has content for the verse by examining the index. What Martin is trying to do is find out which books, chapters and verses should be displayed to users in pick lists. The only way this can be done at this time, by either SWORD or JSword with the module in question, is to render each verse and determine that it renders nothing. This is far too expensive an operation to consider.

The only way to efficiently determine scope is to examine the index for each verse and see if the length is 0. The Scope entry in the conf has been ruled out. It would have been computed using the reverse logic of emptyvss. Go through the v11n from first verse to last and rather than noting what is missing, note what is present.

Today, most of our frontends display pick lists based on the v11n not on the module content. It has long been confusing to end users of modules that don't contain verses in the v11n.

In my view, this is a module problem. It is far easier and faster to rebuild and redistribute a module. We can tell a user to upgrade to the most recent version of a module far easier than making and releasing a code change and having them get a new version of the program. When the change is a work-around for something that shouldn't be in module, I think we should avoid that. For example, the NET Bible has some bugs that should be fixed. But instead we have some special code that is essentially: if module is NET then fix such-and-so when it occurs.

Together in His Service,
DM Smith
There has been a lot of discussion about how missing material in a v11n should be treated (the discussion of the meaning and use of Scope was part of that). Tools such as osis2mod generated warnings whenever OSIS files lacked any part of the chosen v11n. The Scope conf param was, for a time at least, the recommended method of describing what part of a v11n was covered by a module. For these reasons, many existing modules (IBT alone has at least 26 such modules) are currently encoded so as to encompass the entire v11n, returning empty-string verse content for all verses in the v11n that are not included in the module, and using the .conf Scope param to define exactly what is present in the module.
So even though current module making best practice may be different, it would be good for JSword to be graceful with modules that are encoded somewhat differently if at all possible, at least for a time. There are many modules out there, old and new, which don't contain the complete v11n, so determining book coverage is important.
-John
Those verses exist since they are defined in the OSIS input file to
osis2mod. Osis2mod retains everything in its input. This is a well
documented behavior of osis2mod.
The end chapter markup will be put in the last verse that is in the
chapter, which might be verse 0.
They should use xslt to strip empty verses, chapters and books out of
their file into an intermediate file and give that as input to osis2mod.
Alternatively they can use <!-- ... --> to comment out huge swaths of
the input file.
-- DM
On Mar 25, 2014, at 7:48 AM, Martin Denham <mjdenham at gmail.com
IBT have just passed me more information regarding their handling of
empty verses to help clarify if this is an IBT module issue or not.
Here are examples of how IBT's OSIS source defines empty verses in
<div type="x-Synodal-non-canonical"__><div type="book"
osisID="EpJer"><chapter osisID="EpJer.1"><verse sID="EpJer.1.1-72"
osisID="EpJer.1.1 EpJer.1.2 EpJer.1.3 EpJer.1.4 EpJer.1.5
EpJer.1.6 EpJer.1.7 EpJer.1.8 EpJer.1.9 EpJer.1.10 EpJer.1.11
EpJer.1.12 EpJer.1.13 EpJer.1.14 EpJer.1.15 EpJer.1.16 EpJer.1.17
EpJer.1.18 EpJer.1.19 EpJer.1.20 EpJer.1.21 EpJer.1.22 EpJer.1.23
EpJer.1.24 EpJer.1.25 EpJer.1.26 EpJer.1.27 EpJer.1.28 EpJer.1.29
EpJer.1.30 EpJer.1.31 EpJer.1.32 EpJer.1.33 EpJer.1.34 EpJer.1.35
EpJer.1.36 EpJer.1.37 EpJer.1.38 EpJer.1.39 EpJer.1.40 EpJer.1.41
EpJer.1.42 EpJer.1.43 EpJer.1.44 EpJer.1.45 EpJer.1.46 EpJer.1.47
EpJer.1.48 EpJer.1.49 EpJer.1.50 EpJer.1.51 EpJer.1.52 EpJer.1.53
EpJer.1.54 EpJer.1.55 EpJer.1.56 EpJer.1.57 EpJer.1.58 EpJer.1.59
EpJer.1.60 EpJer.1.61 EpJer.1.62 EpJer.1.63 EpJer.1.64 EpJer.1.65
EpJer.1.66 EpJer.1.67 EpJer.1.68 EpJer.1.69 EpJer.1.70 EpJer.1.71
EpJer.1.72"/><verse eID="EpJer.1.1-72"/></chapter>__</div></div>
I'm not sure how osis2mod handles all this when importing to the
module, but it works perfectly without warnings or errors. Also,
when the resulting module is passed to the "emptyvss" tool, it
passes this test without warnings.
On 25 March 2014 11:38, Martin Denham <mjdenham at gmail.com
I am having problems getting a list of BibleBooks contained in
some AV modules which we know do not contain certain books. I
can't work out if the problem is with JSword, the modules, or
osis2mod.
1. book.contains(nonExistingVerse) returns TRUE
2. book.getRawText(nonExistingVerse) returns <chapter end tag>
Here is a simple test to show the problem using KAZ which has
SwordBook kaz = (SwordBook)Books.installed().getBook("KAZ");
Verse esd11Verse = new Verse(kaz.getVersification(),
BibleBook.ESD1, 1, 1);
System.out.println(kaz.contains(esd11Verse));// prints: *true*
*<chapter eID="gen7" osisID="1Esd.1"/>*
Verse esd12Verse = new Verse(kaz.getVersification(),
BibleBook.ESD1, 1, 2);
System.out.println(kaz.contains(esd12Verse));// prints: *true*
*<chapter eID="gen7" osisID="1Esd.1"/>*
So how does "<chapter eID="gen7" osisID="1Esd.1"/>" get into verse
content unexpectedly?
1. a module problem; but IBT say they do not add empty verse slots
2. Sword osis2mod issue
3. JSword issue: why is JSword returning a chapter end tag
instead of verse content
Any ideas what might cause this problem?
Thanks
Martin
On 11 March 2014 12:15, DM Smith <dmsmith at crosswire.org
We haven't pushed this down into JSword. So far it is the
responsibility of the front-end. Chris B has made it efficient
to ask a Book whether it contains a Verse.
Essentially, when it comes to asking a module if it has
meaningful content, you want containsAny(Key verses, boolean
includeIntros) and containsAny(Key verses) { return
containsAny(verses, false); }
I think it should ignore verse 0 by default. If it doesn't
have verse content, then does the content really mean something?
As you have noted contains(Key) is confusing. There are a few
places where it means containsAny. Usually it means
containAll. The name, contains, was chosen early as we derived
from a container class where the argument was an element of
the container. That is, contains is supposed to mean
isMemberOf. Later we changed the inheritance as it wasn't an
"is a" relationship.
But we need to be careful of not introducing more confusion.
By the way, the list serve was holding mail for a few days.
In Him,
DM
On Mar 8, 2014, at 5:26 PM, Martin Denham <mjdenham at gmail.com
Is there an efficient way to find if a BibleBook is
contained in a Book (Bible or commentary) using JSword?
I recall this subject being discussed but can't recall the
outcome.
Thanks
Martin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20140326/c700dcec/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4145 bytes
Desc: not available
URL: <http://www.crosswire.org/pipermail/sword-devel/attachments/20140326/c700dcec/attachment-0001.p7s>
John Austin
2014-03-27 10:03:03 UTC
Permalink
DM and Martin,
I've been wanting to re-build IBT's modules for a while. But there are
several dozen to rebuild. Because of the amount of effort involved, I
want to do it in a way which will work and will be well supported for
years (hopefully) to come. I've looked at this and am encouraged that it
might be doable now. IBT's new modules will require SWORD 1.7+ (because
most will be SynodalProt versified for starters). I believe most front
ends which are supporting IBT's repo are already using 1.7, so it
shouldn't bother many users I hope.

To this end I want to start another sword-devel thread to test the
waters before deciding to jump into re-builds.
-John
Post by DM Smith
John,
Putting this up on sword-devel, since that is a more appropriate
location for the discussion to continue. This is really not about
JSword, but rather about module making.
The nature of osis2mod is to retain all markup except <verse> and
</verse> (or their equivalent milestoned version.) This means that the
markup for a chapter is put in the module's storage for that chapter and
noted in the index. In the case of the chapter that is given below, it
is split into 2 parts, Verse 0 and Verse 1.
<chapter osisID="EpJer.1">
</chapter>
(These will have been transformed into their milestoned versions.)
Also, verse 2 to 72 will be "linked" to verse 1, meaning that in the
index they are given the same location as verse 1.
So, verse 0 has chapter start content and verse 1 to 72 have chapter end content.
Also, osis2mod does not complain if a verse is missing. Never has, never
will. It does "complain" of a verse being present that is not in the
versification. Always has, always will.
All verses are present. This is not good if the module is in fact
incomplete.
That JSword indicates that these "empty" verses are present means that
they have non-zero length in the module.
JSword is graceful in handling this. It determines that the module has
content for the verse by examining the index. What Martin is trying to
do is find out which books, chapters and verses should be displayed to
users in pick lists. The only way this can be done at this time, by
either SWORD or JSword with the module in question, is to render each
verse and determine that it renders nothing. This is far too expensive
an operation to consider.
The only way to efficiently determine scope is to examine the index for
each verse and see if the length is 0. The Scope entry in the conf has
been ruled out. It would have been computed using the reverse logic of
emptyvss. Go through the v11n from first verse to last and rather than
noting what is missing, note what is present.
Today, most of our frontends display pick lists based on the v11n not on
the module content. It has long been confusing to end users of modules
that don't contain verses in the v11n.
In my view, this is a module problem. It is far easier and faster to
rebuild and redistribute a module. We can tell a user to upgrade to the
most recent version of a module far easier than making and releasing a
code change and having them get a new version of the program. When the
change is a work-around for something that shouldn't be in module, I
think we should avoid that. For example, the NET Bible has some bugs
that should be fixed. But instead we have some special code that is
essentially: if module is NET then fix such-and-so when it occurs.
Together in His Service,
DM Smith
On Mar 25, 2014, at 11:43 PM, John Austin <gpl.programs.info at gmail.com
There has been a lot of discussion about how missing material in a
v11n should be treated (the discussion of the meaning and use of Scope
was part of that). Tools such as osis2mod generated warnings whenever
OSIS files lacked any part of the chosen v11n. The Scope conf param
was, for a time at least, the recommended method of describing what
part of a v11n was covered by a module. For these reasons, many
existing modules (IBT alone has at least 26 such modules) are
currently encoded so as to encompass the entire v11n, returning
empty-string verse content for all verses in the v11n that are not
included in the module, and using the .conf Scope param to define
exactly what is present in the module.
So even though current module making best practice may be different,
it would be good for JSword to be graceful with modules that are
encoded somewhat differently if at all possible, at least for a time.
There are many modules out there, old and new, which don't contain the
complete v11n, so determining book coverage is important.
-John
Those verses exist since they are defined in the OSIS input file to
osis2mod. Osis2mod retains everything in its input. This is a well
documented behavior of osis2mod.
The end chapter markup will be put in the last verse that is in the
chapter, which might be verse 0.
They should use xslt to strip empty verses, chapters and books out of
their file into an intermediate file and give that as input to osis2mod.
Alternatively they can use <!-- ... --> to comment out huge swaths of
the input file.
-- DM
On Mar 25, 2014, at 7:48 AM, Martin Denham <mjdenham at gmail.com
<mailto:mjdenham at gmail.com>
IBT have just passed me more information regarding their handling of
empty verses to help clarify if this is an IBT module issue or not.
Here are examples of how IBT's OSIS source defines empty verses in
<div type="x-Synodal-non-canonical"__><div type="book"
osisID="EpJer"><chapter osisID="EpJer.1"><verse sID="EpJer.1.1-72"
osisID="EpJer.1.1 EpJer.1.2 EpJer.1.3 EpJer.1.4 EpJer.1.5
EpJer.1.6 EpJer.1.7 EpJer.1.8 EpJer.1.9 EpJer.1.10 EpJer.1.11
EpJer.1.12 EpJer.1.13 EpJer.1.14 EpJer.1.15 EpJer.1.16 EpJer.1.17
EpJer.1.18 EpJer.1.19 EpJer.1.20 EpJer.1.21 EpJer.1.22 EpJer.1.23
EpJer.1.24 EpJer.1.25 EpJer.1.26 EpJer.1.27 EpJer.1.28 EpJer.1.29
EpJer.1.30 EpJer.1.31 EpJer.1.32 EpJer.1.33 EpJer.1.34 EpJer.1.35
EpJer.1.36 EpJer.1.37 EpJer.1.38 EpJer.1.39 EpJer.1.40 EpJer.1.41
EpJer.1.42 EpJer.1.43 EpJer.1.44 EpJer.1.45 EpJer.1.46 EpJer.1.47
EpJer.1.48 EpJer.1.49 EpJer.1.50 EpJer.1.51 EpJer.1.52 EpJer.1.53
EpJer.1.54 EpJer.1.55 EpJer.1.56 EpJer.1.57 EpJer.1.58 EpJer.1.59
EpJer.1.60 EpJer.1.61 EpJer.1.62 EpJer.1.63 EpJer.1.64 EpJer.1.65
EpJer.1.66 EpJer.1.67 EpJer.1.68 EpJer.1.69 EpJer.1.70 EpJer.1.71
EpJer.1.72"/><verse eID="EpJer.1.1-72"/></chapter>__</div></div>
I'm not sure how osis2mod handles all this when importing to the
module, but it works perfectly without warnings or errors. Also,
when the resulting module is passed to the "emptyvss" tool, it
passes this test without warnings.
On 25 March 2014 11:38, Martin Denham <mjdenham at gmail.com
<mailto:mjdenham at gmail.com>
I am having problems getting a list of BibleBooks contained in
some AV modules which we know do not contain certain books. I
can't work out if the problem is with JSword, the modules, or
osis2mod.
1. book.contains(nonExistingVerse) returns TRUE
2. book.getRawText(nonExistingVerse) returns <chapter end tag>
Here is a simple test to show the problem using KAZ which has
SwordBook kaz = (SwordBook)Books.installed().getBook("KAZ");
Verse esd11Verse = new Verse(kaz.getVersification(),
BibleBook.ESD1, 1, 1);
System.out.println(kaz.contains(esd11Verse));// prints: *true*
*<chapter eID="gen7" osisID="1Esd.1"/>*
Verse esd12Verse = new Verse(kaz.getVersification(),
BibleBook.ESD1, 1, 2);
System.out.println(kaz.contains(esd12Verse));// prints: *true*
*<chapter eID="gen7" osisID="1Esd.1"/>*
So how does "<chapter eID="gen7" osisID="1Esd.1"/>" get into verse
content unexpectedly?
1. a module problem; but IBT say they do not add empty verse slots
2. Sword osis2mod issue
3. JSword issue: why is JSword returning a chapter end tag
instead of verse content
Any ideas what might cause this problem?
Thanks
Martin
On 11 March 2014 12:15, DM Smith <dmsmith at crosswire.org
<mailto:dmsmith at crosswire.org>
We haven't pushed this down into JSword. So far it is the
responsibility of the front-end. Chris B has made it efficient
to ask a Book whether it contains a Verse.
Essentially, when it comes to asking a module if it has
meaningful content, you want containsAny(Key verses, boolean
includeIntros) and containsAny(Key verses) { return
containsAny(verses, false); }
I think it should ignore verse 0 by default. If it doesn't
have verse content, then does the content really mean something?
As you have noted contains(Key) is confusing. There are a few
places where it means containsAny. Usually it means
containAll. The name, contains, was chosen early as we derived
from a container class where the argument was an element of
the container. That is, contains is supposed to mean
isMemberOf. Later we changed the inheritance as it wasn't an
"is a" relationship.
But we need to be careful of not introducing more confusion.
By the way, the list serve was holding mail for a few days.
In Him,
DM
On Mar 8, 2014, at 5:26 PM, Martin Denham <mjdenham at gmail.com
<mailto:mjdenham at gmail.com>
Is there an efficient way to find if a BibleBook is
contained in a Book (Bible or commentary) using JSword?
I recall this subject being discussed but can't recall the
outcome.
Thanks
Martin
_______________________________________________
sword-devel mailing list: sword-devel at crosswire.org
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
Loading...