normalize-scape.mod.xsl

Halloechen!

Post by Vitaly Ostanin
I look into xsl/normalize-scape.mod.xsl.
template name="scape" is really big... And will very big.
I don't know another way for multiple parsing string (and
parsing result of parsing).
What you think about creating xml file for symbols and
replacements? Such xml will easy contributed and maintained for
generating xslt from it.
LaTeX doesn't support unicode characters by their numbers, so
each character need to be translated into valid latex.

Do you mean something like
<http://xml.coverpages.org/unicodeRahtz19981008.xml>?

I think the best solution is an external tool. XSLT is not for
everything. Especially because such a program is small, easy to be
written in a portable way, and can substitute elegantly depending on
context.

Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01

Vitaly Ostanin

2003-07-01 14:18:23 UTC

On Tue, 01 Jul 2003 15:57:53 +0200

Post by Torsten Bronger
Halloechen!

Post by Vitaly Ostanin
I look into xsl/normalize-scape.mod.xsl.
template name="scape" is really big... And will very big.
I don't know another way for multiple parsing string (and
parsing result of parsing).
What you think about creating xml file for symbols and
replacements? Such xml will easy contributed and maintained
for generating xslt from it.
LaTeX doesn't support unicode characters by their numbers, so
each character need to be translated into valid latex.

Do you mean something like
<http://xml.coverpages.org/unicodeRahtz19981008.xml>?

Thanks, it's cool! I'll look on it.

Post by Torsten Bronger
I think the best solution is an external tool.

I already fix normalize-scape.mod.xml for using latex.mapping.xml
and it worked.

scape.xsl with fixed template name="scape" is attached.

<skipped/>

--
Regards, Vyt
mailto: ***@vzljot.ru
JID: ***@vzljot.ru

James Devenish

2003-07-02 09:51:21 UTC

Post by Torsten Bronger
Do you mean something like
<http://xml.coverpages.org/unicodeRahtz19981008.xml>?

If anyone knows how to use this and would like to write notes
about how it can be used with DB2LaTeX, feel free ;-)

Post by Torsten Bronger
What you think about creating xml file for symbols and
replacements? Such xml will easy contributed and maintained for
generating xslt from it.

[...]

Post by Torsten Bronger
I already fix normalize-scape.mod.xml for using latex.mapping.xml
and it worked.

I, too, would like to have had this. But DocBook XSL stylesheets, in
general, are slow enough already. The problem with using a recursive
template is that it can easily increase processing time by a factor of
five. Yet it only benefits developers. So I dropped the idea.

However, thanks to your prompting, perhaps we can come to a compromise:
we will still use the long, monolithic "scape" template but it will be
generated from a mapping file (not hand-coded).

Post by Torsten Bronger
LaTeX doesn't support unicode characters by their numbers,
so each character need to be translated into valid latex.

I haven't found that to be possible (but I'm not an XSLT expert). If you
have any idea how to do this portably in XSLT without using extensions,
I would really love to know. If you have a method that relies on
commonly-available extensions, we could include that as an option. Our
current approach is to say "we can't do this with XSLT, so we'll do it
with LaTeX".

For DB2LaTeX, there are three graceful options built in (though neither
is enabled by default). The test_entities folder (which should probably
have been named test_characters) demonstrates this. The current options
are:

- Do nothing to handle Unicode characters. This is the default. You
will get LaTeX error messages and the output won't be correct.
- Enable output escaping and handle some 'essential' English-language
characters. For unrecognised characters, spell out the character
codes in the text (to alert the reader). This is best way of
providing support for the bulk of English-language documents. "Odd"
characters will appear in a way that proof-readers can recognise. The
example files for this are test_entities/catcode.*
- Enable output escaping, use the LaTeX 'unicode' package, but keep the
output encoding in a Latin-alphabet character set. This is for
Latin-alphabet users. For them, it may be preferable to use an ISO
Latin output encoding and have the 'babel' package handle Latin
characters. Other characters, if present, will be intercepted and
passed to the 'unicode' package. The example files for this are
test_entities/ucs.*
- Use Unicode characters directly. E.g. <xsl:output encoding="utf-8"/>.
This allows fullest use of the DocBook localisations as-is (though
you will need to install the 'unicode' LaTeX package). This option is
intended for documents where the incidence of non-Latin characters is
high. The example files for this are test_entities/utf-8.*

See also (incomplete documentation):
$latex.entities <http://db2latex.sourceforge.net/reference/rn45re81.html>
$latex.inputenc <http://db2latex.sourceforge.net/reference/rn45re81.html>
$latex.use.ucs <http://db2latex.sourceforge.net/reference/rn45re81.html>
$latex.ucs.options http://db2latex.sourceforge.net/reference/rn45re101.html
$latex.babel.language <http://db2latex.sourceforge.net/reference/rn45re102.html>

James.

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01

Vitaly Ostanin

2003-07-02 11:03:49 UTC

On Wed, 2 Jul 2003 17:51:21 +0800

on Tue, Jul 01, 2003 at 03:57:53PM +0200, Torsten Bronger

Post by Torsten Bronger
Do you mean something like
<http://xml.coverpages.org/unicodeRahtz19981008.xml>?

If anyone knows how to use this and would like to write notes
about how it can be used with DB2LaTeX, feel free ;-)

1. License of this file and terms of using.

2. Transform this file to form like latex.mapping.xml (replace
unicode numbers by their entities). Easy.

3. Include result of transform in latex.mapping.xml and use it
with other replacements of special LaTeX symbols.

Post by Torsten Bronger
What you think about creating xml file for symbols and
replacements? Such xml will easy contributed and maintained
for generating xslt from it.

[...]

Post by Torsten Bronger
I already fix normalize-scape.mod.xml for using
latex.mapping.xml and it worked.

You right, modified style is slow, but XSLT is not for speed.

Yet it only benefits developers. So I dropped the idea.
However, thanks to your prompting, perhaps we can come to a
compromise: we will still use the long, monolithic "scape"
template but it will be generated from a mapping file (not
hand-coded).

I'm not sure, that is the right way.

Post by Torsten Bronger
LaTeX doesn't support unicode characters by their numbers,
so each character need to be translated into valid latex.

I haven't found that to be possible (but I'm not an XSLT
expert).

It's easily doing with characters mapping, without any
extensions. And base for it already exists:
http://xml.coverpages.org/unicodeRahtz19981008.xml

If you have any idea how to do this portably in XSLT
without using extensions, I would really love to know. If you
have a method that relies on commonly-available extensions, we
could include that as an option. Our current approach is to say
"we can't do this with XSLT, so we'll do it with LaTeX".

It's not right.

<skipped/>

--
Regards, Vyt
mailto: ***@vzljot.ru
JID: ***@vzljot.ru

James Devenish

2003-07-02 13:10:18 UTC

Hi Vitaly,

New ideas are welcome -- and it would be great for us to improve
language support -- but each idea needs to be assessed for its
practical value.

Post by James Devenish
I, too, would like to have had this. But DocBook XSL
stylesheets, in general, are slow enough already. The problem
with using a recursive template is that it can easily increase
processing time by a factor of five.

You right, modified style is slow, but XSLT is not for speed.

It's not for slownees, either! :)

Post by James Devenish
we will still use the long, monolithic "scape"
template but it will be generated from a mapping file (not
hand-coded).

I'm not sure, that is the right way.
From what I have seen, it is the most practical way so far.

Post by Vitaly Ostanin
LaTeX doesn't support unicode characters by their numbers,
so each character need to be translated into valid latex.

I haven't found that to be possible (but I'm not an XSLT
expert).

It's easily doing with characters mapping, without any
extensions.

I don't believe you! If you can find someone who has demonstrated that
it is practical (or can explain how it could be done) that would help
us find a new solution for DB2LaTeX. As far as I can see, you're
suggesting that we use substring(...) to iterate over every character in
text() nodes and then do lookups in a 65000-element mapping document
(most characters will require LaTeX packages to be loaded -- so there is
always a need for a LaTeX-based solution). It simply isn't practical
(time, space, software support) to do that.

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01

Vitaly Ostanin

2003-07-02 13:59:28 UTC

On Wed, 2 Jul 2003 21:10:18 +0800

Post by James Devenish
Hi Vitaly,
New ideas are welcome -- and it would be great for us to
improve language support -- but each idea needs to be assessed
for its practical value.

Post by James Devenish
I, too, would like to have had this. But DocBook XSL
stylesheets, in general, are slow enough already. The
problem with using a recursive template is that it can
easily increase processing time by a factor of five.

You right, modified style is slow, but XSLT is not for speed.

It's not for slownees, either! :)

I optimize my version of template name="scape" (attached).
Now it used key() functionality from XSLT.

Top of my normalize-scape.mod.xsl:
<xsl:key name="entity" match="mapping" use="@key"/>
<xsl:variable name="latex.mapping.vyt"
select="document('latex.mapping.xml')"/>

Now speed statistic is (tested with xsltproc --timing --repeat):

original db2latex
Applying stylesheet 20 times took 15469 ms

vyt first (scape.xsl)
Applying stylesheet 20 times took 88864 ms
Saving result took 1 ms

vyt second (scape2.xsl)
Applying stylesheet 20 times took 34364 ms

Post by James Devenish
we will still use the long, monolithic "scape"
template but it will be generated from a mapping file (not
hand-coded).

I'm not sure, that is the right way.
From what I have seen, it is the most practical way so far.

May be.

Post by Vitaly Ostanin
LaTeX doesn't support unicode characters by their
numbers, so each character need to be translated into
valid latex.

I haven't found that to be possible (but I'm not an XSLT
expert).

It's easily doing with characters mapping, without any
extensions.

I don't believe you! If you can find someone who has
demonstrated that it is practical (or can explain how it could
be done) that would help us find a new solution for DB2LaTeX.

I'll try :)

Post by James Devenish
As far as I can see, you're suggesting that we use
substring(...) to iterate over every character in text() nodes
and then do lookups in a 65000-element mapping document(most
characters will require LaTeX packages to be loaded -- so there
is always a need for a LaTeX-based solution). It simply isn't
practical(time, space, software support) to do that.

You can split all mapping base by languages (numbers ranged) and
include only specified.

BTW, you can to have 2 alternative variants: with monolithic
"scape"; and with replaced from mapping base (separated from
latex.mapping.xml).

<skipped/>

--
Regards, Vyt
mailto: ***@vzljot.ru
JID: ***@vzljot.ru

Torsten Bronger

2003-07-02 14:33:03 UTC

Halloechen!

[...] As far as I can see, you're suggesting that we use
substring(...) to iterate over every character in text() nodes and
then do lookups in a 65000-element mapping document (most
characters will require LaTeX packages to be loaded -- so there is
always a need for a LaTeX-based solution). It simply isn't
practical (time, space, software support) to do that.

Do you mean something like this:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/xsltml/xsltml/entities.xsl?rev=1.13&content-type=text/vnd.viewcvs-markup

However I fully agree that this doesn't seem to be a very wise thing
to do.

Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01

Torsten Bronger

2003-07-02 14:27:28 UTC

Halloechen!

Post by James Devenish
[...]
For DB2LaTeX, there are three graceful options built in (though neither
is enabled by default). The test_entities folder (which should probably
have been named test_characters) demonstrates this. The current options
[...]
- Use Unicode characters directly. E.g. <xsl:output encoding="utf-8"/>.
This allows fullest use of the DocBook localisations as-is (though
you will need to install the 'unicode' LaTeX package). This option is
intended for documents where the incidence of non-Latin characters is
high. The example files for this are test_entities/utf-8.*

This sounds perfect. Then what is the disadvantage of this option?
Why isn't it used always?

BTW, recently the LaTeX3 project team introduced the new inputenc
option utf-8 (or utf8?) for testing.

Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100006ave/direct;at.asp_061203_01/01

ben

2003-07-02 18:50:56 UTC