TeXML, the XML vocabulary for TeX

Halloechen!

Post by Oleg Paraschenko
I think that you can use TeXML to some extent in the your project.
| Example of TeXML to TeX translation
|
|
| <cmd name="documentclass">
| <opt>12pt</opt>
| <parm>letter</parm>
| </cmd>
|
|
| \documentclass[12pt]{letter}
One of the main benefits of TeXML usage is an automatical translation
of the TeX special symbols.

Interesting, but how is it implemented? In XSLT, or a scripting
language, or what? How fast is it (I'm not prepared to accept a
further significant drop down in speed)?

How are different \usepackage[???]{inputenc}'s dealt with?

Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

Oleg Paraschenko

2004-03-25 12:26:13 UTC

Hi!

On Thu, 25 Mar 2004 12:42:49 +0100

Post by Torsten Bronger
Halloechen!

...

Post by Oleg Paraschenko
One of the main benefits of TeXML usage is an automatical
translation
of the TeX special symbols.

Interesting, but how is it implemented? In XSLT, or a scripting
language, or what?

It is implemented in the Python scripting language. It uses only core
Python modules (expat XML parser, unicode database, something other),
so it should work on any recent system. Mapping from Unicode characters
to LaTeX commands is taken from attachment for the MathML specification
(http://www.w3.org/Math/characters/unicode.xml (note: 1,5 Mb)).

Post by Torsten Bronger
How fast is it (I'm not prepared to accept a
further significant drop down in speed)?

It is hard to said exactly, but I think it is fast. In any case,
it should be faster then processing of specials by xslt.

Post by Torsten Bronger
How are different \usepackage[???]{inputenc}'s dealt with?

The processor does not know about \usepackage, it only translates
characters. It is a task of an xslt to insert \usepackage command into
the output, if required.

User can specify an output encoding. The processor attempts to make as
good translation as possible for it. For example, for letter ß, if
output encoding is ascii, then processor outputs "\ss "; if output
encoding is latin1, then processor outputs "ъ". In latter case correct
header should be \usepackage[latin1]{inputenc}, but it is not a task of
processor to create this header.

Post by Torsten Bronger
Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

Bye!

--
Oleg

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

Torsten Bronger

2004-03-25 12:43:51 UTC

Halloechen!

Post by Oleg Paraschenko
On Thu, 25 Mar 2004 12:42:49 +0100

[...] Interesting, but how is it implemented? In XSLT, or a
scripting language, or what?

It is implemented in the Python scripting language.

I don't know Python. How easy can this be installed on a Windows
system?

Post by Oleg Paraschenko
It uses only core Python modules (expat XML parser, unicode
database, something other), so it should work on any recent
system. Mapping from Unicode characters to LaTeX commands is taken
from attachment for the MathML specification
(http://www.w3.org/Math/characters/unicode.xml (note: 1,5 Mb)).

And is it mode-aware? Does an alpha become \alpha in formulae and a
Greek letter elsewhere? What about ligatures like "--"? Is this an
en-dash or two hyphens? What about typographic things like thin
spaces, soft hyphens, zero-width non-joiner and "break permitted
here"? How much of Unicode is covered yet?

How fast is it (I'm not prepared to accept a further significant
drop down in speed)?

It is hard to said exactly, but I think it is fast. In any case,
it should be faster then processing of specials by xslt.

Okay; I asked because using it would mean to translate
XML--XML-->text instead of XML-->text-->filter-->text, where
"filter" is *very* fast. But faster than XSLT may be enough.

How are different \usepackage[???]{inputenc}'s dealt with?

The processor does not know about \usepackage, it only translates
characters. It is a task of an xslt to insert \usepackage command into
the output, if required.

So I always have to include things like wasy, pifont, textcomp etc?
Wouldn't be a problem, I just need a complete list.

Post by Oleg Paraschenko
User can specify an output encoding. The processor attempts to make as
good translation as possible for it.

Sounds nice. Are you aware of the very new utf-8 that was added to
the LaTeX core two months ago? How good does it work?

Tschoe,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click

Oleg A. Paraschenko

2004-03-25 15:17:05 UTC

Hi!

On Thu, 25 Mar 2004 13:43:51 +0100

Post by Torsten Bronger
Halloechen!

Post by Oleg Paraschenko
On Thu, 25 Mar 2004 12:42:49 +0100

[...] Interesting, but how is it implemented? In XSLT, or a
scripting language, or what?

It is implemented in the Python scripting language.

I don't know Python. How easy can this be installed on a Windows
system?

It should not be a problem. You can download Python from the
http://www.python.org/download/ , install it and run scripts from
a command line. For example:

d:\python23\python.exe texml.py -e ascii test.xml test.tex

And is it mode-aware?

Yes, it is mode-aware. It knows text and math.

Post by Torsten Bronger
Does an alpha become \alpha in formulae and a
Greek letter elsewhere?

I tested and found that in both modes result is "\alpha ". (Or the
letter alpha itself if output is in Greek encoding. I consider it is ok
because I see a very small difference between "$\alpha $" and "$a$")

Post by Torsten Bronger
What about ligatures like "--"? Is this an
en-dash or two hyphens?

As ligatures in TeX are the property of fonts and are not the property
of a document, and as the TeXML processor can't guess what font will be
used, the processor ignores ligatures at all. As result, "--" in TeXML is
translating into "--" in TeX, which is interpreted as en-dash. At time
of development I was considering that it is a correct behaviour. Now I'm
changing my mind and adding handling of "--" and "---" to the list of
bugs. Anyway, I don't plan to break ligatires like "fi", "fl" etc.

Post by Torsten Bronger
What about typographic things like thin
spaces, soft hyphens, zero-width non-joiner and "break permitted
here"? How much of Unicode is covered yet?

There are two translation tables, one for text mode, another one for
math mode. There is 2361 symbols for text mode and 195 symbols for math
mode (math mode reuses text mode if symbol not found).

For mentioned typographic things, here is a test:

| TeXML:
|
| <TeXML>α<math>α</math>
| thin space: [ ]
| soft hyphens: []
| zero-width non-joiner: [‌] oops here ...
| break permitted here: [] ... and here
| </TeXML>
|
| TeX:
| \alpha $\alpha $
| thin space: [\hspace{0.167em}]
| soft hyphens: [\-]
| zero-width non-joiner: [‌] oops here ...
| break permitted here: [] ... and here

As we see, not all characters are mapped. If it is an issue, then it is
an issue for supporters of the unicode map of the MathML specification.
After they approve and fix a problem, the TeXML processor also will be
updated.

How fast is it (I'm not prepared to accept a further significant
drop down in speed)?

It is hard to said exactly, but I think it is fast. In any case,
it should be faster then processing of specials by xslt.

Okay; I asked because using it would mean to translate
XML--XML-->text instead of XML-->text-->filter-->text, where
"filter" is *very* fast. But faster than XSLT may be enough.

How are different \usepackage[???]{inputenc}'s dealt with?

The processor does not know about \usepackage, it only translates
characters. It is a task of an xslt to insert \usepackage command into
the output, if required.

So I always have to include things like wasy, pifont, textcomp etc?
Wouldn't be a problem, I just need a complete list.

Maybe I don't understand the question well, so repeat the qeustion if I
give no answer. The TeXML processor does not add anything. So (imagine),
if the processor generates "\alpha", and usage of "\alpha" in TeX document
requires package "greekfont", you will probably get an error from LaTeX.
I have no good solution yet.