Status of project, and where to next

Discussion:

Lauri Watts

2005-04-21 09:24:48 UTC

Hi all,

KDE (http://www.kde.org) would like to use the db2latex stylesheets for pdf
processing of our documentation repository (http://docs.kde.org) There are
some enormous advantages to this method of pdf generation over any of the
others for us, and the list of open issues we have for them is actually
pretty small.

What we'd like to know is, what is the status of the project as it is now? I
don't see any activity in the CVS repository for 8 months (and that was the
first for some time), there are several open issues in the tracker, and some
useful looking patches in the mailing list.

Are the original developers still around? Does the offer of some new blood
help at all inspire you to come back to us? We'd certainly love to see you
back in action, and if you are simply burned out or feeling a lack of
appreciation, please trust me, you are very appreciated and just tell us what
we can do to help out. If you've moved on, or the issue is lack of time,
would you be willing to add CVS access for some people, to allow some forward
movement, or at least fix up the few bugs that have turned up with xsltproc
now being much stricter in it's processing? Or would you be ok with someone
taking over the project entirely in some form (whether here, with another
name, or in another location?)

How about other readers out there - I'm fairly certain we are not alone in
wanting to see this work bitrotting? Is there anyone else (especially those
of you already having posted patches) who are interested in helping to keep
this project alive?

We (KDE) also have developers already with XSLT and TeX skills that can help
out, but we're generally in the business of developing (and documenting)
desktop software, so if someone more qualified wants to step up and run with
this, I have a really nice big repository of test documents to work on.

To summarise: What I'd most like to see, is for the original developers to pop
up and get back to work (because hey, they already know all this code better
than anyone else, and I like what they've done so far). If that's not
possible, I'd really like their blessing to put together a working group of
some kind to continue development.

Regards,

--
Lauri Watts
KDE Documentation: http://docs.kde.org
KDE on FreeBSD: http://freebsd.kde.org

Nikolai Prokoschenko

2005-04-22 07:44:55 UTC

Permalink

Post by Lauri Watts
What we'd like to know is, what is the status of the project as it is now?

I'm not a developer of DB2LaTeX-XSL, but I'm very interested in it, as I'm
trying to get the Debian Project's Documentation (at the moment only the
Debian Installer Manual) to look clean and nice. I've found out that
db2latex-xsl is rather good for our goals, but it should be rehauled at
some places because these are not yet DocBook-compliant - it also fails
the Docbook Test Suite miserably. I've managed to patch it to support the
morerows property of tables to the extent we needed, but it's still not
really good.

Another problem is that the documentation we have comes in several
languages and also several scripts. The TeX/LaTeX support for these
differs significantly and sometimes DB2LaTeX is not flexible enough to
support all necessary steps. For example, I couldn't generate proper
Japanese without converting from UTF-8 to legacy Japanese (using only free
fonts).

So my opinion is that a lot of work needs to be done, especially in
ensuring flexibility and Docbook-compliance. Another field is ensuring TeX
works the way we want it to (Unicode). If we get clearance from the
developers, we might need to setup an automated test suite processing so
that we can see which areas need work.

I'm lurking in this mailing list, if anything big happens, I'd be glad to
help. I do not consider myself capable of managing the project alone, but
I'd like to see it evolve into a general documentation typesetting
solution, as opposed to FOP and JadeTeX (which is plain ugly and
non-flexible).
--
Nikolai Prokoschenko
***@prokoschenko.de / Jabber: ***@jabber.org

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

nico

2005-04-22 17:48:05 UTC

Permalink

Oups, I forgot the list.

------- Forwarded message -------
From: nico <***@libertysurf.fr>
To: "Nikolai Prokoschenko" <***@prokoschenko.de>
Subject: Re: [DB2LaTeX-devel] Status of project, and where to next
Date: Fri, 22 Apr 2005 19:44:44 +0200

On Fri, 22 Apr 2005 09:44:55 +0200, Nikolai Prokoschenko

Post by Nikolai Prokoschenko

Post by Lauri Watts
What we'd like to know is, what is the status of the project as it is now?

Hello,

I've done a db2latex clone (http://dblatex.sf.net) some time ago that
started with the db2latex stylesheets. Maybe some of the rewritten
stylesheets could be reused, especialy those to render the graphics.
Anyway, this thread is a good occasion to list what is missing, what
should change, etc. What is the docbook test suite you're talking about?
Is it the 'testdocs' entry under the docbook sourceforge site? Besides, I
am still interested in using your XSL table implementation.

Post by Nikolai Prokoschenko
Another problem is that the documentation we have comes in several
languages and also several scripts. The TeX/LaTeX support for these
differs significantly and sometimes DB2LaTeX is not flexible enough to
support all necessary steps. For example, I couldn't generate proper
Japanese without converting from UTF-8 to legacy Japanese (using only free
fonts).
So my opinion is that a lot of work needs to be done, especially in
ensuring flexibility and Docbook-compliance.

If you have suggestions to improve flexibility, or even some hack, please
tell it on the list. You can send all your wishes, at least to see the
amount of work to do.

Post by Nikolai Prokoschenko
Another field is ensuring TeX
works the way we want it to (Unicode). If we get clearance from the
developers, we might need to setup an automated test suite processing so
that we can see which areas need work.

Good idea.

Bye,
BG

--
Using Opera's revolutionary e-mail client: http://www.opera.com/m2/

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

Nikolai Prokoschenko

2005-04-23 22:15:42 UTC

Permalink

What is the docbook test suite you're talking about? Is it the
'testdocs' entry under the docbook sourceforge site?

Exactly. I hasn't been "released" in a long time, but Norman Walsh seems
to update the CVS repository on a regular basis. Current version is for
Docbook 4.4. I think that this test suite is comprehensive enough to find
most "big" mistakes - at least I could verify that my table morerows code
needs rewriting badly.

Besides, I am still interested in using your XSL table implementation.

It was rather a hack to be able to use db2latex-xsl at all for the Debian
project. Generally, I think the tables support should be rewritten.

Post by Nikolai Prokoschenko
So my opinion is that a lot of work needs to be done, especially in
ensuring flexibility and Docbook-compliance.

If you have suggestions to improve flexibility, or even some hack,
please tell it on the list. You can send all your wishes, at least to
see the amount of work to do.

I don't think I've explored the code deeply enough to give qualified
reasons, but I didn't like the idea of "mapping", for example. I've got
the impession that it just replaces certain elements with certain
LaTeX-macros, which is not quite right, at least not without an obvious
way to overload this _consistently_ throughout the code.

But please note that I haven't read the code extensively and I cannot say
I understand all the background for this or that decision.

Post by Nikolai Prokoschenko
Another field is ensuring TeX works the way we want it to (Unicode). If
we get clearance from the developers, we might need to setup an
automated test suite processing so that we can see which areas need
work.

Good idea.

This is actually something I can setup rather quickly. I just need some
place to host it.

Another item on the agenda: I know you used Perl for post-processing. I'm
not quite sure it's sane enough, but I also can't tell you why we
shouldn't be doing that. I would like to have a plain XSL solution - or
else we can just take Perl and scrap XSLT. I know XSLT is PITA (I've spent
several days on the rather trivial task of adding morerows ;)), but if we
choose that way, we should walk along it.

Considering the structure of the project: a wiki might be good to store
all Docbook definitions, rendering proposals and possibly an algorithms
for translating that to LaTeX and of course some discussion.

I will hopefully be looking into db2latex-xsl's code soon, so I can tell
what I would like to see changed.
--
Nikolai Prokoschenko
***@prokoschenko.de / Jabber: ***@jabber.org

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

nico

2005-04-27 23:24:50 UTC

Permalink

On Sun, 24 Apr 2005 00:15:42 +0200, Nikolai Prokoschenko

Post by Nikolai Prokoschenko

What is the docbook test suite you're talking about? Is it the
'testdocs' entry under the docbook sourceforge site?

Ok.

Post by Nikolai Prokoschenko

Post by Nikolai Prokoschenko
So my opinion is that a lot of work needs to be done, especially in
ensuring flexibility and Docbook-compliance.

If you have suggestions to improve flexibility, or even some hack,
please tell it on the list. You can send all your wishes, at least to
see the amount of work to do.

I don't think I've explored the code deeply enough to give qualified
reasons, but I didn't like the idea of "mapping", for example. I've got
the impession that it just replaces certain elements with certain
LaTeX-macros, which is not quite right, at least not without an obvious
way to overload this _consistently_ throughout the code.
But please note that I haven't read the code extensively and I cannot say
I understand all the background for this or that decision.

I agree. I think it comes from the XSL language translation code by N.
Walsh stylesheets to map how to say "chapter" in spanish for instance. It
is well suited for the original purpose, but not really to translate
structural things to others. In dblatex I don't use it heavily.

Post by Nikolai Prokoschenko

Post by Nikolai Prokoschenko
Another field is ensuring TeX works the way we want it to (Unicode). If
we get clearance from the developers, we might need to setup an
automated test suite processing so that we can see which areas need
work.

Good idea.

Well, the perl processing is used for three purposes:
- for fast systematic string replacement, e.g. a "\" is change to a "\\"
to make latex happy. Doing this task with XSL is quite possible but really
slow. I wanted something really faster because I need to drive books
containing more than 200 pages. This said, maybe that EXSL functions now
exist that can do this far better. Is it the case?
- "to do the things I cannot do with XSL". I confess my poor XSL
programming skills with this point. Currently it's for table support.
- to do some compilation stuff once the latex file is available. Typically
it detects and converts some figures format to the expected one for the
tex compiler. For example it converts EPS figures to PDF when pdflatex is
used.

So I would say that:
- Having all the XML translation done by XSL would be the best (and
cleanest) thing provided that the two first points are really possible to
do with XSL. I confident with this. At least the string replacement is not
a functionnal problem (XSL can do this) even if it is still a performance
issue. Besides the David Hedley contribution skip the table coding I
needed to do with perl.
- The last perl processing seems to me interesting (I don't have to care
how to compile the file) but is not a part of the XML translation. It's
only a convenient way to do things after.
- Perl post-processing can become completely optional.

Post by Nikolai Prokoschenko
Considering the structure of the project: a wiki might be good to store
all Docbook definitions, rendering proposals and possibly an algorithms
for translating that to LaTeX and of course some discussion.
I will hopefully be looking into db2latex-xsl's code soon, so I can tell
what I would like to see changed.

Even if dblatex is different from db2latex in some aspects I'm sure that
we can converge to a common set of well defined core XSL stylesheets. And
maybe a wiki is a good way to do the necessary exchanges, but I don't know
how this can be installed. At least the mailing list is a good start to do
exchanges ;-)

Regards,

BG

-------------------------------------------------------
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start! http://www.idcswdc.com/cgi-bin/survey?id=105hix

Torsten Bronger

2005-04-28 05:37:53 UTC

Permalink

Hallöchen!

Post by nico
[...]
- for fast systematic string replacement, e.g. a "\" is change to
a "\\" to make latex happy. Doing this task with XSL is quite
possible but really slow. I wanted something really faster because
I need to drive books containing more than 200 pages. This said,
maybe that EXSL functions now exist that can do this far
better. Is it the case?

I don't know EXSL well enough, but while I too had once thought that
it's good to have everything in XSLT, I quickly realised that
there's always the right tool for the right purpose, and doing
everything in XSLT means that you do it sub-optimally in one way or
the other.

So I wrote tbrplent (in C++) which replaces UTF-8 sequences with
LaTeX commands. It works well. The idea is that the XSLT
stylesheet deploys delimiters. Every text node is enclosed by them,
every formula, and every text-within-formula. So the replacements
can fit in their context. My XSLT typically contains:

<xsl:variable name="start-delimiter" select="''"/>
<xsl:variable name="end-delimiter" select="''"/>

<xsl:template match="text()" priority="0">
<xsl:value-of select="$start-delimiter"/>
<xsl:value-of select="."/>
<xsl:value-of select="$end-delimiter"/>
</xsl:template>

Unfortunately, my ambitions have grown since then, and I plan a
re-write. I needed three pairs of delimiters, but I want to replace
them by one pair plus numerical parameter, in order to have as many
"modes" as I need. For example, there must be a special mode for
headings, because in PDF bookmarks some characters must be written
differently.

tbrplent was written for my tbook project ("TBook RePLace ENTities),
but I now use it for texi2latex, too.

If you are interested, we can try a re-implementation (under a new
name) together. It's not a big thing after all. The replacement
tables exist already, they just need to be expanded to the new
modes. And we have to decide for a language. I planed to use
Python, but probably most of you are Perl hackers.

C++ would produce a small executable program, which is especially
pleasing for a Windows distribution. Since no regexes are involved,
it should be as easy to program and as fast as in Perl. So I vote
for C++.

Tschö,
Torsten.
--
Torsten Bronger, aquisgrana, europa vetus

-------------------------------------------------------
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start! http://www.idcswdc.com/cgi-bin/survey?id5hix

nico

2005-04-28 21:16:30 UTC

Permalink

On Thu, 28 Apr 2005 07:37:53 +0200, Torsten Bronger

Post by Torsten Bronger
Hallöchen!

Salut ! ;-)

Post by Torsten Bronger

That's why I use the perl processing, indeed.

Post by Torsten Bronger
So I wrote tbrplent (in C++) which replaces UTF-8 sequences with
LaTeX commands. It works well. The idea is that the XSLT
stylesheet deploys delimiters. Every text node is enclosed by them,
every formula, and every text-within-formula. So the replacements

That's exactly the way I do.

Post by Torsten Bronger
<xsl:variable name="start-delimiter" select="''"/>
<xsl:variable name="end-delimiter" select="''"/>
<xsl:template match="text()" priority="0">
<xsl:value-of select="$start-delimiter"/>
<xsl:value-of select="."/>
<xsl:value-of select="$end-delimiter"/>
</xsl:template>
Unfortunately, my ambitions have grown since then, and I plan a
re-write. I needed three pairs of delimiters, but I want to replace
them by one pair plus numerical parameter, in order to have as many
"modes" as I need. For example, there must be a special mode for
headings, because in PDF bookmarks some characters must be written
differently.

I understand.

Post by Torsten Bronger
tbrplent was written for my tbook project ("TBook RePLace ENTities),
but I now use it for texi2latex, too.
If you are interested, we can try a re-implementation (under a new
name) together. It's not a big thing after all. The replacement
tables exist already, they just need to be expanded to the new
modes. And we have to decide for a language. I planed to use
Python, but probably most of you are Perl hackers.

Yes it's interesting. Personally I would vote for Python: powerfull, clean
and maintanable code, well suited for text processing, compilable if hard
performance is needed, available on almost every platform.

Post by Torsten Bronger
C++ would produce a small executable program, which is especially
pleasing for a Windows distribution. Since no regexes are involved,
it should be as easy to program and as fast as in Perl. So I vote
for C++.
Tschö,
Torsten.

Regards,

BG

-------------------------------------------------------
SF.Net email is sponsored by: Tell us your software development plans!
Take this survey and enter to win a one-year sub to SourceForge.net
Plus IDC's 2005 look-ahead and a copy of this survey
Click here to start! http://www.idcswdc.com/cgi-bin/survey?id=105hix

Nikolai Prokoschenko

2005-04-23 22:47:01 UTC

Permalink

Post by nico
If you have suggestions to improve flexibility, or even some hack, please
tell it on the list. You can send all your wishes, at least to see the
amount of work to do.

Some other items:

- I've looked into the mappings - it seems that this is just a simplified
way of mass-defining templates. I think to ensure flexibility each
element should get a separate template in XSL.

- There should be some general considerations on naming the templates. The
current system seems rather chaotic to me, especially with all the
"generate.*" templates.

- Connected to the item above: we might need some generic XSL function
library. For example, I needed to build up some kind of array structure
to be able to implement morerows. There will certainly be a whole lot of
other helping functions, which we'll need all over the code especially
for the complex things. Maybe something like this exists already?
(http://sourceforge.net/projects/xsltsl/ and
http://fxsl.sourceforge.net/ have some material)

- We'll also need some prototyping language for XSLT - all the <xsl:if>s
are just nasty. I've seen something like that on the internet, but I do
not have a link right now. It plainly lets you write if a=b {} and
converts it to <xsl:if test="a=b"></xsl:if>. Maybe, if we find something
really working and useful, we might switch to that format and generate
XSLT only automatically?

- General file structure. How about an xsl-file for each Docbook element
or something like it?
--
Nikolai Prokoschenko
***@prokoschenko.de / Jabber: ***@jabber.org

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click

Lauri Watts

2005-04-22 19:55:23 UTC

Permalink

Post by Nikolai Prokoschenko
I'm lurking in this mailing list, if anything big happens, I'd be glad to
help. I do not consider myself capable of managing the project alone, but
I'd like to see it evolve into a general documentation typesetting
solution, as opposed to FOP and JadeTeX (which is plain ugly and
non-flexible).

I think we're in the same situation precisely here. I also think nico's reply
further down the thread may be the answer to both our problems :)

Regards,

--
Lauri Watts
KDE Documentation: http://docs.kde.org
KDE on FreeBSD: http://freebsd.kde.org