Tuesday, July 28, 2009

Translation priority

A while back, Roger and I talked about the translation priority used in the wml files and how this could be carried over to the po files. The idea is that Pootle will sort the files after translation priority, giving new translators an idea of where to begin.

I have started to work on a solution for this. The plan is to read the translation priority from the wml file and include this in the filename for the po file. Take bridges.wml as an example. This file has the translation priority "1-high", so the resulting po will be named 1-high.bridges.po.

Then there's the small details that need to be taken care of. What if the script is looking for 2-medium.bridges.po, but only 1-high.bridges.po exist? This can be the case if someone decides to change the translation priority for that file. Or what if the file simply hasn't been converted to po yet?

Monday, July 27, 2009

Testing of wml2po and po2wml

I have tested wml2po and po2wml again, and like last time I made sure that Pootle was poked at the right time, that I could translate files using Pootle and commit without problems.

I'm glad to say that everything worked just as I expected; wml2po converted all of the files to po, and po2wml convert the translated files back to wml again.

Sunday, July 26, 2009

Different directory structure

The website directory in svn has the following structure..
website/en/*.wml
website/de/*.wml
website/docs/en/*.wml
website/docs/de/*.wml
.. and I always thought that having the same structure in the pootle project directory would be possible..
translation/projects/website/en/*.po
translation/projects/website/de/*.po
translation/projects/website/docs/en/*.po
translation/projects/website/docs/de/*.po
.. but yesterday I found out that pootle does not accept any directory, directly under the project directory, other than a language directory. So I had to come up with a solution for a different directory sturcture. This is the result:
translation/projects/en/*.po
translation/projects/de/*.po
translation/projects/en/docs/*.po
translation/projects/de/docs/*.po
The structure of the website directory in svn remains the same.

So, the plan for tomorrow is to go through the 7 steps again and make sure the whole process works nicely.

Tuesday, July 21, 2009

Todo for po2wml

Use both po4a-gettextize and po4a-updatepo
If a .po file does not exist, use po4a-gettextize to create it, set the right encoding and charset as well as the correct copyright. Nicolas François said that he would add this functionality to po4a-gettextize. However, if the .po file does exist, simply use po4a-updatepo to update if the .wml has changed.

Use the current .wml if that is more complete than the .po
The nice thing with po4a-translate is that it will only write the translated .wml if 80% (the default value) or more has been translated. The not-so-nice thing with po4a-translate is that it will delete the .wml if less than 80% has been translated.

My suggestion for dealing with this is the following: use po4a-translate to write the file $lang-$file.wml. If that file was actually written, rename the file to $file.wml.

Document it
This isn't just for po2wml, but I should write a README explaining the basics.

Thursday, July 16, 2009

Issues with po4a

One of the first things I noticed when I began converting files with po4a, was the fact that po4a would sometimes exit with an error, even if the file was valid wml. This is a problem for files like download.wml where a special link-format is used to make sure every page shows the same version number (and points to the latest release).

Also, when converting a file from .wml to .po for the first time, a header will be added containing the following information:

# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.

It would be nice to have a header that doesn't assign copyright to FSF (or not have a header at all). I have talked to the maintainer of po4a who said he would take a look at these two issues during his two weeks at Debconf.

I have also written two bug reports:
#537236: please add support for valid wml
#537245: modify header in .po

My current todo list

For those who would like to know what I am working on right now, here is my current todo list (also known as list-of-issues-that-need-to-be-sorted-out). That said, progress is a little bit slow these days as I'm in bed with the flu (again).

Converting files
I am working on converting the translated .wml files to .po files. The files are being uploaded to translation/trunk/tools/gsoc09/po/.

I have run into a little problem that I'm not sure how to deal with. When converting a .wml to a .po, including the translated .wml, po4a will sometimes exit with one of the following (or both) errors:

        1. Original has less strings than the translation
        2. Structure disparity between original and translated files

There are two types of translations that cause such problems. The first one is where the translator has simply skipped a paragraph or two and moved on to the next. This type of translation shouldn't be too hard to fix and convert. In most cases, anyways.

The second type is where the translator has written his or her own, translated version of the page, and not a direct, translated copy. This type of translation can be hard to fix unless you know the language.

I guess that one possible solution would be to ask the people on the or-talk list for help, or contacting previous translators directly.

No translated .wml in svn
Not having translated .wml in svn sounds like a good idea. After all, one can just generate the .wml from the translated .po when needed. However, we do need to have a few of the translated .wml files in svn. When I say "a few" I mean the translated versions of the donate-page as well as the download-pages.

The reason for this is that these files can not be converted to po (for now, at least). I suggest that we keep those files in svn and have po2wml generate the rest when we need them, i.e. before the website is built and published.

A small note-to-self would be to fix po2wml so it no longer adds and commits files.

Monday, July 13, 2009

Bug fixed in po4a

Last week I ran into a problem when po4a-translate seemed to have a problem recognizing new, translated strings. This turned out to be a bug in the wml module and it has now been fixed in the po4a repository. This email to the po4a-list explains the bug and the quick fix.

Friday, July 10, 2009

The website, po2wml and po4a-translate

I had to deal with a few things at the beginning of the day, but since then I have:

Built the website
I managed to build the website with my own, translated files and everything looks ok. I haven't published it simply because I am a bit unsure how I can put it on www.xcde.net/torproject using the publish-script.

Fixed po2wml
I noticed that, when running, po2wml would also convert empty .po files. To deal with this, I included an if statement that checks for a certain type of comments (of the form "#."). Those comments say something about the following string, if you're translating a part of a list or a paragraph. So, that way the script knows whether or not to convert the file. No comments of the form "#." means no content.

Looked at po4a-translate
The script, po2wml, is using po4a-translate to convert the translated .po files back to .wml files. It seems like po4a-translate has a problem with recognizing new, translated strings. I copied en/30seconds.wml and de/30seconds.po into a new folder, translated 27 of the 32 strings in 30seconds.po and ran po4a-translate.

I got the following output:
We found translations for 84.37% (27 from 32) of strings.

Then I decided to translate the rest of the .po. I did that, ran po4a-translate again and got the exact same output. It says that only 27 of the 32 strings have been translated, but that is not correct.

I'm not really sure why it's doing this, if I'm doing something wrong or if there's a bug somewhere. I have sent an email to the po4a-devel mailing list and hopefully someone will reply in not too long (that or I'll find someone on irc and give them a friendly poke).

Thursday, July 9, 2009

So far, so good

Today has been the day of testing code, fixing minor bugs and testing some more. I had 9 items on my list and I have managed to do 8 of them. The items are pretty much the same as the ones described in the previous post. I figured I'd do the last item tomorrow and then go through the list one more time.

Step 1:
Set up a repository with a website module and a translation module.

Step 2:
Put wml files in the english directory in the website module.

Step 3:
Make sure pootle reads the translation module.

Step 4:
Run wml2po to convert the wml files to po files and put them in the translation module.

While converting and updating the files, I noticed that files were being committed to the repository even though they had not changed. Or so I thought. It turns out that a few of the comments in the files would change every time I ran the script. The solution was to modify one of the regular expressions in the script to exclude a certain type of comments as well.

Step 5:
Restart pootle so it can detect the new files.

Step 6:
Translate a few strings using the pootle interface.

Step 7:
Commit using the pootle interface.

After a little back and forth I managed to give my user the right permissions. Translating and committing files using the pootle interface went without any further problems.

Step 8:
Run po2wml to convert the po files back to translated wml files.

I noticed that the script would convert the english po files back to english wml files. This certainly doesn't make much sense, and so the problem was fixed with an if statement.

Step 9:
Build and deploy the website.

Building the website is the one thing I haven't done today, and it is the first thing I'll do tomorrow.

Step by step

We now have a way to convert from .wml to .po and back to .wml again, but what exactly is the process from start to finish?

Here's how it will work, step by step, from a new .wml, to a translated .po, to .wml again, to a part of the website.

Step 1:
I have written foo.wml, a new .wml file. I commit this to the repository.

Step 2:
I need to convert foo.wml to foo.po. I run wml2po.

Step 3:
I now have foo.po in the repository as well, but pootle doesn't know about it. Pootle needs to be poked to learn about new files. Restart pootle.

Step 4:
Pootle now knows about foo.po and people can begin to translate.

Step 5:
It is soon time to build the website and I want to include translated pages of foo.wml. People forget to click on the 'commit'-button in the pootle interface when they are done translating foo.po. I log on to the pootle admin interface and force pootle to commit foo.po for all languages.

Step 6:
All work in pootle has been committed and I can update the translated documents, for example website/de/foo.wml. Run po2wml to convert translation/projects/website/de/foo.po to website/de/foo.wml and commit.

Step 7:
I build the website and deploy it.

Wednesday, July 8, 2009

Old code

I have uploaded the code that I have previously written to show that I have been hacking away on stuff, and also what I have been hacking away on. Most of the scripts there should work, but I can't promise anything.

Anyways, they are there for everyone to look at and maybe also learn from (kids, don't use PATH as a variable name). Also, some people might like to see how the ideas and the code have evolved over time.

The code is in the repository, translation/trunk/tools/gsoc09/other.

Tuesday, July 7, 2009

wml2po and po2wml

Four days ago I asked, on the tor-gsoc mailing list, if we could split the contents of website/ into po/ and wml/. Roger Dingledine said that it would be possible, but it would be even better to keep the wml files in the website module and the po files in the translation module.

He also said that "we can give pootle or other services read-write access to the "translation" module, but they can treat the "website" module as read-only. The process of building the website can then treat the "translation" module as read-only".

After a little bit of testing I came up with a good solution. So, in my case, I can have the wml files in
"/home/runa/tor/website"
and the resulting po files in
"/home/runa/tor/translation/projects/website"

I also wrote the second script, po2wml, that will take the translated po files and convert them back to wml files.

The updated version of wml2po has been added to the repository, together with po2wml.

Monday, July 6, 2009

In SVN: wml2po

Some say it's better late than never, I say it's something I should have done a little bit earlier. It took some time before I managed to get on the right track and find a solution that everyone seemed happy with. Hopefully, this is it. The script, wml2po is now in the tor svn repository.

Sunday, July 5, 2009

Almost ready: wml2po

The script, wml2po, is almost ready. It assumes that the contents of website/ is split into wml/ and po/. The resulting po files are now being put where they belong, i.e.:


  • website/wml/en/index.wml to website/po/$language/index.po

  • website/wml/press/en/index.wml to website/po/press/$language/index.po



Also, the hash of each po file is generated before and after the file has been updated. The only thing left is to make sure that script has a proper lockfile.

Friday, July 3, 2009

My progress so far

I just sent an email to the tor-gsoc list asking if it is possible to split the contents of website/ into po/ and wml/. Doing so will make things a whole lot easier when it comes to figuring out where to put the resulting po files.

I also sent an email to my mentor telling him about my progress so far. What I wrote is pretty much the same as I have previously written in this blog:

Over the past few weeks I have written a few different scripts. While they have worked, they have not been ideal solutions to the problem I am trying to solve. If you read the blog post from yesterday, July 2nd, you'll see that I've gotten useful pointers from Peter and that
is what I have been working on today.

What's left to do is:

For wml2po:

  • Make sure that the wml and po files are put where they belong.

  • Generate hash values for each file before and after converting. Compare these two values and revert if they are the same (i.e. nothing has changed).



For po2wml:

  • The code will be pretty much the same as for wml2po and I don't think writing this script will be a problem.



Converting from wml to po, including translations:

  • I have already converted the wml files in the repository (including their translations) to po files. I'm just not sure where I should upload these. What do you think?

  • I also think that uploading those po files won't do much good until we have a script that can keep them updated automatically.



According to my timeline I should be done with the coding by the end of next week, and then start to write documentation (that is, document the code properly and write a readme). I am confident that I'll be able to stick to that plan. Let me know if you have any questions.

Thursday, July 2, 2009

A few bumps in the road

Me and my code have been hitting a few bumps in the road over the past couple of weeks. Either because the solution I came to simply wasn't the best solution out there or because I misunderstood something. Peter Palfrader has been really nice and given me a few pointers, and I am confident that I am on the right track.

So, the new approach will be to use a proper lockfile (and not svn lock), comparing files with hash values, converting all of the wml files, reverting the ones that haven't changed since last time and committing the ones that have.

There is also one more thing on the todo-list, other than writing a script that will keep all the files updated automatically, and that is to convert all the wml files to po files. This can be done in just a few minutes (given that all the files can be converted without problems). I figured I'll upload these to the repository when:

1. The fancy, automatic wml-to-po-and-back-script is done.
2. I know where to put the po files.

I also plan on writing a longer progress report tomorrow and email this to my mentor, Jacob Appelbaum, as well as post it here.