Monday, August 24, 2009

The final changes

GSoC is finished and I have been told that I have passed :)

Changes done before the deadline include:

1. The scripts have been moved to website/ and changed so that you don't have to configure them with the location of the files.

2. I added a line in the Makefile so that you can run make translations to write the translated documents before building and pushing the website.

3. I wrote a HOWTO that gives a quick introduction to pot, po, wml and Pootle, as well as explain how to convert existing translations, update current translations and write the translated documents.

4. I updated the README with a list to check if something goes wrong.

Monday, August 3, 2009

Logging in wml2po

To keep track of the files that, for one reason or another, could not be processed by wml2po, the script writes the filenames to a logfile. I am not sure whether or not logging will be of any use in po2wml. I mean, if a file have been converted to pot, there shouldn't be a problem with converting it back to wml.

Time to work on the documentation.

Sunday, August 2, 2009

Changelog: wml2po

A few things have happened since I last updated my blog. The biggest change is that wml2po no longer converts to .po, but to .pot, and it doesn't convert for all available languages, just English.

A way to handle translation priority was added to the code last week. The script checks the translation priority of the .wml to determine the filename of the .pot. If the .wml doesn't have a translation priority set, the script assumes that the file doesn't need to be translated.

Tuesday, July 28, 2009

Translation priority

A while back, Roger and I talked about the translation priority used in the wml files and how this could be carried over to the po files. The idea is that Pootle will sort the files after translation priority, giving new translators an idea of where to begin.

I have started to work on a solution for this. The plan is to read the translation priority from the wml file and include this in the filename for the po file. Take bridges.wml as an example. This file has the translation priority "1-high", so the resulting po will be named 1-high.bridges.po.

Then there's the small details that need to be taken care of. What if the script is looking for 2-medium.bridges.po, but only 1-high.bridges.po exist? This can be the case if someone decides to change the translation priority for that file. Or what if the file simply hasn't been converted to po yet?

Monday, July 27, 2009

Testing of wml2po and po2wml

I have tested wml2po and po2wml again, and like last time I made sure that Pootle was poked at the right time, that I could translate files using Pootle and commit without problems.

I'm glad to say that everything worked just as I expected; wml2po converted all of the files to po, and po2wml convert the translated files back to wml again.

Sunday, July 26, 2009

Different directory structure

The website directory in svn has the following structure..
website/en/*.wml
website/de/*.wml
website/docs/en/*.wml
website/docs/de/*.wml
.. and I always thought that having the same structure in the pootle project directory would be possible..
translation/projects/website/en/*.po
translation/projects/website/de/*.po
translation/projects/website/docs/en/*.po
translation/projects/website/docs/de/*.po
.. but yesterday I found out that pootle does not accept any directory, directly under the project directory, other than a language directory. So I had to come up with a solution for a different directory sturcture. This is the result:
translation/projects/en/*.po
translation/projects/de/*.po
translation/projects/en/docs/*.po
translation/projects/de/docs/*.po
The structure of the website directory in svn remains the same.

So, the plan for tomorrow is to go through the 7 steps again and make sure the whole process works nicely.

Tuesday, July 21, 2009

Todo for po2wml

Use both po4a-gettextize and po4a-updatepo
If a .po file does not exist, use po4a-gettextize to create it, set the right encoding and charset as well as the correct copyright. Nicolas François said that he would add this functionality to po4a-gettextize. However, if the .po file does exist, simply use po4a-updatepo to update if the .wml has changed.

Use the current .wml if that is more complete than the .po
The nice thing with po4a-translate is that it will only write the translated .wml if 80% (the default value) or more has been translated. The not-so-nice thing with po4a-translate is that it will delete the .wml if less than 80% has been translated.

My suggestion for dealing with this is the following: use po4a-translate to write the file $lang-$file.wml. If that file was actually written, rename the file to $file.wml.

Document it
This isn't just for po2wml, but I should write a README explaining the basics.

Thursday, July 16, 2009

Issues with po4a

One of the first things I noticed when I began converting files with po4a, was the fact that po4a would sometimes exit with an error, even if the file was valid wml. This is a problem for files like download.wml where a special link-format is used to make sure every page shows the same version number (and points to the latest release).

Also, when converting a file from .wml to .po for the first time, a header will be added containing the following information:

# SOME DESCRIPTIVE TITLE
# Copyright (C) YEAR Free Software Foundation, Inc.
# This file is distributed under the same license as the PACKAGE package.
# FIRST AUTHOR , YEAR.

It would be nice to have a header that doesn't assign copyright to FSF (or not have a header at all). I have talked to the maintainer of po4a who said he would take a look at these two issues during his two weeks at Debconf.

I have also written two bug reports:
#537236: please add support for valid wml
#537245: modify header in .po

My current todo list

For those who would like to know what I am working on right now, here is my current todo list (also known as list-of-issues-that-need-to-be-sorted-out). That said, progress is a little bit slow these days as I'm in bed with the flu (again).

Converting files
I am working on converting the translated .wml files to .po files. The files are being uploaded to translation/trunk/tools/gsoc09/po/.

I have run into a little problem that I'm not sure how to deal with. When converting a .wml to a .po, including the translated .wml, po4a will sometimes exit with one of the following (or both) errors:

        1. Original has less strings than the translation
        2. Structure disparity between original and translated files

There are two types of translations that cause such problems. The first one is where the translator has simply skipped a paragraph or two and moved on to the next. This type of translation shouldn't be too hard to fix and convert. In most cases, anyways.

The second type is where the translator has written his or her own, translated version of the page, and not a direct, translated copy. This type of translation can be hard to fix unless you know the language.

I guess that one possible solution would be to ask the people on the or-talk list for help, or contacting previous translators directly.

No translated .wml in svn
Not having translated .wml in svn sounds like a good idea. After all, one can just generate the .wml from the translated .po when needed. However, we do need to have a few of the translated .wml files in svn. When I say "a few" I mean the translated versions of the donate-page as well as the download-pages.

The reason for this is that these files can not be converted to po (for now, at least). I suggest that we keep those files in svn and have po2wml generate the rest when we need them, i.e. before the website is built and published.

A small note-to-self would be to fix po2wml so it no longer adds and commits files.

Monday, July 13, 2009

Bug fixed in po4a

Last week I ran into a problem when po4a-translate seemed to have a problem recognizing new, translated strings. This turned out to be a bug in the wml module and it has now been fixed in the po4a repository. This email to the po4a-list explains the bug and the quick fix.

Friday, July 10, 2009

The website, po2wml and po4a-translate

I had to deal with a few things at the beginning of the day, but since then I have:

Built the website
I managed to build the website with my own, translated files and everything looks ok. I haven't published it simply because I am a bit unsure how I can put it on www.xcde.net/torproject using the publish-script.

Fixed po2wml
I noticed that, when running, po2wml would also convert empty .po files. To deal with this, I included an if statement that checks for a certain type of comments (of the form "#."). Those comments say something about the following string, if you're translating a part of a list or a paragraph. So, that way the script knows whether or not to convert the file. No comments of the form "#." means no content.

Looked at po4a-translate
The script, po2wml, is using po4a-translate to convert the translated .po files back to .wml files. It seems like po4a-translate has a problem with recognizing new, translated strings. I copied en/30seconds.wml and de/30seconds.po into a new folder, translated 27 of the 32 strings in 30seconds.po and ran po4a-translate.

I got the following output:
We found translations for 84.37% (27 from 32) of strings.

Then I decided to translate the rest of the .po. I did that, ran po4a-translate again and got the exact same output. It says that only 27 of the 32 strings have been translated, but that is not correct.

I'm not really sure why it's doing this, if I'm doing something wrong or if there's a bug somewhere. I have sent an email to the po4a-devel mailing list and hopefully someone will reply in not too long (that or I'll find someone on irc and give them a friendly poke).

Thursday, July 9, 2009

So far, so good

Today has been the day of testing code, fixing minor bugs and testing some more. I had 9 items on my list and I have managed to do 8 of them. The items are pretty much the same as the ones described in the previous post. I figured I'd do the last item tomorrow and then go through the list one more time.

Step 1:
Set up a repository with a website module and a translation module.

Step 2:
Put wml files in the english directory in the website module.

Step 3:
Make sure pootle reads the translation module.

Step 4:
Run wml2po to convert the wml files to po files and put them in the translation module.

While converting and updating the files, I noticed that files were being committed to the repository even though they had not changed. Or so I thought. It turns out that a few of the comments in the files would change every time I ran the script. The solution was to modify one of the regular expressions in the script to exclude a certain type of comments as well.

Step 5:
Restart pootle so it can detect the new files.

Step 6:
Translate a few strings using the pootle interface.

Step 7:
Commit using the pootle interface.

After a little back and forth I managed to give my user the right permissions. Translating and committing files using the pootle interface went without any further problems.

Step 8:
Run po2wml to convert the po files back to translated wml files.

I noticed that the script would convert the english po files back to english wml files. This certainly doesn't make much sense, and so the problem was fixed with an if statement.

Step 9:
Build and deploy the website.

Building the website is the one thing I haven't done today, and it is the first thing I'll do tomorrow.

Step by step

We now have a way to convert from .wml to .po and back to .wml again, but what exactly is the process from start to finish?

Here's how it will work, step by step, from a new .wml, to a translated .po, to .wml again, to a part of the website.

Step 1:
I have written foo.wml, a new .wml file. I commit this to the repository.

Step 2:
I need to convert foo.wml to foo.po. I run wml2po.

Step 3:
I now have foo.po in the repository as well, but pootle doesn't know about it. Pootle needs to be poked to learn about new files. Restart pootle.

Step 4:
Pootle now knows about foo.po and people can begin to translate.

Step 5:
It is soon time to build the website and I want to include translated pages of foo.wml. People forget to click on the 'commit'-button in the pootle interface when they are done translating foo.po. I log on to the pootle admin interface and force pootle to commit foo.po for all languages.

Step 6:
All work in pootle has been committed and I can update the translated documents, for example website/de/foo.wml. Run po2wml to convert translation/projects/website/de/foo.po to website/de/foo.wml and commit.

Step 7:
I build the website and deploy it.

Wednesday, July 8, 2009

Old code

I have uploaded the code that I have previously written to show that I have been hacking away on stuff, and also what I have been hacking away on. Most of the scripts there should work, but I can't promise anything.

Anyways, they are there for everyone to look at and maybe also learn from (kids, don't use PATH as a variable name). Also, some people might like to see how the ideas and the code have evolved over time.

The code is in the repository, translation/trunk/tools/gsoc09/other.

Tuesday, July 7, 2009

wml2po and po2wml

Four days ago I asked, on the tor-gsoc mailing list, if we could split the contents of website/ into po/ and wml/. Roger Dingledine said that it would be possible, but it would be even better to keep the wml files in the website module and the po files in the translation module.

He also said that "we can give pootle or other services read-write access to the "translation" module, but they can treat the "website" module as read-only. The process of building the website can then treat the "translation" module as read-only".

After a little bit of testing I came up with a good solution. So, in my case, I can have the wml files in
"/home/runa/tor/website"
and the resulting po files in
"/home/runa/tor/translation/projects/website"

I also wrote the second script, po2wml, that will take the translated po files and convert them back to wml files.

The updated version of wml2po has been added to the repository, together with po2wml.

Monday, July 6, 2009

In SVN: wml2po

Some say it's better late than never, I say it's something I should have done a little bit earlier. It took some time before I managed to get on the right track and find a solution that everyone seemed happy with. Hopefully, this is it. The script, wml2po is now in the tor svn repository.

Sunday, July 5, 2009

Almost ready: wml2po

The script, wml2po, is almost ready. It assumes that the contents of website/ is split into wml/ and po/. The resulting po files are now being put where they belong, i.e.:


  • website/wml/en/index.wml to website/po/$language/index.po

  • website/wml/press/en/index.wml to website/po/press/$language/index.po



Also, the hash of each po file is generated before and after the file has been updated. The only thing left is to make sure that script has a proper lockfile.

Friday, July 3, 2009

My progress so far

I just sent an email to the tor-gsoc list asking if it is possible to split the contents of website/ into po/ and wml/. Doing so will make things a whole lot easier when it comes to figuring out where to put the resulting po files.

I also sent an email to my mentor telling him about my progress so far. What I wrote is pretty much the same as I have previously written in this blog:

Over the past few weeks I have written a few different scripts. While they have worked, they have not been ideal solutions to the problem I am trying to solve. If you read the blog post from yesterday, July 2nd, you'll see that I've gotten useful pointers from Peter and that
is what I have been working on today.

What's left to do is:

For wml2po:

  • Make sure that the wml and po files are put where they belong.

  • Generate hash values for each file before and after converting. Compare these two values and revert if they are the same (i.e. nothing has changed).



For po2wml:

  • The code will be pretty much the same as for wml2po and I don't think writing this script will be a problem.



Converting from wml to po, including translations:

  • I have already converted the wml files in the repository (including their translations) to po files. I'm just not sure where I should upload these. What do you think?

  • I also think that uploading those po files won't do much good until we have a script that can keep them updated automatically.



According to my timeline I should be done with the coding by the end of next week, and then start to write documentation (that is, document the code properly and write a readme). I am confident that I'll be able to stick to that plan. Let me know if you have any questions.

Thursday, July 2, 2009

A few bumps in the road

Me and my code have been hitting a few bumps in the road over the past couple of weeks. Either because the solution I came to simply wasn't the best solution out there or because I misunderstood something. Peter Palfrader has been really nice and given me a few pointers, and I am confident that I am on the right track.

So, the new approach will be to use a proper lockfile (and not svn lock), comparing files with hash values, converting all of the wml files, reverting the ones that haven't changed since last time and committing the ones that have.

There is also one more thing on the todo-list, other than writing a script that will keep all the files updated automatically, and that is to convert all the wml files to po files. This can be done in just a few minutes (given that all the files can be converted without problems). I figured I'll upload these to the repository when:

1. The fancy, automatic wml-to-po-and-back-script is done.
2. I know where to put the po files.

I also plan on writing a longer progress report tomorrow and email this to my mentor, Jacob Appelbaum, as well as post it here.

Tuesday, June 30, 2009

I spoke too soon

Yesterday wasn't much fun. I was in bed with the flu, and so Monday passed and no code was uploaded to the repository. I am feeling a bit better today and have had a look at the code I wrote last week. I also noticed that my todo list was missing an important item. Item number 4, from the previous post, says:

I need to make sure that svn lock is used at the right places, and that the script also adds and commits the files before it's finished.

In addition to that, I also need to think about the small details when it comes to locking and unlocking. What should happen if the lock can't be aquired? What should happen when svn up results in conflicts or the commit doesn't work? To me, exiting and letting the next invocation handle it seems like a simple way of doing it.

Hopefully tomorrow will be a more productive day.

Wednesday, June 24, 2009

New and improved code

Today was spent rewriting old code into something much better. The code that will take new and updated wml files and convert/update the po files are done. I will be leaving for Oslo tomorrow, but I will continue working on Monday when I'm back in London. What remains is:

1. The script needs to know what languages are supported by Pootle. For now, this is done by reading a simple text file. In the future, this could be done by getting a list of directories in /var/lib/pootle/pootle (given that this directory still exists on the server where Pootle is set up).

2. The script needs to check svn up twice. Once to figure out which wml files are added/updated, and once to figure out which po files are updated. To deal with this, I plan on introducing a file, svnup.tmp, that the script will generate, read from and then delete.

3. I need to add the code that will take the updated po files and convert them back to wml. The code will not be much different from the code I wrote for the svn hooks, so this should be quick and easy.

4. I need to make sure that svn lock is used at the right places, and that the script also adds and commits the files before it's finished.

Thoroughly documented and fully working code will, hopefully, be committed to the tor svn repository on Monday.

Tuesday, June 23, 2009

If at first you don't succeed

I decided to look into po4a and its configuration file, hoping that it would be a better solution than svn hooks. Turns out it was a dead end. Using the configuration file will only help keep the translated documents updated (for example, the Norwegian version of gsoc.wml), but it doesn't do much for the pot and po files.

I have, however, received good feedback on the email I sent to the tor-gsoc mailing list last week, asking for suggestions and ideas regarding my solution with svn hooks. One of the suggestions was to have a script that one can run any number of times, and if the input hasn't changed it won't do anything.

So, my new and improved plan is to take the hooks that I have, clean up the code and, instead of checking for new files in a revision/transaction using svnlook, checkout/update a working copy of the HEAD and find out what files I need to update from svn up.

Some questions have been raised as to what should be done with Pootle. At the moment, Pootle needs to be restarted to learn about new files. The good news is that this will be improved in the next version, where one can tell Pootle to rescan for files from the admin interface. A possible, but not pretty, solution could be to have a cron job that runs the script and restarts Pootle once a day.

Thursday, June 18, 2009

The best solution?

So, I merged the first and second hook into one hook that will deal with both new and updated wml files. That hook will work like this:

* Bob commits a new wml file, lets call this tor.wml
* The SVN hook takes tor.wml and does three things:
- Converts to tor.po
- Sets the right encoding and charset in tor.po
- Commits tor.po

The "problem" with this solution is that the hook alters a transaction. When Bob thinks he's committing only one file (tor.wml), he is actually committing two files (tor.wml and tor.po). It may not be an ideal solution, but it works. That said, suggestions and ideas are always welcome.

Wednesday, June 17, 2009

More on SVN hooks

I managed to figure out the problem with the second and third hook, so everything is working just fine now. At the moment I have the following hooks (using the po4a framework):

1. New wml: using po4a-gettextize to convert the wml file to po
2. Updated wml: using po4a-updatepo to update the po files
3. Updated po; using po4a-translate to convert back to wml

Earlier I noticed that using po4a-updatepo will work in the case of the first hook as well, so I plan on merging the first and second hook tomorrow.

Monday, June 15, 2009

SVN hooks

I want to make sure that the wml and po files stay updated all the time, so I am writing svn hooks to take care of that. I have identified three cases where it would be useful to have pre-commit svn hooks:

1. If someone commits a new wml file: convert to po, copy to language directories and commit

2. If someone commits an updated wml file: update all the po files and commit

3. If someone commits an updated po file: convert back to wml, copy to language directories and commit

I finished the first hook last week, but I am having a few problems with the second and third one. I'm guessing it's something really simple and trivial.

Friday, May 29, 2009

Week 1: Pootle is up and running

With exams on Monday, Tuesday and Wednesday, the first week of GSoC went rather quick. My plan for this week was to get Pootle up and running, as well as look at the po4a configuration file and understand how it works.

I have set up Pootle on pootle.xcde.net. I have also converted some files, and the result can be seen on pootle.xcde.net/projects/website.

The plan for next week is to take a closer look at the files that I didn't manage to convert in this round.

Thursday, May 21, 2009

Open Tech 2009 in London

Open Tech 2009 is an informal, low cost, one-day conference on slightly different approaches to technology, democracy and community.

When: Saturday 4th July 2009
Where: Central London
Cost: £5 on the door.

More information on the Open Tech 2009 website.

Wednesday, May 6, 2009

Tor 0.2.0.34 now in hardy and intrepid

The latest stable release of Tor, version 0.2.0.34, has been backported to Ubuntu hardy and intrepid. Those of you running either hardy or intrepid should upgrade to get the latest version. Also, the bug report on bugs.launchpad.net explains why Tor was taken out of jaunty.

Monday, May 4, 2009

Preparing the files

I decided to get a little head start on my project by preparing a few files so that they can later be converted with po4a. When converting, po4a will make a string for each paragraph (or list-item) and therefore it is important that all of the tags are closed. In the next round I'll find a way to work around a few special links (such as "< package-rpm4-stable-sha1 >") and symbols (such as € and £).

Wednesday, April 22, 2009

An introduction

First of all I'd just like to say that I am really excited about working for the Tor project in this year's Google Summer of Code.

My name is Runa Sandvik and I am a student at Norwegian University of Science and Technology (NTNU). This summer I will be working on the translation wiki for the Tor website - making it possible to translate the website via Pootle. At the moment the Tor project uses wml for the website and Pootle to handle translations for various projects such as Torbutton and Vidalia. To make it easier for translators, translating the website should be possible via Pootle. I intend to use the po4a framework to convert the .wml files to .po files (and back) so that they can be handled by Pootle.

I will try to keep this blog updated with news about my progress during the summer. Feel free to comment here or join #tor on OFTC (irc.oftc.net) should you want to discuss this project or ask any questions.