Language Learning: Japanese, Part 1 - Parallel Texts

Note: This is Part 1 of the Language Learning: Japanese series; please refer to the link for other related tutorials.

The Lines-Replace Method

This entry will teach you how to make parallel texts quickly and efficiently.

A quick explanation: The main purpose and use of this method is to generate, in raw text format, a list of sentences separated by newlines. (The ‘\n’ character.)

Spreadsheet apps are able to import them with greater ease as such.

What You Will Have In Your Hands

Here’s the end result of this tutorial:

Complete parallel text of Natsume Souseki's Ten Nights of Dreams

You’ll be making that yourself, with a bit of guidance from yours truly.

Content and Copyright Issues

The above is a parallel text snippet of 夢十夜 (Yume Jyuu Ya) - Ten Nights of Dreams, as penned by Natsume Souseki, the distinguished Japanese author.

I would recommend anyone who’s interested in Japanese to start off with Ten Nights of Dreams: the horror-fantasy theme the story plays with makes it a fantastic and fascinating introduction to the world of Japanese literature. (It does not use complex grammar either, so beginners should be able to cope well.)

For this tutorial, I have chosen to use Natsume Souseki, as the copyright for his works have expired. (So I’ll be safe from any legal repercussions.) However, the translations I’m using were obtained in their entirety from No-Sword; I therefore reserve and attribute full credit to him for that. (I hope the author won’t mind, as I’m not making any profit out of this exercise.)

So, visit these sites to get what you need for this tutorial: the original Japanese version of Ten Nights of Dreams. (The site, Aozora, is something like a Japanese version of Project Gutenberg.) Get the translations from No-Sword too.

You may also download the example parallel text if you like. (It’s in OpenDocument format though.)

Now, read on.

The (Twelve) Steps

My example will use the story from the First Night in its entirety. I’ve taken a lot of screen captures; so click on the links for a visual guide if you’re confused as to what you should do.

The steps outlined below may seem complicated, but really, they would take only take 5 minutes of your time to complete.

1. Select and copy (shortcut key: CTRL-V) the Japanese text. (At this point, make sure that Notepad2 is using the UTF-8 file encoding! Click File -> Encoding -> UTF-8.)

2. Fire up Notepad2. Paste what you’ve copied.

3. (EXPLANATION: Sentences in Japanese are often demarcated by the ‘ 。’ symbol. Therefore, we will use that marker as a line-ending to denote where sentences begin and end.) Open up Replace Text (shortcut key: CTRL-H), and insert the following into the empty fields:


Search String: 。
Replace With: 。\n

Make sure you don’t forget to tick the “Transform backslashes” box. Then click Replace All.

4. Now you’ve got a list of sentences, aligned nicely. However, it’s a bit messy: some sentences have blank lines in between. Not a problem. Select all text (Shortcut key: CTRL-A), then click Edit -> Block -> Remove Lines. (Shortcut key: ALT-R.)

5. Once you’ve done that, the sentences should look much cleaner. Now, select everything (shortcut key: CTRL-A) that’s present in the text file, and copy them.

6. Fire up OpenOffice Calc. Paste what you have (shortcut key: CTRL-V) in the first column/row. A new window should open, asking you to configure the character set and separator options. Unicode, the default character set should work fine; as for the separator, leave it on “Tab”. Press OK.

7. You’ve now a spreadsheet with perfectly aligned sentences, but with an ugly and jagged appearance. What we need is some formatting magic. But first, enable the use of specific fonts for Asian languages. (Click Options -> Language Settings; at the ‘Enhanced language support area’, ]tick the ‘Enabled for Asian languages’ box. Then hit OK.)

8. Now press F11. In the Styles and Formatting window, right-click ‘Default’, then click ‘Modify’. Click the tab called Fonts. Change the font under ‘Asian text font’ to something you like. (I have a bit of a preference for Meiryo myself, and would highly recommend it.) Next, click on the ‘Alignment’ tab. Select ‘Top’ from the Vertical drop-down list, and under ‘Properties’, tick the ‘wrap text automatically’ box.

9. Voila! You now have something that resembles a parallel text. But we’re only half-done.

10. Repeat steps 1-9, but use the English translation this time around. (Here are the translations from No-Sword, if you missed the link above.) Also, take note that you should use the normal period fullstop character, ‘.’, instead of ‘。’, this time around.

So:


Search String: .
Replace With: .\n

When you paste the data into the spreadsheet, take care to do so in the 2nd column.

11. At this point, you should have something that looks like this: a dual-parallel text. We’re nearly done. Now use the shortcut keys ALT-Left Arrow/ALT-Right arrow to gauge the width of the columns for both the English and Japanese language cells. (Alternatively, you could also drag the column headers.)

12. Select all cells (shortcut key: CTRL-A). Then, click ‘Format’ -> Default Formatting.

Done! All that’s left is for you to beautify the spreadsheet, just the way you like it.

Side Notes

The parallel sentences may not be nicely aligned against each other sometimes. Manual inspection and correction may be necessary.

The line-replacement method I’ve been using, as outlined above, may seem fairly obvious to some. But it would surprise you how many do not employ it, thus wasting valuable time and energy making parallel-texts manually.

Lastly, if I’ve helped you out, do let me know. And, if you’ve any tips or suggestions, please do share. Look forward to the next tutorial, which will be concerned with making kanji lists.



"

A quick explanation: The main purpose and use of this method is to generate, in raw text format, a list of sentences separated by newlines. (The ‘\n’ character.) Spreadsheet apps are able to import them with greater ease as such.

"
Date: April 7th, 2008

Comments:

1. If you wish to comment, please write something that surpasses your very thoughtful "LOL". If that makes it too difficult for you to properly comment, don't.
2. I will reply in kind. If you're nice, polite and civil when commenting, I will reply in a similar manner. But if you're immature, (including but not limited to) stupid or hasty, I will delete any or all of your comments as soon as I get to them.
3. I don't delete comments unless they are clearly derogatory or off-topic. Feel free to share your thoughts.
4. This site employs spam protection, in the form of Akismet. Don't post anything that reeks of spam; otherwise, your comment won't see the light of day, and I probably won't know it existed either.

1 Matt
April 10th, 2008 / 9:16 pm

Hey, thanks for the shout-out. No problem at all with being used for the example like this. (Quite the opposite!)

I cringe when I read this translation now, mind you…

Thanks for the (official) permission, Matt!

And hey, please don’t cringe! Your translation, being a bit more literal, is a lot easier to understand. For example, compare the following original sentence, 「静かな水が動いて写る影を乱すように、流れ出したと思ったら、女の目がぱちりと閉じた」 against the English translations:

————-

No-Sword: “Still water welled there, blurring the reflection as it began to move, and then her eyes closed tight.”

Breaking Into Japanese Literature: “It melted away rather as a shadow in a pond breaks up when the water is disturbed. At the very instant this thought occurred to me, the woman’s eyes snapped shut.”

————

Both the elements of grammatical structure and vocabulary are kept somewhat simple in your translation. That makes it a better and more useful choice for us learners.

2 Fazleena
April 12th, 2008 / 4:56 pm

This may not be the right place to ask, but I keep getting stuck at step 3!

It say “0 occurrences of the specified text have been replaced.”

Can you please help me?

Don’t worry, this is the right place to ask.

First, check to see whether you’ve correctly used ‘。’ in the Search String field, and ‘。\n’ in the Replace With field. Also, make sure the ‘Transform backslashes’ box is ticked. Then click ‘Replace All’.

If you’re confused, click on the link in Step 3 for a screenshot.

3 Chris
April 18th, 2008 / 3:21 pm

Hmmm - notepad 2 will not display the Japanese font, but the original notepad will…

Make sure Notepad2’s text-encoding setting is in UTF-8!

4 sven
April 19th, 2008 / 6:39 pm

i seem to have a problem. i can’t stop a new line being formed in notepad2 when the sentence has Mr. or Mrs. it recognises these as new sentences throwing my text all out. Any way round this.

5 sven
April 19th, 2008 / 6:48 pm

scrap that last comment i’ve just replaced all Mr. and Mrs. with Mr, and Mrs, so it no longer recognises them as new sentences

6 Johan Fänge
April 30th, 2008 / 4:22 am

I don’t know how much the text has to be beautified, but shouldn’t it be valuable to treasure paragraphs? After all, whereas individual sentences may not match up, at least paragraphs should, right? This should be more valuable for longer texts too.

So maybe rather than deleting the extra lines it would be better to normalize them. E.g. end of paragraph -> “\n\n” and middle-of-paragraph-newline -> “”.



Comments

Semi-free speech.