EBook Text Formatter

Ebook Text formatter is a small handy tool that can be used to clean up text files before copying them to your favorite eBook reader to improve reading experience. It addresses several issues:

  • Paragraph breaks. Many Internet websites provide texts that are preformatted for specific screen width. When eReader such as Kindle or Sony PRS-505 attempts to reflow such file lines that are too long to fit the width of the screen will have one or two words wrapped to the next line. This looks ugly and makes book much less readable. Sometimes such files have extra spaces added to align the right edge. These too don’t look nice on eReaders. Ebook Text formatter restores paragraphs buy joining all lines of text that don’t begin with a whitespace while preserving empty lines. It also collapses all consecutive whitespaces into a single whitespace. Although simple, this produces nice results in 99% of cases.
  • HTML markup. Some files on websites like lib.ru have HTML markup. Ebook Text formatter can replace all HTML tags with whitespaces that are then collapsed into single whitespace as mentioned above.
  • Character encoding. Sometimes text files are not in the same encoding that eReader expects so you see grabage instead of text. For russian characters Kindle expects codepage 1251, while Sony PRS-505 expects UTF8. Ebook Text formatter can change character encoding to the one your eBook reader expects.

Usage:

  1. Download Ebook Text formatter and extract it to a directory of your choosing on your hard drive. It may be a good idea to put it into directory referenced by PATH environment variable so that you can run it from any directory.
  2. You will need .NET framework runtime 3.5 to run this program. It can be downloaded from Microsoft website and installed free of charge.
  3. Open command line prompt by pressing Win-R, typing cmd and pressing ‘Enter’.
  4. Run EBookTextFormatter.exe with the following arguments: inFile [outFile] [inEncoding] [outEncoding]
    1. inFile – it is a required parameter and it specifies either local path to the source file that you would like to clean up or HTTP url where this file can be downloaded from the Internet. Please note that if Internet file requires some kind of authentication (either by password or cookie) the download will most likely fail.
    2. outFile – if inFile is a local path, this parameter is required – it specifies the path to store resulting file. If input file is being downloaded from the Internet you can skip this parameter and it will default to the file name on the website. Specifying ‘-‘ (dash) as outFile will cause Ebook text formatter to output resulting text to the console.
    3. inEncoding – either codepage number (ex. 1251) or utf8. Specifies which encoding input file is in. If omitted, ecoding is chosen based on the config file (see below)
    4. outEncoding – either codepage number (ex. 1251) or utf8. Specifies output file encoding. For Russian text it should be 1251 for Kindle and utf8 for Sony eBook reader. If omitted, encoding is chosen based on the config file (see below)
  5. Please note that if you can’t skip parameters. For example if you would like to use non-default input encoding you must specify outFile (even if you are happy with the default value).
  6. Output file is created and you can copy it to your eBook reader if it’s not already there.

Configuration: Besides EBookTextFormatter.exe there is also EBookTextFormatter.config.exe file. It contains XML file that can be edited with any text editor (ex.: notepad.exe) and contains following useful parameters:

  1. inEncoding – default value for input file encoding. Either codepage number of ‘utf8’. See usage 4.3 above.
  2. outEncoding – default value for output file encoding. Either codepage number of ‘utf8’. See usage 4.4 above. Normally you should set it according to the device that you have. For Russian text use 1251 for Kindle and utf8 for Sony PRS-500 and PRS-505.
  3. paragraphOffset – number of whitespaces to insert before the first line of paragraph. Default value is 3.
  4. stripHtml – either True or False. If set to True, Ebook Text formatter will strip all HTML tags from input file
  5. overrideWebEncoding – either True or False. Normally web server tells the client which encoding particular file is in. However this information is not always correct or it’s missing. Setting this parameter to True will cause Ebook Text formatter to ignore encoding reported by the web server and use encoding from either command line (usage 4.3) or config file (configuration 1)

Example:

  1. Plug in your Kindle.
  2. Assuming it mounted as drive K:, type K: in the command prompt window.
  3. Type cd \documents
  4. Type EBookTextFormatter.exe http://lib.ru/STRUGACKIE/wolny.txt
  5. Assuming you copied EBookTextFormatter.exe to some location that is referenced in the PATH env. variable, you didn’t change the default config file (and of course that you have unicode font hack installed) you should now see readable version of “Волны гасят ветер” seen as wolny in your book list ready to read on your Kindle once you unplug the USB cable.

I’ve been using this script personally for a while and it is not as polished as true software product should be but it’s usable. Let me know if you have problems using it or would like to add some feature and I’ll see what I can do. Source code is also supplied with the executable file. You are free to use it in any way you like. If you make changes to it, please let me know and I’ll try to incorporate these changes in the version that is available here on BlogKindle.com.

11 thoughts on “EBook Text Formatter”

  1. Hello: I have a problem with my Kindle 2. I have recently purchased a new iMac (OS -x) computer which runs just fine. I switched from a Dell with Windows XP. Problem is that I live in the Republic of Panama and have to download kindle books (I use the MOBI format) onto the computer and then load them to the Kindle using the USB cable. NO problem on the Dell. However, I tried it yesterday on the Mac and got an error message about no relevant file being available. Thought Kindle downloaded plug and play software as needed or “relevant”. Question is how (and where) do I get whatever I need for the Mac to download the kindle files. Really like kindle but I am hung up with about ten books on the Mac and no way to get them over the the Kindle. ANY HELP OR SUGGESTIONS WILL BE GREATLY APPRECIATED. By the way I am a 71 year old retired guy in Panama, but am pretty computer literate from my previous work life.
    Thanks again for your assistance.
    Marvin Turl
    [email protected]

  2. I have a latest Kindle DX which arrived today with serial number starts with B005. I tried to install the font hack for DX posted here but it did not work. When will it be available? I can’t wait to use it.

  3. Marvin, I’ve been a Mac OS X user for many years and recently (1 mo. ago) purchased a Kindle 2 and love it. No problems whatsoever using with my iMac. Have you downloaded the Kindle for Mac application from Amazon? That is what you need, my friend.

    Stephen

  4. On Windows 7 does not work even your example. It gives a message:LibRuFormatter stop working. When trying to convert Russian file which on computer to another file which is on computer also, it makes a new file but impossible to read. Please, correct it and put exact command for converting file from computer to computer.

  5. January 16, 2011

    from time to time I work with legally Bind LSAT Prep Students who are in need

    of enlarged Print. I was excited to hear that Kindle provides SIX FONT SIZES OF ENLARGEMENT.

    My student needs 28 Point Font. Is this size, or near,available in KINDLE?

    Also, I’m attempting to find out if any LSAT Prep Tests Book is available for

    my legally blind students-such as LSAC LSAT Prep Tests Books. I have written to

    Law School Admission Council for an answer to my question.

    I appreciate any assistance I can get for the legally Blind student!

    Jose Rodriguez, Counselor & Pre-Law Advisor
    Cell-504-957-7062

  6. I have trouble converting lib.ru texts with the tool. The problem is when the source text is in KOI-8 (like this one, for example – http://lib.ru/CULTURE/STANISLAWSKIJ/akter.txt). I’ve tried different settings for the inEncoding parameter in the config file, but either got error, or garbled output. I know this could easily be fixed, if the tool knows to read the file from lib.ru in KOI-8, then output to utf8, how to configure it to do so?

  7. Mac user here. I copy the text of lib.ru books and paste it to TextWrangler.app. Then I save the resulting file with a .txt extension. This works quite well, except that some of the formatting gets screwed up. Mainly this affects dialogue, which is formatted differently in Russian books than in English ones (for example: – Yes, – she said instead of “Yes”, she said).

    Too bad this tool does not work on Macs.

  8. Hi, I uploaded ruby script that converts lib.ru html pages to Kindle readable txt format.

    You can find it here:
    http://codeviewer.org/view/code:2506

    Just save html page from your browser and provide file path to the script. For ex, if you saved script as lib-ru-converter:

    ./lib-ru-converter ./some-book.txt.html

  9. I have a new Kindle Fire . Is there anyway I can get a wifi connection with -out a computer connection at my home? Say a tv Dish Sat. connection. Thanks CBump

Leave a Reply

Comment moderation is enabled. Your comment may take some time to appear.