Friday, May 25, 2012

Typewriter Text to Your Computer: How to Get Good Results

Now that I've told you all where you can go to use decent OCR software, I will now cover how you can get good results with this software. These methods were discovered when cleaning up my own OCRed (look Mom, a made up word) chapters that I had written on my little S&C Skyriter. I wrote some of them while I was still trying to keep the spools from sticking when I first got it, so the text was a tad iffy. My later drafts look much better since my spools no longer stuck and I had gotten much better at typing on my machine. So here are a few things that you can try while writing with your own machines that can help you get good results when using OCR software.

  • Keep the draft clean. Since the OCR software reads the shape that it can see on the paper, it's best to try not and screw up and then fix it by backing over the word to cross it out. When in the flow, it's tempting to just hit the backspace or move the carriage so you can cross out the words with a handy ///, XXX, or even a horizontal line. This can be read in many strange ways, by the software, and it becomes tedious to go through and get rid of all the funny combinations. I suggest using white out tape to go over  your mistakes. I find it quite handy, and it keeps my manuscript clean for the software. Make sure you cover all of the mistake, or you may get the odd colon and period in the middle of a word.
  • Have crisp, clean letters. This is a little harder to achieve with a typewriter, so I suggest using any means necessary. You want the crispest, cleanest letters you can manage or you might get words that come out like this: ducL:. It was supposed to be duck,but the software didn't read it that way. My periods are also often mistaken for commas and my Is for ones. This is why I find it best to make sure that my letters come out as sharp as possible. Currently my letters are a tad gummed up, so I might get better results when I clean them. Using a typeface that is easy to read might help too. The g on my machine is a bit funny, so I get a lot of words that look like this: hugGed
  • Use a good quality scanner. This will also maintain the sharpness of your letters. I have a good one, so I don't have this problem. If you have an old one, invest in a new one, or borrow a friends. Many printers do double, triple, or quadruple duty these days, so a good scanner shouldn't be hard to come by.
  • If converting to a Word doc, try to spell everything right. I know that as writers we should try and do this anyway, but it doesn't always happen. By having good spelling, the amount of red and green squiggly lines should be reduced which makes fixing and finding the OCR errors much easier. 
I hope this helps for any writer who is crazy enough to write on a typewriter. I find that preventative measures make everything easier in life. 

For my first blog on using OCR software: Typewriter Text to Your Computer

Saturday, May 19, 2012

Typewriter Text to Your Computer

Hi, there. I know that it's been a while. It appears that I've been writing these things monthly. I guess my writing has kept me busier than I thought.

On that note, since I've been using a typewriter to do my rewrites, I've been looking for a way to convert scanned PDFs of my chapters into text. I noticed that Adobe had that option for roughly $20 a year and looked into it. Well, according to reviews that I searched for because I needed to know if they could convert text that had been typed out with a typewriter, I found that their PDF to Word conversions couldn't handle it. Why? Apparently their OCR software couldn't read the inconsistencies that come with typewriter text.

For those who don't know, OCR is the alphabet soup term for optical character recognition software. It's a program that reads the characters of a text and tries to turn it into the closest representation possible. From my research I discovered that most versions of the software aren't good enough to handle inconsistent characters well. None of them can handle handwriting. It was frustrating. Then I found this article that listed 10 softwares that have free options, five for online and five for desktop. It pretty much did the work for me.

I experimented a bit with the Google Docs option, but my PDF files were all too big. So decided to try the one that the article recommended: OnlineOCR. To say that this was a god send would be a little over the top because the free service can be a little limited. It only does 5 pages an hour for guest users. So, I suggest registering if you're going to use this service. They give you 20 credits (a credit per page) to start out and you can do PDF files, an option you don't have when a guest user. Additional credits can be purchased, or earned through their Bonus Program.

I took advantage of the 20 credits, and had one of my PDFs converted. It took a minute for the file to upload to the site, but then the conversion was relatively quick. (Although, this could depend on your internet speed, and your computer.) Even though it wasn't entirely accurate, OnlineOCR did a pretty good job. I'll still have to go through and fix the little problems like wrong letters, missing words, and formatting, but it beats transcribing my work. While not perfect, the service saved me quite a bit of time and effort that I can put back into my writing. (And gave me a good laugh. It's like reading auto-correct texts.)

So there you are. My first advice blog to the people who can't afford one of those fancy USB Typewriters or have the skills to do their own soldering. Or for those who have a monster collection of those clack-clack machines and still transcribe their own writing. I'm sure many of you have looked into such software, and been unhappy. So far, I have not been disappointed.

Now back to my novel.

Update: If you are a blogger, and are happy about the service that Online OCR provides, they will reward you credits for a blog review written by you. Just make sure you send them an email with the link, something not posted on the site that I had to discover by myself. It's worth it if you have a mound of pages. (Added 5/21/12)