Picawk is a program that employs OCR (Optical Character Recognition) and image processing technology in order to annotate and edit text contained in images. It's like the Microsoft Word for images.
Large Document Image Demo
Picawk was conceived at the end of May 2010. A friend wanted to post screenshots taken from his desktop to a blog of his, on the internet. He wanted to prevent his username from showing up in the terminal screenshot, for privacy reasons (he was posting anonymously to the blog). Therefore, he opened up each screenshot in Photoshop and blurred his username manually. Opening up each screenshot, going through the blurring procedure and saving the processed image seemed a tedious task to me.
So I thought of automating what my friend wanted with OCR and image processing technology. I went on doing just that. A tool that would let someone blur certain keywords easily in an image, supporting batch mode.
The tool evolved and several features and filters were integrated. Picawk takes its name from pic (as in picture) and awk, the common UNIX text processing utility. I was processing text within an picture, so that seemed an ideal name for the project.
Taken from the Wikipedia article on awk:
"AWK is a language for processing files of text. A file is treated as a sequence of records, and by default each line is a record. Each line is broken up into a sequence of fields, so we can think of the first word in a line as the first field, the second word as the second field, and so on. An AWK program is of [sic] a sequence of pattern-action statements. AWK reads the input a line at a time. A line is scanned for each pattern in the program, and for each pattern that matches, the associated action is executed."- Alfred V. Aho
Later on, I incorporated text editing capabilities to picawk (again within an image), wanting to create what one might describe as "Microsoft Word for images". Initial names included textmagik and TIP (Text Image Processor). I realized that picawk could prove valuable for productivity improvement and business automation purposes.
Its logo is a peacock's feather, which is a beautiful picture consisting of vivid colors and denotes the creator's interest in vision.
Why the need for picawk
According to Wikipedia, word processing was one of the earliest applications for the personal computer in office productivity. Back then, the primary format of documents was text. Nowadays, with the advent of social networks, mobile devices and ubiquitous internet, we produce more images than ever.
Modern technology and cameras on each of the devices we use daily, has made it possible to produce, process, store and transmit document images efficiently. In an attempt to move toward the paperless office, large quantities of printed documents are digitized and stored as images in databases. The popularity and importance of document images as an information source are evident. As a matter of fact, many organizations currently use and are dependent on document image databases.
Millions of digital documents are constantly transmitted from one point to another over the internet. The most common format of these digital documents is text, in which characters are represented by machine-readable codes. The majority of computer generated documents are in this format. On the other hand, to make billions of volumes of traditional and legacy documents available and accessible on the Internet, they are scanned and converted to digital images using digitization equipment.
In addition, many people share pictures through their mobile phones in a daily basis. These pictures contain lots of text and information. Sometimes they want their friend/partner to focus on a specific word or snippet of the document, for example the name of the client on a legal document. Needs for annotation arise.
A tool to address those needs hasn't been developed yet. Picawk lets you annotate and edit text contained in images while preserving the formatting of the original document. As noted in the description, it's like the Microsoft Word for images.