Sunday, May 06, 2012

Search and Replace on multiple files

Search & Replace is a common task in data processing environments. You cannot avoid to build process your data to replace or add a word, syntax or even multiple lines of text in several different resources.

If the task can be fully automated, means there is a unique algorithm to transform a resource A to A' based on the content of A than you will look for available methods and tools supporting you to do this kind of operation.

Methods

Regular expressions are very powerful rules to express not only finding common pattern in text based resources but also a good foundation to replace or extend existing content.
Compared to simple phrase based pattern most imaginable rules can be expressed and used as a source for the required transformation.
But regular expressions come with high cost of complexity. It is very likely to defines rules which results in "false positives", means matches that you didn't want to match.

Tools

Doing Search & Replace in the file system on multiple resources (files) is easy for IT people using linux tools like grep, ....
On Windows you also can install those tools and make them a powerful foundation for those kind of operations (see http://gnuwin32.sourceforge.net/packages/grep.htm).

TextCrawler

But not all people like to become an IT expert for simple replacing the term "foo" with "delicious" . On Windows you can use TextCrawler for this. One of the best UI based tools I'm aware of.

It provides
  • simple phrase based operation "Replace phrase A with B" on multiplier files
  • more complex regular expression based operation 
  • and in addition a fuzzy search operation for more advanced search operations
It also supports the use of Unicode characters to search and replace and the processing of files encoded in Unicode (utf-8, utf-16).

To avoid false positives you can
  • preview the hits before actual performing the replace operation
  • use a dedicated regular expression tester to see what exactly match and what will replace
 Search and Replace is something you have to consider harmful but in case you have to do it on a Windows Desktop using this tool is something I can recommend.

No comments: