Sunday, May 06, 2012

Not Open Source but Free CCMS (2)

The already mentioned XML based CCMS "Calenco XML CMS" still available (see also my post "Open Source CCMS").

There is one more system which lately offers an already existing DITA based CCMS without any license cost: "SiberSafe DITA CMS". Read the EULA carefully but in case you need something to play with....

Both are no more open source. Their goal is not to get an open and shared development. They simple heading for lowering the barrier for customer entry.

What you see is that both cases are the company driving the implementation want to get in tough with you and both companies offers additional features with dedicated license costs.

I personal expect more product in this domain following the same approach. Why?

The specific domain of "technical documentation" is pretty small and there are many different and small companies out there which providing specific products to support this domain.

Even in huge installations the amount of licenses required to support the users dealing with technical information isn't very huge - this means the opportunity to sell a huge amount of licenses is limited. In addition most of the available tools are similar to each other - with individual advantages but with no structural differences.
 This means this business model does not really scale and the amount required to sale the license is high.

On the other hand having a tool does not improves your information process and therefore does not add any business value to your organization. At the best case it supports your process with automated tasks. But first of all you need a optimized methods and processes (information process) at all before any tool can assist you as best as possible.

This means - the future is not to create and develop products looks like the today's CCMS system available on the market. The future is to create either information process driven productions where technical information is just one use-case OR focus on integration services to get the value out of existing information.

What are the limitations of todays CCMS system. And how more future oriented designs will look like? More to come in future blog posts....

Search and Replace on multiple files

Search & Replace is a common task in data processing environments. You cannot avoid to build process your data to replace or add a word, syntax or even multiple lines of text in several different resources.

If the task can be fully automated, means there is a unique algorithm to transform a resource A to A' based on the content of A than you will look for available methods and tools supporting you to do this kind of operation.

Methods

Regular expressions are very powerful rules to express not only finding common pattern in text based resources but also a good foundation to replace or extend existing content.
Compared to simple phrase based pattern most imaginable rules can be expressed and used as a source for the required transformation.
But regular expressions come with high cost of complexity. It is very likely to defines rules which results in "false positives", means matches that you didn't want to match.

Tools

Doing Search & Replace in the file system on multiple resources (files) is easy for IT people using linux tools like grep, ....
On Windows you also can install those tools and make them a powerful foundation for those kind of operations (see http://gnuwin32.sourceforge.net/packages/grep.htm).

TextCrawler

But not all people like to become an IT expert for simple replacing the term "foo" with "delicious" . On Windows you can use TextCrawler for this. One of the best UI based tools I'm aware of.

It provides
  • simple phrase based operation "Replace phrase A with B" on multiplier files
  • more complex regular expression based operation 
  • and in addition a fuzzy search operation for more advanced search operations
It also supports the use of Unicode characters to search and replace and the processing of files encoded in Unicode (utf-8, utf-16).

To avoid false positives you can
  • preview the hits before actual performing the replace operation
  • use a dedicated regular expression tester to see what exactly match and what will replace
 Search and Replace is something you have to consider harmful but in case you have to do it on a Windows Desktop using this tool is something I can recommend.

Concurrency: low-level design still matters

Todays design very focused on application level design. Using optimized operation for a given software service.

This means you try to create simple, atomic operations which can be called from your business process. Each service can be distributed and scale using mainstream deployment pattern.

So far so good. What you might see once you do this. Running one thread on a single hardware gives you the predictable performance you have to achieve, running 8 concurrent threads each single operation takes a much higher execution time.

You also seen this behavior in one of your applications? Than you probably faced with concurrency issues and once you eliminated all application related issues you get aware of that even today hardware related optimization is something you have to take care of. Really?

I see and know some application in my daily work doesn't scale very well on a single hardware - they are very basic in terms of application related algorithm but they using algorithm patterns causing memory contention....

Thus you still have to understand the low level architecture and ways to optimize the basic algorithm in your code.

Lock-Free Algorithm

Have a look at "Lock-Free Algorithm" to get a very good overview on how such things still affect concurrency behavior of your application. You should also read "Beginners guide-concurrency" from the Trisha Gee and Michael Barker.

You also gets hints and estimation how virtualization might affect you performance.

Summary

Choosing the right hardware still matters in operation scenarios where concurrency is used to scale your application AND scalability is a core success factor of the application.