Monday, December 28, 2009

Software Globalization

Software Globalization is always a big deal and to be honest i do not know many software application with more or less shortcomings in this area. Globalization consists of Internationalization (preparing the application that they can be adapted to different locales) and Localization (translation of content for a specific locale).

you have to deal with locale specific
  • messages
  • dates
  • numbers and currencies
  • replacements
  • conversions
  • encodings
  • sorting / collation
  • test bounderies
  • layouts
  • .....
most of them seems to be easy for 95% of the use-cases but getting 100% done is at least tricky once you do not have the corresponding and suitable software infrastructure.

the best open source library in this domain I'm aware of is ICU. As always a software tool does not solve the problem itself it just gives you a solid base once you understand what needs to be done ;-)

related links:

Sunday, June 14, 2009

who has locked my account?

if you have to identify on which windows client / server a given account is currently logged on sysinternals utility "psloggedon" is your friend.

a simple "psloggedon account" identifies all computers the given account is currently logged on (and part of the same domain you start this command).

this tool is part of sysinternals huge amount of useful tools and command line helpers which helps to answer or identify common maintenance issues, like

  • does my application doesn't close all used handles
    Process Explorer
  • which resources (filesystem, registry, ...) does my application access
    Process Monitor
  • capture of debug output (traces using OutputDebugString, DbgPrint) without using a debugger
  • .....
and all utilities are free. the team creating and maintaining those utilities are now part of Microsoft. up to now this has any negative effect to utilities they provide beside there is no source code available anymore which prevents to see how certain information can be / must be retrieved to be on the safe side.

Friday, June 12, 2009

transform pdf to word / excel

online conversion from pdf to word / excel is available from you have to upload your pdf and will receive the converted word / excel file as long as the result is smaller than 12 MB. result looks good for most use-case. of course you have to manual rework based on the intension you have with the result but content, images and layout is converted and accessable for subsequent tasks.

i like that kind of advertising - a easy to use subset of the functionality is available for free and if i like the feature and want more enterprise features i will probably pay the price.

ux tools for windows

useful unix tools (e.g. grep / find) for windows are available here
advantage compared to cygwin you can use them right from the windows command line without any additional layer.

make usage of command line much easier....

Sunday, June 07, 2009

(re)use of content

(re)use of content is a common topic in almost each domain has to deal with content. the rational is obvious and easy: use of existing content is faster and cheaper than creating new one.

there are two major requirements to (re)use content:
  • you need a business object the content belongs to
    otherwise it is impossible to identify existing content and determine if this content is worth to use
  • you need a well defined information type the content belongs to
    only if you know what type of content you have to create allows you to identify if it may already exist
based on those two trivial requirements you can define rules in which case (re)use is possible:

if existing content is created for the same business object and for the same information type as the new content you have to create.

to extend this scenario you can deviate your content from existing one if either the business object or information type is also derivate from the content you intend to use.

to derivate a test case from corresponding use case is obvious and valid as long as the business object both belonging to are the same. if a certain engine exists in three variants each derivate from a common building block the corresponding content is obvious also valid to use and derivate.

as you see usage of content requires both knowledge and connection to the business and to the content itself.
because of this many (re)use scenario fails in the real world (look into your own domain -- do you satisfy with your content (re)use?)

if you look at the two dimension enables usage of content you might understand that in many cases the usage of content between companies might be higher than the usage of content within one company. in many cases competitors dealing with similar business objects and information types beside different departments within one company might not.

this means that if the information types are not critical for a certain domain cross domain usage of content will become a possibility once the information itself is interchangeable (thats a topic for its own).

usage of translation is one of the most obvious scenario for cross company usage. because translation becomes more and more a cost center but is also a business driver the usage of common translation memories between companies working in the same domain is of course thinkable. content already translated for nokia might be used by sony as well, same is true for BMW and Daimler.

the "Language Data Exchange Portal" shows how this might work. each member provides content and funding and can participate from the complete pool of data available. as better the content is classified as better the results each company has......more to come?

Tuesday, June 02, 2009

Google Wave: backbone for real information collaboration

beside Microsoft still working on getting search on existing information done (last try Google tries to re-invent the creation, usage and collaboration of information with upcoming "Google Wave".

having a deeper look into the already available information the most interesting part of the design is that the complete architecture is based on hosted information transformation based on a group of humans and automated participants. the overall architecture looks clean and the demo provided here: looks promising.

such kind of infrastructure has the capability to be used in all kind information centric workflows especially those happens in the "cloud". creation and further development of engineering artifacts from customer requirements to user stories to design documents and testing artifacts up to the usage of the same information in technical docs created for them. such workflow always requires a information object centric architecture and in addition collaboration feature sets.

we have to see if Wave will succeed, means if developer comunity contribute and use the existing extension points if so, i'm looking forward to see what can be done with this promising infrastructure.....

in my point of view the future of information goes away from document paradigm and end up in the a more message oriented paradigm where a group of people work on structured / semi-structured information objects (messages / topics) and assemble them in a final stage in each business workflow to a document, web-page, calculation sheet, ..... this means a document is "just one output for one audience" and describe just one usage of the information at a certain time.

in those days search is still essential but just as one way to navigate through the specific information pool and only if the hits are relevant for further usage (try to search for "Microsoft" on and see if the provided hits are relevant enough for you.....)

see also:

CMIS resources

nice summary of background material about CMIS can be found here:
especially the list of available provides and consumers are useful for those try to further investigate into this new standard effort. sample code says more than thousand words....

Thursday, May 21, 2009

Open Source TMS: GlobalSight

few days ago i stumbled over GlobalSight a open source TMS solution formally known as Ambassador.

the marketplace for TMS products are dominated by a few commercial vendors most of them doesn't make big money with those products.


the translation / localization domain requires strong domain knowledge which isn't widespread especially the combination of IT knowledge and linguistic, translation and localization knowledge is hard to achieve.

in addition translation / localization today still means to process input files from pretty heterogeneous and in many cases proprietary file formats (e.g. HTML, Word, Framemaker, Quark, Excel, XML, software resources, .....), translate them and create output files using localized content and layout. that is one of the most appalling jobs you can think of as a developer.

in general XML can reduce the requirements for TMS but the majority of users are still not using XML based content creation and therefore are not able to use XML as data format for localization.

what does this mean for success of the solution GobalSight?

to be honest i do not expect huge developer community dive into further development or even contribution in this solution because of the reason mentioned above. only if additional companies (like those part of the mentioned Steering Committee) provide development resource to this project this can be more than "just another way to herald end of life of a not successful product"
different views?

Monday, April 27, 2009

cmis: Apache Chemistry

good news on CMIS front. Apache incubating a new project called "Chemistry".

project description

Apache Chemistry is a generic Java language implementation of the upcoming OASIS CMIS specification.


Apache Chemistry is an effort to provide a Java (and possibly others, like JavaScript) implementation of an upcoming CMIS specification, consisting of a high-level API for developers wanting to manipulate documents, a low-level SPI close to the CMIS protocol for developers wanting to implement a client or a server, and default implementations for all of the above. Chemistry aims to cover both the AtomPub and SOAP bindings defined by the CMIS specifications.

with success of this project there might be an open source CMIS reference implementation not coupled with a existing CMS and therefore has the change to on the one hand provide a potential good starting point for vendor specific integration work and on the other hand a reference makes the specification "real" and touchable.

see also:

win32 memory management

i believe that one year from now 64bit operating systems are mainstream in server infrastructure. by the way currently there are still 32bit windows server / software out there. therefore it's time to understand limitation you leave behind using 64bit OS. so don't miss

EXtensions...xml processing

you always faced with "missing features" if you working with existing standards.

two reasons for that:
  1. the feature is specific to your domain or solution
  2. the feature is common but for some reasons (and there might be good ones) didn't get it in the existing standard
for the second case common EXtensions might be the area you looking for. in the xml domain the following projects trying to define "standard extensions". this means that extending an existing standard out of the scope of the standard process itself but without loosing the portability of applicaton using the extensions.

beside EXSLT which is well-known and already produced a sufficient amount of output all other EX communities are brand new and therefore doesn't provide any output so far.

watch and contribute......

Thursday, April 23, 2009

Programming is Fun But Shipping is Your Job

pretty old but always valid post on common shortcoming of developers.
one of the most critical part of each software project is to get the last 2 % of the job done. regardless which development process you choses (waterfall, agile, rup, ...) you are always faced to get the team focused until the stakeholders of the project are really happy (and not only the developers creating the assets).

identify content: powerful but tricky

regular expression and xpath are two approaches to identify a matching subset of content within a given amount of content for further usage. the first one based on plain text the second one based on xml.

both are powerful but if you not using it on a day by day basis are tricky and error prone. error prone means that you have to avoid all edge cases where a given expression shouldn't match but it does or the other way around where a given expression should match but it doesn't.

if you work in XML related world you often need both approaches e.g. using XSLT or XQuery.

there are few commercial IDE's which helping out to develop the right expression for the required use case but there are two which i'm often use dedicated to help you dealing either with regular expressions or xpath


powerful standalone tool to create and verify xpath expression. support for xpath 2.0 and most features you ask for in this context. this tool is based on well-known Saxon for xpath 2.0 and .net subsystem for xpath 1.0



powerful standalone tool to create and verify regular expression. with buildin regex analyser and on-the-fly validation, code generation (for .net and vb only), ....


Monday, April 13, 2009

... too long.

if you read

i would say, too long. because i really do not have any statement of goal and expected improvements in the mentioned list.

last statement is at least strange:
If you add up the weeks in brackets you will see that it comes to 26 weeks which is six months. This is purely an estimate and project length really depends on the workload of the Project Team as well as the amount of data involved.
if only those two constrains are responsible for project length is do not expect any measurable outcome beside "making IT people and consultants happy"....

thats also a good example for "IT centric projects likely to fail"

save money means save money

the following blog post from Mark Woeppel "Is Your Continuous Improvement Organization a Profit Center?" summarize one of the major misunderstanding of continuous improvement in many companies. not too much to add, beside that i can agree to the suggestion within the post based on personal experience.

same statements must be made for any (IT) project claim to improve existing processes. while working with the stakeholders to identify the scope, requirements and more important the rational for those i often hear the statement that a particular requirement / process change / feature improves the process and therefore saves money.

after the project is finished and in operation, the same amount of people with at least the same amount of budget working on the "improved process" and the upfront investment already gone. based on simple calculation, the company just loose money with such a project (even if the cause - see below - might be different).
that leads to continues and increasing management distrust in changes stated as "money saver" and subsequent projects are more likely to rejected from senior management.

  1. project scope or requirement's simple wasn't target oriented. means they do not really help to improve the overall process they more or less leads to "doing the same in a different way".
    that is a common project failure and can be avoided if project / requirement / system management skills are improved or more simple by doing a better job. the project teams fails to address the goals.
  2. other process parameters are changed as well. for example the process throughput increase, additional process outputs are created or budget / people now assigned to other activities within the same department to resolve bottlenecks in this areas, ....
the second one is the more important one, because this means that the change has an impact but the company isn't able to express / see this in terms of cash. means budget is simple moved from one service item to others.

how to avoid this?

divide the individual rationals for a project in measurable and independent topics and for each of them assign responsible person from senior management. means if a certain amount of requirement state to save money, the budget schedule for the period the result of the project is in operation must be reduced. if on the other hand upcoming new demands must be addressed those must be implemented in budget schedule as dedicated position.

sounds trivial, but believe me that does not happen very often in real world situations. the reason for this is simple -- it is a pretty hard work and sometimes not easy to define the saved money. This not only helps to get project agreed, provide traceability for senior management but also tries to get the bottom of the stated effect (which sometimes simple is not right which leads to effect described in 1. reason).


the different goals and results can be traced and measured and if a project or few parts of a improvement projects intent to save money, you have to save money means you have to have reduced budget if the project succeed.


save money must mean saved money at the end. All other "statements" and explanations are not valid especially in todays business context....

cash is kind

"cash is king" is a principle that many companies today have to follow. this simple means all decision must follow to increase cash flow.

for all activities and departments of a company with no direct impact to the core business process this means that they are forced to immediate reduce their budget to a minimum / essential value.

HR, IT, TechDoc, .... are examples for affected units.

the current situation is not designed for mid or long term investments in the mentioned buisness areas which "only" improves long term operation. what you need is short term benefit without too much investment.

good opportunity for outsourcing, isn't it? i expect significant amount of companies try to reduce their supporting activities through outsourcing services even if the services doesn't fit 100% to their specific needs and the missing parts isn't essential for daily operation.

what do you need for service outsourcing:
  1. service function must be fully defined (in technical context this means that the interface must be fully defined)
  2. service availability must be defined
  3. service security must be defined
    =>what happens with the data maintained by a different organization?
  4. service termination must be defined
    =>what happens if service is moved in-house again or moved to different service provider?
  5. price must be defined
    =>pay per use / per time period / ?
having this in mind i expect increasing request to ASP / PaaS / SaaS / PRaaS buisness models and companies can provide such services in certain domains.

provide IT infrastructure (software & hardware) for daily operation for a dedicated amount of money.
=>IT assets based

provide a dedicated platform to implement and host a application for a dedicated amount of money.
=>development asset based

provide a dedicated set of software features for a dedicated amount of money.
=>software application / function based

provide a dedicated Process / Business function incl. the corresponding IT asserts for a dedicated amount of money.
=>business process based

the amount of semantic involved is increasing from top to down. therefore the potential value but also the potential risk for the requesting company increase. todays knowledge and IT infrastructure make usage of those business models not only possible for big, enterprise companies but also for mid and small size ones.

but beside few other points you still have to trust in trust if you want to go this direction.....

Sunday, April 12, 2009

model a DITA compliant model

DITA today is more and more adopted at least in the techdoc domain. the success and adoption rate is based on two major advantages over other existing standards:
  • adaptability
    the DITA data model can be adapted based on defined rules to specific needs and domains. this concept is called specialization
  • modularization
    content creation is not based on document paradigm but on module paradigm. authors no more create documents they create topics which represent some artifact of the system they describe.
problem for an information architect

since i started working with DITA i reviewed several data models from coworkers, from customers and last but not least created by myself. All of them claimed to be DITA compliant.
Using them with also DITA complaint tools often fails for certain features or in some cases completely.

  1. the created models wasn't valid according to the released specification
    creating a complaint DITA model (DTD or W3C Schema) requires to know all rules and requirements the DITA specification provides without having any tool support goes beyond the support of creating regular DTD's and W3C Schema's. this means the information architect is leaving alone with the underlying complexity.

    i do not know exactly but based on data models i'm faced with i expect that at least 50% of customized DITA models out there are not complaint to the DITA spec.

    few years ago i already posted this here:
  2. the used tools requires additional, tools specific configuration or semantic or simple doesn't implement a feature according to the specification
the second reason is getting better over time and today you find a bunch of tools which are production ready even for enterprise usage.

the first reason still not resolved.

you might argue, why not use the DITA model out of the box?
  • if you have specific requirements in your business process and you require additional semantic to support those
  • if you have to simplify the usage of content creation for the authors and users to get better and more consistent content outcome.
    note: i consider DITA subsetting / configuration as just another way for the generic concept specialization
  • if you introduce new business domain / taxonomy into your content maintenance strategy.
issues if your model doesn't really confirms to the specification?
  • interoperability is no more guaranteed
    this is mainly a problem for the enterprise and in most cases not at the beginning of the usage of DITA
    but if you once look into this problem you forced to fix all content created against your model or adopt the processing chain your content is delivered to
  • process chain does not work
    you add new tools to your environment and certain features doesn't work. you again can fix all your existing content and the data model and tools or adopt the tool with certain workarounds
this week i stumbled over DITAworks modeling module which is the first tool I'm aware of addresses the modeling use-case in the DITA domain.

I'm not yet verified it to see to until which extend this tool supports the modeling / validation process and how much manual work is still involved and how complete the implementation currently is. because this tool is still beta and is brand new i expect much outstanding work but i hope this tool opens the door for speed up DITA related development work and the more important part of the story improve quality for DITA related information models.

i know that the amount of money / amount of customer can be found in this domain is pretty small and the complexity of the problem isn't small enough therefore i do not expect too much competitors in near future, but i might be wrong.

time will show....

Tuesday, March 31, 2009

content & service composition: small and simple showcase

if you want to see and learn how easy a information aggregation use case incl. corresponding presentation can be solved take a look at "Make dashboards with XQuery".

this sample is all about composition, from content and service (functional) point of view.

most of the concepts required in the file of information processing are involved. even if the implementation has drawbacks and limitation in several points you see how information centric requirements can be solved.

Sunday, March 29, 2009

just another standard to manage content

CMIS is another try to provide a vendor independent content exchange api. major ECM vendors, like MS, Oracle, IBM and many others are joined the corresponding OASIS TC to release the first version of this new standard.

good overview of the current state can be found in "CMIS meeting notes".

I'd like the idea to have a common accepted and wide adopted standard to maintain resources and all related concepts.

major advantages
  • decoupling of client implementation dealing with standard related concepts from server implementation
  • usage of several server implementations through single client implementation
  • adaption of common and rich infrastructure for specific needs
this leads to the core vision i have in mind talking about such a standard.


core infrastructure of a content management is provided by regular OS client infrastructure and specific needs are implemented on top of those core layer. a wide range of server side infrastructure can be used and integrated without adding IT complexity.


we already heard about those vision, thats what WebDAV claim to provide....
why not working stronger on WebDAV instead of re-inventing the wheel? looking at the world of WebDAV shows the issues in this area:
  • many incomplete implementation out there
  • integration in regular OS infrastructure is half-assed and error prone
  • WebDAV is implemented as an additional API with limited maturity and attention of CMS vendors.
  • DeltaV extension only supported by a very rare and not widely used implementations
  • ....
of course, WebDAV standard is still not feature complete. especially search interface and concept of typed links are not yet provided by the WebDAV standard. but adding this as additional extension is not a big deal (see also

the major issue with WebDAV is simple the lack of robust and complete implementation. the main reason: it is easier to define a standard than fully implement those standard and resolve existing issues within the standard.

the major members in CMIS TC are the same as those working on WebDAV standard. i personal hope that the success and sustainability of implementation will be improved. otherwise there is just another standard no one takes care of once it is 80% complete.

i do not compare both approaches because this is another story but in terms of simplicity WebDAV is currently in pole position. more to come......

Friday, March 27, 2009

Windows Unlimited?

i often have to deal with application server design and deployments based on 32bit Windows server OS.

if you have to maintain more than one application server on a single host and for some reasons have to have more virtual address space for one application server you have to deal with /3GB switch enterprise server OS provides.

the expected load -> request to each running application server -> amount of threads on each application server must be defined carefully and requires more OS in depth know than I'm willing to have.

you have to understanding how the OS deals with system resources and in this context i stumbled over a good blog post
Pushing the Limits of Windows: Paged and Nonpaged Pool
which describe the limits of Windows pretty clear and understandable even for guys like me ;-)

not often read such comprehensive and focused information for this particular topic. thanks Mark for that.....

Wednesday, March 18, 2009

open usage of sequence of data points

lot of important data out there are simple "Time series" data, means a sequence of one or more data points change over time.

a service to share and use such data is Timetric. currently the amount of user and useful time series are small but in general such pretty platform can deploy common time series from many different domains.


who takes care that the shared data is correct and therefore is valuable to use? all services based on public contribution and usage are faced with the same issue. do you trust the data you see? do you trust wikipedia? in general you should not. you have to double check at least 2 different sources before you use the provided data.

in addition once you double checked your data you have to make sure that the quality of data is guaranteed over time. that is much more difficult.


if the data is mission critical you should not use data before validating them. in case of non static data you have to validate the data each time they change. this means that you either need more than one data source as service which are not based on same data source or you have to look and buy commercial services takes care of the provided data or you request the service from the organization owns / collecting the data. each of those solution requires special handling for the particular domain.


availability of public data service are promising but i currently do not see a available model to trust in. therefore usage is pretty limited only for some kind of "outline view"

Sunday, March 15, 2009

data processing workflows

if you are faced with data processing workflow which requires to process / transform a huge amount of data in a limited amount of time this can end up in pretty complex implementations. if you have enough hardware to do the job you need a infrastructure makes use of the hardware.

using hardware for a limited time today isn't a big issue. the cloud infrastructure out there (e.g Amazon ec2) is perfect if you have to process a huge amount of data in limited amount of time for a limited duration. you are able to scale the usage of the required hardware for the time they are required and just pay for the required duration.

now you also need a ready to use software infrastructure to implement the processing workflow. MapReduce is a software infrastructure for such kind of problems.

Apache hadoop implement this MapReduce but it lacks of easy of use means is a pretty low level infrastructure and of course lacks of higher level workflows which is not defined by MapReduce.

Cascading closes the gap. based on "stream processing" the MapReduce pattern are applied and used. it is not too complicated within one day i was able to create a simple applicaton which convert 20 TB of svg data to jpg and doing some transformation in between using batik and 20 concurrent hardware nodes.

for some use cases the power of cloud is easy to tell....

eat your own dog food

if you in a position to hire a consultant you would ask which one to choose? in each business domain the amount of consultant are huge in each domain.

two things you should consider (after looking into the slides they present):

  • don't hire someone who leaves before the suggestions are implemented. so each project tries to involve external consults should make sure that the involved people are not only responsible to provide some kind of "best practices" but also to implement their usage in the particular case they are hired
  • don't hire someone who do not eat his own dog food. means ask you're potential consultant how comparable cases are solved by his own company or his or her own daily business.
    each problem can be traced down to some analogue problem which requires the same approaches you faced with. if you talk about the daily work and solution with your consultant before hire him you get a good feeling if the external person really knows what he talking about
its often not the slides which are important its more often the individual experience and knowledge make the difference between good and excellence help in you particular problem domain.

Sunday, March 01, 2009

Information Dynamics

i stumbled over a dissertation which focus on "Information, Their Effects and Management in Supply Chains". unfortunately the paper is only available in German. quotation from the summary:
I therefore suggest focusing SCM initiatives on information processing and information efficiency in order to enhance overall system behaviour and efficiency.

take the time and read the paper, it provides a interesting view on effect and impact of information in SCM.

ODF and OOXML: interoperability issues

as already mentioned here using a standard does not guarantee to be vendor / implementation independent.

the following paper "Lost in Translation: Interoperability Issues for Open Standards -- ODF and OOXML as Examples" shows that complexity is one major show stopper in this area.


OOXML and ODF try to define office documents this means that content, content structure and layout (and application semantic) must be standardized.

because the complexity of both standards is high only limited amount of vendors are able to implement 100% coverage of the standard and even if they try they are not able to prevent errors in implementation.

lessons learned?

standards should carefully consider good old "Everything should be made as simple as possible, but no simpler." (see principle. in our context this means that those standards require certain, atomic level of conformance which makes it possible for each vendor to implement a certain and complete subset if the complete set is not possible and useful for a particular application.

Sunday, February 22, 2009

IT centric projects likely to fail

during the last two years many projects around trends like "social networking" / Web 2.0 comes to IT departments. the IT guys reading stories from well known vendors like Microsoft that collaboration and information sharing is one of the most important trends for next years.

they take this input and run to their sponsors (business departments) and ask them if they have trouble maintaining information around in their daily business and surprise, surprise they received a "yes we have problems".


most of todays problems are caused by incufficient information lifecycle. the core buisness assets are more or less maintained by information supporting and guiding the core buisness assets are still not really under control. project teams are not able to share a common view of project / work related information, information get lost from one human interfact to the next. supporting documents are not findable even if they exist somewhere.....

IT trys to fix this

they hire few consultants train them installing the product of interests providing wiki, blog, chat, document managment functionality. if they are smart enough they ask the buisness department for their requirements and now they try to setup a pilot using their product of choice.

great everything works after few days of departments start to use the good new world....solution gets adapted and released.....

one year later looking back and surprise, surprise the problems still the same just in another layout.

this story happens several times during the last year. Look at the most MS Sharepoint related projects out their -- most of them a great experience for the IT / consultants and developers but few or zero benefit for the business.


its simple and everybody knows the answer. IT systems doesn't solve a problem and on the other hand are not the cause of the problem. the problem is caused and must be solved within the business process itself. IT systems can only provide sufficient support for certain steps in the process if the process and connected process itself is healthy.


if business people claim they always faced with outdated information the cause is manifold. there might be no process to update the corresponding information at a certain step in the process. there might be no time to update the known information source, there might be no information which are the relevant information source for other people in the team, there might be redundant information source and the wrong one is used.......

sounds trivial and obvious but....
why we still faced with IT centric projects?

  • IT people are happy to develop new solutions
  • business people are happy to find someone guilty for existing problems.
  • IT people don't like to identify the real cause they are mainly focus on the solution.
  • business people cannot image what a certain IT solution mean for their daily business.
  • IT people don't understand the CAUSE they only understand the PROBLEM itself (expressed in "requirements")

and therefore both parties believe they solve a existing problem but they simple implement a solution to decant the problem into another IT solution.


never try to solve a existing problem related to information management never introduce a IT solution first. enforce the business to solve and describe their problems using the existing tool chain and try to identify the CAUSE of the problem. if this is done and works successful the areas a IT solution can support is easy to identify and now its much easier to identify which IT solutions is the right one to choose....


i don't say that the mentioned product MS Sharepoint is good or bad but it is a product IT tend to play with and therefore a good example many working people faced with....

Saturday, February 21, 2009

classified as INTJ

stumbled over Typelyzer which tries to define Myers Briggs Type Indicator. i immediate tried it out using my own blog

viola: this blog is classified as INTJ

does this fit? well, it at least fits to the content of my blog.

how does it work?

they using free engine for text classification based on content similarity. they trained their classification with many reference material created by persons known and classified as one of the different types.

to identify psychological preferences based on blog entries is another great experiment to use available information and draw conclusions.

Monday, February 16, 2009

maps in svg

if you search for maps in svg you can have a look at blank maps from wikipedia:

svg is pretty smart if you require automatic creation of dynamic content like charts and visual reports based on dynamic data.

by the way svg standard has still many limitation in terms of interoperability and tool support. but in case svg cannot be published as-is Apache Batik is there to transform the svg source into format you require (pdf, eps, png, tif).

Sunday, February 15, 2009

RDF support using yahoo's BOSS

yahoo added rdf/microformat support to their public search api called BOSS.
to understand how it works read

this makes semantic sources if available usable in searches and therefore BOSS might be one real alternative to what the bg (big google) provides......

change is todays baseline

stumbled over the following short video sequence which shows the pace of change

hold on and pay attention. thanks to

for pointing me to this link.

open source search engine: apache lucense & solr

Apache Lucene is one of the most interesting search api's in open source marketplace. it provides a powerful and pluggable interface to provide full test feature set to many java based application.
but Lucene is "only" an api not an application.

Apache Solr is on the other hand is an enterprise search server based on Apache Lucene. the feature set behind that solution is worth to look at, means Solr introducing features some commercial systems still missing. for example using standard enterprise search provided by MS within their Sharepoint product you wonder how less features are available there.....(maybe adding FAST into next generation of Sharepoint change that but thats another story),

since few month professional services for the mentioned open source products can be used from Lucid Imagination. providing support and integration work for open source products makes the usage for the products more useful for the enterprise.

if you have to provide advance "full text" feature double check the open source community before you invest in commercial variations....

Sunday, February 08, 2009

Generate DITA Java API reference documentation using DITADoclet and DITA API specialization

one example to show a step to close the gap between developers who creating the artifacts and outgoing and enriched documentation created from their source without the need of additional redundancy.
it is not too far away from approach introduced by javadoc / doxygen beside the datamodel used behind this approach has much more value for additional information integration as provided by the classical approach.

not all yet done by the "out-of-the box" solution provided here:
but you can imagine the possibilities using this kind of approach:
  • integration of additional information provided by marketing or tech-doc groups or service groups without any media break in information usage
  • usage of other deployment processes for api documentation available within companies infrastructure
  • .....

improve your full text search experience

Cloudlet a free firefox extension improves your google search experience and provide "cloud" based access to additional keywords might help your to access the relevant information faster. there is no magic behind that (and the results are not perfect in all cases) just using the information provided by google itself.....full text search as its best.

link: Project Euler

this project
Project Euler is a series of challenging mathematical/computer programming problems that will require more than just mathematical insights to solve. Although mathematics will help you arrive at elegant and efficient methods, the use of a computer and programming skills will be required to solve most problems.

The motivation for starting Project Euler, and its continuation, is to provide a platform for the inquiring mind to delve into unfamiliar areas and learn new concepts in a fun and recreational context.

is out there for a while but i just stumbled over it. check it out.....

Sunday, January 25, 2009

Rocks Into Gold - Helping Programmers THRIVE through the Credit Crunch

summary of "Rock Into Gold" from Clarke Ching can be downloaded at

just read.....

Pipelines and SOA -> pipelines are more than that

new book "Software Pipelines and SOA" showing the advantages using "software pipeline" approach from very technical and partial point of view.

the pattern itself has much more benefit and performance scalability is only the one of them (and in many cases not the most important one). the principle to process a flow of information in atomic, semantically closed steps is a pattern which unix itself was build on.
if you ever seen how easy even complex processes can be created based on pretty simple and atomic executables you can imagine what pipelining mean in terms of software architecture (see also

unix (and nowadays many other similar infrastructures) use a abstract stream of data passed through the pipeline and therefore the flow of information is less semantic, is is only a abstract stream of data. the usage of pipelines can be improved if more common semantic is part of the overall pipeline, means each step / operation has more information of what is expected to flow through than just data.

XML pipelining is intent to use XML as information flow. this ensures that data processing / transformation can be done using less basic byte stream operation. processing can be done on declarative languages like xpath, xquery, xslt, .... which again reduce the complexity of information access and transformation.

xml itself can express endless amount of user data using different data models. a pipeline defined for a subset of data models can again reduce the complexity and therefore can improve the benefit of a particular pipeline infrastructure.
you can imagine pipelines dedicated to transform content created against a DITA data model into endless of distribution formats. the semantic of a particular pipeline step can be used in several pipes are much higher than on "general purpose XML" level.

if we're looking into the second term "SOA" within the mentioned book title we have to divide:
  1. pipelining to orchestrate dedicated services (macro pipelining)
    because each business process can be expressed using the pipeline paradigm the implementation of SOA orchestration is suggested to do using the pipeline pattern.

    therefore you have to define the sequence / control flow of services and corresponding message transformation.
    languages like BPEL providing a model to express such kind of pipelines.

    this layer often require persistence of pipeline state because execution of such processes can take between hours and years.
    this layer often requires human steps, means not the complete pipeline can be executed by a machine without human interaction. languages like BPEL4People are extension to cover this standard requirement.

    there are many frameworks out there trying to provide an easy to start infrastructure. by the way the usage and complexity of current implementation must be still not underestimated.
  2. pipelining to solve one or more dedicated business steps (micro pipelining)
    within each business process a given amount of data confirms to specification A must be transformed into data confirms to specification B.
    e.g. extracting order data from ERP system and add sum for particular product groups before the result must be render to HTML for later display to the person in charge.

    those operation can of course as well defined as sequence of steps in which the input data is transformed into output data using multiple steps. those definition mainly derived from business rules.

    this layer does not require human interaction and persistence and therefore can be implemented on fully automated frameworks. using XML as data backbone results in "xml pipelining which combines most advantages required for "micro pipelining"

    languages like xproc, xpl , .... and corresponding implementation can be used in this area.

    in general a micro pipeline transform one input of a macro pipeline step into corresponding output(s)


pipelining is one of the most powerful paradigm we faced with for todays common IT problems. but this pattern is either new not magic its more a "back to the roots....".

Webinar: Building RESTful Services with XQuery and XRX

O'Reilly Webinar "Building RESTful Services with XQuery and XRX" happens on 28.01.09 (10:00 a.m. PST).
if you want to know how the mentioned technologies can work together.....

Sunday, January 18, 2009

SOA Patterns

site to share / read / contribute SOA design pattern. whenever you are involved in SOA implementation projects is worth to take a short look into:

Wednesday, January 14, 2009

RIA for information deployment

during the last couple of years the requirements to (software) applications changing in many ways:
  • from application to solution
    => customer is able to configure a solution based on provided services and corresponding orchestration and configuration
  • from static deployment to dynamic deployment
    =>customer is able to update a bought component via online connection. new solution feature can be added, configuration can be changed based on demand.
  • from function oriented usage to process oriented usage
    =>the application functions are embedded in business tasks reflecting the business process of different user group. different user groups therefore faced with different application behavior.
not all of the mentioned "trends" are incooperated in each application today but more and more companies providing a subset of the mentioned approaches in their products / solutions. if products changed to solution can be adopted to achieve most value for one particular customer or customer domain the corresponding information (online help, user documentation, service information, faq, ....) must follow this approach as well.

this in particular means that the information must designed to provide
  • size on demand
    each information product can be configured to fit one particular product installation. product configuration can change over time
  • update on demand
    information updates can be provided as fast as possible using "online channels". on the other hand content must be available without any "online channel" available.
  • workflow related content mapping
    information must not only map to a particular function of the product as already done with e.g. context sensitive onine helps. the information must be mapped and aligned with the workflow / buisness process the production customer intent to use the product
if you faced with such productions and you have to contribute any kind of information you know that this those change the way you have to create / maintain and deploy corresponding information products.

if we look into the information deployment process (other part of information lifecycle is part of subsequent blog posts) one of the most interesting answers to this question is the usage of RIA frameworks for that purpose.

most promising application looking at information deployment is "Adobe AIR".
that more the other RIA application can provided today but that does not mean that they do provide the possibility to be a deployment platform for information products in the future (e.g. "Google Gears" has similar goals to Adobe AIR)

next step would be to define a gap analysis which features is missing in Adobe AIR and which buisness goal is therefore not fulfilled based on current available platform. i will do this during the next few weeks...see what the results are.

Tuesday, January 13, 2009

just using xml provides Interoperability?

"The Anatomy of Interoperability" is one of the best articles summarize the issue of well know and often promised term "interoperability".

one domain often faced with this term is the world of xml and related "standards". lot's of them out there, some of them really stable and useful and even are interoperable (e.g. xml 1.0, xslt 1.0, xpath 1.0) itself.

by the way just using xml does not gurantee interoperablility for your data. this is only available if application behavior is addressed by a related standard. xml related standards try to achive this (e.g. svg) often fail or they are difficult to use because they missing essential features the specific user domain requires and corresponding tool vendors / application provides add them in a tool specific way. or the standard is too complex to implement a 100% complaint application (e.g. xlink).

DITA for example a new OASIS standard / information architecture to maintain mainly techdoc related features more and more faces with those issues. this standard has customization in mind, means specialization to specific needs is part of the design but there are of course still limitation and there a good reasons for those limitations in general.

the initial standard was not feature complete (means essential requirements were missed in user point of view) and therefore vendors /consultants / end user adding specific non complaint features for their specific needs which often results in missing the goal of interoperability.

why is DITA still successful?

to understand this you have two things to consider:
  • keep in mind that just using xml does not solve your interoperability goals without any additional effort
  • keep in mind that fully inoperable data is not always what you need. regular business cases often working well with a inoperable subset or predefined transformation on demand.
what makes xml worse to look at is the ability to access / transform the tagged data without requiring a vendor specific api (but often you need vendor specific know how to understand the application semantic) and a foreseeable effort to do this.

and that is the key feature if you think about organization specific information models.

Sunday, January 11, 2009

additional semantic for existing information sources

how information enrichment works in real world showing two public use-cases based on different technologies:
  • freebase
    "Freebase is an open, shared database that contains structured information on millions of topics in hundreds of categories. This information is compiled from open datasets like Wikipedia, MusicBrainz, the Securities and Exchange Commission, and the CIA World Fact Book, as well as contributions from our user community."

    this means that applications like freebase using already existing information and trying to add additional semantic to them based on combining and extracting information and context or in this case let user add additional semantics without modification to the source of the information.

    the tool thinkbase using freebase to provide a visual graph of information and corresponding link dependencies.
  • MailMark using a xml database (Mark Logic) as backbone for building the application just on XQuery

    the semantic comes from information aggregation and combination. in this application no additional user interaction is possible
both examples are very useful and showing real world application which you might transform into your own information landscape.....

XBRL: a language for the electronic communication of business and financial data

good overview can be found here:

Saturday, January 10, 2009

XRX gets more attention

XRX shows that more and more information are represented in xml today. that trend will continue because more and more processes today seen as what they always was: "information driven". more and more traditional "unstructured" formats are now represented in xml and more and more business value can be extracted from those formats (OOXML, Open Document, ....).

on the other hand more and more companies start to creating certain type of information (user documentation, online helps, service information) using more semantic rich information architecture as provided by dita.

that opens up the success for databases with native xml read / maintain and search feature set. they are able to provided additional value to already existing information created without the knowledge of their future use.

good summery of technologies in this area are provided by Kurt Cagle
"Analysis 2009: XForms and XML-enabled clients gain traction with XQuery databases"

Wiki: solves collaboration & information sharing?

i often hear statements that using a wiki platform will solve our company problem in collaboration and information sharing.
based on my personal experience most of the wiki project's seen in reality failing silent, means they start with more or less enthusiasm but end up in either
  • content silos with outdated, bad findable information chunks
  • unused part of the companies intranet / IT infrastructure
  • derived by only a handful contributers and users
recently published article showing the reason and rational for that behavior.

by the way there are wiki projects out there (internet -> wikipedia, intranet) which are successful.

what makes them successful?

in my personal point of view, each successful "information process" requires at least
  • definition of common information lifecycle
    - who has to create which kind of information?
    - which criteries must be fulfilled to define a information object as usable?
    - which kind of subject matter expert must a involved for which kind of information
  • and common information taxonomie
    - what kind of information must be maintained
    - what kind of common classification do we use
    - best practices for structuring the information
  • and people who create, maintain and use the information
    - training is required
    - advantages and usage of information must be part of common understanding => people must see personal benefit in using and maintaining the information
based on spirit and purpose of the wiki within an organisation those guidelines must be more or less detailed but in any extend they must be available and somehow trained / reviewed.

the most successfull wiki project Wikipedia provides the mentioned guidlines all in an open and collaborative way (

one thing does not work is to setup a wiki platform and post a link to all potential users without any additional hard work.

always remember: providing information not more but not less than hard work. the more value a information must provide the more hard work is required to create them.

Tuesday, January 06, 2009

directory of services in the cloud

directory of available API's in the cloud are available here:

the list shows two things
  1. there are lot of services out there many of them can be used free of charge


  2. the stability of usage is a huge problem
    - few listed services are moved or removed completely
    - few service definition often changed without providing a sufficient version management

Buzzwording continues?

the IT industry is very innovative in creating and consuming buzzwords. from year to year at least one new trend alongside with one or more buzzwords are created.

main reason for that success is the corresponding visibility and based on that the opportunity to get budget. the main characteristic of such terms is that there are no formal definition of what is really the essence / definition of such term but on the other hand everybody seems to have a clear and complete understanding and definition for the term / buzzword.

second characteristic of such terms is that a common trend is associated with those terms.

and last but not least the life cycle of such trends are pretty similar, approx. 1/2 year until everybody is aware of it (through publications, blog posts, articles), 1 year highest awareness incl. associated investments and at the end the trend will be replaced by next one.

that looks pretty similar to fashion industry and in my point of view there is not too much structural differences between a new fashion trend and a IT trend.

just a small list of buzzwords from the last few years:

why is it possible to make money with those trends? because all of them promising to solve real existing problems in real industry. if we use the trends mentioned above their main focus is to

  • consistent access to required information at the right time at the right place
  • get rid of increasing IT complexity
  • get rid of proprietary vendor driven information silos
  • reduce Total cost of ownership for hosting the available information within a company
  • improve collaboration between different business groups
  • improve adaptability to changing business requirements
  • ....
if we take the essence of the mentioned buzzwords and collect them again than the following is really promising
  • usage of dedicated and well defined services for business automation
  • pay for usage of a defined service level instead of paying for hardware / software and corresponding maintenance (what really cares is the service that automates a certain business step)
  • architecture that adapts fast and controlled to change of business requirements (changed SLA) and not to changed IT requirements
  • .....
the trends above of course drives the creation of standards, software and services makes those requirements easier to fulfill but still it still hard work and more than just using those nice buzzwords.....

Saturday, January 03, 2009

xml processing in TecDoc industry

based on my previous talk on German tekom conference the main rational and corresponding issues behind xml / xml processing are defined here. to avoid translation of content i decided to post the content in German. slides are available here.

Warum sollte man sich im Umfeld der technischen Dokumentation mit Pipelinesprachen insbesondere mit XML basierten Pipelinesprachen beschäftigen?

Zwei Thesen zur Begründung

These 1 – Nutzen von Information

Der Nutzen von Informationseinheiten steigt mit der Anzahl der Prozesse, die auf diesen angewendet werden.

Erstellt und liefert ein Unternehmen Gebrauchsanweisungen in Papier für sich sehr stark unterscheidende Produktgruppen in nur einer Sprache, so sind die darin enthaltenen Informationen relativ einfach zu erstellen und verwalten aber der Nutzen der Information für das Unternehmen sehr gering. Die Bedeutung und der Wert der Information nimmt mit jedem zusätzlichen Nutzer der Information (zusätzliche Online Hilfe, Sprachvarianten, Produktvarianten, Nutzung der Information in Produktschnittstelle....) zu.

Zur Nutzenmaximierung muss somit die Anzahl der Verwender einer Information innerhalb der Anwendungsfallspezifischen Rahmenbedingungen maximiert werden. Jede Verwendung von Information basiert auf der Etablierung eines Prozesses zur Verwendung dieser(Erstellung eines Handlungsanleitenden Textes in deutsch, Wiederverwenden von dedizierten Informationsbausteinen einer Sprache, Erstellung einer Variante innerhalb eines bestehenden Informationsbausteines, Publikation einer Online Hilfe, ....). Da jeder Prozess die Komplexität des Gesamtprozesses erhöht steigt der Aufwand über den Gesamtprozess des Informationslebenszyklus mit jedem zusätzlichen Prozess, d.h. mit jeder zusätzlichen Verwendung der Information.

These 2 – Prozesse auf Informationen

Prozesse auf Informationseinheiten sind zum überwiegenden Teil innerhalb eines Unternehmens und sogar Unternehmensübergreifend identisch. Dies bedeutet im Umkehrschluss, das sich die Branche im Umfeld der technischen Dokumentation mit den Auswirkungen von „marginalen“ Unterschieden befasst. Die Unterschiede liegen im Wesentlichen in unterschiedlichen Informationsquellen (Art, Ablage, Format, ....) und den zu liefernden Informationsprodukten(Unternehmensspezifische Styleguides, zu liefernde Formate, ....).

Die Vielzahl von individuellen und spezifischen Prozessen ist weitgehend der fehlenden Zerlegung der Prozesse und der fehlenden übergreifenden Standardisierung von Prozessbestandteilen zuzuschreiben.

Die notwendigen Informationsbestandteile für jedes Kundendokument müssen anhand variabler Eingangsparameter identifiziert und bereitgestellt und schließlich zusammengebaut werden. Das Kundendokument wird mit angereichert, d.h. erhält einen oder mehrere Index mit definierten Anforderungen, ein Glossar, TOC, usw. Schlussendlich erfolgt eine Überführung nach HTML, PDF oder andere Formate. Eine weitere Zergliederung dieser Teilschritte führt für jeden dieser Schritte zu einem grossteil identischer und einer kleinen Anzahl spezifischer Schritte.

Schlüssel zum Erfolg

Um den Nutzen seiner Information nachhaltig zu maximieren muss dies mit einer konsequenten Zerlegung der Prozesse in ihre atomaren Bestandteile und somit der maximalen Nutzung vorhandener Prozessbestandteile (und das zugrunde liegende Wissen darüber) erfolgen. Somit kann der Aufwand und die Komplexität für die Nutzung von Informationseinheiten im Verhältnis zum Nutzen gering gehalten werden.

SMILA (SeMantic Information Logistics Architecture)

"SMILA (SeMantic Information Logistics Architecture) is an extensible framework
for building search solutions to access unstructured information in the enterprise.
Besides providing essential infrastructure components and services, SMILA also delivers
ready-to-use add-on components, like connectors to most relevant data sources."

initiated by German based company empolis this project seems to be promising in solving one common problem while dealing with todays information overflow:

  • identification and access to information relevant for a given business task / process
  • integration of "unstructured" information in corresponding business process
used standards are of course complex and not really common used in many organizations right now but that might change in mid term....