Monday, April 27, 2009

cmis: Apache Chemistry

good news on CMIS front. Apache incubating a new project called "Chemistry".

project description

Apache Chemistry is a generic Java language implementation of the upcoming OASIS CMIS specification.


Apache Chemistry is an effort to provide a Java (and possibly others, like JavaScript) implementation of an upcoming CMIS specification, consisting of a high-level API for developers wanting to manipulate documents, a low-level SPI close to the CMIS protocol for developers wanting to implement a client or a server, and default implementations for all of the above. Chemistry aims to cover both the AtomPub and SOAP bindings defined by the CMIS specifications.

with success of this project there might be an open source CMIS reference implementation not coupled with a existing CMS and therefore has the change to on the one hand provide a potential good starting point for vendor specific integration work and on the other hand a reference makes the specification "real" and touchable.

see also:

win32 memory management

i believe that one year from now 64bit operating systems are mainstream in server infrastructure. by the way currently there are still 32bit windows server / software out there. therefore it's time to understand limitation you leave behind using 64bit OS. so don't miss

EXtensions...xml processing

you always faced with "missing features" if you working with existing standards.

two reasons for that:
  1. the feature is specific to your domain or solution
  2. the feature is common but for some reasons (and there might be good ones) didn't get it in the existing standard
for the second case common EXtensions might be the area you looking for. in the xml domain the following projects trying to define "standard extensions". this means that extending an existing standard out of the scope of the standard process itself but without loosing the portability of applicaton using the extensions.

beside EXSLT which is well-known and already produced a sufficient amount of output all other EX communities are brand new and therefore doesn't provide any output so far.

watch and contribute......

Thursday, April 23, 2009

Programming is Fun But Shipping is Your Job

pretty old but always valid post on common shortcoming of developers.
one of the most critical part of each software project is to get the last 2 % of the job done. regardless which development process you choses (waterfall, agile, rup, ...) you are always faced to get the team focused until the stakeholders of the project are really happy (and not only the developers creating the assets).

identify content: powerful but tricky

regular expression and xpath are two approaches to identify a matching subset of content within a given amount of content for further usage. the first one based on plain text the second one based on xml.

both are powerful but if you not using it on a day by day basis are tricky and error prone. error prone means that you have to avoid all edge cases where a given expression shouldn't match but it does or the other way around where a given expression should match but it doesn't.

if you work in XML related world you often need both approaches e.g. using XSLT or XQuery.

there are few commercial IDE's which helping out to develop the right expression for the required use case but there are two which i'm often use dedicated to help you dealing either with regular expressions or xpath


powerful standalone tool to create and verify xpath expression. support for xpath 2.0 and most features you ask for in this context. this tool is based on well-known Saxon for xpath 2.0 and .net subsystem for xpath 1.0



powerful standalone tool to create and verify regular expression. with buildin regex analyser and on-the-fly validation, code generation (for .net and vb only), ....


Monday, April 13, 2009

... too long.

if you read

i would say, too long. because i really do not have any statement of goal and expected improvements in the mentioned list.

last statement is at least strange:
If you add up the weeks in brackets you will see that it comes to 26 weeks which is six months. This is purely an estimate and project length really depends on the workload of the Project Team as well as the amount of data involved.
if only those two constrains are responsible for project length is do not expect any measurable outcome beside "making IT people and consultants happy"....

thats also a good example for "IT centric projects likely to fail"

save money means save money

the following blog post from Mark Woeppel "Is Your Continuous Improvement Organization a Profit Center?" summarize one of the major misunderstanding of continuous improvement in many companies. not too much to add, beside that i can agree to the suggestion within the post based on personal experience.

same statements must be made for any (IT) project claim to improve existing processes. while working with the stakeholders to identify the scope, requirements and more important the rational for those i often hear the statement that a particular requirement / process change / feature improves the process and therefore saves money.

after the project is finished and in operation, the same amount of people with at least the same amount of budget working on the "improved process" and the upfront investment already gone. based on simple calculation, the company just loose money with such a project (even if the cause - see below - might be different).
that leads to continues and increasing management distrust in changes stated as "money saver" and subsequent projects are more likely to rejected from senior management.

  1. project scope or requirement's simple wasn't target oriented. means they do not really help to improve the overall process they more or less leads to "doing the same in a different way".
    that is a common project failure and can be avoided if project / requirement / system management skills are improved or more simple by doing a better job. the project teams fails to address the goals.
  2. other process parameters are changed as well. for example the process throughput increase, additional process outputs are created or budget / people now assigned to other activities within the same department to resolve bottlenecks in this areas, ....
the second one is the more important one, because this means that the change has an impact but the company isn't able to express / see this in terms of cash. means budget is simple moved from one service item to others.

how to avoid this?

divide the individual rationals for a project in measurable and independent topics and for each of them assign responsible person from senior management. means if a certain amount of requirement state to save money, the budget schedule for the period the result of the project is in operation must be reduced. if on the other hand upcoming new demands must be addressed those must be implemented in budget schedule as dedicated position.

sounds trivial, but believe me that does not happen very often in real world situations. the reason for this is simple -- it is a pretty hard work and sometimes not easy to define the saved money. This not only helps to get project agreed, provide traceability for senior management but also tries to get the bottom of the stated effect (which sometimes simple is not right which leads to effect described in 1. reason).


the different goals and results can be traced and measured and if a project or few parts of a improvement projects intent to save money, you have to save money means you have to have reduced budget if the project succeed.


save money must mean saved money at the end. All other "statements" and explanations are not valid especially in todays business context....

cash is kind

"cash is king" is a principle that many companies today have to follow. this simple means all decision must follow to increase cash flow.

for all activities and departments of a company with no direct impact to the core business process this means that they are forced to immediate reduce their budget to a minimum / essential value.

HR, IT, TechDoc, .... are examples for affected units.

the current situation is not designed for mid or long term investments in the mentioned buisness areas which "only" improves long term operation. what you need is short term benefit without too much investment.

good opportunity for outsourcing, isn't it? i expect significant amount of companies try to reduce their supporting activities through outsourcing services even if the services doesn't fit 100% to their specific needs and the missing parts isn't essential for daily operation.

what do you need for service outsourcing:
  1. service function must be fully defined (in technical context this means that the interface must be fully defined)
  2. service availability must be defined
  3. service security must be defined
    =>what happens with the data maintained by a different organization?
  4. service termination must be defined
    =>what happens if service is moved in-house again or moved to different service provider?
  5. price must be defined
    =>pay per use / per time period / ?
having this in mind i expect increasing request to ASP / PaaS / SaaS / PRaaS buisness models and companies can provide such services in certain domains.

provide IT infrastructure (software & hardware) for daily operation for a dedicated amount of money.
=>IT assets based

provide a dedicated platform to implement and host a application for a dedicated amount of money.
=>development asset based

provide a dedicated set of software features for a dedicated amount of money.
=>software application / function based

provide a dedicated Process / Business function incl. the corresponding IT asserts for a dedicated amount of money.
=>business process based

the amount of semantic involved is increasing from top to down. therefore the potential value but also the potential risk for the requesting company increase. todays knowledge and IT infrastructure make usage of those business models not only possible for big, enterprise companies but also for mid and small size ones.

but beside few other points you still have to trust in trust if you want to go this direction.....

Sunday, April 12, 2009

model a DITA compliant model

DITA today is more and more adopted at least in the techdoc domain. the success and adoption rate is based on two major advantages over other existing standards:
  • adaptability
    the DITA data model can be adapted based on defined rules to specific needs and domains. this concept is called specialization
  • modularization
    content creation is not based on document paradigm but on module paradigm. authors no more create documents they create topics which represent some artifact of the system they describe.
problem for an information architect

since i started working with DITA i reviewed several data models from coworkers, from customers and last but not least created by myself. All of them claimed to be DITA compliant.
Using them with also DITA complaint tools often fails for certain features or in some cases completely.

  1. the created models wasn't valid according to the released specification
    creating a complaint DITA model (DTD or W3C Schema) requires to know all rules and requirements the DITA specification provides without having any tool support goes beyond the support of creating regular DTD's and W3C Schema's. this means the information architect is leaving alone with the underlying complexity.

    i do not know exactly but based on data models i'm faced with i expect that at least 50% of customized DITA models out there are not complaint to the DITA spec.

    few years ago i already posted this here:
  2. the used tools requires additional, tools specific configuration or semantic or simple doesn't implement a feature according to the specification
the second reason is getting better over time and today you find a bunch of tools which are production ready even for enterprise usage.

the first reason still not resolved.

you might argue, why not use the DITA model out of the box?
  • if you have specific requirements in your business process and you require additional semantic to support those
  • if you have to simplify the usage of content creation for the authors and users to get better and more consistent content outcome.
    note: i consider DITA subsetting / configuration as just another way for the generic concept specialization
  • if you introduce new business domain / taxonomy into your content maintenance strategy.
issues if your model doesn't really confirms to the specification?
  • interoperability is no more guaranteed
    this is mainly a problem for the enterprise and in most cases not at the beginning of the usage of DITA
    but if you once look into this problem you forced to fix all content created against your model or adopt the processing chain your content is delivered to
  • process chain does not work
    you add new tools to your environment and certain features doesn't work. you again can fix all your existing content and the data model and tools or adopt the tool with certain workarounds
this week i stumbled over DITAworks modeling module which is the first tool I'm aware of addresses the modeling use-case in the DITA domain.

I'm not yet verified it to see to until which extend this tool supports the modeling / validation process and how much manual work is still involved and how complete the implementation currently is. because this tool is still beta and is brand new i expect much outstanding work but i hope this tool opens the door for speed up DITA related development work and the more important part of the story improve quality for DITA related information models.

i know that the amount of money / amount of customer can be found in this domain is pretty small and the complexity of the problem isn't small enough therefore i do not expect too much competitors in near future, but i might be wrong.

time will show....