Sunday, January 25, 2009

Rocks Into Gold - Helping Programmers THRIVE through the Credit Crunch

summary of "Rock Into Gold" from Clarke Ching can be downloaded at

just read.....

Pipelines and SOA -> pipelines are more than that

new book "Software Pipelines and SOA" showing the advantages using "software pipeline" approach from very technical and partial point of view.

the pattern itself has much more benefit and performance scalability is only the one of them (and in many cases not the most important one). the principle to process a flow of information in atomic, semantically closed steps is a pattern which unix itself was build on.
if you ever seen how easy even complex processes can be created based on pretty simple and atomic executables you can imagine what pipelining mean in terms of software architecture (see also

unix (and nowadays many other similar infrastructures) use a abstract stream of data passed through the pipeline and therefore the flow of information is less semantic, is is only a abstract stream of data. the usage of pipelines can be improved if more common semantic is part of the overall pipeline, means each step / operation has more information of what is expected to flow through than just data.

XML pipelining is intent to use XML as information flow. this ensures that data processing / transformation can be done using less basic byte stream operation. processing can be done on declarative languages like xpath, xquery, xslt, .... which again reduce the complexity of information access and transformation.

xml itself can express endless amount of user data using different data models. a pipeline defined for a subset of data models can again reduce the complexity and therefore can improve the benefit of a particular pipeline infrastructure.
you can imagine pipelines dedicated to transform content created against a DITA data model into endless of distribution formats. the semantic of a particular pipeline step can be used in several pipes are much higher than on "general purpose XML" level.

if we're looking into the second term "SOA" within the mentioned book title we have to divide:
  1. pipelining to orchestrate dedicated services (macro pipelining)
    because each business process can be expressed using the pipeline paradigm the implementation of SOA orchestration is suggested to do using the pipeline pattern.

    therefore you have to define the sequence / control flow of services and corresponding message transformation.
    languages like BPEL providing a model to express such kind of pipelines.

    this layer often require persistence of pipeline state because execution of such processes can take between hours and years.
    this layer often requires human steps, means not the complete pipeline can be executed by a machine without human interaction. languages like BPEL4People are extension to cover this standard requirement.

    there are many frameworks out there trying to provide an easy to start infrastructure. by the way the usage and complexity of current implementation must be still not underestimated.
  2. pipelining to solve one or more dedicated business steps (micro pipelining)
    within each business process a given amount of data confirms to specification A must be transformed into data confirms to specification B.
    e.g. extracting order data from ERP system and add sum for particular product groups before the result must be render to HTML for later display to the person in charge.

    those operation can of course as well defined as sequence of steps in which the input data is transformed into output data using multiple steps. those definition mainly derived from business rules.

    this layer does not require human interaction and persistence and therefore can be implemented on fully automated frameworks. using XML as data backbone results in "xml pipelining which combines most advantages required for "micro pipelining"

    languages like xproc, xpl , .... and corresponding implementation can be used in this area.

    in general a micro pipeline transform one input of a macro pipeline step into corresponding output(s)


pipelining is one of the most powerful paradigm we faced with for todays common IT problems. but this pattern is either new not magic its more a "back to the roots....".

Webinar: Building RESTful Services with XQuery and XRX

O'Reilly Webinar "Building RESTful Services with XQuery and XRX" happens on 28.01.09 (10:00 a.m. PST).
if you want to know how the mentioned technologies can work together.....

Sunday, January 18, 2009

SOA Patterns

site to share / read / contribute SOA design pattern. whenever you are involved in SOA implementation projects is worth to take a short look into:

Wednesday, January 14, 2009

RIA for information deployment

during the last couple of years the requirements to (software) applications changing in many ways:
  • from application to solution
    => customer is able to configure a solution based on provided services and corresponding orchestration and configuration
  • from static deployment to dynamic deployment
    =>customer is able to update a bought component via online connection. new solution feature can be added, configuration can be changed based on demand.
  • from function oriented usage to process oriented usage
    =>the application functions are embedded in business tasks reflecting the business process of different user group. different user groups therefore faced with different application behavior.
not all of the mentioned "trends" are incooperated in each application today but more and more companies providing a subset of the mentioned approaches in their products / solutions. if products changed to solution can be adopted to achieve most value for one particular customer or customer domain the corresponding information (online help, user documentation, service information, faq, ....) must follow this approach as well.

this in particular means that the information must designed to provide
  • size on demand
    each information product can be configured to fit one particular product installation. product configuration can change over time
  • update on demand
    information updates can be provided as fast as possible using "online channels". on the other hand content must be available without any "online channel" available.
  • workflow related content mapping
    information must not only map to a particular function of the product as already done with e.g. context sensitive onine helps. the information must be mapped and aligned with the workflow / buisness process the production customer intent to use the product
if you faced with such productions and you have to contribute any kind of information you know that this those change the way you have to create / maintain and deploy corresponding information products.

if we look into the information deployment process (other part of information lifecycle is part of subsequent blog posts) one of the most interesting answers to this question is the usage of RIA frameworks for that purpose.

most promising application looking at information deployment is "Adobe AIR".
that more the other RIA application can provided today but that does not mean that they do provide the possibility to be a deployment platform for information products in the future (e.g. "Google Gears" has similar goals to Adobe AIR)

next step would be to define a gap analysis which features is missing in Adobe AIR and which buisness goal is therefore not fulfilled based on current available platform. i will do this during the next few weeks...see what the results are.

Tuesday, January 13, 2009

just using xml provides Interoperability?

"The Anatomy of Interoperability" is one of the best articles summarize the issue of well know and often promised term "interoperability".

one domain often faced with this term is the world of xml and related "standards". lot's of them out there, some of them really stable and useful and even are interoperable (e.g. xml 1.0, xslt 1.0, xpath 1.0) itself.

by the way just using xml does not gurantee interoperablility for your data. this is only available if application behavior is addressed by a related standard. xml related standards try to achive this (e.g. svg) often fail or they are difficult to use because they missing essential features the specific user domain requires and corresponding tool vendors / application provides add them in a tool specific way. or the standard is too complex to implement a 100% complaint application (e.g. xlink).

DITA for example a new OASIS standard / information architecture to maintain mainly techdoc related features more and more faces with those issues. this standard has customization in mind, means specialization to specific needs is part of the design but there are of course still limitation and there a good reasons for those limitations in general.

the initial standard was not feature complete (means essential requirements were missed in user point of view) and therefore vendors /consultants / end user adding specific non complaint features for their specific needs which often results in missing the goal of interoperability.

why is DITA still successful?

to understand this you have two things to consider:
  • keep in mind that just using xml does not solve your interoperability goals without any additional effort
  • keep in mind that fully inoperable data is not always what you need. regular business cases often working well with a inoperable subset or predefined transformation on demand.
what makes xml worse to look at is the ability to access / transform the tagged data without requiring a vendor specific api (but often you need vendor specific know how to understand the application semantic) and a foreseeable effort to do this.

and that is the key feature if you think about organization specific information models.

Sunday, January 11, 2009

additional semantic for existing information sources

how information enrichment works in real world showing two public use-cases based on different technologies:
  • freebase
    "Freebase is an open, shared database that contains structured information on millions of topics in hundreds of categories. This information is compiled from open datasets like Wikipedia, MusicBrainz, the Securities and Exchange Commission, and the CIA World Fact Book, as well as contributions from our user community."

    this means that applications like freebase using already existing information and trying to add additional semantic to them based on combining and extracting information and context or in this case let user add additional semantics without modification to the source of the information.

    the tool thinkbase using freebase to provide a visual graph of information and corresponding link dependencies.
  • MailMark using a xml database (Mark Logic) as backbone for building the application just on XQuery

    the semantic comes from information aggregation and combination. in this application no additional user interaction is possible
both examples are very useful and showing real world application which you might transform into your own information landscape.....

XBRL: a language for the electronic communication of business and financial data

good overview can be found here:

Saturday, January 10, 2009

XRX gets more attention

XRX shows that more and more information are represented in xml today. that trend will continue because more and more processes today seen as what they always was: "information driven". more and more traditional "unstructured" formats are now represented in xml and more and more business value can be extracted from those formats (OOXML, Open Document, ....).

on the other hand more and more companies start to creating certain type of information (user documentation, online helps, service information) using more semantic rich information architecture as provided by dita.

that opens up the success for databases with native xml read / maintain and search feature set. they are able to provided additional value to already existing information created without the knowledge of their future use.

good summery of technologies in this area are provided by Kurt Cagle
"Analysis 2009: XForms and XML-enabled clients gain traction with XQuery databases"

Wiki: solves collaboration & information sharing?

i often hear statements that using a wiki platform will solve our company problem in collaboration and information sharing.
based on my personal experience most of the wiki project's seen in reality failing silent, means they start with more or less enthusiasm but end up in either
  • content silos with outdated, bad findable information chunks
  • unused part of the companies intranet / IT infrastructure
  • derived by only a handful contributers and users
recently published article showing the reason and rational for that behavior.

by the way there are wiki projects out there (internet -> wikipedia, intranet) which are successful.

what makes them successful?

in my personal point of view, each successful "information process" requires at least
  • definition of common information lifecycle
    - who has to create which kind of information?
    - which criteries must be fulfilled to define a information object as usable?
    - which kind of subject matter expert must a involved for which kind of information
  • and common information taxonomie
    - what kind of information must be maintained
    - what kind of common classification do we use
    - best practices for structuring the information
  • and people who create, maintain and use the information
    - training is required
    - advantages and usage of information must be part of common understanding => people must see personal benefit in using and maintaining the information
based on spirit and purpose of the wiki within an organisation those guidelines must be more or less detailed but in any extend they must be available and somehow trained / reviewed.

the most successfull wiki project Wikipedia provides the mentioned guidlines all in an open and collaborative way (

one thing does not work is to setup a wiki platform and post a link to all potential users without any additional hard work.

always remember: providing information not more but not less than hard work. the more value a information must provide the more hard work is required to create them.

Tuesday, January 06, 2009

directory of services in the cloud

directory of available API's in the cloud are available here:

the list shows two things
  1. there are lot of services out there many of them can be used free of charge


  2. the stability of usage is a huge problem
    - few listed services are moved or removed completely
    - few service definition often changed without providing a sufficient version management

Buzzwording continues?

the IT industry is very innovative in creating and consuming buzzwords. from year to year at least one new trend alongside with one or more buzzwords are created.

main reason for that success is the corresponding visibility and based on that the opportunity to get budget. the main characteristic of such terms is that there are no formal definition of what is really the essence / definition of such term but on the other hand everybody seems to have a clear and complete understanding and definition for the term / buzzword.

second characteristic of such terms is that a common trend is associated with those terms.

and last but not least the life cycle of such trends are pretty similar, approx. 1/2 year until everybody is aware of it (through publications, blog posts, articles), 1 year highest awareness incl. associated investments and at the end the trend will be replaced by next one.

that looks pretty similar to fashion industry and in my point of view there is not too much structural differences between a new fashion trend and a IT trend.

just a small list of buzzwords from the last few years:

why is it possible to make money with those trends? because all of them promising to solve real existing problems in real industry. if we use the trends mentioned above their main focus is to

  • consistent access to required information at the right time at the right place
  • get rid of increasing IT complexity
  • get rid of proprietary vendor driven information silos
  • reduce Total cost of ownership for hosting the available information within a company
  • improve collaboration between different business groups
  • improve adaptability to changing business requirements
  • ....
if we take the essence of the mentioned buzzwords and collect them again than the following is really promising
  • usage of dedicated and well defined services for business automation
  • pay for usage of a defined service level instead of paying for hardware / software and corresponding maintenance (what really cares is the service that automates a certain business step)
  • architecture that adapts fast and controlled to change of business requirements (changed SLA) and not to changed IT requirements
  • .....
the trends above of course drives the creation of standards, software and services makes those requirements easier to fulfill but still it still hard work and more than just using those nice buzzwords.....

Saturday, January 03, 2009

xml processing in TecDoc industry

based on my previous talk on German tekom conference the main rational and corresponding issues behind xml / xml processing are defined here. to avoid translation of content i decided to post the content in German. slides are available here.

Warum sollte man sich im Umfeld der technischen Dokumentation mit Pipelinesprachen insbesondere mit XML basierten Pipelinesprachen beschäftigen?

Zwei Thesen zur Begründung

These 1 – Nutzen von Information

Der Nutzen von Informationseinheiten steigt mit der Anzahl der Prozesse, die auf diesen angewendet werden.

Erstellt und liefert ein Unternehmen Gebrauchsanweisungen in Papier für sich sehr stark unterscheidende Produktgruppen in nur einer Sprache, so sind die darin enthaltenen Informationen relativ einfach zu erstellen und verwalten aber der Nutzen der Information für das Unternehmen sehr gering. Die Bedeutung und der Wert der Information nimmt mit jedem zusätzlichen Nutzer der Information (zusätzliche Online Hilfe, Sprachvarianten, Produktvarianten, Nutzung der Information in Produktschnittstelle....) zu.

Zur Nutzenmaximierung muss somit die Anzahl der Verwender einer Information innerhalb der Anwendungsfallspezifischen Rahmenbedingungen maximiert werden. Jede Verwendung von Information basiert auf der Etablierung eines Prozesses zur Verwendung dieser(Erstellung eines Handlungsanleitenden Textes in deutsch, Wiederverwenden von dedizierten Informationsbausteinen einer Sprache, Erstellung einer Variante innerhalb eines bestehenden Informationsbausteines, Publikation einer Online Hilfe, ....). Da jeder Prozess die Komplexität des Gesamtprozesses erhöht steigt der Aufwand über den Gesamtprozess des Informationslebenszyklus mit jedem zusätzlichen Prozess, d.h. mit jeder zusätzlichen Verwendung der Information.

These 2 – Prozesse auf Informationen

Prozesse auf Informationseinheiten sind zum überwiegenden Teil innerhalb eines Unternehmens und sogar Unternehmensübergreifend identisch. Dies bedeutet im Umkehrschluss, das sich die Branche im Umfeld der technischen Dokumentation mit den Auswirkungen von „marginalen“ Unterschieden befasst. Die Unterschiede liegen im Wesentlichen in unterschiedlichen Informationsquellen (Art, Ablage, Format, ....) und den zu liefernden Informationsprodukten(Unternehmensspezifische Styleguides, zu liefernde Formate, ....).

Die Vielzahl von individuellen und spezifischen Prozessen ist weitgehend der fehlenden Zerlegung der Prozesse und der fehlenden übergreifenden Standardisierung von Prozessbestandteilen zuzuschreiben.

Die notwendigen Informationsbestandteile für jedes Kundendokument müssen anhand variabler Eingangsparameter identifiziert und bereitgestellt und schließlich zusammengebaut werden. Das Kundendokument wird mit angereichert, d.h. erhält einen oder mehrere Index mit definierten Anforderungen, ein Glossar, TOC, usw. Schlussendlich erfolgt eine Überführung nach HTML, PDF oder andere Formate. Eine weitere Zergliederung dieser Teilschritte führt für jeden dieser Schritte zu einem grossteil identischer und einer kleinen Anzahl spezifischer Schritte.

Schlüssel zum Erfolg

Um den Nutzen seiner Information nachhaltig zu maximieren muss dies mit einer konsequenten Zerlegung der Prozesse in ihre atomaren Bestandteile und somit der maximalen Nutzung vorhandener Prozessbestandteile (und das zugrunde liegende Wissen darüber) erfolgen. Somit kann der Aufwand und die Komplexität für die Nutzung von Informationseinheiten im Verhältnis zum Nutzen gering gehalten werden.

SMILA (SeMantic Information Logistics Architecture)

"SMILA (SeMantic Information Logistics Architecture) is an extensible framework
for building search solutions to access unstructured information in the enterprise.
Besides providing essential infrastructure components and services, SMILA also delivers
ready-to-use add-on components, like connectors to most relevant data sources."

initiated by German based company empolis this project seems to be promising in solving one common problem while dealing with todays information overflow:

  • identification and access to information relevant for a given business task / process
  • integration of "unstructured" information in corresponding business process
used standards are of course complex and not really common used in many organizations right now but that might change in mid term....