Tuesday, March 31, 2009

content & service composition: small and simple showcase

if you want to see and learn how easy a information aggregation use case incl. corresponding presentation can be solved take a look at "Make dashboards with XQuery".

this sample is all about composition, from content and service (functional) point of view.

most of the concepts required in the file of information processing are involved. even if the implementation has drawbacks and limitation in several points you see how information centric requirements can be solved.

Sunday, March 29, 2009

just another standard to manage content

CMIS is another try to provide a vendor independent content exchange api. major ECM vendors, like MS, Oracle, IBM and many others are joined the corresponding OASIS TC to release the first version of this new standard.

good overview of the current state can be found in "CMIS meeting notes".

I'd like the idea to have a common accepted and wide adopted standard to maintain resources and all related concepts.

major advantages
  • decoupling of client implementation dealing with standard related concepts from server implementation
    =>decoupling
  • usage of several server implementations through single client implementation
    =>integration
  • adaption of common and rich infrastructure for specific needs
    =>enrichment
this leads to the core vision i have in mind talking about such a standard.

Vision


core infrastructure of a content management is provided by regular OS client infrastructure and specific needs are implemented on top of those core layer. a wide range of server side infrastructure can be used and integrated without adding IT complexity.

WebDAV

we already heard about those vision, thats what WebDAV claim to provide....
why not working stronger on WebDAV instead of re-inventing the wheel? looking at the world of WebDAV shows the issues in this area:
  • many incomplete implementation out there
  • integration in regular OS infrastructure is half-assed and error prone
  • WebDAV is implemented as an additional API with limited maturity and attention of CMS vendors.
  • DeltaV extension only supported by a very rare and not widely used implementations
  • ....
of course, WebDAV standard is still not feature complete. especially search interface and concept of typed links are not yet provided by the WebDAV standard. but adding this as additional extension is not a big deal (see also http://www.webdav.org/specs/rfc5323.html).

the major issue with WebDAV is simple the lack of robust and complete implementation. the main reason: it is easier to define a standard than fully implement those standard and resolve existing issues within the standard.

the major members in CMIS TC are the same as those working on WebDAV standard. i personal hope that the success and sustainability of implementation will be improved. otherwise there is just another standard no one takes care of once it is 80% complete.

i do not compare both approaches because this is another story but in terms of simplicity WebDAV is currently in pole position. more to come......

Friday, March 27, 2009

Windows Unlimited?

i often have to deal with application server design and deployments based on 32bit Windows server OS.

if you have to maintain more than one application server on a single host and for some reasons have to have more virtual address space for one application server you have to deal with /3GB switch enterprise server OS provides.

the expected load -> request to each running application server -> amount of threads on each application server must be defined carefully and requires more OS in depth know than I'm willing to have.

you have to understanding how the OS deals with system resources and in this context i stumbled over a good blog post
Pushing the Limits of Windows: Paged and Nonpaged Pool
which describe the limits of Windows pretty clear and understandable even for guys like me ;-)

not often read such comprehensive and focused information for this particular topic. thanks Mark for that.....

Wednesday, March 18, 2009

open usage of sequence of data points

lot of important data out there are simple "Time series" data, means a sequence of one or more data points change over time.

a service to share and use such data is Timetric. currently the amount of user and useful time series are small but in general such pretty platform can deploy common time series from many different domains.

problem

who takes care that the shared data is correct and therefore is valuable to use? all services based on public contribution and usage are faced with the same issue. do you trust the data you see? do you trust wikipedia? in general you should not. you have to double check at least 2 different sources before you use the provided data.

in addition once you double checked your data you have to make sure that the quality of data is guaranteed over time. that is much more difficult.

solution?

if the data is mission critical you should not use data before validating them. in case of non static data you have to validate the data each time they change. this means that you either need more than one data source as service which are not based on same data source or you have to look and buy commercial services takes care of the provided data or you request the service from the organization owns / collecting the data. each of those solution requires special handling for the particular domain.

summary

availability of public data service are promising but i currently do not see a available model to trust in. therefore usage is pretty limited only for some kind of "outline view"

Sunday, March 15, 2009

data processing workflows

if you are faced with data processing workflow which requires to process / transform a huge amount of data in a limited amount of time this can end up in pretty complex implementations. if you have enough hardware to do the job you need a infrastructure makes use of the hardware.

using hardware for a limited time today isn't a big issue. the cloud infrastructure out there (e.g Amazon ec2) is perfect if you have to process a huge amount of data in limited amount of time for a limited duration. you are able to scale the usage of the required hardware for the time they are required and just pay for the required duration.

now you also need a ready to use software infrastructure to implement the processing workflow. MapReduce is a software infrastructure for such kind of problems.

Apache hadoop implement this MapReduce but it lacks of easy of use means is a pretty low level infrastructure and of course lacks of higher level workflows which is not defined by MapReduce.

Cascading closes the gap. based on "stream processing" the MapReduce pattern are applied and used. it is not too complicated within one day i was able to create a simple applicaton which convert 20 TB of svg data to jpg and doing some transformation in between using batik and 20 concurrent hardware nodes.

for some use cases the power of cloud is easy to tell....

eat your own dog food

if you in a position to hire a consultant you would ask which one to choose? in each business domain the amount of consultant are huge in each domain.

two things you should consider (after looking into the slides they present):

  • don't hire someone who leaves before the suggestions are implemented. so each project tries to involve external consults should make sure that the involved people are not only responsible to provide some kind of "best practices" but also to implement their usage in the particular case they are hired
  • don't hire someone who do not eat his own dog food. means ask you're potential consultant how comparable cases are solved by his own company or his or her own daily business.
    each problem can be traced down to some analogue problem which requires the same approaches you faced with. if you talk about the daily work and solution with your consultant before hire him you get a good feeling if the external person really knows what he talking about
its often not the slides which are important its more often the individual experience and knowledge make the difference between good and excellence help in you particular problem domain.


Sunday, March 01, 2009

Information Dynamics

i stumbled over a dissertation which focus on "Information, Their Effects and Management in Supply Chains". unfortunately the paper is only available in German. quotation from the summary:
I therefore suggest focusing SCM initiatives on information processing and information efficiency in order to enhance overall system behaviour and efficiency.

take the time and read the paper, it provides a interesting view on effect and impact of information in SCM.

ODF and OOXML: interoperability issues

as already mentioned here using a standard does not guarantee to be vendor / implementation independent.

the following paper "Lost in Translation: Interoperability Issues for Open Standards -- ODF and OOXML as Examples" shows that complexity is one major show stopper in this area.

why?

OOXML and ODF try to define office documents this means that content, content structure and layout (and application semantic) must be standardized.

because the complexity of both standards is high only limited amount of vendors are able to implement 100% coverage of the standard and even if they try they are not able to prevent errors in implementation.

lessons learned?

standards should carefully consider good old "Everything should be made as simple as possible, but no simpler." (see http://en.wikiquote.org/wiki/Albert_Einstein) principle. in our context this means that those standards require certain, atomic level of conformance which makes it possible for each vendor to implement a certain and complete subset if the complete set is not possible and useful for a particular application.