Monday, November 07, 2011

Data-Driven Documents

Data-Driven Documents. What's that?
D3.js is a small, free JavaScript library for manipulating documents based on data.
You have a data source requires visualization and a chart library is not enough for that purpose. This little library helps you to develop the visualization using the DOM model and data driven transformation.

A good example of whats can be done is available here: http://www.visualizing.org/full-screen/16266

Monday, October 24, 2011

Reactive documents

Stumbled upon a JavaScript library to simplify the creation of reactive documents, called Tangle.

Whenever you have to explain scenarios / alternatives in a sensible manner a reactive document is one possible way to do so. Changing a parameter and directly see the impact for all dependent parts of the information / document is a very sufficient way to teach certain problems / solution scenarios.

This library makes the creation very simple (from a technical point of view) and you are able to concentrate on the content scenario which is of course the harder part.


XQuery - rare used but ramp-up gets easier

A very nice overview of what XQuery is, what are the relation to other XML based standards and why it is still very rare used is summarized here http://grtjn.blogspot.com/2011/10/xquery-novelties-revisited.html.

Once you want to dive into XQuery a bit deeper, get a feeling if the approach is sufficient for your use case you should try out BaseX. A XML database, open source. The main advantage for tryouts which are not available on alternative solution is a lightweight but very useful UI fronted for the content stored in the database.

Adding XML documents, try out queries and see how / what they match and last but not least a noticeable view of how information in a XML store looks like (from a conceptional point of view) comes aligned with easy install and ramp-up costs.

Alternatives (commercial and open source) are collected here: http://trent-intovalue.blogspot.com/2010/08/xquery-design-patterns.html

Tuesday, September 13, 2011

Valuable Information

Today I stumbled upon the following twitter post:

Every human intervention in a business process introduces a 4% chance of error. - B. Beims  

Sounds interesting and relevant in the context I'm working in. Than I tried to verify the source and basis for this statement.

  • Using google to search for the statement
  • Using google to search for the author
  • Finding second / third source for this statement
To be honest I wasn't able to verify what I have to verify and therefore  use any bit of this information. Therefore I take this statement as a trigger for this blog post - better than nothing.

That is a example of todays most common topic today:
  • more and more "characters" are accessable and flowing around the world, like "Chinese whispers" posted, re-posted, extended, ....
  • less and less of the accessible" information" (in terms of percent based on the complete available total amount of "information") is relevant or valid
  • Shorten  / context less "information" does not lead to human usable information chain
That is just a fact and reality - everyone has to deal with. To improve your personal ability to make "characters" to "information" you still have to go the hard way:

Don't use and post a information which is not verified by at least
  • a second, independent source
    or
  • personal verification
    or
  • background information which provides you with considerable background to trace the information
If you do not have time for this kind of verification - just leave the "characters" as they are and mark them as irrelevant for you. This should make your personal information chain much cleaner and helps you to divide relevant from irrelevant information.

Don't forget: It is never cheap to gather valuable information. It was never and will never.












Sunday, September 04, 2011

Kevin Slavin: How algorithms shape our world



http://www.ted.com/talks/kevin_slavin_how_algorithms_shape_our_world.html

Writing code in most cases does not mean that you can ever control the usage and implication of the results.....

HTML5 and XML

HTML5 will be the main syntax for the Internet in the next few years and will replace the today most frequently used HTML 4.01. Main driver for this shift was Google and is now adapted by all major browser / OS vendors and organizations.

The main advantage of HTML5 are the new amount of build in features which reflect most of the todays common requirements for web based applications.


Why XHTML 1.0/1.1 failed so far? It mainly was much to strict for the web community - the web and also the world is a non perfect place and therefore HTML5 is much more suitable to fit into this world than the XML based approach of XHTML can provide.

Does this mean XML for the web has loose and does not make sense at all. No there is still space for the XML based standards beside HTML5:
Major advantage of using XML to express the content on the web is a much more easier way to integrate the resulting content into XML processing chains using regular XML transformation tool chains.

  • Easier Reuse content for different channels using XSLT / XQuery
  • Retrieve content as XHTML and extract only dedicated parts (views) required for different use-case
  • Store and request the content using XQuery based infrastructure
The drawback is that today only the newest browser support the mime-type "application/xhtml" therefore for a while Polyglot XHTML might be a good opportunity to deliver the mass and keep processing use-cases doable.

A good summary of Polyglot XHTML and related XML based alternatives for HTML5 can be found here: http://www.xmlplease.com/xhtml/xhtml5polyglot/

Thursday, September 01, 2011

analyse and process DTDs

Working with DTD is still a common task for XML (/SGML) driven use-cases. Knowing this it is very amazing that there is no well know DTD visualization tool available supporting this task.

The good old "Near&Far Designer" is gone many years ago and the source is probably lost in the space of Open Text (the company bought Microstar Software Ltd in 1999). This tool is still in use by many organization having to deal with SGML DTDs (e.g. in the military or aircraft industry).

DTD documentation


There are a few open source scripts out there which converting a DTD into HTML pages for documentation purpose which are available free of charge:
There is one tool out there supporting graphical visualization, documentation and a few function to report key function within the given DTD:

TreeVision (http://www.ovidius.com/meta/download/treevision.html) from German company Ovidius. The tool is available free of charge and provides a very sufficient way to analyze XML / SGML DTDs.


Convert to XML Schema alternatives


If you have to process the content of a DTD for specific use-cases like analyzing the model based on custom specific rules the easiest way is to convert the DTD to RELAX NG (XML syntax) or W3C Schema language. Both are based on XML and therefore can be processed using regular XML based tools.

The best tool to do support this is trang (http://www.thaiopensource.com/relaxng/trang.html source is hosted on http://code.google.com/p/jing-trang/) initially created by James Clark. Compared to commercial alternatives the result is very predictable and for many use cases as good as possible.

DTDs will still exists for many years just because of the many legacy applications created around them. The amount of support is limited but still exists....


Monday, August 01, 2011

lost in email threads....

One of the most time expensive daily tasks is to identify the email required for the the current task in mind.

You know that you already received a email for a particular topic and you want reference it, you require the technical details for a certain topic, ....

Using email tags and full text search of todays email clients is a quite sufficient help to get those kind of tasks done. But once you find a particular email, they is almost ever part of a thread back and forth and getting what you want requires to get the context of the found email. To get the complete context in emails thread isn't trivial, even with thread functions of the common email tools. Most of them are limited in what is shown in particular for long running threads:
  • you loosing the message context around the identified message because the threading function re-arrange the way your inbox is displayed
  • you don't have an easy to use visibility of what really happens, what are the timings for each mail in the task the corresponding sender
  • you do not have easy navigation without loosing the context
Few weeks ago I got aware of ThreadVis a add on for Thunderbird email client.

Pretty cool, it provides a visual graph of the email thread based on the currently selected email with different colors for different sender, length indication for time durations and direct popup help for the content of each item:


You see where you are, what was and after and who was the sender. Even threads you not receive are visible. A easy navigation between the emails, and popup previous for each of the thread items.
Viola, what else could you want? Of course there are things can be improved by the way the basic idea and implementation is worth to take a look at....

Monday, November 01, 2010

Open Source TMS: optentm2 / Open Source Localization framework: okapitools

The open standards in the world of localization becoming more and more mainstream. XLIFF as the main data model to carry localized information between the different automated and manual steps within the localization process, TMX as a format to exchange translation memories, ... the full list of standards can be find here http://www.opentag.com/okapi/wiki/index.php?title=Open_Standards.
This enables more and more open source or free available tools to support certain steps in the localization process without dealing with proprietary and complex formats and conventions.
Few weeks ago the new project "opentm2" released first stable release. Based on IBM TranslationManager this infrastructure claims:
OpenTM2 provides an open platform for managing translation related activities with enterprise level scalability and quality. It serves as an open yet comprehensive localization tool that provides that integration platform. Ultimately, the goal is to create a cost-efficient and high-quality localization deliverable.
Promising and  goes into the same direction as GlobalSight mentioned in a previous post: http://trent-intovalue.blogspot.com/2009/05/open-source-tms-globalsight.html.
In addition infrastructure helps to automate common tasks in localization and dealing with the mentioned standards are available as open source. The project "Okapi framework"  provides a pipeline and several useful steps in localization processes, like
 In addition useful tools to e.g. test text segmentation rules based on SRX and provide a Java based infrastructure which can be embedded in your own application makes this framework something you should look at if you work for / in localization process.

Hopefully the story continues.....

Link: Image Search Tools

Short post for useful list of image search tools posted here "7 Image Search Tools That Will Change Your Life". With the help of such tools you might be able to find images and not only search for it.....

DITA - Beyond OT

DITA-OT (Open Toolkit) is the reference implementation to transform DITA source into various output formats. The reference implementation is open source and maintained by increasing community:

The OT using Apache ANT as pipeline infrastructure. Thats OK in general, but there are several shortcomings once you have to integrate or extend the OT implementation for enterprise use cases (see e.g. discussion "Pipeline refactoring", "Diving Into Performance Improvements").
Because DITA-OT is "only" a reference implementation there might be other implementation out there already using a better approach because they started development once the limitation of the OT implementation already known?
Yes there are two available I'm aware of with different key aspects:
  •  XMLmind DITA Converter (see http://www.xmlmind.com/ditac/what_is_ditac.html)

    Key Aspects: Easier use, integration and improved output.
    Reality: Bad and monolithic design with hard coded java based pipeline and a small user community. No real advantage compared to existing DITA-OT.
  • DITA XProc Pipelines (see https://community.emc.com/docs/DOC-8740)

    Key Aspects: greater flexibility, extensibility, portability, performance.
    Reality:
    Implementation based on XProc (XML Pipeline Language). The design is one step in the right direction and shows the much higher scalability (functional and non functional) of this design, e.g. in case you want to use a extended semantics for validation you can add ISO-Schematron based rules for validation using existing "ISO Schematron schema for DITA" stylesheet and build-in "p:validate-with-schematron" step in XProc and add it into the existing pipeline.

    This implementation isn't perfect, it mainly use XProc markup for the implementation which makes the code long and hard to read / maintain. The usage of the right language for each step which is one advantage of XML pipelining isn't consequent implemented in this implementation. Based on currently available XProc Engines (http://tests.xproc.org/results/) this implementation is not yet ready for enterprise but this will change in the near future the more real world examples are available and used in production.
The XProc based approach is promising and maybe in the future the DITA-OT development also switch over to real XML pipelining. This makes any kind of integration and extension much easier than it is today.....