TRENT: 2010

Monday, November 01, 2010

Open Source TMS: optentm2 / Open Source Localization framework: okapitools

The open standards in the world of localization becoming more and more mainstream. XLIFF as the main data model to carry localized information between the different automated and manual steps within the localization process, TMX as a format to exchange translation memories, ... the full list of standards can be find here http://www.opentag.com/okapi/wiki/index.php?title=Open_Standards.
This enables more and more open source or free available tools to support certain steps in the localization process without dealing with proprietary and complex formats and conventions.
Few weeks ago the new project "opentm2" released first stable release. Based on IBM TranslationManager this infrastructure claims:

OpenTM2 provides an open platform for managing translation related activities with enterprise level scalability and quality. It serves as an open yet comprehensive localization tool that provides that integration platform. Ultimately, the goal is to create a cost-efficient and high-quality localization deliverable.

Promising and goes into the same direction as GlobalSight mentioned in a previous post: http://trent-intovalue.blogspot.com/2009/05/open-source-tms-globalsight.html.
In addition infrastructure helps to automate common tasks in localization and dealing with the mentioned standards are available as open source. The project "Okapi framework" provides a pipeline and several useful steps in localization processes, like

read several input formats
http://www.opentag.com/okapi/wiki/index.php?title=Filters
common steps in localization process
http://www.opentag.com/okapi/wiki/index.php?title=Steps
connection to common translation resources (like TMS)
http://www.opentag.com/okapi/wiki/index.php?title=Connectors

In addition useful tools to e.g. test text segmentation rules based on SRX and provide a Java based infrastructure which can be embedded in your own application makes this framework something you should look at if you work for / in localization process.

Hopefully the story continues.....

Link: Image Search Tools

Short post for useful list of image search tools posted here "7 Image Search Tools That Will Change Your Life". With the help of such tools you might be able to find images and not only search for it.....

DITA - Beyond OT

DITA-OT (Open Toolkit) is the reference implementation to transform DITA source into various output formats. The reference implementation is open source and maintained by increasing community:

The OT using Apache ANT as pipeline infrastructure. Thats OK in general, but there are several shortcomings once you have to integrate or extend the OT implementation for enterprise use cases (see e.g. discussion "Pipeline refactoring", "Diving Into Performance Improvements").
Because DITA-OT is "only" a reference implementation there might be other implementation out there already using a better approach because they started development once the limitation of the OT implementation already known?
Yes there are two available I'm aware of with different key aspects:

XMLmind DITA Converter (see http://www.xmlmind.com/ditac/what_is_ditac.html)

Key Aspects: Easier use, integration and improved output.
Reality: Bad and monolithic design with hard coded java based pipeline and a small user community. No real advantage compared to existing DITA-OT.
DITA XProc Pipelines (see https://community.emc.com/docs/DOC-8740)

Key Aspects: greater flexibility, extensibility, portability, performance.
Reality:
Implementation based on XProc (XML Pipeline Language). The design is one step in the right direction and shows the much higher scalability (functional and non functional) of this design, e.g. in case you want to use a extended semantics for validation you can add ISO-Schematron based rules for validation using existing "ISO Schematron schema for DITA" stylesheet and build-in "p:validate-with-schematron" step in XProc and add it into the existing pipeline.

This implementation isn't perfect, it mainly use XProc markup for the implementation which makes the code long and hard to read / maintain. The usage of the right language for each step which is one advantage of XML pipelining isn't consequent implemented in this implementation. Based on currently available XProc Engines (http://tests.xproc.org/results/) this implementation is not yet ready for enterprise but this will change in the near future the more real world examples are available and used in production.

The XProc based approach is promising and maybe in the future the DITA-OT development also switch over to real XML pipelining. This makes any kind of integration and extension much easier than it is today.....

Tuesday, September 21, 2010

Usage of Wolfram|Alpha

Wolfram|Alpha started round about 1,5 years ago with big noise in the IT related press. Their goal to make "systematic knowledge immediately computable and accessible to everyone" (see http://www.wolframalpha.com/about.html) was promising from the beginning.

Up to now they make huge progress and their engine is a sufficient help for everyone using a computer on a daily basis.

Each day you have to answer question requires specific knowledge or algorithm. In many cases you can find specific Websites helping to answer those questions. But having one source, answer those questions, combine the results for as many domains as Wolfram|Alpha do is unique on the Net (AFAIK).

Samples

I have to offer 10,000 USD, how much is this in EUR?
Simple type e.g. "convert 10000 USD to EUR" - viola.
I want to arrange a telco with someone in Shanghai on a given date / time?
Simple type e.g. "12.10.2010 16:00 in Shanghai" - viola.
What is the human readable date / time for a given unix timestamp?
Simple type e.g. "unix timestamp 130464000 to cet" - viola.
Infos for a given Website
Simple type e.g. "www.orbeon.com"
What the hack was port 541 used for
Type "port 541"
...you see what I mean?

Once you want to use or share a certain query you can create a so called widget and bookmark it, embed it into your website, .....

creating a wizard to convert any given unix timestamp to a given named timezone (specific query shown above) i created the following widget (it takes me 10 min. of my limited time ;-)

Limitation

There are still lots of limitations, e.g. for many knowledge domains only US based sources are available. The combination of knowledge is still limited, a fancy query, like "weather the day Jimmy Hendrix died" does not work for many scenarios right now. But they getting better each day....

The end of power point....

My previous post (see http://trent-intovalue.blogspot.com/2010/09/end-of-documents.html) shows that a linear sequence of information is not a perfect match for the expression of complex technical content.
That is also true for presentations which are often supported by power point slides. If you already tried to introduce a complex topic and interact with the audience you might now that the right information is always a slide away.

Why all office application re-create power point instead of thinking of more flexible concepts? I'm not really sure. Few days ago i stumbled upon Prezi which tries to goes a different route. The content is organized in a tree of topics and navigation is much more focused on context / audience and concrete situation.

Sample:

Why should you move beyond slides? on Prezi

This application is no silver bullet and not perfect at all but at least shows that different concepts of software based presentations are possible.

The end of documents....

Assumption

If you have to describe any technical subject you are aware that knowledge is hard to express in a straight linear sequence of information topics. In most cases you have to structure the information into information tropics and semantic connection between those => a network of information topics.

Nothing new and a trivial statement, you might say. All semantic concepts are using those principle buildup.

Yes, but why most technical subjects are still using documents or slide shows to express technical subjects?
Those media formats are linear by design. The reader has to follow the one and only linear flow defined by the author of the document. In the best case the author is able to find one of the sufficient linear paths through the network of information and the reader is therefore able to understand the described subject. But even in this case getting the hole picture, identify ways to extend the provided information, embed it to different subject etc. isn't possible or at least requires to re-construct the information tree in mind.

Ask yourself why you are using

A word processing software to define project information (requirements, design specification, test specification, ....)
Power Point to introduce a particular problem domain
A word processing software to trace a result of a workshop (also known as workshop protocol)
....

The answer is simple. Because we simple get used to and there are no mainstream alternative media formats out there which can be used without at least one significant constrain (effort to implement and train, difficult to share, ....). The complete office suites still remains rooted in the old linear concepts. Even new players in this business adapting this paradigm (e.g. Google Docs).

Alternatives?

I'm pretty sure that in the future documents will be replaced with applications which providing a way to describe topics as short topics and makes it easy to connect those topics with semantic links (e.g. depends on, contains, .....). A document in this scenario is just one path through the network of information for one particular use case. This kind of application can replace todays word processing software without loosing any important feature.

In the domain of technical writing topic based authoring (e.g. using DITA information architecture) becomes more and more popular. The main use case there is to (re-)use information as much as possible to reduce creation and information maintenance costs. In my point of view that is "only" a important side effect of having the content defined in a much more usable form. Not a linear sequence of information reflecting a group of authors view but as more or less complete set of information topics linked together. The todays results are still linear documents of some format (pdf, online help formats, ....) but that only corroborate the belief that linear documents are mainstream.

Open Issue

The usability of topic based authoring isn't sufficient today. It is more or less a hand crafted creation of enriched information. To dive into the mainstream usability is the most important factor. The creation and linking of information must be at least as easy as using e.g. Mind Mapping tools (e.g. FreeMind, MindManager) combined with easy to use structured topic content editor (e.g. tools like Xopus or XMAX goes into this direction).

There’s more to come? Lets see....

Tuesday, August 17, 2010

Open Source CCMS

stumbled upon "Calenco XML CMS" a open source (AGPL) CCMS (for further infos about the different kind of available CMS domains, see my post: http://trent-intovalue.blogspot.com/2010/03/stumbled-upon-microsoft-sharepoint-cms.html).
From the list of features published on their website it looks promising. It seems to me the first CCMS application available as open source. I not verified the application so far but I definitely will. Basic features of course but in case such a application get a strong user and developer community the business case for the small CCMS vendors might be tricky. Will see...
The now implemented support for DITA 1.1 and Docbook 4 and 5. The amount of covered feature is also a matter for the verification.

Wednesday, August 11, 2010

usage of open source

Working in software development domain focused on XML processing you have many major and stable open source infrastructure to use.

But how do you decide to use a certain project for your specific needs? you have to consider the following basic questions:

what is the goal of the project and does it match to the goal of your usage?
If you are not able to answer the question or if both goals doesn't match the roadmap is likely to mismatch => you cannot participate on improvements and in same cases are no more able to upgrade to newer version.
how big is the user community?
the more people using the project the more use-cases are implemented and considered to be stable
Is your use case similar to a significant group of the user community?
same rational as in point one and two
How many active developers working on the project?
The activity multiplied with the amount of developers divided by the size of the project gives you an impression how mature the current code is and how sufficient development will be

note: big is no value for itself. a small but focused developer community sometimes is a better choice as long as their interest for the project stays stable
How active is the development?
Same rational as above
Does the license fits to my use-case or company policy?
And of course the most important one: does the project fits to my system requirements?

How to get those information? You can investigate the projects website, taking with the project community, the developers. And you might use Ohloh (http://www.ohloh.net/) which is a community for open source developers provides information around open source projects, the code and the community. Great source for open source....

XQuery Design Patterns

Nice summary of XQuery Design Patterns (see http://patterns.28msec.com/). This side also contains a link to a online XQuery engine based on Zorba (see http://try.zorba-xquery.com/).

The real power of XQuery comes with a database hosting the data for dynamic retrieval and processing. Today there are several databases out there supporting XQuery processing, even as open source.

Open Source

eXist (http://exist.sourceforge.net/)
XQuery sandbox here http://demo.exist-db.org/exist/sandbox/sandbox.xql
sedna (http://modis.ispras.ru/sedna/)
XQuery sandbox here http://wikixmldb.org/xq/
BaseX (http://www.inf.uni-konstanz.de/dbis/basex/)
XQuery sandbox here http://phobos101.inf.uni-konstanz.de/basex/demo
brandnew TNTBase (http://tntbase.org/)
XQuery sandbox here http://alpha.tntbase.mathweb.org:8080/tntbase/ores/Try.html
uses Subversion and the Berkley DB XML. This implementation adds more XML processing infrastructure which intends to be a XML Pipelining infrastructure including a XML database.
Oracle Berkeley DB XML (http://www.oracle.com/us/products/database/berkeley-db/index-066571.html)
There is a commercial version available as well.

Commercial

The big 3 relational db vendors (MS, Oracle, IBM) provide XQuery support. This approach combines relational storage of xml fragments and is therefore only suitable in certain use-cases
MarkLogic (http://www.marklogic.com/)
the most powerful XML DB currently available in my point of view.
no online sandbox available, but a free developer edition can be downloaded from http://developer.marklogic.com/products
....

Monday, May 17, 2010

Why, What and How

If you want to define a RFP (Request for Quotation) / RFP (Request for Proposal) or you have to answer those requests you are forced to define or understand requirements for a particular product or solution.

There are lot's of good sources showing how to write and handle requirements. But often they miss one important fact which makes either the creation, prioritization and understanding of requirements much easier. The Why instead of the What.

Rational

First set the baseline for effectiveness. (Why)
Than define what you need to be effective. (What)
And at the end define the efficiency. (How)

What is the rational, the reason for a particular requirement. If you try to understand the Why or if you forced to define the Why the resulting requirements / usage of the requirement are much more valid than without doing this.

The "What" does not tell anyone the real intention of a solution. The why provides the motivation and business value and therefore the baseline for effectiveness which makes or makes not a requirement right to exists.

ToDo

Before you start to define any requirement try to define the major "Whys" you intent to solve. Keep in mind that there must no requirement which cannot be derived from those high level whys.
derive each use-case / requirement from one of the major why areas and add a specific rational for the individual use-case / requirement.
classify / prioritize the requirements based on the rational.
Derive the "How". In this step the statement of efficiency is the key for selection of the right solution.

Result

The defined requirements are much easier to understand and alternatives can be much easier identified and qualified.
Prioritization can be much easier coordinated to management because the consequence of a decision for the "Why" and therefore the company / department goals are always clear even if someone does not have detailed knowledge of the subject matter.

=>ensure effectiveness
Definition of the system requirements / How can focused on efficiency.
=>ensure efficiency

To see the same idea from sales / marketing point of view watch: Start with Why: How Great Leaders Inspire Everyone to Take Action.

Sunday, May 16, 2010

facebook - openbook

if you have a facebook account or knowing people having account, try http://youropenbook.org/. I'm always amazed about the data people commit to companies like facebook.

try it out as long as it is available.

XSLT Version 2.1

XSLT Version 2.1 Draft published. As mentioned in the published draft the main goal is making streaming of transformation easier for implementors.

Streaming makes life easier whenever you have to process huge input streams of information to reduce memory footprint and deliver already processed chunks to the next consumer before the complete stream is successful processed.

On the other hand XSLT 2.0 which is a recommendation since 3 years is not widely adopted from implementors of XSLT engines -- Saxon, AltovaXML and few commercial ones-- so far.

Why? The main reason is complexity compared to the corresponding benefit. The implementation of a XSLT engine passes the 2.0 conformance tests is expensive and complex compared to the benefit for most users / use-cases. As in any commercial product each new feature / requirement should be qualified against the value and cost / effort it brings into the standard.

The interesting question would be -- how to measure benefits of a feature? How to interview the "users" of a standard? There is no trivial answer and that might be the main reason why standards tend to get over complex.

Monday, April 19, 2010

CMIS beta implementations

good overview of existing CMIS implementations (client and server) can be found here:

http://www-10.lotus.com/ldd/lqwiki.nsf/dx/11122008094143AMWEBK95.htm

because CMIS version 1.0 is not final approved yet (see also http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=cmis) all of those implementations can be marked as "beta". Hopefully the progress of the standard will end up in a state different to what happens to WebDAV where the initial hype never made substantial progress in major and public available products and infrastructure components.

time will tell.....

Monday, March 08, 2010

CMS of what kind?

stumbled upon "Microsoft SharePoint: The CMS Killer" blog post. A personal view how MS SharePoint fits for ECMS use cases.
The results might be wrong or write depending on what your understanding of CMS is.

Problem

There is no common understanding of the the term CMS and even not for the derived term ECMS.

Cause

CMS means Content Management System. Based on this definition it is a application (system) to manage content. Thats trivial but what does content really means? Content is all and everything. Most of the content can be managed within IT systems as well.

Illustration

General categories of content maintained by IT applications:

Structured content
Content maintained and structured as a collection of data (order or offer data). Content in this context means a collection of records with given structure
Those type of data are typical maintained with applications called ERP or other kind of systems of this type (e.g. ALM systems).
Unstructured content
Content maintained within documents. The content within the document is not addressable outside the application the document was created with (the semantic makes only sense in one specific usage scenario).
Those type of data are typical maintained with applications called DMS or Web-CMS (maintaining HTML) systems.

The propagation of xml introduce a third categorie of content:

Semi-structured content
Content used to create documents (information products) of some kind. The content within the document is addressable outside the application the document was created with.
OR
Content maintained as system independent instance of data (e.g. order or offer data).
In both cases XML is the common format today.

Each CMS has a history in one of the mentioned areas and has its specific strength and weakness for usage in different domains.

The corresponding consultants knowing the mentioned issue and trying to create domain specific names for specific usage of systems

DBMS
Main goal is to maintain relational data and used by dedicated applications on top.
DMS
Main goal is to maintain documents
Web-CMS
Main goal is to maintain intranet / extranet / internet sites
ECMS
Enterprise-CMS
Main goal is to provide enterprise ready workflow and records management on top of DMS feature set
CCMS
Component-CMS
Main goal is to maintain content stored in XML for single-source publishing
?
anything i missed, of course there are plenty of buzzwords / domains out there describing mixed-scenario usage.

Because non of the mentioned definitions are "formal approved" the specific vendors use the term with most customer attention in their target domain and each vendor tries to fulfill use-cases from other domains as well. Because each CMS has a particular implementation history the usage in most cases is limited even if the vendor claims to cover most of them.

Examples

a C-CMS vendor might support xml usage and publishing very well but does not scale if enterprise workflow or records management is required.
a ECMS vendor has enterprise BPM support build in but lacks of sophisticated xml semantic and functions.
a Web-CMS makes creating your Internet presence easy but lacks the usage of the same content for printed documents
.....

Summary

This reflects the current state of content management. I expect in the next few years that new and maybe existing systems will move into the semi-structured content area and some of them might succeed. They might reach the final goal that content can be create, maintain, re-purpose and publish based on different user communities from one single source.

Until that stage is reached....coming back to initial "Microsoft SharePoint: The CMS Killer" statement. Ask the author what kind of content use-case he has in mind and you can validate the statement.

Sunday, March 07, 2010

we live for our customers....

....really?

do you work with customer centric mindset?
do you work in a customer centric organization?

your first and quick answer might be yes, of course. my company does everything to make the customer happy, thats where our revenue comes from. of course.....

That was my first thought as well.

Than i looked at several websites incl. my own company one. you find service descriptions, white papers, success stories, awards -- everything focus to promote the own portfolio of services / products.

Have a look at your company's website. Looks it different?

To understand the customer means to understand their challenges, needs and questions they have. You have to answer the questions: what is the business value you can bring to your customer instead of leave them alone to select a specific product or service you offer.

What is the reason for that?

It is much harder to define and understand the problems your customer have and how you can provide business value for them using their terms and definition. Instead showing what you did and do and let the potential customer decide if that might help them is much easier.

By the way you will be much more successful if you at least try to think and follow the problems your potential customer have to solve....

top most dangerous programming errors

the list "Top 25 Most Dangerous Programming Errors" makes clear that the software domain still has much to learn. do you involved in software development projects either as project manager, software architect or software developer? if so you are sure that the mentioned errors not happen in your daily work or the subject of your daily work?

i think it is still much work to do until software development can be called a "proofed engineering discipline".

the more application will be hosted in the cloud the more such kind of issues will be mission critical for the target group of the application.

by the way the list is a good reminder of low level non functional requirements and topics for peer-review....

Sunday, January 17, 2010

DITA DTD: tools to support specialization

if you are new to DITA and want to create a custom DTD configuration with or without custom specialization this tool is a good starting point.

you may ask why, at least if you do not want to introduce custom semantic to your data model you should at least define which domain belongs to you. a summary why to go this way, read http://drmacros-xml-rants.blogspot.com/2007/04/dita-standard-practice-always-make.html.

Online version of "DITA DTD Generator" is available as online version. source code can be found here: http://code.google.com/p/dita-generator/.

If you have to support custom semantic (which is common in enterprise usage of DITA) you might use "DITA Visual Specialization Manager" or read "DITA Specialization Tutorial".

As always using the right tool is only the baseline. The more advanced task is to identify what you need based on your business case.

Negate expressions using regex

a not so rare question using regex is "match all strings doesn't contain word foo". that isn't something regex is made for. by the way searching the web shows that the expression

(?:(?!REGEX).)*

where the expression REGEX must be replace with expression must be negated, e.g. (?:(?!(foo1|foo2)).)* returns true for all strings doesn't contain foo1 OR foo2.

more details and background can be found here: http://www.perlmonks.org/?node_id=588315#588368

Saturday, January 09, 2010

utilities, utilities

daily work on Windows is pretty unproductive without so called utilities. small tools and extensions increase efficiency.

Hanselman Ultimate Tools List provides a good summary of utilities for Windows. Many of them i used as well for years.

Friday, January 01, 2010

agil development means valuable end user assistance

gile development methods have become more and more popular during the last few years. Especially the software domain tries to speed up development cycles and deliver new features on demand of the customer. most common method today is SCRUM.
there are many good books, articles, forums and blogs out there describing the goal, advantage of diving into agile software development. but less or zero of them covering the creation of end user assistance, like online help, for the different users groups and additional, required information must be shipped with the product.

#1 principle of agile manifesto

Our highest priority is to satisfy the customer
through early and continuous delivery
of valuable software.

"valuable software" of course means one the customer is able to use, therefore end user documentation is essential part of the software development process.

that means that you have to change your process to

no more write documents / additional information instead
write valuable information assigned or integral part of the product component it belongs to
a document is one potential output if a customer value can be achieved with this type of output
no more prioritize, test, deploy and measure customer feedback for the application and the user documentation as separate applications instead
while working on a particular feature prioritize, test and deploy all valuable user information and work together with the customer to get feedback for the complete application (incl. user documentation).

sounds simple, but as you know that isn't simple at all. many companies currently workaround the mentioned issues or just ignoring the resulting consequences.

by the way one of the few blog posts I found "Writing End-User Documentation in an Agile Development Environment".

what is your experience and solution to this challenge?

do you discuss end-user documentation requirements while sprint planning / review...?
do you prioritize end-user documentation along with the corresponding user story?
do you test your end user documentation while test your application?
....?
do you localize user documentation while localize your application?
do you use dedicated professional writes or other team members for creating end user related information products?
....?

predictions for 2010

same procedure as every year but as Yogi Berra mentioned

It’s tough to make predictions, especially about the future.

so just stop trying to be one of currently 5,130,000 tries according to google or 12,600,000 according to bing.

In this sense --

The ultimate function of prophecy is not to tell the future, but to make it. Your successful past will block your visions of the future.

Joel A. Barker