Archive for September, 2007

Kettle also coming out with a new RC

September 28, 2007

This is the RC week for open source ETL! A friend of mine forwarded me this email he got from Pentaho (I need to sign up for their mailing list, never got to do it). Again, not sure when I will have time to look at this RC, but new versions are always good. It shows that Pentaho, like Talend, continue to invest in their product.

Dear friends,

Even though we had our work cut out for us the last couple of months, there
was no sign of a slowdown the last couple of weeks. In fact, a couple of
long standing items on my TODO/WANTED list finally got in:

– The debugger with breakpoints, pause/resume
– Remote execution of jobs and transformations (using job entries)

At the same time, a series of bugs got fixed too, ranging in severify from
cosmetic to blocking.

There is always room for improvement, but it looks like we’ll have to go for a
feature freeze (RC1) sooner or later anyway. Let’s do it sooner rather than
later.
We had a chat internally at Pentaho and I thought that next Monday, October
1st would be a great day to kick RC1 out of the door.

I hope you will take the oppertunity with me to do final testing and bug
fixing to make RC1 as stable as possible. For the next 4 to 8 weeks we’ll be
focussing on documentation and testing to ensure that 3.0.0 is as good as we
can humanly make it. If all goes as expected those efforts should bring us
an RC2 on October 29th and a release November 19th.

All the best,

Matt

Talend asked me to beta test their RC

September 26, 2007

A couple weeks ago I had signed up for Talend’s beta tester newsletter. Yesterday I got the following email from them. Not sure when I will have time to try this new version but it seems to address some of the points discussed lately.

Here is the email:

Dear Talend Community Member,

We are proud to inform you that Talend Open Studio 2.2.0 Release Candidate is now available. This version contains all the features of Talend Open Studio 2.2.0 and we need you to track all problems that might exist in your Open Source data integration tool, before its release.

What’s new in this version?

The numerous new features of Talend Open Studio 2.2.0 include:
– enhancement of the management of contexts (GUI, new tContextDump component)
– export jobs as Java Web Services
– graphical expression builder

Talend Open Studio is now based on the latest Eclipse version (3.3), you can benefit from all the improvements of this new framework (including support for Windows Vista).

We have also integrated new components:

Java :
– Support for more databases: AS/400 connector, generic JDBC connector
– Slowly Changing Dimensions for MySQL, Oracle, Ingres, MS SQL, DB2, Sybase (support for types 1, 2 & 3, support for Surrogate Keys, etc.)
– Support for stored procedures in Oracle, MS SQL, Ingres, MySQL, DB2
– Connection sharing for Oracle and PostgreSQL
– Support for LDIF/LDAP
– “Wait for file” and “Wait for SQL Data” to start a job upon the apparition of a file or of certain records in a table
– Flow merge and split (tUnite and tReplicate)
– Support for SCP

Perl :
– Multiple substitutions, simple and complex (tReplace)
– Connection sharing for Oracle and PostgreSQL
– Lookup with multiple matches
– “Wait for file” and “Wait for SQL Data” to start a job upon the apparition of a file or of certain records in a table
– Flow data metering
– File touch
– Flow merge and split (tUnite and tReplicate)
– Support for SCP

Performance of complex jobs have been significantly improved with the passing of data structures as references. Check out this scenario to feel the performance enhancement: http://www.talendforge.org/wiki/doku.php?id=performances:scenario_3.

Please download and test this Release Candidate, read the documentation, go through the tutorials, chat with us on the Forum, suggest new features and report bugs on our Bugtracker, check out our technical documentations on the Wiki…

Joining the Talend community is the best way to influence the progress of your preferred data integration solution!

The download is available at http://www.talend.com/download.php (http://www.talend.com/download.php).
The community tools (Forum, Bugtracker, Changed Log, Wiki, Subversion, Trac, Flash tutorials…) are available at http://www.talendforge.org (http://www.talendforge.org).

Thanks again for your support and your involvement!
Best regards,

The Talend Team

Managing Slowly Changing Dimensions

September 11, 2007

There was a new thing that Talend said they supported in July: Slowly Changing Dimensions. I guess they were playing catchup, because as far as I know this has been supported by Kettle for a while. Never mind, I thought I would give it a try and compare how well both tools support SCDs.

Bottom line: booth tools make SCD management super easy. Congratulations guys, you made a pretty difficult concept easy to implement. Clearly, Talend’s implementation is still young, it is missing some features such as surrogate keys or specifying the end date. Kettle has a more thorough functional coverage.

Something that’s missing from both tools however: Type 3 SCDs. OK, I’ll grant you this – in my years of consulting, I have never had to implement a Type 3 SCD. But still, it would be good to have it, just in case you need it 🙂

From the performance standpoint, Talend clearly makes up for its functional gaps. I ran a test with 25,000 source records. When creating the dimension, TOS went through the process in 8.7 seconds but it took Kettle 675 seconds! Updating the dimension, a much more resource consuming process, took TOS 512 seconds and Kettle 1,323 seconds.

Which tells me another thing: no vendor can claim to always be 50 or 100 times faster than others! Performance comparisons depend so much on which test you run. In my case, TOS is 78 times faster than Kettle in the first test, but only 2.6 times faster in the second one.