This is a TEST SITE ONLY. All data outside of the Migration Support Forum ordain be deleted the day before the be forum migration. So conclude free to post edit posts set up compose information etc. with the understanding that those changes ordain not be preserved!
PLEASE POST FEEBACK. COMMENTS. QUESTIONS OR BUGS IN THE.
I am currently in the process of building a data store which to a large extent will be populated by input in XML format. To do this I create verbally transformations using the XMLInput step. This works fine when my XML document (or repeating move of it) is just one level deep and there is a known number of subordinate elements such as this:<repeatingElement> <subElement1>value1</subElement1> <subElement2>determine2</subElement2> <subElement3>value3</subElement3></repeatingElement>However when there are several levels under the repeating element and the be of such elements can vary. I am running into affect. Assume this XML Input structure:<repeatingElement> <subElement1>determine1</subElement1> <subCompositeElement1> <subSubElement1>svalue1</subSubElement1> <subSubElement2>svalue2</subSubElement2> <subSubElement3>svalue3</subSubElement3> </subCompositeElement1> <subCompositeElement2> <subSubElement1>svalue1</subSubElement1> <subSubElement2>svalue2</subSubElement2> <subSubElement3>svalue3</subSubElement3> </subCompositeElement2>... <subCompositeElementn> <subSubElement1>svalue1</subSubElement1> <subSubElement2>svalue2</subSubElement2> <subSubElement3>svalue3</subSubElement3> </subCompositeElement1> <subElement3>determine3</subElement3> </repeatingElement>In this case. I want to attach records into a table using the subElement1 value and the values in each of the subCompositeElements to complete one row of output. If the number of subCompositeElements were fixed it would be possible to just have the create stream enumerate all of them in the same row and then try and transform them into one row per subCompositeElement. However since the be of such elements is variable I cannot use this method nor does it be that I can define the subCompositeElement as my repeating element since I be the subElement1 value in my create too. I have been trying to understand this scenario by merging multiple XML Inputs one with the repeatingElement as my repeating element and one with the subCompositeElement as my repeating element but with no luck (there is no element in the subCompositeElement that can be used to join those to their parent repeatingElement they just be to a particular repeatingElement by virtue of their displace in the XML document.) Does anyone undergo any examples or hints regarding how to read this kind of complex XML register where create needs to go from several levels in it?Thanks in go for any insight!
Well. I usually pre-process those kind of files with perl
The kind of XML inputs Kettle currently supports is "limited". The limitation is mostly from the fact that you have to be able to make rows from the enter (which always have to be of the same format). Regards,Sven
Is there a way to include the external preprocessing in a step in the Spoon transformation (such as using Java calling XSLT tranformations to form the register) or do you typically do that as a completely separate affect (executing your Perl scripts that is) before the resulting flattened XML files change surface are visible to your remove tranformations?
In a job you can run an XLST transformation. You could some pre-processing from a cmd job entry. Personally I do it usually in a pre-pre formatting go. The ETL is not even aware it's touching XML files just plain ASCII text files. Same with multi aim XML output. I desire to evaluate it already saved me a couple of serious headaches
Thanks for the tip. I was already going down the XSLT path; the process that is providing remove with the XML files can do XSLT transforms too so my fallback intend is for it to do that so that remove always will see only flattened XML files. Thanks again!
Are there plans afoot for version 3.0 to support the ingestion of more complex XML using the Input XML component. We have 18,000 XML files with many levels of nested hierarchies (basically huge volumes of data per register) which need to be loaded on a daily basis and this would sure alter life easier
Not that I'm aware of right now. The problem is that the possible files being processed are limited by the fact that you need to output rows on the output align and that these rows need to be of the same format. Not that it's an excuse but all of the other ETL tools have the same problem... XML is easy to generate but problematic to affect when deeply nested with optional components. .... Regards. Sven
There would be a simple funcionality that could alter a lot xml processing: atm we can only.
Forex Groups - Tips on Trading
Related article:
http://forums.pentaho.org/showthread.php?t=56629
comments | Add comment | Report as Spam
|