CSV Reader
The CSV Read transform takes Character Separated Value (CSV) files as the data-source.
This transform can parse a wide variety of CSV data including Hierarchical Data, tab delimited and unicode files.
Example
A typical CSV file, shown below, has a fixed number of fields which are separated by commas. The inherent fixed number of fields means that however many columns the file has, every record must have that number of columns, otherwise an error will occur. These can contain an empty value, but would be held between the two Separating Characters.
Reader > File Layout
The file layout tab is where the file and hierarchical dataset options are specified.
Transform Id
The unique user-defined name for the transform.
Data Source
The controller type is defined here. This is the source from where the CSV data will be read, which can be Email, File or Http.
See Input/Output Controllers (IO Controller) for more information.
Controller Options
This is the expandable options section beneath the Data Source drop-down. These options change according to the data source selected.
See Input/Output Controllers (IO Controller) for more information.
Field Delimiter
The character(s) used to separate each of the fields.
The delimiter may use 1 or more characters.
To specify tab delimited files, enter \t.
Header Rows
The fixed number of rows at the start of the file which are filled with content that is not data.
Since the CSV is a text file, the most likely use of header rows will be field headers.
Footer Rows
The fixed number of rows at the end of the file also filled with content other than data. All content within the footer rows will be ignored when reading the data.
Mapping Style
- By Position
- Fields will be mapped according to their position.
- By Field Heading
- Fields will be mapped according to their headings. Available only when headings are detected within the file.
Hierarchical Data
A hierarchical dataset can be generated from a CSV file, if the records within the file are identified as having different transaction types (i.e. headers and details).
For the hierarchy to be created:
- There must be a field within the data that identifies the type of record (i.e. header/details/sub-detail etc.)
- The field identifying the transaction type must be located in the same field position for all records.
Example
The screenshot below illustrates a file with a hierarchical structure.
The first field identifies the transaction type for all records. In this example, “1” denotes a sales order header, and “2” denotes an order detail/line item.
Hierarchical Dataset Options
To configure a hierarchical dataset, expand Hierarchical Dataset Options.
Hierarchical Dataset
Keyed Fields
The key field is used to construct a hierarchical dataset. For this reason, keys are assigned either on import, if the dataset is already hierarchised, or in the hierarchy transform.
By giving a record a numerical value, the parent-child relationships are defined. Child records are inserted into the dataset when its keys match those of its parent’s.
When setting the keys, each child transaction must have at least one more key defined than its parent.
These are defined in the field mapping tab of the Hierarchy transform.
Ordered Data
On import, the structure and relationship IMan identifies between fields relies on the order of the data within the file.
Record Type Field Id
This dropdown provides the ability to select the field position identifying the transaction type. In the screenshot example above this is the first field which in this case is named 'LineType'.
This drop down is enabled only when the hierarchical dataset checkbox is selected.
Hierarchy Detection Results
Pressing the Refresh button will cause IMan to parse the CSV file(s) per the IO Controller settings to detect the different transaction types. In the example above, two transaction types are detected: 1 and 2.
Reader > Field Mapping
Transaction Identifier
If a Hierarchical Data dataset, each Transaction detected in the CSV file will be listed in this drop-down; if it is a flat dataset, only one Id will appear here. Selecting an Id from this list will open it for editing.
Changing the selected Id on this drop-down will save the currently active Id and open the newly selected transaction into the field mapping grid.
Transaction Id
The Transaction Id can be edited here, if the user chooses.
To see changes made to this Id, it is necessary to close off the setup window, and load it again from the Transform Setup screen.
Parent Id (Hierarchical Datasets Only)
For flat datasets, or when the topmost parent is currently selected from the Transaction Identifier, the same transaction Id will appear in this drop-down. This is because a flat dataset has no parent-child relationships, and a topmost parent can have no parents itself. For other transaction Id's, the available parents will appear here. The appropriate Parent Id is selected in order to build the correct hierarchical relationships between the imported records.
Position
The position of the field in the file. The position of a field is fixed and cannot be edited.
Field Name
The unique name of the field.
Import
When set to True, the field is included in the resultant dataset.
When set to False, the field will be dropped from the dataset.
Type
The data Field Types of the field.
Key (Hierarchical Datasets Only)
The key applies to Keyed Hierarchy datasets only because it is used to insert fields into the correct place within the dataset depending on their relationship to other records.
The Key fields of the record being inserted must match the parent records keys.
Each child transaction type must have at least one more key defined than its parent transaction type. This is due to the nature of the structure being built.
Populating the Field Mapping Grid
If valid values have been entered into the File Path, Name and Delimiter fields and there are no errors after pressing the refresh button, the fields from the CSV file will populate the grid on the Field Mapping tab. When this does occur, the file has been successfully read to extract the transaction types and fields.
File Parsing Process
In conjunction with options specified on the Options tab, the read transform engine uses the following inbuilt properties to read a CSV file:
Line Delimiter
The transform engine auto detects the line delimiter.
The delimiter can be any of the following:
Name |
Symbol |
Unicode Value |
---|---|---|
Windows |
CRLF |
U+000D & U+000A |
Unix |
LF |
U+000D |
Mac (Up to Os 9) |
CR |
U+000A |
Form Feed |
FF |
U+000C |
Next Line |
NEL |
U+0085 |
Paragraph Separator |
PS |
U+2029 |
Line Separator |
LS |
U+2028 |
Text Qualifier
The read transform automatically detects and removes text qualifiers (“ or ‘) around fields.