How to parse XLSX, CSV files in ORACLE APEX?
PARSE Function
This is the main parser function. It allows to parse XML, XLSX, CSV or JSON files and returns a generic table of the following structure:Structure of parser return
LINE_NUMBER, COL001, COL002, COL003, COL004 ... COL300
We have line_number and 300 other columns, having said that we can read 300 columns from the above four file formats.
Parser Function parameters.
FUNCTION parse (p_content IN BLOB,
p_file_name IN VARCHAR2 DEFAULT NULL,
p_file_type IN t_file_type DEFAULT NULL,
p_file_profile IN CLOB DEFAULT NULL,
p_detect_data_types IN VARCHAR2 DEFAULT 'Y',
p_decimal_char IN VARCHAR2 DEFAULT NULL,
p_xlsx_sheet_name IN VARCHAR2 DEFAULT NULL,
p_row_selector IN VARCHAR2 DEFAULT NULL,
p_csv_row_delimiter IN VARCHAR2 DEFAULT lf,
p_csv_col_delimiter IN VARCHAR2 DEFAULT NULL,
p_csv_enclosed IN VARCHAR2 DEFAULT '"',
p_skip_rows IN PLS_INTEGER DEFAULT 0,
p_add_headers_row IN VARCHAR2 DEFAULT 'N',
p_file_charset IN VARCHAR2 DEFAULT 'AL32UTF8',
p_max_rows IN NUMBER DEFAULT NULL,
p_return_rows IN NUMBER DEFAULT NULL,
p_store_profile_to_collection IN VARCHAR2 DEFAULT NULL)
RETURN wwv_flow_t_parser_table PIPELINED;
Parameter
|
Description
|
P_CONTENT
|
The file content to be parsed as a BLOB
|
P_FILE_NAME
|
The name of the file; only used to derive the file type.
Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE must
be passed in.
|
P_FILE_TYPE
|
The type of the file to be parsed. Use this to explicitly pass
the file type in.
Either P_FILE_NAME, P_FILE_TYPE or P_FILE_PROFILE must
be passed in.
|
P_FILE_PROFILE
|
File profile to be used for parsing. The file profile might have
been computed in a previous PARSE() invocation. If passed in again,
the function will skip some profile detection logic and use the passed in
profile - in order to improve performance.
|
P_DETECT_DATA_TYPES
|
Whether to detect data types (NUMBER, DATE, TIMESTAMP) during
parsing. If set to 'Y', the function will compute the file profile and
also add data type information to it. If set to 'N', no data types will
be detected and all columns will be VARCHAR2. Default is 'Y'.
|
P_DECIMAL_CHAR
|
Use this decimal character when trying to
detect NUMBER data types. If not specified,the procedure will
auto-detect the decimal character.
|
P_XLSX_SHEET_NAME
|
For XLSX workbooks. The name of the worksheet to parse. If
omitted, the function uses the first worksheet found.
|
P_ROW_SELECTOR
|
For JSON and XML files. Pointer to the array / list of rows
within the JSON or XML file. If omitted, the function will:
·
For XML files: Use "/*/*"
(first tag under the root tag) as the row selector.
·
For JSON files: Look for a JSON
array and use the first array found.
|
For XML files: Use "/*/*" (first tag under the root
tag) as the row selector.
|
|
For JSON files: Look for a JSON array and use the first array
found.
|
|
P_CSV_ROW_DELIMITER
|
Override the default row delimiter for CSV parsing.
|
P_CSV_COL_DELIMITER
|
Use a specific CSV column delimiter. If omitted, the function
will detect the column delimiter based on the first row contents.
|
P_CSV_ENCLOSED
|
Override the default enclosure character for CSV parsing.
|
P_SKIP_ROWS
|
Skip the first N rows when parsing.
|
P_ADD_HEADERS_ROW
|
For XML, JSON: Emit the column headers (tag, attr names) as the
first row.
|
P_FILE_CHARSET
|
File encoding, if not UTF-8 (AL32UTF8).
|
P_MAX_ROWS
|
Stop parsing after P_MAX_ROWS have been returned.
|
P_RETURN_ROWS
|
Amount of rows to return. This is useful when the parser shall
to parse more rows (for data type detection), than it is supposed to return.
When the specified amount of rows have been emitted, the function will
continue parsing (and refining the detected data types) until P_MAX_ROWS has
been reached, or until the ROWNUM < x clause of the SQL query
kicks in and stops execution.
|
P_STORE_PROFILE_TO_COLLECTION
|
Store the File profile which has been computed during parse into
a collection. The collection will be cleared, if it exists. Only be used for
computed profiles.
|
Examples:
CREATE TABLE loader_files
(
id NUMBER,
file_content BLOB
);
How to parse XLSX?
Insert excel file, Let's take a workbook with two worksheets.Worksheet 1
Line No
|
ITEMS
|
Store
|
1
|
Apple
|
New Delhi
|
2
|
Shampoo
|
Noida
|
3
|
Milk
|
Gurugram
|
4
|
Veggies
|
Pune
|
Worksheet 2
Item
|
Qty
|
10001
|
Ab1
|
10002
|
Ab2
|
10003
|
Ab3
|
10004
|
Ab4
|
10005
|
Ab5
|
10006
|
Ab6
|
10007
|
Ab7
|
10008
|
Ab8
|
10009
|
Ab9
|
10010
|
Ab10
|
10011
|
Ab11
|
10012
|
Ab12
|
10013
|
Ab13
|
10014
|
Ab14
|
10015
|
Ab15
|
10016
|
Ab16
|
10017
|
Ab17
|
10018
|
Ab18
|
10019
|
Ab19
|
10020
|
Ab20
|
10021
|
Ab21
|
SELECT line_number,
col001,
col002,
col003
FROM TABLE (apex_data_parser.parse (p_content => (SELECT file_content
FROM loader_files
WHERE id = 1),
p_file_name => 'data.xlsx',
p_xlsx_sheet_name => 'sheet1.xml',
p_skip_rows => 0));
Output
LINE_NUMBER
|
COL001
|
COL002
|
COL003
|
1
|
Line No
|
ITEMS
|
Store
|
2
|
1
|
Apple
|
New Delhi
|
3
|
2
|
Shampoo
|
Noida
|
4
|
3
|
Milk
|
Gurugram
|
5
|
4
|
Veggies
|
Pune
|
To skip the headers in the file, set p_skip_rows to 1.
SELECT line_number,
col001,
col002,
col003
FROM TABLE (apex_data_parser.parse (p_content => (SELECT file_content
FROM loader_files
WHERE id = 1),
p_file_name => 'data.xlsx',
p_xlsx_sheet_name => 'sheet1.xml',
p_skip_rows => 1));
LINE_NUMBER
|
COL001
|
COL002
|
COL003
|
1
|
1
|
Apple
|
New Delhi
|
2
|
2
|
Shampoo
|
Noida
|
3
|
3
|
Milk
|
Gurugram
|
4
|
4
|
Veggies
|
Pune
|
Data from the Second Sheet.
SELECT line_number,Output
col001,
col002,
col003
FROM TABLE (apex_data_parser.parse (p_content => (SELECT file_content
FROM loader_files
WHERE id = 1),
p_file_name => 'data.xlsx',
p_xlsx_sheet_name => 'sheet2.xml',
p_skip_rows => 1));
LINE_NUMBER
|
COL001
|
COL002
|
COL003
|
1
|
Item
|
Qty
|
|
2
|
10001
|
Ab1
|
|
3
|
10002
|
Ab2
|
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
.
|
20
|
10019
|
Ab19
|
|
21
|
10020
|
Ab20
|
|
22
|
10021
|
Ab21
|
How to get the Column List from Worksheet?
SELECT *
FROM TABLE (
apex_data_parser.get_columns (apex_data_parser.discover (
p_content => (SELECT file_content
FROM loader_files
WHERE id = 1),
p_file_name => 'large.xlsx')));
How to get the list of worksheets?
SELECT *
FROM TABLE (apex_data_parser.get_xlsx_worksheets (
p_content => (SELECT file_content
FROM loader_files
WHERE id = 1)));
SHEET_SEQUENCE
|
SHEET_DISPLAY_NAME
|
SHEET_FILE_NAME
|
SHEET_PATH
|
1
|
A
|
sheet1.xml
|
worksheets/sheet1.xml
|
2
|
B
|
sheet2.xml
|
worksheets/sheet2.xml
|
How to parse CSV files?
Files data
SELECT *
FROM TABLE (apex_data_parser.parse (p_content => (SELECT file_content
FROM loader_files
WHERE id = 2),
p_file_name => 'data.csv'));
LINE_NUMBER
|
COL001
|
COL002
|
COL003
|
1
|
CO
|
SO
|
ASHISH
|
2
|
1
|
2
|
Sahay
|
No comments:
Post a Comment
Please do not add any spam links or abusive comments.