Formex 4 Parser¶
This module can parse the tables (TBL elements) of a Formex 4 file.
The TBL element is used to mark up a Formex table, which actually contains text structured
in columns with related data.
A table usually contains the following information:
an optional title (
TITLE),one or more structured text blocks (
GR.SEQ) in order to mark up optional explanatory information about the table content, located between the title of the table and the table itself,optionally a group of notes called in the table (
GR.NOTES),the corpus of the table (
CORPUS).
When building the internal table object, this builder will:
interpret the title (
TITLE) and structured text blocks (GR.SEQ) like rows. The nature attribute of each row will be “title” and “text-block” respectively.interpret the group of notes (
GR.NOTES) like a row of nature “footer”interpret the corpus of the table (
CORPUS) like the body of the table. The nature attribute of each row will be “body”.
Note
Since the Formex table structure is not suitable for typesetting/page layout, this parser is
also able to parse CALS-like attributes (for instance frame, cols, colsep,
rowsep, …) and CALS-like elements (for instance colspec). This attributes and
elements may be added with the Formex 4 builder,
see FormexBuilder.
New in version 0.5.0.
- benker.parsers.formex.ElementType¶
alias of
lxml.etree._Element
- class benker.parsers.formex.FormexParser(builder, formex_ns=None, cals_ns=None, embed_gr_notes=False, **options)¶
Bases:
benker.parsers.base_parser.BaseParserFormex 4 tables parser
- contains_ie(fmx_node)¶
- get_cals_qname(name)¶
- get_formex_qname(name)¶
- parse_cals_row_styles(fmx_elem)¶
Parse the row styles
- Parameters
fmx_elem (ElementType) – Formex element:
ROW,TI.BLK,STI.BLKorGR.NOTES.- Returns
CSS-like styles
Changed in version 0.5.1: The “vertical-align” style is built from the
@cals:valignattribute.
- parse_fmx_cell(fmx_cell)¶
Parse a
CELLelement.- Parameters
fmx_cell (ElementType) – table cell
Changed in version 0.5.2: Add support for the
@cals:cellstyleattribute (extension). This attribute is required for two-way conversion of Formex tables to CALS and vice versa. If theCELL/@TYPEand theROW/@TYPEare different, we add a specific “cellstyle” style. This style will keep theCELL/@TYPEvalue.
- parse_fmx_colspec(cals_colspec)¶
Parse a CALS-like
colspecelement.For instance:
<colspec colname="c1" colnum="1" colsep="1" rowsep="1" colwidth="30mm" align="center"/>
- Parameters
cals_colspec (ElementType) – CALS-like
colspecelement.
- parse_fmx_corpus(fmx_corpus)¶
- parse_fmx_row(fmx_row)¶
Parse a
ROWelement which containsCELLelements.This element may be in a
BLK`- Parameters
fmx_row (ElementType) – table row
- parse_fmx_sti_blk(fmx_sti_blk)¶
Parse a
STI.BLKelement, considering it like a row of a single cell.For instance:
<STI.BLK COL.START="1" COL.END="1"> <P>STI.BLK title</P> </STI.BLK>
- Parameters
fmx_sti_blk (ElementType) – subtitle of the
BLK.
- parse_fmx_ti_blk(fmx_ti_blk)¶
Parse a
TI.BLKelement, considering it like a row of a single cell.For instance:
<TI.BLK COL.START="1" COL.END="2"> <P><HT TYPE="BOLD">TI.BLK title</HT></P> </TI.BLK>
- Parameters
fmx_ti_blk (ElementType) – title of the
BLK.
- parse_gr_notes(fmx_gr_notes)¶
Parse a
GR.NOTESelement, considering it like a row of a single cell.For instance:
<GR.NOTES> <TITLE> <TI> <P>GR.NOTES Title</P> </TI> </TITLE> <NOTE NOTE.ID="N0001"> <P>Table note</P> </NOTE> </GR.NOTES>
- Parameters
fmx_gr_notes (ElementType) – group of notes called in the table (
GR.NOTES)
Changed in version 0.5.1:
GR.NOTESelements can be embedded if the embed_gr_notes options isTrue.
- parse_table(fmx_corpus)¶
Convert a
<CORPUS>Formex element into table object.- Parameters
fmx_corpus (ElementType) – Formex element.
- Return type
ElementType
- Returns
Table.
- parse_tbl_styles(fmx_tbl)¶
Parse a
TBLelement and extract the styles- Parameters
fmx_tbl (ElementType) – table
- Returns
dictionary of styles and nature
- setup_table(styles=None, nature=None)¶
- transform_tables(tree)¶