Formex 4 Parser¶
This module can parse the tables (TBL
elements) of a Formex 4 file.
The TBL
element is used to mark up a Formex table, which actually contains text structured
in columns with related data.
A table usually contains the following information:
an optional title (
TITLE
),one or more structured text blocks (
GR.SEQ
) in order to mark up optional explanatory information about the table content, located between the title of the table and the table itself,optionally a group of notes called in the table (
GR.NOTES
),the corpus of the table (
CORPUS
).
When building the internal table object, this builder will:
interpret the title (
TITLE
) and structured text blocks (GR.SEQ
) like rows. The nature attribute of each row will be “title” and “text-block” respectively.interpret the group of notes (
GR.NOTES
) like a row of nature “footer”interpret the corpus of the table (
CORPUS
) like the body of the table. The nature attribute of each row will be “body”.
Note
Since the Formex table structure is not suitable for typesetting/page layout, this parser is
also able to parse CALS-like attributes (for instance frame
, cols
, colsep
,
rowsep
, …) and CALS-like elements (for instance colspec
). This attributes and
elements may be added with the Formex 4 builder,
see FormexBuilder
.
New in version 0.5.0.
- benker.parsers.formex.ElementType¶
alias of
lxml.etree._Element
- class benker.parsers.formex.FormexParser(builder, formex_ns=None, cals_ns=None, embed_gr_notes=False, **options)¶
Bases:
benker.parsers.base_parser.BaseParser
Formex 4 tables parser
- contains_ie(fmx_node)¶
- get_cals_qname(name)¶
- get_formex_qname(name)¶
- parse_cals_row_styles(fmx_elem)¶
Parse the row styles
- Parameters
fmx_elem (ElementType) – Formex element:
ROW
,TI.BLK
,STI.BLK
orGR.NOTES
.- Returns
CSS-like styles
Changed in version 0.5.1: The “vertical-align” style is built from the
@cals:valign
attribute.
- parse_fmx_cell(fmx_cell)¶
Parse a
CELL
element.- Parameters
fmx_cell (ElementType) – table cell
Changed in version 0.5.2: Add support for the
@cals:cellstyle
attribute (extension). This attribute is required for two-way conversion of Formex tables to CALS and vice versa. If theCELL/@TYPE
and theROW/@TYPE
are different, we add a specific “cellstyle” style. This style will keep theCELL/@TYPE
value.
- parse_fmx_colspec(cals_colspec)¶
Parse a CALS-like
colspec
element.For instance:
<colspec colname="c1" colnum="1" colsep="1" rowsep="1" colwidth="30mm" align="center"/>
- Parameters
cals_colspec (ElementType) – CALS-like
colspec
element.
- parse_fmx_corpus(fmx_corpus)¶
- parse_fmx_row(fmx_row)¶
Parse a
ROW
element which containsCELL
elements.This element may be in a
BLK`
- Parameters
fmx_row (ElementType) – table row
- parse_fmx_sti_blk(fmx_sti_blk)¶
Parse a
STI.BLK
element, considering it like a row of a single cell.For instance:
<STI.BLK COL.START="1" COL.END="1"> <P>STI.BLK title</P> </STI.BLK>
- Parameters
fmx_sti_blk (ElementType) – subtitle of the
BLK
.
- parse_fmx_ti_blk(fmx_ti_blk)¶
Parse a
TI.BLK
element, considering it like a row of a single cell.For instance:
<TI.BLK COL.START="1" COL.END="2"> <P><HT TYPE="BOLD">TI.BLK title</HT></P> </TI.BLK>
- Parameters
fmx_ti_blk (ElementType) – title of the
BLK
.
- parse_gr_notes(fmx_gr_notes)¶
Parse a
GR.NOTES
element, considering it like a row of a single cell.For instance:
<GR.NOTES> <TITLE> <TI> <P>GR.NOTES Title</P> </TI> </TITLE> <NOTE NOTE.ID="N0001"> <P>Table note</P> </NOTE> </GR.NOTES>
- Parameters
fmx_gr_notes (ElementType) – group of notes called in the table (
GR.NOTES
)
Changed in version 0.5.1:
GR.NOTES
elements can be embedded if the embed_gr_notes options isTrue
.
- parse_table(fmx_corpus)¶
Convert a
<CORPUS>
Formex element into table object.- Parameters
fmx_corpus (ElementType) – Formex element.
- Return type
ElementType
- Returns
Table.
- parse_tbl_styles(fmx_tbl)¶
Parse a
TBL
element and extract the styles- Parameters
fmx_tbl (ElementType) – table
- Returns
dictionary of styles and nature
- setup_table(styles=None, nature=None)¶
- transform_tables(tree)¶