Formex 4 Parser

This module can parse the tables (TBL elements) of a Formex 4 file.

The TBL element is used to mark up a Formex table, which actually contains text structured in columns with related data.

A table usually contains the following information:

  • an optional title (TITLE),
  • one or more structured text blocks (GR.SEQ) in order to mark up optional explanatory information about the table content, located between the title of the table and the table itself,
  • optionally a group of notes called in the table (GR.NOTES),
  • the corpus of the table (CORPUS).

When building the internal table object, this builder will:

  • interpret the title (TITLE) and structured text blocks (GR.SEQ) like rows. The nature attribute of each row will be “title” and “text-block” respectively.
  • interpret the group of notes (GR.NOTES) like a row of nature “footer”
  • interpret the corpus of the table (CORPUS) like the body of the table. The nature attribute of each row will be “body”.

Note

Since the Formex table structure is not suitable for typesetting/page layout, this parser is also able to parse CALS-like attributes (for instance frame, cols, colsep, rowsep, …) and CALS-like elements (for instance colspec). This attributes and elements may be added with the Formex 4 builder, see FormexBuilder.

New in version 0.5.0.

benker.parsers.formex.ElementType

alias of lxml.etree._Element

class benker.parsers.formex.FormexParser(builder, formex_ns=None, cals_ns=None, embed_gr_notes=False, **options)

Bases: benker.parsers.base_parser.BaseParser

Formex 4 tables parser

contains_ie(fmx_node)
get_cals_qname(name)
get_formex_qname(name)
parse_cals_row_styles(fmx_elem)

Parse the row styles

Parameters:fmx_elem (ElementType) – Formex element: ROW, TI.BLK, STI.BLK or GR.NOTES.
Returns:CSS-like styles

Changed in version 0.5.1: The “vertical-align” style is built from the @cals:valign attribute.

parse_fmx_cell(fmx_cell)

Parse a CELL element.

Parameters:fmx_cell (ElementType) – table cell
parse_fmx_colspec(cals_colspec)

Parse a CALS-like colspec element.

For instance:

<colspec
  colname="c1"
  colnum="1"
  colsep="1"
  rowsep="1"
  colwidth="30mm"
  align="center"/>
Parameters:cals_colspec (ElementType) – CALS-like colspec element.
parse_fmx_corpus(fmx_corpus)
parse_fmx_row(fmx_row)

Parse a ROW element which contains CELL elements.

This element may be in a BLK`

Parameters:fmx_row (ElementType) – table row
parse_fmx_sti_blk(fmx_sti_blk)

Parse a STI.BLK element, considering it like a row of a single cell.

For instance:

<STI.BLK COL.START="1" COL.END="1">
  <P>STI.BLK title</P>
</STI.BLK>
Parameters:fmx_sti_blk (ElementType) – subtitle of the BLK.
parse_fmx_ti_blk(fmx_ti_blk)

Parse a TI.BLK element, considering it like a row of a single cell.

For instance:

<TI.BLK COL.START="1" COL.END="2">
  <P><HT TYPE="BOLD">TI.BLK title</HT></P>
</TI.BLK>
Parameters:fmx_ti_blk (ElementType) – title of the BLK.
parse_gr_notes(fmx_gr_notes)

Parse a GR.NOTES element, considering it like a row of a single cell.

For instance:

<GR.NOTES>
  <TITLE>
    <TI>
      <P>GR.NOTES Title</P>
    </TI>
  </TITLE>
  <NOTE NOTE.ID="N0001">
    <P>Table note</P>
  </NOTE>
</GR.NOTES>
Parameters:fmx_gr_notes (ElementType) – group of notes called in the table (GR.NOTES)

Changed in version 0.5.1: GR.NOTES elements can be embedded if the embed_gr_notes options is True.

parse_table(fmx_corpus)

Convert a <CORPUS> Formex element into table object.

Parameters:fmx_corpus (ElementType) – Formex element.
Return type:ElementType
Returns:Table.
parse_tbl_styles(fmx_tbl)

Parse a TBL element and extract the styles

Parameters:fmx_tbl (ElementType) – table
Returns:dictionary of styles and nature
setup_table(styles=None, nature=None)
transform_tables(tree)