Formex 4 Parser

This module can parse the tables (TBL elements) of a Formex 4 file.

The TBL element is used to mark up a Formex table, which actually contains text structured in columns with related data.

A table usually contains the following information:

  • an optional title (TITLE),

  • one or more structured text blocks (GR.SEQ) in order to mark up optional explanatory information about the table content, located between the title of the table and the table itself,

  • optionally a group of notes called in the table (GR.NOTES),

  • the corpus of the table (CORPUS).

When building the internal table object, this builder will:

  • interpret the title (TITLE) and structured text blocks (GR.SEQ) like rows. The nature attribute of each row will be “title” and “text-block” respectively.

  • interpret the group of notes (GR.NOTES) like a row of nature “footer”

  • interpret the corpus of the table (CORPUS) like the body of the table. The nature attribute of each row will be “body”.

Note

Since the Formex table structure is not suitable for typesetting/page layout, this parser is also able to parse CALS-like attributes (for instance frame, cols, colsep, rowsep, …) and CALS-like elements (for instance colspec). This attributes and elements may be added with the Formex 4 builder, see FormexBuilder.

New in version 0.5.0.

benker.parsers.formex.ElementType

alias of lxml.etree._Element

class benker.parsers.formex.FormexParser(builder, formex_ns=None, cals_ns=None, embed_gr_notes=False, **options)

Bases: benker.parsers.base_parser.BaseParser

Formex 4 tables parser

contains_ie(fmx_node)
get_cals_qname(name)
get_formex_qname(name)
parse_cals_row_styles(fmx_elem)

Parse the row styles

Parameters

fmx_elem (ElementType) – Formex element: ROW, TI.BLK, STI.BLK or GR.NOTES.

Returns

CSS-like styles

Changed in version 0.5.1: The “vertical-align” style is built from the @cals:valign attribute.

parse_fmx_cell(fmx_cell)

Parse a CELL element.

Parameters

fmx_cell (ElementType) – table cell

Changed in version 0.5.2: Add support for the @cals:cellstyle attribute (extension). This attribute is required for two-way conversion of Formex tables to CALS and vice versa. If the CELL/@TYPE and the ROW/@TYPE are different, we add a specific “cellstyle” style. This style will keep the CELL/@TYPE value.

parse_fmx_colspec(cals_colspec)

Parse a CALS-like colspec element.

For instance:

<colspec
  colname="c1"
  colnum="1"
  colsep="1"
  rowsep="1"
  colwidth="30mm"
  align="center"/>
Parameters

cals_colspec (ElementType) – CALS-like colspec element.

parse_fmx_corpus(fmx_corpus)
parse_fmx_row(fmx_row)

Parse a ROW element which contains CELL elements.

This element may be in a BLK`

Parameters

fmx_row (ElementType) – table row

parse_fmx_sti_blk(fmx_sti_blk)

Parse a STI.BLK element, considering it like a row of a single cell.

For instance:

<STI.BLK COL.START="1" COL.END="1">
  <P>STI.BLK title</P>
</STI.BLK>
Parameters

fmx_sti_blk (ElementType) – subtitle of the BLK.

parse_fmx_ti_blk(fmx_ti_blk)

Parse a TI.BLK element, considering it like a row of a single cell.

For instance:

<TI.BLK COL.START="1" COL.END="2">
  <P><HT TYPE="BOLD">TI.BLK title</HT></P>
</TI.BLK>
Parameters

fmx_ti_blk (ElementType) – title of the BLK.

parse_gr_notes(fmx_gr_notes)

Parse a GR.NOTES element, considering it like a row of a single cell.

For instance:

<GR.NOTES>
  <TITLE>
    <TI>
      <P>GR.NOTES Title</P>
    </TI>
  </TITLE>
  <NOTE NOTE.ID="N0001">
    <P>Table note</P>
  </NOTE>
</GR.NOTES>
Parameters

fmx_gr_notes (ElementType) – group of notes called in the table (GR.NOTES)

Changed in version 0.5.1: GR.NOTES elements can be embedded if the embed_gr_notes options is True.

parse_table(fmx_corpus)

Convert a <CORPUS> Formex element into table object.

Parameters

fmx_corpus (ElementType) – Formex element.

Return type

ElementType

Returns

Table.

parse_tbl_styles(fmx_tbl)

Parse a TBL element and extract the styles

Parameters

fmx_tbl (ElementType) – table

Returns

dictionary of styles and nature

setup_table(styles=None, nature=None)
transform_tables(tree)