Formex 4 Builder

This module can construct a Formex 4 table from an instance of type Table.

Formex describes the format for the exchange of data between the Publication Office and its contractors. In particular, it defines the logical markup for documents which are published in the different series of the Official Journal of the European Union.

This builder allow you to convert Word document tables into Formex 4 tables using the Formex 4 schema (formex-05.59-20170418.xd).

Specifications and examples:

  • The Formex 4 documentation and schema is available online in the Publication Office: Formex Version 4.

  • An example of Formex 4 table is available in the Schema documentation: TBL

Changed in version 0.5.0: Refactoring (rename “Formex4” to “Formex”):

  • the class Formex4Builder is renamed FormexBuilder,

benker.builders.formex.ElementTreeType

alias of lxml.etree._ElementTree

benker.builders.formex.ElementType

alias of lxml.etree._Element

class benker.builders.formex.FormexBuilder(detect_titles=False, use_cals=False, cals_ns='https://lib.benker.com/schemas/cals.xsd', cals_prefix='cals', width_unit='mm', **options)

Bases: benker.builders.base_builder.BaseBuilder

Formex 4 builder used to convert tables into TBL elements according to the TBL Schema

build_cell(row_elem, cell, row)

Build the Formex 4 <CELL> element.

Formex 4 attributes:

  • @COL The mandatory COL attribute is used to specify in which column the cell is located.

  • @COLSPAN When a cell in a row ‘A’ must be linked to a group of cells in the same row, the first CELL element of this group has to provide the COLSPAN attribute. The value of the COLSPAN attribute is the number of cells in the group. The COL attribute of the first cell indicates the number of the first column in the group.

    The use of the COLSPAN attribute is only allowed to relate the value of a cell in several columns within the same row. Its value must be at least equal to ‘2’.

  • @ROWSPAN When a cell in column ‘A’ is linked to a cell in row ‘B’ located just below row ‘A’, the CELL element of this single cell must provide the ROWSPAN attribute. The value of the ROWSPAN attribute is equal to the number of cells in the group. The CELL element relating to the single cell must be placed within the first ROW element in the group. The ROW elements corresponding to the other rows in the group may not contain any CELL elements for the column containing the single cell ‘A’.

    The use of the ROWSPAN attribute is only authorised to relate the value of a cell in several rows. Its value must be at least equal to ‘2’.

  • @ACCH If the group of related cells is physically delimited by a horizontal brace, this symbol must be marked up using the ACCH attribute.

  • @ACCV If the group of related cells is physically delimited by a vertical brace, this symbol must be marked up using the ACCV attribute.

  • @TYPE The TYPE attribute of the CELL element is used to indicate locally the type of contents of the cells. It overrides the value of the TYPE attribute defined for the row (ROW) which contains the cell.

Parameters

Changed in version 0.4.4: Modification of the Formex4 builder to better deal with empty cells (management of <IE/> tags).

Changed in version 0.5.0: Add support for CALS-like elements and attributes. Add support for bgcolor (Table background color).

Changed in version 0.5.1: Preserve processing instruction in cell content.

Changed in version 0.5.2: Add support for the @cals:cellstyle attribute (extension). This attribute is required for two-way conversion of Formex tables to CALS and vice versa. If the CELL/@TYPE and the ROW/@TYPE are different, we add a specific “cellstyle” style. This style will keep the CELL/@TYPE value.

build_colspec(group_elem, col)

Build the CALS <colspec> element (only is use_cals is True).

CALS attributes:

  • @colnum is the column number.

  • @colname is the column name. Its format is “c{col_pos}”.

  • @colwidth width of the column (with its unit). The unit is defined by the width_unit options.

  • @align horizontal alignment of table entry content. Possible values are: “left”, “right”, “center”, “justify” (“char” is not supported).

  • @colsep column separators (vertical ruling). Possible values are “0” or “1”.

  • @colsep row separators (horizontal ruling). Possible values are “0” or “1”.

Note

The @colnum attribute (number of column) is not generated because this value is usually implied, and can be deduce from the @colname attribute.

Parameters

Changed in version 0.5.0: Add support for CALS-like elements and attributes.

Changed in version 0.5.1: Add support for CALS-like attributes: @colnum, @align, @colsep, and @rowsep.

build_corpus(tbl_elem, table)

Build the Formex 4 <CORPUS> element.

Parameters

Changed in version 0.5.1: If this option detect_titles is enable, a title will be generated if the first row contains an unique cell with centered text.

Changed in version 0.5.1: Add support for the @width CALS-like attribute (table width).

build_row(corpus_elem, row)

Build the Formex 4 <ROW> element.

Formex 4 attributes:

  • @TYPE The TYPE attribute indicates the specific role of the row in the table. The allowed values are:

    • ALIAS: if the row contains aliases. Such references may be used when the table is included on several pages of a publication. The references are associated to column headers on the first page and are repeated on subsequent pages.

    • HEADER: if the row contains cells which may be considered as a column header. This generally occurs for the first row of a table.

    • NORMAL: if most of the cells of the row contain ‘simple’ or ‘normal’ data. This is the default value.

    • NOTCOL: if the cells of the row contain units of measure relating to subsequent rows.

    • TOTAL: if the row contains data which could be considered as ‘totals’.

    Note that this TYPE attribute is also provided for the cells (CELL), which could be used to override the value defined for the row. On the other hand, ‘NORMAL’ is the default value, so it is necessary to specify the TYPE attribute value in each cell of a row which has a specific type in order to avoid the default overriding (see the first row of the example below).

Parameters

Changed in version 0.5.0: Add support for CALS-like elements and attributes.

Changed in version 0.5.1: The @cals:valign attribute is built from the “vertical-align” style.

build_tbl(table)

Build the Formex 4 <TBL> element.

Formex 4 attributes:

  • @NO.SEQ This mandatory attribute provides a sequence number to the table. This number represents the order in which the table appears in the document.

  • @CLASS The CLASS attribute is mandatory and is used to specify the type of data contained in the table. The allowed values are:

    • GEN: if the table contains general data (default value),

    • SCHEDULE: if it is a schedule,

    • RECAP: if it is a synoptic table.

    These two last values are only used for documents related to the general budget.

  • @COLS This mandatory attribute provides the actual number of columns of the table.

  • @PAGE.SIZE The PAGE.SIZE attribute takes one of these values:

    • DOUBLE.LANDSCAPE: table on two A4 pages forming an A3 landscape page,

    • DOUBLE.PORTRAIT: table on two A4 pages forming an A3 portrait page,

    • SINGLE.LANDSCAPE: table on a single A4 page in landscape,

    • SINGLE.PORTRAIT: table on a single A4 page in portrait (default).

Parameters

table (benker.table.Table) – Table

Returns

The newly-created <TBL> element.

Changed in version 0.5.0: Add support for CALS-like elements and attributes. Add support for bgcolor (Table background color).

build_title(tbl_elem, row)

Build the table title using the <TITLE> element.

For instance:

<TITLE>
  <TI>
    <P>Table IV</P>
  </TI>
</TITLE>
Parameters
  • tbl_elem (ElementType) – Parent element: <TBL>.

  • row (benker.table.RowView) – The row which contains the title.

Changed in version 0.4.4: Modification of the Formex4 builder to better deal with empty cells (management of <IE/> tags).

cleanup_tbl_in_tbl(fmx_root)

Cleanup the TBL elements when they are direct children of another TBL

Parameters

fmx_root (ElementType) – The result tree which contains the TBL elements to remove.

drop_superfluous_attrs(fmx_root)

Drop superfluous CALS-like attributes at the end of the Formex building.

  • @cals:namest and @cals:nameend are defined by @COLSPAN

  • @cals:morerows is defined by @ROWSPAN

  • @cals:rowstyle is defined by ROW/@TYPE, GR.NOTES, TI.BLK or STI.BLK.

Parameters

fmx_root (ElementType) – Root element of the Formex file.

New in version 0.5.1.

extract_gr_notes(fmx_root)

Extract GR.NOTES from the table footers.

This function moves or creates a GR.NOTES just before the CORPUS.

Parameters

fmx_root (ElementType) – The result tree with GR.NOTES.

Changed in version 0.5.1: If the ROW contains a GR.NOTES, we move it before the CORPUS, else we create it.

finalize_tree(tree)

Finalize the resulting tree structure:

  • Calculate the @NO.SEQ values: sequence number of each table;

  • Cleanup the TBL elements when they are direct children of another TBL;

  • Extract GR.NOTES from the table footers;

  • Group ROW elements by BLK based on the @cals:rowstyle attribute (CALS extension).

Parameters

tree (ElementTreeType) – The resulting tree.

Changed in version 0.5.1: Drop superfluous CALS-like attributes at the end of the Formex building.

generate_table_tree(table)

Build the XML table from the Table instance.

Parameters

table (benker.table.Table) – Table

Returns

Table tree

get_cals_qname(name)
get_formex_qname(name)
insert_blk(fmx_root)

Group ROW elements by BLK based on the @cals:rowstyle attribute (CALS extension).

Parameters

fmx_root (ElementType) – The result tree which contains the CORPUS/ROW elements.

property ns_map
setup_table(table)
update_no_seq(fmx_root)

Calculate the @NO.SEQ values: sequence number of each table.

Parameters

fmx_root (ElementType) – The result tree which contains the TBL elements to update.

benker.builders.formex.ProcessingInstructionType

alias of lxml.etree._ProcessingInstruction

class benker.builders.formex.RowInfo(tag, type, level)

Bases: tuple

level

Alias for field number 2

tag

Alias for field number 0

type

Alias for field number 1

benker.builders.formex.guess_row_info(rowstyle)
benker.builders.formex.revision_mark(name, attrs)