Office Open XML parser

This module can parse Office Open XML (OOXML) tables.

Specifications:

class benker.parsers.ooxml.OoxmlParser(builder, styles_path=None, **options)

Bases: benker.parsers.base_parser.BaseParser

Office Open XML parser.

parse_grid_col(w_grid_col)

Parse a <w:gridCol> element.

See: Table Grid/Column Definition.

Parameters:w_grid_col (etree._Element) – Table element.
parse_table(w_tbl)

Convert a Office Open XML <w:tbl> into CALS <table>

Parameters:w_tbl (etree._Element) – Office Open XML element.
Return type:etree._Element
Returns:CALS element.
parse_tbl(w_tbl)

Parse a <w:tbl> element.

See: Table Properties.

Parameters:w_tbl (etree._Element) – Table element.

Changed in version 0.4.0: The section width and height are now stored in the ‘x-sect-size’ table style (units in ‘pt’).

parse_tc(w_tc)

Parse a <w:tc> element.

See: Table Cell Properties.

Parameters:w_tc (etree._Element) – Table element.

Changed in version 0.5.1: XML indentation between cell paragraphs is ignored.

parse_tr(w_tr)

Parse a <w:tr> element.

See: Table Row Properties.

Parameters:w_tr (etree._Element) – Table element.
transform_tables(tree)
benker.parsers.ooxml.value_of(element, xpath, *, namespaces={'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}, default=None)

Take the first value of a xpath evaluation.

Parameters:
  • element (etree._Element) – Root element used to evaluate the xpath expression.
  • xpath (str) – xpath expression. This expression will be evaluated using the namespaces namespaces.
  • namespaces (dict[str, str]) – Namespace map to use for the xpath evaluation.
  • default – default value used if the xpath evaluation returns no result.
Returns:

the first result or the default value.

benker.parsers.ooxml.w(name)