New Python Markdown extension cell_row_span
Contents
Background
I wrote this extension when I discovered the existing spantables
extension
was broken on Python Markdown version 3, and also didn’t work under Python 2.
I submitted a bug to the author, and this was his reply:
This extension is a modified version of the original tables extension. I see that the tables extension was heavily modified. Does the latest tables extension work for you? If so, then I might solve this problem by taking that version and re-adding the span functionality.
The phrase “I might solve this problem” is ambiguous: does it mean he will do the work, or is it a short hand for “If I were you, I would solve this problem …”?
In addition, as the author noted, the spantables
extension is a modified
version of the original tables
source. While it worked, it’s not really the
best way to go about it. I solved the problem by subclassing the tables
extension. I let the TableProcessor do its job–creating the ElementTree
structure for the table–then my code runs to modify that structure to span
cells and rows according to the user’s markdown text.
Here’s the documentation for the extension.
Summary
Adds spanning for rows and cells in tables.
Syntax
Example:
| Column 1 | Col 2 | Big row span | |:-----------------------:|-------| -------------- | | r1_c1 spans two cols || One large cell | | r2_c1 spans two rows | r2_c2 | | |_^ _| r3_c2 | | | ______ | r4_c2 |_ _|
The example renders as:
.--------------------------------------------------. | Column 1 | Col 2 | Big row span | |---------------------------------+----------------| | r1_c1 spans two cols | | |---------------------------------| | | r2_c1 spans two rows | r2_c2 | | | |-------| One large cell | | | r3_c2 | | |-------------------------+-------| | | ____ | r4_c2 | | `--------------------------------------------------'
~~~
To span cells across multiple columns, end them with two or more consecutive vertical bars. Cells to the left will be merged together, as many cells are there are bars. In the example above, there are two bars at the end of cell 2 on row 1, so the two cells to the left of it (numbers 1 and 2) are merged.
To span cells across rows, fill the cell on the last row with at least two
underscores, one at the start and the other at the end of its content, and no
other characters than spaces, underscores, ^
or =
. This is referred to as
the marker. The cell with the marker and all the empty cells above it to the
first non-empty cell will be made into a single cell, with the content of the
non-empty cell. See column 3 (“Big row span”) in the example.
By default the contents are vertically aligned in the middle of the cell. To
align to the top, include at least one ^
character in the marker between the
two underscores; for example, |_^^^_|
or simply |_^ _|
. See row 2 in
column 1 of the example, which is merged with row 3 and aligned at the top. To
align to the bottom, use at least one =
character between the underscores;
for example, |_ = _|
. Including both ^
and =
in a marker raises a
ValueError
exception.
Note: If this extension finds a cell with at least two underscores and no other
characters other than spaces, ^
or =
, it assumes it’s a row span marker and
attempts to process it. If you need a cell that looks like a marker (generally
one with only underscores in it), add the text 
as well—this extension
won’t process it as a row span marker and Markdown will change the 
to a
space.
Bug in Markdown 2.6
Python Markdown 2.6 does not process the following table correctly:
| Column 1 | Column 2 | Column 3 | Column 4 | | -------- | -------- | -------- | -------- | | r1,c1 | r1,c2 | r1,c3 | r1,c4 | | r2,c1 || r2,c3 | r2,c4 |
The table should be rendered as follows:
.-------------------------------------------. | Column 1 | Column 2 | Column 3 | Column 4 | |----------+----------+----------+----------| | r1,c1 | r1,c2 | r1,c3 | r1,c4 | |----------+----------+----------+----------| | r2,c1 | r2,c3 | r2,c4 | `-------------------------------------------'
~~~
Instead it comes out as:
.-------------------------------------------. | Column 1 | Column 2 | Column 3 | Column 4 | |----------+----------+----------+----------| | r1,c1 | r1,c2 | r1,c3 | r1,c4 | |----------+----------+----------+----------| | r2,c1 | r2,c4 | | `-------------------------------------------'
~~~
The bug is in the tables extension, not this one. If you’re having problems
getting a table with column spans to work correctly in Markdown 2.6, try
replacing ||
with |~~|
, as follows:
| Column 1 | Column 2 | Column 3 | Column 4 | | -------- | -------- | -------- | -------- | | r1,c1 | r1,c2 | r1,c3 | r1,c4 | | r2,c1 |~~| r2,c3 | r2,c4 |
The table extension processes the above correctly, and this extension recognizes
a cell containing only ~~
as an empty cell. (I chose ~~
because I can’t think
of a reason anyone would use that in a cell.) If you want to use something
different, change ~~
to another value in the following line in the code:
RE_empty_cell = re.compile(r'\s*(~~)?\s*$')
Keep in mind that many characters have special meaning in regular expressions. If you use any of the following characters in the expression, preceed them with a backslash (“\“) to avoid problems:
. + * | ? $ ( ) [ ] { }
Usage
See Extensions for general extension usage. Use cell_row_span
as
the name of the extension. You must include the tables
extension before
this one, or this extension will not be run.
This extension does not accept any special configuration options.
See https://python-markdown.github.io/extensions/tables/ for documentation on the tables extension.
License: BSD
Python code
from markdown.extensions import Extension from markdown.extensions.tables import TableProcessor from markdown.util import etree import re class CellRowSpanExtension(Extension): """ Table Cell and Row Span extension """ def extendMarkdown(self, md, md_globals): """ Replace the TableProcessor with an instance of CellRowSpanBlockProcessor """ if 'table' in md.parser.blockprocessors: md.parser.blockprocessors['table'] = CellRowSpanBlockProcessor(md.parser) class CellRowSpanBlockProcessor(TableProcessor): """ Subclass the TableProcessor """ table_count = 0 RE_adjacent_bars = re.compile(r'\|(~~)?\|') RE_remove_lead_pipe = re.compile(r'^ *\|') # ... Colonel Mustard in the Library? ;) RE_bar_to_bar = re.compile(r'(.*?)\|') RE_row_span_marker = re.compile(r'^_[_^= ]*_$') RE_valign_top = re.compile(r'\^') RE_valign_bottom = re.compile(r'=') RE_empty_cell = re.compile(r'\s*(~~)?\s*$') """ No test() method needed; the superclass provides it """ def _update_colspan_attrib(self, text, tr_index, tr, td_remove): """ Update 'colspan' attributes in 'td' entries """ text = self.RE_remove_lead_pipe.sub('', text) # Remove leading '|' from text td_index = 0 td_last_active_index = 0 for m in self.RE_bar_to_bar.finditer(text): if len(m.group(1)) == 0 or m.group(1) == '~~': # We found an adjacent cell try: td = tr[td_last_active_index] # Update 'colspan' on previous cell except IndexError: row_content = '' for i in range(len(tr)): c = tr[i].text row_content += " Cell %i: %s\n" % (i+1, c if c else 'Empty') raise IndexError( 'Cannot merge cell beyond end of row ' "(one too many '|' characters in row?)\n" 'Check row %i of table %i in your document. Row contents:\n%s' % ( tr_index+1, self.table_count, row_content ) ) span = 1 if 'colspan' in td.keys(): span = int(td.get('colspan')) td.set('colspan', str(span+1)) td_remove.append( (tr, tr[td_index]) ) else: td_last_active_index = td_index td_index += 1 def _update_rowspan_attrib(self, tbody, tr_index, td_index, td_remove): """ Update 'rowspan' attributes in 'td' entries """ # Look for '^' (vertical align top) or '=' (bottom) in the marker marker = tbody[tr_index][td_index].text v_align = 'middle' if self.RE_valign_top.search(marker): v_align = 'top' if self.RE_valign_bottom.search(marker): if v_align == 'top': raise ValueError( 'Cannot use both ^ (top) and = (bottom) codes in a row span ' 'marker\nCheck row %i, column %i in table %i in your ' 'document' % (tr_index+1, td_index+1, self.table_count) ) v_align='bottom' # Starting from the current row, go up the rows and delete columns # until we hit a non-empty column (or the start of the table) row_num = tr_index while (row_num >= 0): td = tbody[row_num][td_index] if row_num == tr_index or self.RE_empty_cell.match(td.text): td_remove.append( (tbody[row_num], td) ) else: break row_num -= 1 # Update the colspan and valign attributes on the row td = tbody[row_num][td_index] if row_num >=0 else tbody[0][td_index] td.set('rowspan', str(tr_index-row_num+1)) td.set('valign', v_align) def run(self, parent, blocks): self.table_count += 1 # Save the block for later inspection rows = blocks[0].split('\n') # Let the TableProcessor do its work super(CellRowSpanBlockProcessor, self).run(parent, blocks) # Scan the original table text for adjacent columns and spanned rows td_remove = [] # List of td objects to be removed tr_index = 0 tbody = parent[-1].find('tbody') for tr in tbody: # Check for adjacent columns if self.RE_adjacent_bars.search(rows[tr_index+2]): self._update_colspan_attrib(rows[tr_index+2], tr_index, tr, td_remove) # Check for spanned rows td_index = 0 for td in tr: if self.RE_row_span_marker.match(td.text): self._update_rowspan_attrib(tbody, tr_index, td_index, td_remove) td_index += 1 tr_index += 1 # Remove unneeded td elements for tr, td in td_remove: if td in tr: tr.remove(td) def makeExtension(*args, **kwargs): return CellRowSpanExtension(*args, **kwargs)