Markdown extension notes; new extension h1h2_uplinks
Contents
A decent home for my extensions
This week I figured out how to put my Python Markdown extensions into a directory
of my choosing instead of forcing them to be in
/usr/lib/python#.#/site-packages/markdown/extensions.
First up, I couldn’t use a directory path containing markdown/extensions
(my initial attempt was /home/brian/projects/python/markdown/extensions
).
All attempts to import a module failed:
[brian@sparrow ~]$ export PYTHONPATH='/home/brian/projects/python' [brian@sparrow ~]$ python3 >>> import markdown >>> import markdown.extensions.auc_headers Traceback (most recent call last): File "<stdin>", line 1, in <module> ImportError: No module named 'markdown.extensions.auc_headers'
After trial and error, I determined that directories on PYTHONPATH
will be
used for importing if they have a file named __init__.py
in them, but with
the major side-effect of disabling access to the module that would otherwise
be found on sys.path
.
In the end I set up /home/brian/projects/python/markdown_extensions
. Note that
the directory is markdown_extensions
as opposed to extensions being a
subdirectory under markdown. I could have used pretty much anything other
projects/markdown/
, such as projects/md/xtn
–the only requirement being I
shouldn’t replicate the name of an existing module directory.
The next piece was making a minor change to the extensions themselves. Instead of importing markdown modules in this fashion:
from __future__ import absolute_import from __future__ import unicode_literals from . import Extension from ..preprocessors import Preprocessor from ..blockprocessors import BlockProcessor from ..util import etree import re
I do the following instead:
from __future__ import absolute_import from __future__ import unicode_literals from markdown import Extension from markdown.preprocessors import Preprocessor from markdown.blockprocessors import BlockProcessor from markdown.util import etree import re
The final part was updating jmd
to determine the correct path to use for the
extensions:
# Track down the path to the markdown extensions for PY_PATH in $(echo -e "import sys\nfor p in sys.path: print(p)" | python3) do [ -d $PY_PATH/markdown/extensions ] && MARKDOWN_EXT_PATH="$PY_PATH/markdown/extensions" done [ "$MARKDOWN_EXT_PATH" ] || die "Unable to locate path to markdown/extensions" # Append extensions to MARKDOWN_PY: # "markdown.extensions" is in $MARKDOWN_EXT_PATH # "markdown_extensions" is in ~/brian/projects/python # "extra" includes abbr, attr_list, def_list, fenced_code, footnotes, tables, smart_strong # "headerid" is not in the list because "toc" does this work now export PYTHONPATH="/home/brian/projects/python" for EXTENSION in extra meta sane_lists smarty toc urilfy \ gentoc_remove auc_headers autoxref h1h2_uplinks do [ -f "$MARKDOWN_EXT_PATH/$EXTENSION.py" ] && X="markdown.extensions" || X="markdown_extensions" MARKDOWN_PY="$MARKDOWN_PY -x $X.$EXTENSION" done
New extension: h1h2_uplinks
From the module:
On every H1 and H2 header in a doument, add a pair of links: one with the label “TOC” that goes to the Table of Contents entry for the header, and another with the label “Top” that goes to the top of the file. These links make it easier to navigate a large document, and are especially useful on touch screen devices.
This works by first adding id attributes to entries in the table of contents,
to serve as the targets for the TOC links mentioned above. Since I can’t tell
which entries are <h1>
and <h2>
(well, I could but I would have to do two
passes over the entire file) I simply added IDs to all of them:
<div id="toc" class="toc"> <ul> <li><a id="toc-0001" href="#week-of-october-22-28">Week of October 22-28</a><ul> <li><a id="toc-0002" href="#markdown-extensions">Markdown extensions</a><ul> <li><a id="toc-0003" href="#a-decent-home-for-my-extensions">A decent home for my extensions</a></li> </ul> </li> <li><a id="toc-0004" href="#new-extension-h1h2_uplinks">New extension: h1h2_uplinks</a></li> <li><a id="toc-0005" href="#pygments">Pygments</a></li> </ul> </li> <li><a id="toc-0006" href="#week-of-october-15-21">Week of October 15-21</a><ul> <li><a id="toc-0007" href="#python-markdown">Python Markdown</a></li> <li><a id="toc-0008" href="#python-markdown-extensions">Python Markdown extensions</a><ul> <li><a id="toc-0009" href="#gentoc_remove">gentoc_remove</a></li> <li><a id="toc-0010" href="#auc_headers-arbitrary-underline-character-headers">auc_headers (Arbitrary Underline Character headers)</ a></li> <li><a id="toc-0011" href="#autoxref">autoxref</a></li> <li><a id="toc-0012" href="#autoxref-extension">autoxref extension</a></li> </ul> </li> </ul> </li> <!-- (485 lines skipped) --> <li><a id="toc-0332" href="#titanitechcom-prank-successfailure-list">Titanitech.com prank success/failure list</a></li> <li><a id="toc-0333" href="#computer-assistance-client-list">Computer Assistance: Client List</a></li> <li><a id="toc-0334" href="#current-medications">Current medications</a></li> </ul> </li> </ul> </div>
The above is output from markdown, but it’s horribly formatted. Not that it
matters–HTML is processed as a stream and doesn’t necessarily need whitespace
and line breaks. They’re useful for humas, not computers. Here’s the same text
after being run through tidy -i --wrap
:
<h2 id="table-of-contents">Table of Contents</h2> <div id="toc" class="toc"> <ul> <li> <a id="toc-0001" href="#week-of-october-22-28" name="toc-0001">Week of October 22-28</a> <ul> <li> <a id="toc-0002" href="#markdown-extensions" name="toc-0002">Markdown extensions</a> <ul> <li><a id="toc-0003" href="#a-decent-home-for-my-extensions" name="toc-0003">A decent home for my extensions</a></li> </ul> </li> <li><a id="toc-0004" href="#new-extension-h1h2_uplinks" name="toc-0004">New extension: h1h2_uplinks</a></li> <li><a id="toc-0005" href="#pygments" name="toc-0005">Pygments</a></li> </ul> </li> <li> <a id="toc-0006" href="#week-of-october-15-21" name="toc-0006">Week of October 15-21</a> <ul> <li><a id="toc-0007" href="#python-markdown" name="toc-0007">Python Markdown</a></li> <li> <a id="toc-0008" href="#python-markdown-extensions" name="toc-0008">Python Markdown extensions</a> <ul> <li><a id="toc-0009" href="#gentoc_remove" name="toc-0009">gentoc_remove</a></li> <li><a id="toc-0010" href="#auc_headers-arbitrary-underline-character-headers" name="toc-0010">auc_headers (Arbitrary Underline Character headers)</a></li> <li><a id="toc-0011" href="#autoxref" name="toc-0011">autoxref</a></li> <li><a id="toc-0012" href="#autoxref-extension" name="toc-0012">autoxref extension</a></li> </ul> </li> </ul> </li> <!-- (653 lines skipped) --> <li><a id="toc-0332" href="#titanitechcom-prank-successfailure-list" name="toc-0332">Titanitech.com prank success/failure list</a></li> <li><a id="toc-0333" href="#computer-assistance-client-list" name="toc-0333">Computer Assistance: Client List</a></li> <li><a id="toc-0334" href="#current-medications" name="toc-0334">Current medications</a></li> </ul> </li> </ul> </div>
The next part was updating the <h1>
and <h2>
headings. Output from
markdown usually is as follows:
<h1 id="week-of-october-22-28">Week of October 22-28</h1> <h2 id="markdown-extensions">Markdown extensions</h2>
The h1h2_uplinks extension updates them to:
<div class="h1 header"> <h1 id="week-of-october-22-28">Week of October 22-28</h1> <div class="up-links"><a href="#toc-0001">TOC</a> | <a href="#toc">Top</a></div> </div> <div class="h2 header"> <h2 id="markdown-extensions">Markdown extensions</h2> <div class="up-links"><a href="#toc-0002">TOC</a> | <a href="#toc">Top</a></div> </div>
The final piece is some CSS to position the up-links div:
div.heading { position: relative; } div.up-links { text-align: right; position: absolute; bottom: 2px; right: 0px; } div.h1 div.up-links { bottom: 25%; right: 5px; } div.up-links a { color: grey; }
Here’s the extension’s code. The first part is a the extension object itself,
which has a property called toc_id
that’s used to stash information between
the calls to the tree processor and the post-processor:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
#!/usr/bin/python3 class H1H2uplinksExtension(Extension): """ Uplinks Extension. """ def extendMarkdown(self, md, md_globals): """ Add pieces to Markdown. """ md.registerExtension(self) # h1h2_id maps slugified H1/H2 headers (eg 'this-is-a-header') to their # IDs (eg, 'toc-####'). We need to stash the dict somewhere that's # accessible from both the tree processor and the post-processor, so # we make it a property of the Extension object. self.h1h2_id = dict() ## Add the tree processor to the list md.treeprocessors.add( "h1h2_uplinks", H1H2uplinksTreeprocessor(self),"<inline" ) # Insert the post-processor after inserting raw HTML md.postprocessors.add( "h1h2_uplinks", H1H2uplinksPostprocessor(self), ">raw_html" ) |
The tree processor identifies the H1 and H2 headers and makes note of them in
the toc_id
dictionary:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
#!/usr/bin/python3 class H1H2uplinksTreeprocessor(Treeprocessor): """ Track down the H1 and H2 headers, assign them IDs (e.g. 'toc-0001') and store the header/ID mappings in 'id' """ call_num = 0 def __init__(self, extobj): self.h1h2_id = extobj.h1h2_id def run(self, doc): self.call_num = self.call_num + 1 h1h2_counter = 0 for elem in doc: if elem.tag in ('h1', 'h2'): h1h2_counter = h1h2_counter + 1 target = "toc-{:04d}".format(h1h2_counter) self.h1h2_id[slugify(str(elem.text), '-')] = target |
Finally the post-processor adds id attributes to the H1 and H2 headers in the
table of contents, and adds the up-links <div>
wrappers to the headers in
the main body of the file:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 |
#!/usr/bin/python3 class H1H2uplinksPostprocessor(Postprocessor): """ Add "id=" attributes to H1 and H2 level entries in the Table of Contents """ call_num = 0 def __init__(self, extobj): self.h1h2_id = extobj.h1h2_id def run(self, text): """ Add 'id' attributes to links in the Table of Contents Parameter: * text: the HTML stream generated by markdown Return: Updated text Note: this method is actually called twice: once when the Table of Contents is generated, and again when the document is complete. This method runs only on the second call; ergo, if the 'toc' extensions is not loaded, this extension won't do anything. """ self.call_num = self.call_num + 1 if self.call_num == 1: return text # Locate the <div> holding the Table of Contents match = re.search(r'(?P<toc><div class="toc">.*?</div>)', text, re.DOTALL) if not match: return text start, end = match.span(1) RE_A = re.compile(r'(?P<P1> *(?:<li>)<a )(?:id="toc-[0-9]{4}" )?(?P<P2>href="#(?P<id>.*?)">.*</a>.*)') RE_H1_H2 = re.compile(r' *<(?P<h1_h2>h[12]) id="(?P<id>.*?)"') new_doc = [] # Phase 1: add 'id' attributes to items in the Table of Contents. for line in match.group('toc').split("\n"): # If line is '<a href="..."> ... </a>", add "id='toc-####'" to it m = RE_A.match(line) if m and m.group('id') in self.h1h2_id: line = '{}id="{}" {}'.format( m.group('P1'), self.h1h2_id[m.group('id')], m.group('P2') ) # Add this line to the new TOC code new_doc.append(line) # Phase 2: add 'up-links' div to h1 and h2 elements in the HTML for line in text[end+1:].split('\n'): match = RE_H1_H2.match(line) if match and match.group('id') in self.h1h2_id: new_doc.append('<div class="{} header">'.format(match.group('h1_h2'))) new_doc.append(' ' + line) new_doc.append(' <div class="up-links"><a href="#{}">TOC</a> | ' '<a href="#top">Top</a></div>'.format(self.h1h2_id[match.group('id')])) new_doc.append('</div>') else: new_doc.append(line) return '{}\n{}'.format(text[0:start-1], '\n'.join(new_doc)) |