Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GFM re-hrefs the href #8

Closed
rasoolsomji opened this issue Aug 30, 2020 · 9 comments · Fixed by #11
Closed

GFM re-hrefs the href #8

rasoolsomji opened this issue Aug 30, 2020 · 9 comments · Fixed by #11

Comments

@rasoolsomji
Copy link

Clumsy title but bear with me.

Using the 'blog' scaffold, you run the docs through wikidoc.py content_markdown(), which first (because compose() runs right-to-left) makes wikilinks, and then converts markdown to HTML (using GFM). But because you have already made your wikilinks nicely formatted in HTML, GFM ends up re-hrefing your href, for example:

from markdown import markdown as md
from mdx_gfm import GithubFlavoredMarkdownExtension
string = 'Here comes a link <a href="https://example.org" class="wikilink">link text</a>'
md(string, extensions=(GithubFlavoredMarkdownExtension(), ))
>>>'<p>Here comes a link <a href="<a href="https://example.org">https://example.org</a>" class="wikilink">link text</a></p>'

However this doesn't happen on transclude links:

string = '<aside>Here comes a link <a href="https://example.org" class="wikilink">link text</a></aside>'
md(string, extensions=(GithubFlavoredMarkdownExtension(), ))
>>>'<aside>Here comes a link <a href="https://example.org" class="wikilink">link text</a></aside>'
@gordonbrander
Copy link
Owner

Hmm, thanks. That seems like a rather yikes bug in GithubFlavoredMarkdownExtension? Inline HTML is valid markdown, so GFM shouldn't be doing that. I'll investigate.

@SinaKhalili
Copy link

Also ran into this and made a quick naive plugin that goes over and removes the double hrefs. Posting incase it helps.

def remove_double_hrefs(docs):
    for doc in docs:
        yield put(
            Doc.content, 
            doc,
            re.sub(r'<a href=\"http.+?<a href=\"(http.*?)\">\1</a>\"', r'<a href="\1"', doc.content)
        )

Just make sure to put it after the content_markdown plugin.

@twilightfades0
Copy link

Hey, does anyone know if there's a "proper" fix for this yet? I'm using the current HEAD version (f55fc63) following the directions for obsidian-lettersmith here and I'm seeing what looks like this issue. Any recommendations?

Here's a repository with my (stub) site that's giving problems: twilightfades0/digital_garden (currently on rev 01da77e5).

Thanks!

@tijptjik
Copy link

tijptjik commented Jun 10, 2021

@twilightfades0 - changing the order in which the markdown is processed solved the issue for me ( without resorting to @SinaKhalili's plugin)

The change sadly is to the content_markdown function in wikidoc.py, so you need to monkey-patch the code, or fork the library.

changing this:

    return compose(
        markdowntools.content,
        content_wikilinks(
            base_url,
            link_template,
            nolink_template,
            transclude_template
        ),
        annotate_links,
        summary_markdown
    )

to this

return compose(
            content_wikilinks(
                    base_url,
                    link_template,
                    nolink_template,
                    transclude_template
            ),
            markdowntools.content,
            annotate_links,
            summary_markdown
    )

@gordonbrander
Copy link
Owner

gordonbrander commented Jun 10, 2021

@tijptjik Thank you, I'll test your PR! I thought I had tried this and run into a different ordering issue, but my memory may be faulty here.

In the meantime, anyone that wants to can compose their own function, rather than having to monkey-patch the core function. One of my design goals for Lettersmith is to compose all the features from smaller "lego bricks" so that others can disassemble and rebuild/extend the features.

@tijptjik
Copy link

Thank you, @gordonbrander - for your response and your excellent tool!

lettersmith_py happily gobbles up a variety of markdown sources which I haven't tested against, so caveat mergor!

It solved the issue for Obsidian.md sources, but indeed could introduce issues in other places!

Indeed, the lego-like compositionality is really nice - I've used it to add support for image-links in the Wiki syntax which Obsidian also uses (e.g. ![[Pasted Image 2000.png]] --> <img src="/images/Pasted%20Image 2000.png">).

Thanks again for being so responsive!

@gordonbrander
Copy link
Owner

gordonbrander commented Jun 10, 2021

Copying comment from #10 (comment)

Tested locally. Unfortunately, #10 breaks transclusion. The reason is that Markdown runs first, so it wraps the wikilinks in a <p> tag. This is Markdown behaving as expected. We should run any to-HTML transformations BEFORE running markdown, since Markdown is supposed to leave HTML alone. The issue is that the GFM library has a bug where it does not leave HTML alone.

Will investigate another fix.

@gordonbrander
Copy link
Owner

gordonbrander commented Jun 10, 2021

The bug is in the py-gfm library, and it looks like this library may be abandonware soon zopieux/py-gfm#28. Investigate https://github.com/Zopieux/pycmarkgfm as possible replacement.

gordonbrander added a commit that referenced this issue Sep 26, 2021
Fixes #8.

GFM doesn't ignore inline HTML when rendering, and will produce bugs,
such as re-hrefing the href in a link tag.

GFM has essentially been abandoned. This PR replaces it with commonmark,
which implements Common Markdown (https://commonmark.org/).

Pros:

- Commonmark is maintained by readthedocs, so is a good bet in terms of
code durability.
- Edge-cases in CommonMark are well-specified.
- The library generates an AST, which we could use for enhancements
  later.

Cons:

- No auto-linking of bare URLs. You need to wrap URLs in <brackets>
- No br for soft linebreaks. you will need to add two spaces at the end
  of the line to create a BR.
@gordonbrander
Copy link
Owner

Just a heads-up that PR #11 fixes this by replacing GFM with commonmark. Commonmark is a well-specified flavor of markdown, and the library seems to be well-mainted by ReadTheDocs.io.

Note that commonmark does not have some of Github-flavored Markdown's enhancements, such as auto-linking bare URLs, and treating line breaks as BRs. If you use these features of GitHub-flavored markdown, upgrading will be a breaking change.

However, I feel that CommonMark provides a number of benefits, foremost that it is well-specified, and well-maintained. The PR linked above goes into the pros and cons.

See https://commonmark.org/ for more on the (minor) differences.

gordonbrander added a commit that referenced this issue Oct 1, 2021
Fixes #8.

GFM doesn't ignore inline HTML when rendering, and will produce bugs,
such as re-hrefing the href in a link tag.

GFM has essentially been abandoned. This PR replaces it with commonmark,
which implements Common Markdown (https://commonmark.org/).

Pros:

- Commonmark is maintained by readthedocs, so is a good bet in terms of
code durability.
- Edge-cases in CommonMark are well-specified.
- The library generates an AST, which we could use for enhancements
  later.

Cons:

- No auto-linking of bare URLs. You need to wrap URLs in <brackets>
- No br for soft linebreaks. you will need to add two spaces at the end
  of the line to create a BR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants