Skip to content

Problem with links from docx/odt to markdown #2689

Closed
Pullusb opened this Issue · 4 comments

2 participants

@Pullusb

linkmarkdown.docx
converting this document to markdown put a "\n" with an empty text link before each link and split unevenly when the link is a sentence.
The link is also made italic wich I don't want.
ex :

the dvorak keyboard (where the link is 'dvorak')
the[
](link)[*dvorak*](link)

the movie "le nom des gens". (where the link is ' "le nom des gens" ' )
The movie[
](link)[*"*](link)[*L*](link)[*enom des gens"*](link)

Note: this docx file was downloaded from a google doc.

@jkr
Collaborator

Weird -- they're actually different links in the docx xml file. I can collapse them, but that would only work if they go to the same place. I guess adjacent links to the same place should be collapsed in general. I wonder if this is something about how word handles French? Any way, thanks for reporting this -- I'll let you know as soon as I figure out the best way to proceed.

@Pullusb

Actually, when I write directly the word document or a google doc with links, the file is clean and the links are processed correctly by pandoc.
I know why the file is faulty.
It was originally a copy/paste from a text with links already generated by a markdown (on my ghost blog). It seems that the copy of this document create multiple links where it's suppose to be only one. Maybe the markdown/text conversion of the ghost platform isn't clean...

@jkr
Collaborator

Well, the docx reader should be robust against these things anyway, since documents come from all sorts of sources. I have a fix almost ready to push -- just fixing up a couple of function names. Is it okay if I use part of the file above as a test case?

@Pullusb

No problem, feel free to use it.

@jkr jkr added a commit that closed this issue
@jkr jkr Docx reader: Add a "Link" modifier to Reducible
We want to make sure that links have their spaces removed, and are
appropriately smushed together.

This closes #2689
2ee7752
@jkr jkr closed this in 2ee7752
@c-forster c-forster pushed a commit to c-forster/pandoc that referenced this issue
@jkr jkr Docx reader: Add a "Link" modifier to Reducible
We want to make sure that links have their spaces removed, and are
appropriately smushed together.

This closes #2689
28f31a1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.