Very slow performance with some markdown options #2730
wch
commented
+++ Winston Chang [Feb 19 16 09:37 ]:
fast for all of those settings. So there's something about that
particular content that slows it down.
Does it contain `<` characters?
Thanks for the excellent, detailed bug report. I think this commit fixes the problem (I tested on your files). But let me know if it doesn't.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
When converting some files from Markdown to HTML, performance can be very slow, depending on the markdown variant and options selected. The time grows exponentially, as shown in the graph below.
For this example, I have a very basic input -- it's just raw HTML with some JSON content embedded in a
<script>
tag. (We're using markdown as an input format because sometimes the HTML is intermingled with markdown. But in this example, the actual content is just HTML.)index.html:
This is paired with a minimal template file:
And it's run through pandoc with:
The problem is that, with the content I have (the
"blah blah blah"
is replaced with a bunch of R code in a string), pandoc is extremely slow. Here's a graph of time, with 50KB, 100KB, and 150KB of text in the<script></script>
tags, with various flavors of markdown. Note the log y scale:For
markdown_strict
, the time for 50KB is 0.37 seconds; for 100KB, it's 3.1 seconds, and for 150KB, it's 25.5 seconds. If the input is a megabyte in size, the conversion time with this exponential growth rate would be about 30,000,000,000,000,000 seconds. My actual data is over two megabytes, so there would be many more zeros on there. :)In the graph, I've also compared it to
markdown
andcommonmark
, which are much faster, as well asmarkdown-markdown_in_html_blocks
andmarkdown+markdown_attribute
, which are just as slow asmarkdown_strict
. I would have expected themarkdown-markdown_in_html_blocks
andmarkdown+markdown_attribute
options to be faster thanmarkdown
, but that opposite appears to be true.The example input files are in https://github.com/wch/pandoc-hang, with a subdirectory for each input file size. For example, the 100KB input file is in:
https://github.com/wch/pandoc-hang/tree/master/simplified-100kb
I also tried changing the specific content in the
<script>
tags, and that makes a big difference in speed. In my use case, it's R code in a string, but when I replace it with just blank spaces, the conversion is fast for all of those settings. So there's something about that particular content that slows it down.