Skip to content

Script Development for Converting Markdown and LaTeX to Images

Published: at 11:54 PM
Loading...

A few days ago, I wrote a QQ bot for fun and connected it to a third-party AI interface. Although there are plugins in LiteLoaderQQNT that can render Markdown and LaTeX formats, most people do not have these plugins. So, I wondered if I could write a script to convert text containing Markdown and LaTeX into images, and here is what I did.

Implementation Idea

After researching, I found that the following steps can be used:

  1. Convert the generated text into an HTML page;
  2. Use MathJax to render the LaTeX format in it;
  3. Take a screenshot of the page and save it.

JavaScript Script: Webpage Screenshot

First, write a JS script to take a screenshot of a given HTML page. The code is as follows:

const puppeteer = require('puppeteer');
const path = require('path');
const fs = require('fs');

// Get input and output paths from command line arguments
const args = process.argv.slice(2);
const inputPath = args[0];
const outputPath = args[1];

// Check if arguments are provided
if (!inputPath || !outputPath) {
  console.error('Usage: node script.js <input path> <output path>');
  process.exit(1);
}

// Resolve input path to absolute path
const resolvedInputPath = path.resolve(__dirname, inputPath);

// Check if input file exists
if (!fs.existsSync(resolvedInputPath)) {
  console.error('Input file not found:', resolvedInputPath);
  process.exit(1);
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setViewport({ width: 1200, height: 100 });

  // Open the page
  await page.goto('file://' + resolvedInputPath, { waitUntil: 'networkidle0' });

  // Try to wait for MathJax to finish rendering, wait up to 2 seconds
  try {
    await page.waitForFunction(
      'window.MathJax && window.MathJax.Hub && window.MathJax.Hub.getAllJax().length > 0',
      { timeout: 2000 }
    );
  } catch (e) {
    console.log('MathJax not fully loaded, but continue to generate image...');
  }

  // Take a screenshot and save to the specified output path
  const resolvedOutputPath = path.resolve(__dirname, outputPath);
  await page.screenshot({ path: resolvedOutputPath, fullPage: true });

  console.log('Screenshot Successfully:', resolvedOutputPath);

  await browser.close();
})();

Save the above code as screenshot.js.

Environment Setup

The above code depends on Node.js. First, configure the Node.js environment, and then install the Puppeteer package in the same folder with the following command:

npm install puppeteer

Python Script: Markdown to HTML

Create a new Python program md2img.py in the same folder to convert Markdown and LaTeX to images.

import markdown
from markdown.extensions import Extension
from markdown.treeprocessors import Treeprocessor

# MathJax
class MathJaxExtension(Extension):
    def extendMarkdown(self, md):
        md.treeprocessors.register(MathJaxProcessor(md), 'mathjax', 175)

class MathJaxProcessor(Treeprocessor):
    def run(self, root):
        for element in root.iter():
            if element.tag == 'span' and 'class' in element.attrib and 'math' in element.attrib['class']:
                element.tag = 'script'
                element.attrib['type'] = 'math/tex'
                element.text = element.text

def replace_brackets(input_string):
    input_string = input_string.replace('\\[', '?gzl?')
    input_string = input_string.replace('\\]', '?gzr?')
    input_string = input_string.replace('\\(', '?gxl?')
    input_string = input_string.replace('\\)', '?gxr?')
    return input_string

def covert_brackets(input_string):
    input_string = input_string.replace('?gzl?', '\\[')
    input_string = input_string.replace('?gzr?', '\\]')
    input_string = input_string.replace('?gxl?', '\\(')
    input_string = input_string.replace('?gxr?', '\\)')
    return input_string

def convert_markdown_to_html_with_latex(md_text):
    md = markdown.Markdown(extensions=['codehilite', 'fenced_code', MathJaxExtension()])
    html = md.convert(md_text)
    html = covert_brackets(html)
    return f"""
    <html>
    <head>
        <meta charset=\"UTF-8\">
        <script type=\"text/javascript\" src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML\"></script>
        <link rel=\"stylesheet\" href=\"https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/default.min.css\">
        <script src=\"https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js\"></script>
        <script>hljs.highlightAll();</script>
        <style>body {{ font-size: 30px; }}</style>
    </head>
    <body>
        {html}
    </body>
    </html>
    """

def markdown_to_html(md_text, html_out_path):
    html_output = convert_markdown_to_html_with_latex(replace_brackets(md_text))
    with open(html_out_path, 'w', encoding='utf-8') as f:
        f.write(html_output)
    print("HTML has finished.")

def html_to_image(html_path, img_path):
    import subprocess
    import os

    html_path = os.path.abspath(html_path)
    img_path = os.path.abspath(img_path)

    if not os.path.exists(html_path):
        raise FileNotFoundError(f"HTML file not found: {html_path}")

    node_command = ["node", "screenshot.js", html_path, img_path]

    try:
        result = subprocess.run(
            node_command,
            check=True,
            text=True,
            capture_output=True
        )
        print("screenshot.js output:", result.stdout)
    except subprocess.CalledProcessError as e:
        print("screenshot.js failed:", e.stderr)
        raise

    if not os.path.exists(img_path):
        raise FileNotFoundError(f"Image generation failed: {img_path}")

    print(f"Image generated successfully: {img_path}")

import os
import time

def markdown_to_image(md_text, img_out_path):
    tmp_dir = os.path.join(os.getcwd(), 'tmp', 'cache')
    os.makedirs(tmp_dir, exist_ok=True)
    timestamp = int(time.time())
    html_file_path = os.path.join(tmp_dir, f"temp_{timestamp}.html")
    markdown_to_html(md_text, html_file_path)
    try:
        html_to_image(html_file_path, img_out_path)
        print(f"Image generated successfully: {img_out_path}")
    except Exception as e:
        print(f"Image generation failed: {e}")
        raise
    finally:
        if os.path.exists(html_file_path):
            os.remove(html_file_path)

Environment Setup

The above Python program requires the markdown package. Install it with the following command:

pip install markdown

Usage Example

Create a new test.py and call md2img.markdown_to_image(md_text, img_out_path) in it to convert Markdown and LaTeX to an image.


Previous Post
Testing LaTeX Rendering