将 Markdown 和 LaTeX 转为图片的脚本开发

前几天写了个 QQ 的机器人玩玩，接了一个第三方的 AI 接口。虽然在 LiteLoaderQQNT 中有相关的插件可以渲染 Markdown 和 LaTeX 等格式的文本，但大部分人是没有这些插件的。因此，我想着能否写一个将含有 Markdown 和 LaTeX 格式的文本转化为图片的脚本，于是便有了以下操作。

实现思路

经过查阅资料，了解到可以通过以下步骤实现：

将生成的文本转化为 HTML 网页；
使用 MathJax 将其中的 LaTeX 格式渲染出来；
对页面进行截图并保存。

JavaScript 脚本：网页截图

首先，写一个 JS 脚本，用于将给定的 HTML 网页进行截图，代码如下：

const puppeteer = require('puppeteer');
const path = require('path');
const fs = require('fs');

// 从命令行参数获取输入和输出路径
const args = process.argv.slice(2);
const inputPath = args[0];
const outputPath = args[1];

// 检查是否提供了参数
if (!inputPath || !outputPath) {
  console.error('用法: node script.js <输入路径> <输出路径>');
  process.exit(1);
}

// 将输入路径解析为绝对路径
const resolvedInputPath = path.resolve(__dirname, inputPath);

// 检查输入文件是否存在
if (!fs.existsSync(resolvedInputPath)) {
  console.error('输入文件未找到:', resolvedInputPath);
  process.exit(1);
}

(async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.setViewport({ width: 1200, height: 100 });

  // 打开页面
  await page.goto('file://' + resolvedInputPath, { waitUntil: 'networkidle0' });

  // 尝试等待 MathJax 渲染完成，最多等待 2 秒
  try {
    await page.waitForFunction(
      'window.MathJax && window.MathJax.Hub && window.MathJax.Hub.getAllJax().length > 0',
      { timeout: 2000 }
    );
  } catch (e) {
    console.log('MathJax 未完全加载，但继续生成图片...');
  }

  // 截图并保存到指定输出路径
  const resolvedOutputPath = path.resolve(__dirname, outputPath);
  await page.screenshot({ path: resolvedOutputPath, fullPage: true });

  console.log('Screenshot Successfully:', resolvedOutputPath);

  await browser.close();
})();

将上述代码保存为 screenshot.js。

配置依赖环境

上述代码依赖 Node.js 环境，首先要配置好 Node.js 的环境变量，同时使用以下命令在相同文件夹下安装 Puppeteer 包：

npm install puppeteer

Python 脚本：Markdown 转 HTML

在相同文件夹下新建一个 md2img.py 的 Python 程序，用于将 Markdown 和 LaTeX 转换为图片。

import markdown
from markdown.extensions import Extension
from markdown.treeprocessors import Treeprocessor

# MathJax
class MathJaxExtension(Extension):
    def extendMarkdown(self, md):
        md.treeprocessors.register(MathJaxProcessor(md), 'mathjax', 175)

class MathJaxProcessor(Treeprocessor):
    def run(self, root):
        for element in root.iter():
            if element.tag == 'span' and 'class' in element.attrib and 'math' in element.attrib['class']:
                element.tag = 'script'
                element.attrib['type'] = 'math/tex'
                element.text = element.text

def replace_brackets(input_string):
    input_string = input_string.replace('\[', '?gzl?')
    input_string = input_string.replace('\]', '?gzr?')
    input_string = input_string.replace('\(', '?gxl?')
    input_string = input_string.replace('\)', '?gxr?')
    return input_string

def covert_brackets(input_string):
    input_string = input_string.replace('?gzl?', '\[')
    input_string = input_string.replace('?gzr?', '\]')
    input_string = input_string.replace('?gxl?', '\(')
    input_string = input_string.replace('?gxr?', '\)')
    return input_string

def convert_markdown_to_html_with_latex(md_text):
    md = markdown.Markdown(extensions=['codehilite', 'fenced_code', MathJaxExtension()])
    html = md.convert(md_text)
    html = covert_brackets(html)
    return f"""
    <html>
    <head>
        <meta charset="UTF-8">
        <script type="text/javascript" src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"></script>
        <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/styles/default.min.css">
        <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.8.0/highlight.min.js"></script>
        <script>hljs.highlightAll();</script>
        <style>body {{ font-size: 30px; }}</style>
    </head>
    <body>
        {html}
    </body>
    </html>
    """

def markdown_to_html(md_text, html_out_path):
    html_output = convert_markdown_to_html_with_latex(replace_brackets(md_text))
    with open(html_out_path, 'w', encoding='utf-8') as f:
        f.write(html_output)
    print("HTML has finished.")

def html_to_image(html_path, img_path):
    import subprocess
    import os

    html_path = os.path.abspath(html_path)
    img_path = os.path.abspath(img_path)

    if not os.path.exists(html_path):
        raise FileNotFoundError(f"HTML 文件未找到: {html_path}")

    node_command = ["node", "screenshot.js", html_path, img_path]

    try:
        result = subprocess.run(
            node_command,
            check=True,
            text=True,
            capture_output=True
        )
        print("screenshot.js 输出:", result.stdout)
    except subprocess.CalledProcessError as e:
        print("screenshot.js 执行失败:", e.stderr)
        raise

    if not os.path.exists(img_path):
        raise FileNotFoundError(f"图片生成失败: {img_path}")

    print(f"图片生成成功: {img_path}")

import os
import time

def markdown_to_image(md_text, img_out_path):
    tmp_dir = os.path.join(os.getcwd(), 'tmp', 'cache')
    os.makedirs(tmp_dir, exist_ok=True)
    timestamp = int(time.time())
    html_file_path = os.path.join(tmp_dir, f"temp_{timestamp}.html")
    markdown_to_html(md_text, html_file_path)
    try:
        html_to_image(html_file_path, img_out_path)
        print(f"图片生成成功: {img_out_path}")
    except Exception as e:
        print(f"图片生成失败: {e}")
        raise
    finally:
        if os.path.exists(html_file_path):
            os.remove(html_file_path)

配置依赖环境

上述 Python 程序需要安装 markdown 包，运行以下命令安装：

pip install markdown

调用示例

新建一个 test.py，在里面调用 md2img.markdown_to_image(md_text, img_out_path) 即可实现 Markdown 和 LaTeX 转图片。

参数说明：
- md_text 表示含有 Markdown 和 LaTeX 的文本；
- img_out_path 表示图片的生成路径。