纯前端实现 Word 文档读取与导出方案详解 | 极客日志

JavaScriptNode.js大前端

纯前端实现 Word 文档读取与导出方案详解

综述由AI生成概述本方案支持 Word 文档（.docx）的导入和导出，实现了编辑器与 Office 文档格式之间的无缝转换。整体架构如下：核心依赖库 | 库名 | 版本 | 用途 | | --- | --- | --- | | mammoth | 1.11.0 | Word 文档导入，将 .docx 转换为 HTML | | docx | 9.1.0 | Word 文档导出，将 JSON 转换为 .doc…

城市逃兵发布于 2026/4/6更新于 2026/5/2388K 浏览

概述

本方案支持 Word 文档（.docx）的导入和导出，实现了编辑器与 Office 文档格式之间的无缝转换。整体架构如下：

Word .docx 文件 ↓ (导入) mammoth 库解析 ↓ HTML 格式 ↓ Tiptap 编辑器 ↓ JSON Content ↓ (导出) docx 库生成 ↓ Word .docx 文件

核心依赖库

库名	版本	用途
mammoth	1.11.0	Word 文档导入，将 .docx 转换为 HTML
docx	9.1.0	Word 文档导出，将 JSON 转换为 .docx
markdown-it	14.1.0	Markdown 文档导入

Word 文档导入

文件位置

API 路由: src/app/api/import/route.ts
核心逻辑: src/lib/server/importDocument.ts

1. 导入流程

.docx.md.txt 用户选择文件 FormData 上传文件验证文件类型? mammoth 解析 markdown-it 解析纯文本解析生成 HTML 样式清理返回给前端 Tiptap 渲染

2. API 接口

端点: POST /api/import

请求格式: multipart/form-data

{
  file: File // 上传的文件对象
}

响应格式:

{
  success: true,
  html: string, // 转换后的 HTML
  format: 'docx' | 'markdown' | 'text', // 原始格式
  warnings?: string[], // 警告信息（如有）
  filename:  
}

相关免费在线工具

Keycode 信息
查找任何按下的键的javascript键代码、代码、位置和修饰符。在线工具，Keycode 信息在线工具，online
Escape 与 Native 编解码
JavaScript 字符串转义/反转义；Java 风格 \uXXXX（Native2Ascii）编码与解码。在线工具，Escape 与 Native 编解码在线工具，online
JavaScript / HTML 格式化
使用 Prettier 在浏览器内格式化 JavaScript 或 HTML 片段。在线工具，JavaScript / HTML 格式化在线工具，online
JavaScript 压缩与混淆
Terser 压缩、变量名混淆，或 javascript-obfuscator 高强度混淆（体积会增大）。在线工具，JavaScript 压缩与混淆在线工具，online
Base64 字符串编码/解码
将字符串编码和解码为其 Base64 格式表示形式即可。在线工具，Base64 字符串编码/解码在线工具，online
Base64 文件转换器
将字符串、文件或图像转换为其 Base64 表示形式。在线工具，Base64 文件转换器在线工具，online

async function docxToHtml(buffer: ArrayBuffer): Promise<ImportedDocumentResult> {
  const nodeBuffer = Buffer.from(buffer);
  const { value, messages } = await mammoth.convertToHtml({
    buffer: nodeBuffer,
    {
      // 样式映射：将 Word 样式映射到 HTML 标签
      styleMap: [
        'p[style-name="Heading 1"] => h1:fresh',
        'p[style-name="Heading 2"] => h2:fresh',
        'p[style-name="Heading 3"] => h3:fresh',
        'p[style-name="Heading 4"] => h4:fresh',
      ],
      // 图片处理：转换为 base64 内联图片
      convertImage: mammoth.images.inline(async (image) => {
        const base64 = await image.read('base64');
        return {
          src: `data:${image.contentType};base64,${base64}`,
        };
      }),
    }
  });
  // 清理样式（移除 text-indent 等）
  const sanitized = removeInlineTextIndentStyles(value.trim()) || '<p></p>';
  // 提取警告信息
  const warnings = messages?.filter((message) => message.type === 'warning').map((message) => message.message);
  return {
    format: 'docx',
    html: sanitized,
    warnings: warnings && warnings.length > 0 ? warnings : undefined,
  };
}

Word 样式	HTML 标签	说明
Heading 1	`<h1>`	一级标题，fresh 表示强制创建新标签
Heading 2	`<h2>`	二级标题
Heading 3	`<h3>`	三级标题
Heading 4	`<h4>`	四级标题
Normal (默认)	`<p>`	普通段落

convertImage: mammoth.images.inline(async (image) => {
  const base64 = await image.read('base64');
  return {
    src: `data:${image.contentType};base64,${base64}`,
  };
})

function removeInlineTextIndentStyles(html: string): string {
  return html.replace(/(style=)(['"])([^'"]*)(\2)/gi, (_match, prefix: string, quote: string, styles: string) => {
    const filtered = styles
      .split(';')
      .map((item) => item.trim())
      .filter((item) => item.length > 0 && !/^text-indent\s*:/i.test(item));
    if (filtered.length === 0) {
      return '';
    }
    return `${prefix}${quote}${filtered.join('; ')}${quote}`;
  });
}

<!-- 清理前 -->
<p style="text-indent: 2em;font-size: 14px;">段落内容</p>
<!-- 清理后 -->
<p style="font-size: 14px;">段落内容</p>

function textToHtml(buffer: ArrayBuffer): ImportedDocumentResult {
  const text = normalizeTextContent(buffer).trim();
  // 检测段落分隔符（双换行）
  const hasDoubleBreak = /\n{2,}/.test(text);
  const rawBlocks = hasDoubleBreak ? text.split(/\n{2,}/) : text.split(/\n/);
  const paragraphs = rawBlocks
    .map((block) => block.replace(/\n+/g, '\n').trim())
    .filter(Boolean)
    .map((paragraph) => {
      if (!hasDoubleBreak) {
        return `<p>${escapeHtml(paragraph)}</p>`;
      }
      // 段落内的单换行转换为 <br>
      const lines = paragraph.split('\n').map((line) => escapeHtml(line));
      return `<p>${lines.join('<br>')}</p>`;
    });
  return {
    format: 'text',
    html: paragraphs.join('\n'),
  };
}

纯前端实现 Word 文档读取与导出方案详解

概述

核心依赖库

Word 文档导入

文件位置

1. 导入流程

2. API 接口

更多推荐文章

相关免费在线工具

3. Word 文档解析 (mammoth)

核心函数：`docxToHtml`

样式映射策略

图片处理

4. 样式清理

`removeInlineTextIndentStyles` 函数

5. 纯文本和 Markdown 导入

纯文本导入

更多推荐文章

相关免费在线工具

纯前端实现 Word 文档读取与导出方案详解

概述

核心依赖库

Word 文档导入

文件位置

1. 导入流程

2. API 接口

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

3. Word 文档解析 (mammoth)

核心函数：docxToHtml

样式映射策略

图片处理

4. 样式清理

removeInlineTextIndentStyles 函数

5. 纯文本和 Markdown 导入

纯文本导入

微信扫一扫，关注极客日志

更多推荐文章

相关免费在线工具

核心函数：`docxToHtml`

`removeInlineTextIndentStyles` 函数