Convert CHM to PDF: Quick Guide for Windows and Mac

Preserve Formatting When Converting CHM to PDF

Converting CHM (Compiled HTML Help) files to PDF is common when you need portable, printable documentation. The biggest challenge is preserving original formatting—layout, fonts, images, table of contents, links, and code blocks—so the PDF remains readable and faithful to the source. This guide shows practical methods and step-by-step settings to maximize formatting fidelity across Windows, macOS, and Linux.

1. Choose the right conversion approach

  • Use a dedicated CHM-to-PDF tool when you want the least manual work and the best built-in handling of CHM structure (TOC, indexes, anchors).
  • Convert via extraction + HTML-to-PDF for more control over CSS, fonts, and layout; best when CHM contains complex styling.
  • Print-to-PDF from a CHM viewer for a quick result; quality depends on the viewer’s rendering.

2. Tools and why they matter

  • Dedicated converters (e.g., specialized CHM utilities): preserve TOC and internal links well and often offer batch processing.
  • HTML extraction + headless browser (Chromium/Puppeteer, wkhtmltopdf): gives fine control over CSS, page size, margins, and font embedding. Ideal for advanced formatting preservation.
  • CHM viewers with print support (Windows Help Viewer, SumatraPDF, xchm): convenient but may not embed fonts or preserve advanced CSS perfectly.
  • Command-line utilities (chm2pdf, chmsee + print or libchm-based tools): scriptable for consistent output across many files.

3. Extract CHM contents when possible (recommended for best fidelity)

  1. Extract HTML, images, CSS, and other assets from the CHM using a tool like 7-Zip (Windows) or chmlib utilities.
  2. Open the extracted HTML in a modern browser to inspect rendering.
  3. Fix or add a print stylesheet (print.css) to:
    • Set page size and margins
    • Force font-family and embed web fonts if needed
    • Ensure images scale correctly (max-width: 100%)
    • Remove UI-only elements and unnecessary scripts

4. Convert HTML to PDF with control over layout

  • Use a headless browser or renderer:
    • Puppeteer/Chromium:
      • Set print background graphics to true
      • Define paper size and margins
      • Enable header/footer templates for page numbers and titles
    • wkhtmltopdf:
      • Use –enable-local-file-access for local assets
      • Set –print-media-type to apply print CSS
  • Embed fonts: include web fonts in CSS with local paths or base64 to ensure PDFs display correctly on other systems.
  • Preserve images: keep original resolution; scale in CSS rather than downsampling prematurely.

5. Keep Table of Contents and internal links

  • If using a dedicated CHM converter, enable options to convert CHM TOC into PDF bookmarks.
  • For HTML-to-PDF, generate a PDF bookmark structure from the HTML headings (h1–h6) or use a tool that maps TOC files to PDF bookmarks.
  • Ensure anchor links are preserved by using a renderer that supports internal linking (Chromium-based tools do).

6. Handle code blocks, tables, and special formatting

  • Add print-specific CSS:
    • Use monospace font-family for code blocks
    • Apply overflow-wrap and word-break for long lines
    • Use table { border-collapse: collapse; page-break-inside: avoid; } and break rules to avoid splitting rows awkwardly
  • For large tables, consider converting wide tables to landscape pages or reducing font size in print CSS.

7. Batch conversion and automation tips

  • Script the extraction and conversion pipeline:
    • Extract CHM → apply print CSS → render to PDF with Puppeteer or wkhtmltopdf → merge PDFs and add bookmarks
  • Use consistent page size and fonts across the batch to maintain uniform appearance.

8. Common pitfalls and fixes

  • Missing fonts in PDF: embed fonts or use standard fonts in print CSS.
  • Broken images: ensure relative paths remain valid or use absolute/local file access during rendering.
  • Loss of hyperlinks: choose a renderer that preserves anchors and enable link conversion options.
  • TOC/bookmarks missing: use converters that explicitly map CHM TOC to PDF bookmarks or generate bookmarks from headings.

9. Quick step-by-step for best results (recommended)

  1. Extract CHM contents.
  2. Add/adjust a print.css enforcing fonts, page size, margins, and image rules.
  3. Use Puppeteer/Chromium to render PDFs with print backgrounds, headers/footers, and bookmark generation.
  4. Verify a sample PDF, fix CSS or assets, then batch-process remaining files.
  5. Optionally merge PDFs and add a combined TOC using a PDF tool.

10. Final checks before distribution

  • Verify embedded fonts and that text is selectable (not rasterized images).
  • Check bookmarks, internal links, and external link behavior.
  • Test on several PDF viewers (Adobe Reader, Preview, browser PDF viewers) to ensure consistency.

Following these steps will maximize formatting fidelity when converting CHM to PDF while keeping documentation usable and polished.

Comments

Leave a Reply