How does TeXSmith work?¶
TeXSmith ingests Markdown (.md), HTML (.html), YAML (.yaml), and BibTeX (.bib), then runs them through a conversion pipeline to produce LATEX or a finished PDF.
Templates define the layout and expose slots that get filled with content from your sources. The template also relies on fragments which are extra layers for extending the features such as a bibliography, glossary, fonts, page geometry, or other typesetting options.
Internal pipeline¶
-
Collect and classify inputs The CLI and
ConversionServiceaccept Markdown/HTML documents, optional front matter YAML, and bibliography files.split_inputspeels off.bib/.bibtex, treats a lone YAML file as the only document when needed, and normalises any provided front matter. When documents share front matter, it is deep-merged into eachDocument, withpress.*metadata validated up front to avoid surprises later. -
Normalise documents to HTML
Document.from_markdownruns Python-Markdown with the bundled extensions (smallcaps, texlogos, index, Mermaid, raw LATEX fences, etc.), extracts front matter, and caches the resulting HTML.Document.from_htmlcan either keep the whole file or extract a selector (article.md-content__innerby default). Heading strategies are decided here (keep, drop, or promote the first heading intopress.title), numbering defaults are resolved, and slot directives declared in front matter (press.slot.*) are seeded into the document slot mapping. -
Bind the template and attributes
build_binder_contextresolves which template runtime to use (TemplateBinding) and which slots exist. Template attributes declared inmanifest.toml(TemplateAttributeSpec) are merged in a strict order: template defaults → fragment defaults → front matter (press.*or direct fields) → CLI/session overrides. Attribute ownership is enforced so two fragments or the template cannot claim the same attribute. Before anything renders, mustache placeholders in HTML, front matter, and template overrides are expanded against the merged context so later stages see concrete values. -
Split content into slots Slot requests come from CLI
--slot, front matter, or defaults.extract_slot_fragmentswalks the HTML to find the requested headings/IDs, pulls those sections out, and assigns them to template slots (abstract, mainmatter, appendix, etc.). Base heading levels and offsets are computed per slot so sectioning commands line up with the template’s depth configuration. Any missing selectors produce warnings and the remainder of the document flows into the default slot. -
Prime context, fragments, and attributes The binder context prepares runtime defaults: language, code engine/style, callout definitions, diagram backend, emoji mode, and bibliography map. Active fragments are resolved (from template extras or explicit overrides) and each fragment injects its context defaults plus owned attributes. Fragments are small, declarative building blocks (
fragment.tomlor a PythonBaseFragment) that emit pieces into specific slots (package/input/inline). Examples:ts-geometry,ts-fonts,ts-bibliography,ts-index, glossary, code. A fragment may skip rendering viashould_render(for example, bibliography and index fragments only activate when citations or index entries exist). -
Resolve partials LATEX output is assembled from Jinja partials (one per Markdown/HTML construct). The precedence is explicit: template overrides (
manifest.tomllatex.template.override) → fragment partials → core defaults insrc/texsmith/adapters/latex/partials. Both templates and fragments can declarerequired_partials; missing providers abort with aTemplateError. TeXSmith tracks which provider owns each partial so diagnostics clearly name the culprit. -
Render HTML fragments to LATEX Each slot fragment is rendered through
LaTeXRenderer, with fallback converters registered when external tools are unavailable. Runtime data (base_level, numbered flag, drop_title, bibliography map, partial providers, language, diagram backend) flows to every handler. TheDocumentStateaccumulates headings, citations, index terms, script usage/fallback font summaries, glossaries, snippets, callouts, and asset references so later stages can emit the right packages and backmatter. -
Fonts and script matching As text is rendered, the script detector (
texsmith.fonts.scripts) scans moving arguments (headings, captions, index entries) and wraps non-Latin runs in dedicated LATEX macros. A cached fallback index built from Noto coverage data chooses per-script font families and emits both font-switching commands and summary stats. The--fonts-infoflag surfaces the detected scripts, chosen families, and counts after the run. -
Bibliography and index resolution Bibliography data comes from
.bibfiles plus optional inline front matter entries (including DOI lookups with caching). Only cited keys are written to a generatedtexsmith-bibliography.bib, keeping outputs lean. Citations recorded inDocumentStatetrigger thets-bibliographyfragment, which injects package setup and backmatter hooks. Index terms collected by the Markdown extension sethas_index/index_terms, enabling thets-indexfragment to loadimakeidxhelpers and drop the\printindexblock into thefragment_backmatterslot. -
Template wrap and emission Slot outputs are merged back into the template entrypoint via
wrap_template_document, alongside template/fragment attributes, required assets, and optional manifest/debug artefacts. When running underTemplateSession, the fragments are also materialised as.texfiles so templates can\input{}or\usepackage{}them. The resultingTemplateRenderResultcarries the main.texpath, per-fragment outputs, bibliography path (if any), selected template engine, and shell-escape requirement, ready fortexsmith pdf/Tectonic to produce the final PDF.