File Formats¶

Falara supports 9 file formats across 8 format families. Each format has a dedicated processor that extracts translatable text, preserves structure and formatting, and reassembles the translated output.

Supported Formats¶

Format	Extensions	Description
HTML	`.html`, `.htm`	Markup is extracted and replaced with typed placeholders; reassembled after translation
Markdown	`.md`, `.markdown`	Document structure (headings, lists, code blocks) is preserved; only prose is translated
Word	`.docx`	Paragraphs and table cells are extracted; character-level formatting (bold, italic) is preserved
Excel	`.xlsx`	Individual cells are translated; formulas and cell references are skipped
PowerPoint	`.pptx`	Text boxes and speaker notes are extracted and translated
XLIFF 1.2	`.xlf`, `.xliff`	Standard exchange format; `<trans-unit>` source/target pairs are translated
XLIFF 2.0	`.xlf`, `.xliff`	Newer XLIFF standard; `<segment>` elements are translated
Eurotext-XLIFF	`.xlf`	Proprietary Eurotext format with extended metadata and CDATA-embedded HTML
JSON	`.json`	Key-value pairs; values are translated, keys and structure are preserved
PDF	`.pdf`	Text is extracted from the document and translated; original layout is not reconstructed

Limits¶

Limit	Value
Single file upload	50 MB
Batch total	50 MB

How File Processing Works¶

Upload — File is received and MIME type is validated against the declared extension
Extraction — Format processor extracts translatable segments; non-translatable content (code, formulas, metadata) is preserved as-is
Translation — Segments pass through the multi-agent pipeline
Injection — Translated segments are written back into the original document structure
Download — Reassembled file is served via GET /v1/jobs/{job_id}/download

The injection step happens on-demand at download time, not during the pipeline. The original file structure is never permanently modified.

XLIFF Handling¶

For XLIFF files, the processor reads <source> elements and writes translations back into <target> elements. Existing <target> content is replaced.

Eurotext-XLIFF files preserve: - CDATA wrappers on text content - HTML markup embedded inside CDATA (block tags, attribute order) - Proprietary metadata attributes

JSON Handling¶

For JSON files, only string values are translated. Keys, numbers, booleans, and structural elements remain unchanged. Nested objects and arrays are traversed recursively.

{
  "title": "Willkommen",
  "items": [
    { "label": "Startseite", "url": "/home" }
  ]
}

In this example, "Willkommen", "Startseite" are translated. "title", "items", "label", "url", and "/home" are preserved.

PDF Handling¶

For PDF files, text content is extracted and translated. Note that the original PDF layout is not reconstructed — the translated output is delivered as extracted text segments.

Format Detection¶

Format is detected from the file extension, then validated against the actual MIME type of the uploaded file. A mismatch returns 415 Unsupported Media Type.