Skip to content

File Formats

Falara supports 9 file formats across 8 format families. Each format has a dedicated processor that extracts translatable text, preserves structure and formatting, and reassembles the translated output.


Supported Formats

Format Extensions Description
HTML .html, .htm Markup is extracted and replaced with typed placeholders; reassembled after translation
Markdown .md, .markdown Document structure (headings, lists, code blocks) is preserved; only prose is translated
Word .docx Paragraphs and table cells are extracted; character-level formatting (bold, italic) is preserved
Excel .xlsx Individual cells are translated; formulas and cell references are skipped
PowerPoint .pptx Text boxes and speaker notes are extracted and translated
XLIFF 1.2 .xlf, .xliff Standard exchange format; <trans-unit> source/target pairs are translated
XLIFF 2.0 .xlf, .xliff Newer XLIFF standard; <segment> elements are translated
Eurotext-XLIFF .xlf Proprietary Eurotext format with extended metadata and CDATA-embedded HTML
JSON .json Key-value pairs; values are translated, keys and structure are preserved
PDF .pdf Text is extracted from the document and translated; original layout is not reconstructed

Limits

Limit Value
Single file upload 50 MB
Batch total 50 MB

How File Processing Works

  1. Upload — File is received and MIME type is validated against the declared extension
  2. Extraction — Format processor extracts translatable segments; non-translatable content (code, formulas, metadata) is preserved as-is
  3. Translation — Segments pass through the multi-agent pipeline
  4. Injection — Translated segments are written back into the original document structure
  5. Download — Reassembled file is served via GET /v1/jobs/{job_id}/download

The injection step happens on-demand at download time, not during the pipeline. The original file structure is never permanently modified.


XLIFF Handling

For XLIFF files, the processor reads <source> elements and writes translations back into <target> elements. Existing <target> content is replaced.

Eurotext-XLIFF files preserve: - CDATA wrappers on text content - HTML markup embedded inside CDATA (block tags, attribute order) - Proprietary metadata attributes


JSON Handling

For JSON files, only string values are translated. Keys, numbers, booleans, and structural elements remain unchanged. Nested objects and arrays are traversed recursively.

{
  "title": "Willkommen",
  "items": [
    { "label": "Startseite", "url": "/home" }
  ]
}

In this example, "Willkommen", "Startseite" are translated. "title", "items", "label", "url", and "/home" are preserved.


PDF Handling

For PDF files, text content is extracted and translated. Note that the original PDF layout is not reconstructed — the translated output is delivered as extracted text segments.


Format Detection

Format is detected from the file extension, then validated against the actual MIME type of the uploaded file. A mismatch returns 415 Unsupported Media Type.