File Formats¶
Falara supports 9 file formats across 8 format families. Each format has a dedicated processor that extracts translatable text, preserves structure and formatting, and reassembles the translated output.
Supported Formats¶
| Format | Extensions | Description |
|---|---|---|
| HTML | .html, .htm |
Markup is extracted and replaced with typed placeholders; reassembled after translation |
| Markdown | .md, .markdown |
Document structure (headings, lists, code blocks) is preserved; only prose is translated |
| Word | .docx |
Paragraphs and table cells are extracted; character-level formatting (bold, italic) is preserved |
| Excel | .xlsx |
Individual cells are translated; formulas and cell references are skipped |
| PowerPoint | .pptx |
Text boxes and speaker notes are extracted and translated |
| XLIFF 1.2 | .xlf, .xliff |
Standard exchange format; <trans-unit> source/target pairs are translated |
| XLIFF 2.0 | .xlf, .xliff |
Newer XLIFF standard; <segment> elements are translated |
| Eurotext-XLIFF | .xlf |
Proprietary Eurotext format with extended metadata and CDATA-embedded HTML |
| JSON | .json |
Key-value pairs; values are translated, keys and structure are preserved |
.pdf |
Text is extracted from the document and translated; original layout is not reconstructed |
Limits¶
| Limit | Value |
|---|---|
| Single file upload | 50 MB |
| Batch total | 50 MB |
How File Processing Works¶
- Upload — File is received and MIME type is validated against the declared extension
- Extraction — Format processor extracts translatable segments; non-translatable content (code, formulas, metadata) is preserved as-is
- Translation — Segments pass through the multi-agent pipeline
- Injection — Translated segments are written back into the original document structure
- Download — Reassembled file is served via
GET /v1/jobs/{job_id}/download
The injection step happens on-demand at download time, not during the pipeline. The original file structure is never permanently modified.
XLIFF Handling¶
For XLIFF files, the processor reads <source> elements and writes translations back into <target> elements. Existing <target> content is replaced.
Eurotext-XLIFF files preserve: - CDATA wrappers on text content - HTML markup embedded inside CDATA (block tags, attribute order) - Proprietary metadata attributes
JSON Handling¶
For JSON files, only string values are translated. Keys, numbers, booleans, and structural elements remain unchanged. Nested objects and arrays are traversed recursively.
In this example, "Willkommen", "Startseite" are translated. "title", "items", "label", "url", and "/home" are preserved.
PDF Handling¶
For PDF files, text content is extracted and translated. Note that the original PDF layout is not reconstructed — the translated output is delivered as extracted text segments.
Format Detection¶
Format is detected from the file extension, then validated against the actual MIME type of the uploaded file. A mismatch returns 415 Unsupported Media Type.