Bulk convert .docx/.xlsx to open formats on Windows using LibreOffice CLI and PowerShell
LibreOfficePowerShellAutomation

Bulk convert .docx/.xlsx to open formats on Windows using LibreOffice CLI and PowerShell

UUnknown
2026-03-06
10 min read
Advertisement

Automate reliable bulk conversion of .docx/.xlsx to ODT/ODS/PDF on Windows with LibreOffice headless and PowerShell — scripts, metadata tips, and production hardening.

Bulk convert .docx/.xlsx to open formats on Windows using LibreOffice CLI and PowerShell

Hook: If you're an IT admin or developer responsible for migrating thousands of Office files to open formats for archival, compliance, or cost reduction, you need a repeatable, reliable, and automatable pipeline that preserves formatting and metadata. In 2026 the pressure to reduce vendor lock-in and to meet long-term preservation standards (PDF/A, ODF) is higher than ever — here’s a production-ready approach using LibreOffice in headless mode driven by PowerShell.

Executive summary — most important first

Use LibreOffice's soffice.exe --headless CLI to convert DOCX/XLSX at scale, orchestrate batches and parallelism with PowerShell 7+, preserve file system timestamps and document metadata with ExifTool, and avoid common pitfalls like profile collisions and antivirus slowdowns. This article provides:

  • Practical PowerShell scripts (single-threaded, parallel, resumable)
  • Tips for preserving metadata and formatting
  • Production hardening advice: logging, retries, throttling, and validation
  • 2026-relevant notes on import filter improvements and archival formats

Why use LibreOffice headless in 2026?

By late 2025 and into 2026, the Document Foundation continued improving OOXML import filters and PDF/export fidelity. That makes LibreOffice a cost-effective, privacy-friendly conversion engine for mass migrations. Running soffice in --headless mode gives you a lightweight, scriptable backend without a GUI — ideal for Windows servers and automation pipelines.

Key benefits

  • Open-source, no licensing cost for mass processing
  • Better offline privacy compared to cloud conversions
  • Command-line control and filters for PDF/ODF/ODS export
  • Active improvements in OOXML compatibility in recent releases (2024–2026)

High-level workflow

  1. Scan repository for .docx/.xlsx files and build a job list
  2. Decide target format per type: .docx → .odt (or .pdf), .xlsx → .ods (or .pdf)
  3. Run LibreOffice headless with a unique UserInstallation per process to avoid profile collisions
  4. Post-process: copy filesystem timestamps, transfer document metadata (ExifTool), store logs and error reports
  5. Validate results and run spot checks

Practical PowerShell recipes

Below are three scripts: a simple single-threaded starter, a production-ready PowerShell 7 parallel pipeline with logging and resume capabilities, and a helper to copy metadata and timestamps using ExifTool.

Preparation (one-time)

  1. Install LibreOffice on Windows (standard installer). Locate soffice.exe, typically under C:\Program Files\LibreOffice\program\soffice.exe.
  2. Install PowerShell 7+ for ForEach-Object -Parallel. Download from Microsoft if you don't have it.
  3. Install ExifTool (recommended) to copy document metadata reliably: https://exiftool.org/
  4. Exclude conversion folders from real-time antivirus scanning to improve throughput.

1) Minimal single-threaded converter (good for testing)

$soffice = 'C:\Program Files\LibreOffice\program\soffice.exe'
$sourceRoot = 'C:\Data\Documents'
$targetRoot = 'C:\Data\Documents-Converted'

Get-ChildItem -Path $sourceRoot -Recurse -Include *.docx,*.xlsx | ForEach-Object {
    $relative = $_.FullName.Substring($sourceRoot.Length).TrimStart('\')
    $outDir = Join-Path $targetRoot ([IO.Path]::GetDirectoryName($relative))
    New-Item -ItemType Directory -Path $outDir -Force | Out-Null

    if ($_.Extension -ieq '.docx') {
        $format = 'odt'
    } elseif ($_.Extension -ieq '.xlsx') {
        $format = 'ods'
    }

    $args = @('--headless', '--nologo', '--convert-to', $format, '--outdir', $outDir, $_.FullName)
    Write-Output "Converting: $($_.FullName) → $format"
    Start-Process -FilePath $soffice -ArgumentList $args -Wait -NoNewWindow
}

2) Production-ready PowerShell 7 pipeline (parallel, resume, metadata and timestamp preservation)

This script assumes PowerShell 7 and ExifTool installed and in PATH. It creates a job list file to allow resume, uses a unique LibreOffice user profile per process, and throttles parallelism to the number of CPU cores.

# CONFIGURATION
$soffice = 'C:\Program Files\LibreOffice\program\soffice.exe'
$sourceRoot = 'C:\Data\Documents'
$targetRoot = 'C:\Data\Documents-Converted'
$jobList = 'C:\Temp\convert-jobs.csv'
$logFile = 'C:\Temp\convert-log.csv'
$parallel = [System.Environment]::ProcessorCount  # adjust if you want fewer concurrent processes

# Build or load job list (CSV: Source, Target)
if (-not (Test-Path $jobList)) {
    $jobs = Get-ChildItem -Path $sourceRoot -Recurse -Include *.docx,*.xlsx | ForEach-Object {
        $rel = $_.FullName.Substring($sourceRoot.Length).TrimStart('\')
        $outDir = Join-Path $targetRoot ([IO.Path]::GetDirectoryName($rel))
        if ($_.Extension -ieq '.docx') { $ext = 'odt' } else { $ext = 'ods' }
        $target = Join-Path $outDir ([IO.Path]::GetFileNameWithoutExtension($_.Name) + '.' + $ext)
        [PSCustomObject]@{ Source = $_.FullName; Target = $target }
    }
    $jobs | Export-Csv -Path $jobList -NoTypeInformation
}

$tasks = Import-Csv -Path $jobList

# Create a log file if missing
if (-not (Test-Path $logFile)) { "Time,Source,Target,Result,Message" | Out-File -FilePath $logFile -Encoding utf8 }

# Convert in parallel
$tasks | ForEach-Object -Parallel {
    param($soffice, $logFile)
    try {
        $src = $_.Source
        $tgt = $_.Target
        $outDir = Split-Path -Path $tgt -Parent
        New-Item -ItemType Directory -Path $outDir -Force | Out-Null

        # Skip if already up-to-date
        if (Test-Path $tgt) {
            $srcTime = (Get-Item $src).LastWriteTimeUtc
            $tgtTime = (Get-Item $tgt).LastWriteTimeUtc
            if ($tgtTime -ge $srcTime) {
                "$(Get-Date -Format o),$src,$tgt,SKIP,TargetUpToDate" | Out-File -FilePath $logFile -Append -Encoding utf8
                return
            }
        }

        # Unique temporary user profile for LibreOffice to avoid collisions
        $uid = [guid]::NewGuid().ToString()
        $userProf = "C:/Temp/libreprofile_$uid"
        New-Item -ItemType Directory -Path $userProf -Force | Out-Null
        $userProfUri = "file:///$userProf".Replace('\', '/')

        # Choose format by extension
        $format = if ($src.ToLower().EndsWith('.docx')) { 'odt' } else { 'ods' }

        $args = @('--headless', '--nologo', '--invisible', '--nocrashreport', '--nolockcheck', "--env:UserInstallation=$userProfUri", '--convert-to', $format, '--outdir', $outDir, $src)

        $proc = Start-Process -FilePath $soffice -ArgumentList $args -Wait -NoNewWindow -PassThru

        if ($LASTEXITCODE -eq 0 -or $proc.ExitCode -eq 0) {
            # Preserve timestamps
            $s = Get-Item $src
            $converted = Get-Item $tgt
            $converted.CreationTime = $s.CreationTime
            $converted.LastWriteTime = $s.LastWriteTime

            # Copy document metadata using ExifTool (if available)
            if (Get-Command exiftool -ErrorAction SilentlyContinue) {
                exiftool -TagsFromFile $src -all:all -overwrite_original $tgt | Out-Null
            }

            "$(Get-Date -Format o),$src,$tgt,OK," | Out-File -FilePath $logFile -Append -Encoding utf8
        } else {
            "$(Get-Date -Format o),$src,$tgt,ERROR,ExitCode=$($proc.ExitCode)" | Out-File -FilePath $logFile -Append -Encoding utf8
        }
    } catch {
        "$(Get-Date -Format o),$($MyInvocation.InvocationName),,EXCEPTION,$($_.Exception.Message)" | Out-File -FilePath $logFile -Append -Encoding utf8
    } finally {
        # Best-effort cleanup of user profile folder
        if (Test-Path $userProf) { Remove-Item -LiteralPath $userProf -Recurse -Force -ErrorAction SilentlyContinue }
    }
} -ArgumentList $soffice, $logFile -ThrottleLimit $parallel

3) Copy metadata and preserve internal properties (ExifTool examples)

ExifTool can move internal document metadata (title, author, keywords, custom properties) from the source container into the converted ODF/PDF. Use it as a post-step if your conversions change or drop metadata.

# Copy all metadata from source.docx into converted.odt
exiftool -TagsFromFile source.docx -all:all -overwrite_original converted.odt

# Copy only core properties (Title, Author, Keywords)
exiftool -TagsFromFile source.docx -Title -Author -Keywords -overwrite_original converted.odt

Tips to preserve formatting and fidelity

  • Keep LibreOffice up to date: major OOXML filter fixes landed in releases around 2024–2026. Newer versions generally improve fidelity.
  • Prefer ODT/ODS before PDF when you intend post-editing. Converting DOCX → ODT tends to keep structure, but expect small differences. Convert to PDF for archival snapshots.
  • Embed fonts in PDFs to avoid rendering differences on other systems — use the writer_pdf_Export filter when producing PDFs.
  • Use PDF/A for long-term archiving (PDF/A-1/-2). LibreOffice exposes PDF export filter options; test the filter string you need in a few files to confirm the exact option syntax.
  • Spot-check complex files (tables, floating frames, tracked changes, macros). Some features (VBA macros, DOCM) are not fully supported and will be lost in conversion.

PDF export example with filter options

To export to PDF with specific options you can pass the filter name and options. Example (syntax varies between versions — test locally):

$args = @('--headless','--convert-to','pdf:writer_pdf_Export:SelectPdfVersion=1;EmbedStandardFonts=true','--outdir',$outDir,$src)
Start-Process -FilePath $soffice -ArgumentList $args -Wait

Operational and performance considerations

  • User profile collisions: LibreOffice needs a per-process UserInstallation. Use unique temporary profiles per process (shown above) to avoid lock issues and stubborn instances.
  • Antivirus & I/O: Excluding your conversion directories from real-time scanning can dramatically increase throughput for large batches.
  • Disk I/O and temp directories: Prefer SSDs for the working directory and temp profiles. Monitor free space and cleanup temporary profiles regularly.
  • Retries and idempotency: Make scripts idempotent — skip targets that are newer than the source and re-run failed items only. Use a job list CSV and log for easy resume.
  • Throttling: Don't exceed CPU and memory capacity — LibreOffice is memory-hungry for complex files; use a lower -ThrottleLimit for large documents.

Troubleshooting common issues

soffice appears to hang or leave zombie processes

  • Ensure each process used a unique --env:UserInstallation and that temporary profiles are cleaned up.
  • Run with --nocrashreport --nolockcheck and monitor logs. Use task manager or Get-Process to identify stuck soffice.exe instances.

Output files missing metadata

  • Use ExifTool to copy document core properties after conversion.
  • Some custom properties may be stored differently — export the property list and map values as needed with a script.

Formatting broken in converted ODT

  • Try a newer LibreOffice build or test alternate filters (sometimes direct PDF export preserves appearance better).
  • For large-scale migration, sample a representative set of documents (complex tables, track changes, mail merge) and validate before mass-run.

Validation and quality assurance

Automated conversion isn't finished until you validate. A practical QA pipeline includes:

  • Automated file counts and size comparisons before/after
  • Random sampling of converted files with visual diffing (PDF render compare) and property checks
  • Checksums (Get-FileHash) and log reviews to ensure every job finished OK

Advanced: UNO scripting for fine-grained control

If the CLI doesn't expose the option you need (for example, custom metadata mapping, advanced filter settings, or preserving tracked changes in a particular way), use LibreOffice's UNO API from Python or Java. UNO gives you programmatic access to document models and export filters but requires more setup. Use CLI for most bulk jobs, and UNO for high-fidelity transformations of a smaller set.

In 2026 the focus in document conversion is on interoperability, archival standards, and AI-assisted validation. The Document Foundation has continued improving OOXML import filters; however, for absolute fidelity consider creating both ODF and PDF exports so you have an editable open-format copy and a visual-accurate archival copy. Expect further advances in 2026–2027 where conversion engines will expose richer filter options and automated layout-diff tools to flag rendering changes.

Quick checklist before you run a mass conversion

  • Install PowerShell 7+, LibreOffice, and ExifTool
  • Test conversion on samples across document complexity
  • Create a job list CSV for idempotency and resume support
  • Exclude conversion directories from AV, ensure SSD-backed temp space
  • Decide on target formats: ODT/ODS for editability, PDF/PDF-A for archival
  • Plan QA: logs, spot checks, and checksum validation

Actionable takeaways

  • Use unique LibreOffice profiles per process to avoid collisions when running parallel conversions.
  • Preserve file timestamps by copying CreationTime and LastWriteTime after conversion.
  • Preserve document metadata using ExifTool as a post-step to copy internal properties reliably.
  • Throttle concurrency to your CPU/memory capacity — PowerShell 7's ForEach-Object -Parallel with -ThrottleLimit is ideal.
  • Validate with spot checks and automated logging; maintain a job list for resumability.

Conclusion & call-to-action

Bulk converting Office files with LibreOffice headless and PowerShell is a scalable and cost-effective solution when set up correctly. Use the examples and hardening tips above: build a job list, adopt parallel processing carefully, preserve timestamps and metadata with ExifTool, and validate results before decommissioning originals.

Start small, measure fidelity on a representative sample, and iterate. The combo of LibreOffice headless + PowerShell gives you repeatable, auditable, and scriptable migrations that are production-ready in 2026.

Next steps: Clone these scripts into your environment, run a 100-file pilot, and review logs for formatting and metadata fidelity. If you want, paste a sample log or a problematic file name here and I’ll help craft a targeted filter or UNO approach to fix it.

Advertisement

Related Topics

#LibreOffice#PowerShell#Automation
U

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Advertisement
2026-03-06T03:15:52.645Z