Scribe Mutual EHI Export Format Specification

Version: ndjson-v1
Canonical URL: https://app.scribemutual.com/ehi-export-format/ndjson-v1
# b10 EHI Export Format Specification (`ndjson-v1`)

**Status:** Active (Phase 4 Certification Hardening)  
**Format Version:** `ndjson-v1`  
**Specification Revision:** `2026.04.26-r2`  
**Applies To:** ONC 170.315(b)(10) single-patient and population EHI export packages  
**Canonical Contract Companion:** `docs/b10-ehi-export-manifest-contract.md`

---

## Table of Contents

1. [Layperson Summary](#layperson-summary)
2. [Purpose and Audience](#purpose-and-audience)
3. [Normative Language](#normative-language)
4. [Authoritative ZIP Layout](#authoritative-zip-layout)
5. [Manifest Schema (`manifest.json`)](#manifest-schema-manifestjson)
6. [NDJSON Rules (FHIR R4)](#ndjson-rules-fhir-r4)
7. [Resource Types Included](#resource-types-included)
8. [Supplemental File Handling](#supplemental-file-handling)
9. [Checksums and Documentation URL Verification](#checksums-and-documentation-url-verification)
10. [Third-Party Parsing Procedure](#third-party-parsing-procedure)
11. [Worked Examples](#worked-examples)
12. [Versioning and Change Control](#versioning-and-change-control)
13. [Conformance Checklist](#conformance-checklist)
14. [Troubleshooting](#troubleshooting)

---

## Layperson Summary

This document is the external parsing guide for Scribe Mutual EHI exports.  
If a certifier or outside developer has only a ZIP file and this document, they should be able to:

- understand where every file must be located,
- validate whether the package is tampered or malformed,
- parse clinical NDJSON safely,
- and detect mismatches between manifest metadata and actual package contents.

In plain terms: the ZIP is a shipment, `manifest.json` is the packing list, `checksums.sha256` is tamper detection, and `clinical.ndjson` files are the machine-readable patient records.

---

## Purpose and Audience

This specification is for:

- independent validators,
- integration teams consuming exports,
- and certification auditors validating ONC 170.315(b)(10) behavior.

This document defines package format only. It does not define API authentication, permissions, or job scheduling internals.

---

## Normative Language

The words below are normative:

- **MUST**: mandatory for conformance
- **MUST NOT**: prohibited
- **SHOULD**: recommended unless a documented reason exists
- **MAY**: optional

---

## Authoritative ZIP Layout

Every export MUST be a ZIP archive with ZIP-relative paths only.

### Always Required Package-Level Files

```text
manifest.json
export-format-url.txt
checksums.sha256
```

### Required Clinical Payload Pattern

```text
patients/{patientId}/clinical.ndjson
```

### Optional Supplemental Payload Patterns

```text
patients/{patientId}/documents/{filename}
patients/{patientId}/media/{filename}
interoperability/{patientId}/{filename}.hl7
```

Layout constraints:

- Paths MUST NOT be absolute.
- Paths MUST NOT contain `..` path traversal segments.
- Population exports MAY contain zero patients (empty valid population package), one patient, or many patients.
- Single-patient exports MUST contain exactly one patient clinical path.

---

## Manifest Schema (`manifest.json`)

`manifest.json` MUST be valid JSON and MUST conform to the schema below.

### Top-Level Fields

| Field | Type | Required | Allowed Values / Rules | Meaning |
|---|---|---|---|---|
| `jobId` | string | yes | ULID | Export job identifier |
| `exportType` | string | yes | `single-patient` or `population` | Package export scope |
| `formatVersion` | string | yes | `ndjson-v1` | Package format contract version |
| `documentationUrl` | string | yes | Absolute URL in non-dev | Public parser specification URL |
| `createdAt` | string | yes | ISO 8601 timestamp | Package creation time |
| `requestedBy` | string | yes | non-empty | Requesting user/service identifier |
| `patientCount` | integer | yes | `>= 0` | Number of distinct exported patients |
| `fileCount` | integer | yes | must equal total ZIP file entries (`>= 3`; can be exactly `3` for empty population) | Total ZIP file entries |
| `files` | array | yes | one item per ZIP file | Inventory of package contents |

### `files[]` Entry Fields

| Field | Type | Required | Allowed Values / Rules | Meaning |
|---|---|---|---|---|
| `path` | string | yes | ZIP-relative; no `..` | File path inside ZIP |
| `category` | string | yes | `clinical-ndjson`, `document-binary`, `media-binary`, `interoperability-raw`, `manifest`, `checksum`, `format-url` | File classification |
| `patientId` | string or null | yes | patient logical ID or `null` for package-level files | Subject ownership |
| `mediaType` | string | yes | MIME type | Content type |
| `sha256` | string or null | yes | 64-char lowercase hex or `null` for generated contract files | Integrity digest |
| `sizeBytes` | integer or null | yes | `>= 0` or `null` for generated contract files | File size |
| `recordCount` | integer or null | yes | NDJSON line count for NDJSON files; `null` for non-NDJSON | NDJSON record quantity |
| `sourceSubsystem` | string | yes | source domain label | Origin of data |

Contract note:

- `manifest.json` and `checksums.sha256` entries may use `sha256: null` and `sizeBytes: null` in the inventory to avoid recursive checksum/self-size dependency.
- For all non-generated payload files, `sha256` and `sizeBytes` MUST be populated.

---

## NDJSON Rules (FHIR R4)

For every `patients/{patientId}/clinical.ndjson`:

- Content MUST be UTF-8.
- Each non-empty line MUST be a standalone JSON object.
- Each JSON object MUST contain `resourceType`.
- Payload MUST represent FHIR R4 resources (one resource per line).
- File SHOULD end with a trailing newline.
- Parsed line count MUST equal `manifest.files[].recordCount` for that path.

Consumers MUST NOT assume array wrappers. This is line-delimited JSON, not a JSON list document.

---

## Resource Types Included

The canonical FHIR resource types included by contract are:

- `Patient`
- `Encounter`
- `Condition`
- `Observation`
- `MedicationRequest`
- `AllergyIntolerance`
- `Procedure`
- `Immunization`
- `DiagnosticReport`
- `DocumentReference`
- `CareTeam`
- `Consent`
- `EpisodeOfCare`
- `List`
- `Media`
- `Provenance`
- `Communication`
- `Appointment`

Source of truth for the implementation list:

- `SM_backend/src/services/ehi/drsExportContract.js` (`CANONICAL_RESOURCES`)

Supplemental payload classes that may appear as file artifacts (not additional NDJSON resource types):

- clinical notes raw text
- CCDA XML
- Direct message attachments
- raw HL7 messages

---

## Supplemental File Handling

Supplemental files are optional and appear only when available for the patient.

### Documents

- Path: `patients/{patientId}/documents/{filename}`
- Typical media types: `application/pdf`, `text/plain`, `application/xml`, `application/octet-stream`

### Media

- Path: `patients/{patientId}/media/{filename}`
- Includes non-document binary payloads allowed by scope policy

### Raw HL7

- Path: `interoperability/{patientId}/{filename}.hl7`
- Preserved as interoperability artifacts for downstream reconciliation or traceability

All supplemental files MUST:

- appear in `manifest.files[]`,
- carry checksum rows in `checksums.sha256`,
- and follow the same ZIP path safety constraints.

---

## Checksums and Documentation URL Verification

### `checksums.sha256`

Each line MUST use this exact pattern (double-space separator):

```text
{sha256_hex}  {relative/path/to/file}
```

Verification procedure:

1. Parse each line into expected hash and path.
2. Recompute SHA-256 for each ZIP entry except `checksums.sha256`.
3. Compare actual digest to expected digest.
4. Any mismatch is non-conformant.

### `export-format-url.txt` and `manifest.documentationUrl`

- `export-format-url.txt` MUST contain exactly one URL line.
- URL line MUST equal `manifest.documentationUrl`.
- In non-dev environments, `documentationUrl` MUST be absolute HTTPS and externally reachable.
- Implementations MAY expose multiple no-auth route aliases, but all aliases MUST serve byte-equivalent documentation content.

---

## Third-Party Parsing Procedure

A third-party parser SHOULD execute this exact order:

1. Open ZIP; enumerate file entries.
2. Assert required package-level files exist.
3. Parse `manifest.json`; validate required fields and types.
4. Validate `manifest.formatVersion === "ndjson-v1"`.
5. Parse `checksums.sha256`; verify digest format and path mapping.
6. Verify checksums for all non-checksum files.
7. Validate path safety (no absolute paths, no traversal).
8. Parse each `clinical.ndjson` line; assert JSON parse success and `resourceType` presence.
9. Reconcile counts:
   - ZIP file count vs `manifest.fileCount`
   - distinct patient clinical paths vs `manifest.patientCount`
   - NDJSON line counts vs per-file `recordCount`
10. Validate URL parity:
    - `export-format-url.txt` equals `manifest.documentationUrl`
11. Optionally fetch `documentationUrl` to confirm third-party discoverability.

Any failed check means the package is non-conformant.

---

## Worked Examples

### Example A: Minimal Valid Single-Patient Package

```text
manifest.json
export-format-url.txt
checksums.sha256
patients/patient-123/clinical.ndjson
```

### Example B: Multi-Patient Package With Supplemental Data

```text
manifest.json
export-format-url.txt
checksums.sha256
patients/patient-1/clinical.ndjson
patients/patient-1/documents/discharge-summary.pdf
patients/patient-2/clinical.ndjson
patients/patient-2/media/image-1.jpg
interoperability/patient-2/admission.hl7
```

Example multi-patient manifest facts:

- `exportType = "population"`
- `patientCount = 2`
- `fileCount = 8`
- `files[]` has one row per ZIP path above with matching `category`, `recordCount`, and checksums metadata.

### Example Invalid Pattern A: Manifest/File Drift

- `manifest.fileCount=8`, actual ZIP file entries=7  
Result: non-conformant (inventory mismatch).

### Example Invalid Pattern B: NDJSON Missing `resourceType`

- a parsed NDJSON line object has no `resourceType`  
Result: non-conformant (FHIR line invariant violation).

### Example Invalid Pattern C: URL Drift

- `manifest.documentationUrl` differs from `export-format-url.txt`  
Result: non-conformant (documentation pointer mismatch).

### Example Invalid Pattern D: Generated-File Metadata Drift

- `manifest.files[]` entry for `category = manifest` has non-null `sha256` while `checksums.sha256` entry has nullability rules violated  
Result: non-conformant (contract nullability mismatch for generated inventory rows).

---

## Versioning and Change Control

### Format Version

- `formatVersion` is currently fixed at `ndjson-v1`.
- Any new format MUST use a new version label (for example `ndjson-v2`) and retain backward parse guidance.

### Specification Revision

- This document revision: `2026.04.26-r2`.
- Revision updates MAY clarify parsing guidance but MUST NOT silently change format behavior for the same `formatVersion`.

### Change Log

- `2026.04.26-r2`: Added full manifest schema tables, explicit canonical resource-type list, worked multi-patient example, and formal versioning/change-control section for certification closeout.

---

## Conformance Checklist

A package is conformant only when all checks pass:

- required files are present,
- manifest schema and inventory validate,
- checksums validate for package contents,
- NDJSON lines parse and include `resourceType`,
- patient/file/record counts reconcile,
- documentation URL fields match,
- path safety rules hold.

Layperson summary: a third party can trust and parse the package if file layout, metadata, checksums, and NDJSON structure all agree.

---

## Troubleshooting

### Symptom: checksum mismatch

- Rebuild package and regenerate checksums after all payload files are finalized.
- Confirm the ZIP was not modified post-generation.

### Symptom: NDJSON parse failure

- Ensure one JSON object per line.
- Ensure UTF-8 encoding and no multiline object formatting.

### Symptom: patient counts do not reconcile

- Verify each exported patient has exactly one `clinical.ndjson` path.
- Confirm `manifest.patientCount` matches distinct patient clinical paths.

### Symptom: parser cannot reproduce expected output

- Run the validator script against the package first.
- If ambiguity remains, treat this as spec debt and update this document before release.