Fauna Data Manager

This project is no longer in development and is not supported by Fauna.

The Fauna Data Manager (FDM) is a terminal application that performs migration, export, import, backup, and restore tasks for Fauna databases. For example:

  • Copying a Fauna database, including its documents, collections, indexes, functions, and roles, at any point in time, to another Fauna database.

  • Importing and updating documents from:

    • JSON or CSV files, in the local filesystem or in an AWS S3 bucket,

    • any JDBC-compliant SQL database, such as MySQL or PostgreSQL,

    • another Fauna database.

  • Exporting and backing up documents to:

    • a local filesystem as JSON files,

  • Simple ETL tasks, such as:

    • changing a field/column name and/or data type,

    • specifying a "primary key" field (to use as a reference),

    • specifying the import time,

    • ignoring fields,

    • applying a simple merge, replace, ignore policy for specified schema types.

  • Copying an existing Fauna database into a new database, to establish initial schema and content, for testing or multi-tenant scenarios.

For more information on the Fauna Data Manager, see the following topics:

  • Install the Fauna Data Manager: describes the requirements and installation procedure.

  • Parameters: describes all of the parameters that specify the source, destination, and formatting of document fields.

  • Configuration: describes the format of the Fauna Data Manager configuration file, fdm.props, which can be used to record settings for multiple Fauna Data Manager invocations.

  • Format transformations: describes the syntax of format transformations, which are used to rename fields, ignore fields, and/or change field types during processing.

  • Examples: presents a number of import, export, and document copying goals and how to invoke the Fauna Data Manager to achieve them.

Limitations

The Fauna Data Manager is currently in preview mode: we’d like you to try it, but you should not use it on production databases.

The current release of the Fauna Data Manager has the following limitations:

  • Document history is not processed. Only the most recent version of each document is exported or copied.

  • Child databases are not processed. To process a child database, run the Fauna Data Manager with an admin key for that child database.

  • Keys and tokens are not copied. Since the secret for a key or token is only provided on initial creation, it is not possible to recreate existing keys and tokens. You would need to create new keys and tokens in the target database.

  • GraphQL schema metadata is not fully processed. This means that if you import an exported database, or copy one Fauna database to another, you need to import an appropriate GraphQL schema into the target database in order to run GraphQL queries.

  • Schema documents have an upper limit of 10,000 entries per type. If a source database contains more than 10,000 collections, indexes, functions, or roles, only the first 10,000 of each type are processed and the remainder are ignored.

  • When exporting a Fauna database to the local filesystem, only collections and their associated documents are exported. A copy of the schema documents describing collections, indexes, functions, and roles is copied to the file fauna_schema. Currently, that schema file cannot be used during import.

  • Fauna imposes collection-naming rules, specifically that collections cannot be named any of the following: events, set, self, documents, or _. Unfortunately, the Fauna Data Manager does not have the capability to rename collections during processing. If your CSV, JSON, or JDBC sources have filenames/tables that use these reserved names, processing terminates with an error.

  • While the Fauna Data Manager works on Windows, only limited testing has been done on that platform. You may experience unexpected platform-specific issues. We certainly plan to expand our Windows testing for the Fauna Data Manager for future releases.

Over time, we hope to remove many of these limitations and add new features. We love your feedback, and want to hear from you whether the Fauna Data Manager is useful to you, whether you encounter problems, and especially if you have suggestions for improvement! Let us know in the #fdm channel in our Community Slack.

Export file format

When the Fauna Data Manager creates a backup of a Fauna database to a filesystem, it creates one file per collection in the source database named after the source database collection.

Each exported file contains one JSON document per line, representing all of the documents that exist in the source database.

One additional file is created, called fauna_schema. It too contains one JSON document per line, and these documents record the collections, indexes, functions, and roles within the source database.

Processing synopsis

The Fauna Data Manager operates using multiple threads to achieve the best throughput possible.

The loader thread evaluates the source for documents, and populates a read queue to stream in documents to process.

The main processing thread fetches a document from the read queue, applies any Format transformations that may be enabled, and sends the document to a write queue.

The write thread fetches documents from the write queue as quickly as it can, and sends those documents to the specified destination.

When the destination is a Fauna database, documents are created using the Insert function.

Source documents that contain references result in practically-identical documents in the destination (history is not copied). When source documents do not contain references, or there is no suitable ID field that can serve as a reference (via Format transformations), destination documents get generated references. For the latter, repeated Fauna Data Manager runs result in multiple copies of the source documents.

Similarly, when a source document contains a timestamp, or contains a field that can be used as a timestamp (via Format transformations), the destination documents use the source document’s timestamp. For source documents without timestamps, the destination documents receive the current timestamp during processing.

If a source document has a reference, but the timestamp differs from an existing document in the destination database that has the same reference, a new version of the document is created in the destination database. This prevents overwriting of any existing documents in the destination database.

Is this article helpful? 

Tell Fauna how the article can be improved:
Visit Fauna's forums or email docs@fauna.com

Thank you for your feedback!