Sync

The Sync object represents the configuration and state for synchronizing a specific dataset from a Source to your local database. It acts as a builder that allows you to define how the data should be fetched, mapped, and stored.

While a Source represents the connection to an external system (the "where"), a Sync represents the specific data being transferred (the "what").

For Databases (SQL): A Source is the database connection. A Sync corresponds to a single table (e.g., users, orders). You will typically have one Source and multiple Syncs.

For Files (CSV): A Source is the file location. A Sync corresponds to the file's content. Since a CSV file typically contains only one dataset, there is usually a 1-to-1 relationship between the Source and the Sync.

Creating and Configuring a Sync

You create a Sync instance by calling the sync() method on a Source object. You must provide a dataset identifier, which tells the package which table or file to target.

// For a Database Source, the identifier is the table name
$sync = $source->sync('users');

// For a CSV Source, the identifier is the file path
$sync = $source->sync('/path/to/file.csv');

Once you have a Sync object, you can chain methods to configure it before calling run().

Sync Strategies

The strategy determines how data is fetched and updated. You select a strategy using $sync->withStrategy('full_refresh');

Available strategies:

full_refresh (Default)

How it works: Truncates the local table and re-imports all rows from the source.

Best for: Small datasets, CSV files, or sources that don't track changes (no updated_at).

Pros: Simple, guarantees consistency.

Cons: Slow for large datasets; IDs may change if not careful.

watermark

How it works: Tracks a "watermark" column (usually updated_at or an auto-incrementing id) and only fetches rows where the value is greater than the last successful sync.

Best for: Large, append-only logs or tables with a reliable updated_at timestamp.

Modes:

append_only: Inserts new rows only.

upsert: Updates existing rows based on the primary key.

content_hash

How it works: Calculates a SHA-256 hash of the row's content. It compares source hashes with local hashes to detect changes.

Best for: Sources that lack a reliable updated_at column but where you need to detect updates and deletions.

Features: Can detect deleted rows in the source (marking them as __is_deleted locally).

id_diff

How it works: Fetches a list of all IDs from the source and compares them with local IDs. It then fetches full data only for the new IDs.

Best for: APIs where fetching a list of IDs is cheap but fetching full records is expensive.

Column Mapping

You can control which columns are imported and what they are named locally using mapColumns().

$sync->mapColumns([
// Source Column => Local Column
'id'          => 'remote_id',
'first_name'  => 'name',
'email'       => 'email_address',

// Set to null to exclude a column
'password'    => null,
]);

If you do NOT call mapColumns(): The package defaults to a 1:1 mapping. All columns found in the source will be imported with their original names.

Defining the Destination Table

You can specify the name of the local table where data will be stored using $sync->toTable('my_local_users');

Automatic Table Naming (Recommended)

If you do not call toTable(), the package automatically generates a table name for you. This is the recommended approach because it handles Schema Versioning automatically.

Format: andach_{connector}_{source_name}_{dataset}_{version}

Example: andach_sql_legacy_erp_users_v1

Why use automatic naming? If the source schema changes (e.g., a new column is added), the package will automatically detect this drift.

It creates a new version (e.g., ..._v2).
It creates a new table for that version.
It runs the sync into the new table, leaving the old v1 table intact as a backup.

If you manually force a table name with toTable('my_table'), you are responsible for managing schema changes. If the source adds a column that isn't in your local table, the sync may fail.

Running the Sync

Finally, call run() to execute the synchronization. This method performs the extract and load process within a database transaction.

$source->sync('users')
->withStrategy('watermark', [
    'watermark_column' => 'updated_at',
    'mode' => 'upsert',
    'primary_key' => 'id'
])
->mapColumns([
    'id' => 'remote_id',
    'name' => 'full_name',
    'email' => 'email'
])
->run();

Metadata Columns

The package automatically adds several reserved metadata columns to your local tables to manage synchronization state and history. You should not use these names for your own mapped columns.

__id: The local primary key (BigInteger).
__source_id: The original ID from the source (nullable).
__content_hash: A hash of the row's content, used by the content_hash strategy.
__is_deleted: A boolean flag indicating if the row has been deleted in the source (used by content_hash and id_diff strategies).
__last_synced_at: The timestamp when the row was last synced/updated in the local table.

Creating and Configuring a Sync​

Sync Strategies​

Available strategies:​

full_refresh (Default)​

watermark​

content_hash​

id_diff​

Column Mapping​

Defining the Destination Table​

Automatic Table Naming (Recommended)​

Running the Sync​

Metadata Columns​