Datasource sync types and statuses

💡 3 min read

Learn about the various Datasource sync types and status available in Onna


Search an intermediate/advanced topic.

# Datasource synchronization types

Onna Datasources have various attributes and properties: owners, authentication credentials, and how the data is synchronized.

All cloud based Datasources, such as Dropbox or Slack, have three sync types available:

  1. auto: continuously sync
    Auto-sync means that Onna will perform a full sync first and will keep the Datasource and Onna in mirrored sync. Any deletions from the Datasource will be deleted in Onna, as well.
  2. one: one-time sync
    One-time is a one-way sync that collects information only once.
  3. arch: archive, like auto but without deletions
    Auto-sync and Archive means that Onna will perform a full sync first and will continuously add any new files generated at the Datasource. The sync type does not delete files deleted from the Datasource

Folder Datasources only support one-time syncs.

# Datasource synchronization statuses

You keep track of the state of a Datasource's sync status with the following values:

  1. pending: sent to processing pipeline. The Datasource is waiting to be processed by the spyder manager.
  2. created: after initial creation, not configured yet
  3. syncing: picked up by spyder and working
  4. synced: spyder finished
  5. error*: spyder errored and will not be retried
  6. failed: spyder failed and will be retried
  7. paused: synchronization is paused
  8. resuming: synchronization is picking up from where it was paused
  9. invited: Invited, sharing a Datasource with another user

The Datasource status is dependent on the spyder and the spyder manager.

For example, a Datasource with an auto sync type will be scheduled to run periodically, every 20 to 30 minutes. This job is controlled by the spyder manager which schedules syncs and dynamically allocates necessary resources to the spyder.

# Endpoints for Datasource-specific tasks

For the endpoints listed below, you can call them by appending to the Datasource URL.

@frontsearch: performs a search operation using the Datasource as the context

@sendToCompute: sends a resource to the processing pipeline.

You can GET @sendToCompute to force a resource in a Datasource to enter the processing pipeline.


@sendToSpyder: sends Datasource to spyder, which sends a message to RabbitMQ (opens new window) queue to reschedule a new spyder job.

This HTTP GET will force the Datasource to collect data


@refreshServiceCredentials: tries to refresh credentials that are stored in your "wallet" and associated with this Datasource. This endpoint is called internally by the spyder when it detects that credentials are out of date.

This POST will try to refresh the credentials of a Datasource. Your auth token, and specific headers are part of the payload:

  data=json.dumps({"X-Auth-Secret": ..., "Authorization": "Bearer ..."})

@getAllChildren: gets all children of a Datasource.


Because Datasources can be large, you also provide a way to control the page size, and using a cursor for the results:

  data=json.dumps({"page_size": 7})
  json.dumps({"page_size": 7, "scroll": resp["cursor"]}),
Last Updated: 11/25/2020, 4:48:46 PM