# Advanced

Create Datasource via API

Creating a Datasource using the API


💡 9 min read

Note

This is an intermediate/advanced topic.

# Creating A Datasource In Onna With The API

# Requirements

  • Onna auth token and existing workspace
  • Cloud source auth

# Authenticate With Onna

Prior to creating the source, make sure that you're authenticated with your Onna instance.

# Authenticate With Cloud Source

The setup of authentication for enterprise sources is different from the authentication of non-enterprise sources.

For the purpose of this article it is assumed that you have already created your “authorized connection”.

Please review our support article for more information on authorized connection.

Each source in Onna will have similarities in their setup process, but there will also be differences because there are differences in the source software.

In other words, a Dropbox setup’s configuration may have different parameters from a Slack source configuration.

Info

The steps for a Slack configuration will be different if you choose to create a User-based “Custodian” collection or an archive.

This document is based on the custodian collection.

You will be selecting channels the user is a member of in selected workspaces and the user’s direct and multiparty messages.

Before creating a Datasource, you will need to obtain the appropriate wallet credentials. For this example, it is an OAuth token that you receive when you are setting up the Slack Enterprise collection.

# Types Of Wallet Credentials

  • User wallets
  • Account wallets

# User Wallets

Each user gets a wallet folder under their personal folder /user@onna.com/wallet.

# Account Wallets

There's a wallet also at the account level for enterprise sources /wallet which only admin users have access to.

Only wallet_credentials are allowed to be added to a wallet, and you must be authenticated and have the proper permissions to add a credential to a wallet. Once the credential has been added to a wallet, the system will be able to use it to collect data from cloud sources without user intervention.

# Create A SlackEnterprise Datasource

The first step to create a Datasource is a POST command to the Workspace you would like the Datasource to be created in.

The body of the POST command:

payload = {
  "@type": "SlackEDatasource",
  "sync_status": "created",
  "username": "demo@onna.com",
  "wallet_credentials": "abcdef123434c4470b51463bbc39b3be2",
  "data_types": [
    "resources"
  ],
  "title": "Slack Enterprise - Custodian Collection",
  "id": "slack-enterprise-custodian-collection"
}
1
2
3
4
5
6
7
8
9
10
11
resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}",
    headers=headers, # This needs to show Onna auth bearer
    data=json.dumps(payload),
)
datasource_id = resp.json()["@name"]
1
2
3
4
5
6

The @type is SlackEDatasource because you are creating a Slack Enterprise Source.

The source starts out with a sync_status of created

This will prevent the source from showing up for users until you have fully configured it.

The username in the payload above is the corresponding username for the wallet_credentials.

The wallet_credential is the UUID of the wallet_credential value that is available for your use.

The data_types section will be set to "resources" in an array because you want to obtain resources for this source, as opposed to identities.

The title is the title that users will see in application.

The id will be how the source’s URL will show and needs to be URL compliant.

After the source is created with the above payload, you can then use the source’s endpoint to query for information that you may need to gather in order to configure the rest of the sync parameters.

# Retrieve Slack Workspaces Or Teams

The first data you will need to retrieve for Slack Enterprise is the list of Workspaces or Teams.

This endpoint is applicable to Slack only.

A POST to the Datasource's URL (which was saved to the datasource_id variable) + /@getTeams

resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@getTeams",
    headers=headers, # This needs to show Onna auth bearer
)
1
2
3
4

Will return an array of objects in the form of:

[
  {
    "id": "T12345678",
    "name": "Demo Workspace",
    "description": "Workspace 1 for Demos"
  },
  {
    "id": "T12345679",
    "name": "2nd Demo Workspace",
    "description": "Workspace 2 for Demos"
  },
]
1
2
3
4
5
6
7
8
9
10
11
12

For Slack Enterprise sources, you will also want to obtain the user accounts.

You call, with a GET, the Datasource URL + /@getEnterpriseUsers

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@getEnterpriseUsers",
    headers=headers, # This needs to show Onna auth bearer
)
1
2
3
4

This will return a list of approximately 1000 users and offset value if the list is greater than 1000 users.

{
  "users": [
    {
      "id": "W014VSWN9GR",
      "name": "demo",
      "real_name": "demo",
      "teams": [
        "T12345678"
      ],
      "email": "demo@onna.com",
      "display_name": "Demo",
      "is_bot": false,
      "deleted": false
    },
    ...
     {
      "id": "W8WCRJAH0",
      "name": "demo.2",
      "real_name": "Demo 2",
      "teams": [
        "T12345679"
      ],
      "email": "demmo2@onna.com",
      "display_name": "demo.2",
      "is_bot": false,
      "deleted": false
    },
  ],
  "offset": "W8XEFQ7RV"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

The offset value can then be used to query for more users:

Datasource Url + /@getEnterpriseUsers?offset=W8XEFQ7RV

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@getEnterpriseUsers?offset=W8XEFQ7RV`",
    headers=headers, # This needs to show Onna auth bearer
)
1
2
3
4

This will return the next batch of users, in the same format as above.

Keep on retrieving users until you have found your desired user(s) or until the offset is null. In this example, you are interested in the user demo with id W014VSWN9GR

To sync all of the channels for the user you can PATCH your source with the following information.

payload = {
  "isEnterprise": false,
  "sync_status": "pending",
  "type_sync": "arch",
  "from_date": "2020-07-01T07:00:00.000Z",
  "include_channels": true,
  "include_groups": true,
  "include_org_channels": true,
  "include_org_groups": true,
  "use_custodian_collection": true,
  "sync_filters": {
    "user-W014VSWN9GR": {
      "all_selected": true,
      "selected": [],
      "filter_type": "user",
      "filter_name": "Demo User"
    },
    "dms": {
      "filter_type": "dms",
      "filter_name": "add-source-dialog.slack.threads-to-sync.direct-messages",
      "selected": [],
      "all_selected": true
    },
    "mpim": {
      "filter_type": "mpim",
      "filter_name": "add-source-dialog.slack.threads-to-sync.multiparty-messages",
      "selected": [],
      "all_selected": true
    },
    "workspace": {
      "all_selected": true,
      "excluded": [
        {
          "id": "T12345678"
        },
        {
          "id": "T12345679"
        }
      ],
      "filter_type": "workspace",
      "filter_name": "workspace"
    },
    "accounts": {
      "all_selected": false,
      "selected": [
        {
          "id": "W014VSWN9GR"
        }
      ],
      "excluded": [
      ]
    }
  }
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}",
    headers=headers, # This needs to show Onna auth bearer
    body=json.dumps(payload)
)
1
2
3
4
5

You then call @sendToSpyder?force=true on your Datasource to schedule this Datasource in the processing queue.

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@sendToSpyder?force=true",
    headers=headers, # This needs to show Onna auth bearer
)
1
2
3
4

# Create A Dropbox Business Datasource

The first step to create a Datasource is a POST command to the Workspace you would like the Datasource to be created in.

The body of the POST command:

{
  "@type": "DropboxEDatasource",
  "sync_status": "created",
  "username": "gcdemobcn@gmail.com",
  "type_sync": "one",
  "from_date": "2020-08-01T07:00:00.000Z",
  "to_date": "2020-08-27T06:59:59.999Z",
  "wallet_credentials": "d9c638034b3a4f4bac80487a63c7c0a8",
  "data_types": [
    "resources"
  ],
  "title": "Dropbox Business",
  "id": "dropbox-business"
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}",
    headers=headers, # This needs to show Onna auth bearer
    data=json.dumps(payload),
)
datasource_id = resp.json()["@name"]
1
2
3
4
5
6

In this case the @type is DropboxEDatasource because you're creating a Dropbox Business Source.

You start the source out with a sync_status of created.

This will prevent the source from showing up for users until you have fully configured it.

username is the corresponding username for the wallet_credentials.

The wallet_credential is the UUID of the wallet_credential value that is available for your use.

The data_types section will be set to resources in an array because you want to obtain resources for this source, as opposed to identities.

The title is the title that users will see in application.

The id will be how the source’s URL will show and needs to be URL compliant.

The from_date and to_date are optional fields setting the date boundaries for the synced content.

After the source is created with those properties, you can then use the source’s endpoint to query for information that you may need to gather in order to configure the rest of the sync parameters.

The first data you will retrieve for Dropbox Business are the user accounts you wish to sync. You can either verify a list of user accounts by the email addresses or return the list of the entire organization.

To verify the emails:

payload = {"users":["gcdemobcn@gmail.com", "user2@onna.com", "juciara@onna.com", "user4@onna.com","user3@onna.com","user5@onna.com"]}

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@getDropboxExistUsers",
    headers=headers,
    data=json.dumps(payload`)
)
1
2
3
4
5
6
7

The response for emails that do not exist is null. For users that do exist, you are returned their user ids and groups

{
   "gcdemobcn@gmail.com": {
       "display_name": "Allison Keane",
       "account_id": "dbid:AAD8HDsVw440pp2awCYeRROGEyn5Xq17Y8Y",
       "team_member_id": "dbmid:AADbJxVwEFrMuyUrRycnQZj_qLWZonuJ9BI",
       "email": "gcdemobcn@gmail.com",
       "joined": "2017-08-21T13:13:59Z",
       "groups": [
           "g:d0f1dc136deb23000000000000000002",
           "g:d0f1dc136deb230000000000000003af",
           "g:d0f1dc136deb230000000000000914b6"
       ],
       "status": "active"
   },
   "user2@onna.com": null,
   "juciara@onna.com": {
       "display_name": "Juciara Nepomuceno De Souza",
       "account_id": "dbid:AAAID-u07I1UdsM39kJCfzJUY1f-3a6lblw",
       "team_member_id": "dbmid:AACytrTd7M1PSZFGqPCk2shRI8nJP8a8zco",
       "email": "juciara@onna.com",
       "joined": "2019-11-08T11:47:10Z",
       "groups": [
           "g:d0f1dc136deb23000000000000000002",
           "g:d0f1dc136deb230000000000000b37e1",
           "g:d0f1dc136deb230000000000000b3881"
       ],
       "status": "active"
   },
   "user4@onna.com": null,
   "user3@onna.com": null,
   "user5@onna.com": null
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

To retrieve the entire list of user accounts requires querying against our asynchronous tasks API which is beyond the scope of this article.

You can also choose to sync all folders for the users.

The final option is to choose if you would like to include Paper files in your sync. This is set with the ignore_paper_docs field. Setting it to True will ignore paper docs, excluding them from your sync, while setting it to False will include them.

The sync_filters are based on the user id’s. If you want to collect team_folder drives for the selected users include an empty dictionary as the value for team_folders filter.

This will sync the team folders for the selected user(s).

payload =
{
  "isEnterprise": false,
  "@type": "DropboxEDatasource",
  "sync_status": "pending",
  "username": "gcdemobcn@gmail.com",
  "type_sync": "one",
  "from_date": "2020-08-01T07:00:00.000Z",
  "to_date": "2020-08-27T06:59:59.999Z",
  "wallet_credentials": "d9c638034b3a4f4bac80487a63c7c0a8",
  "data_types": [
    "resources"
  ],
  "title": "Dropbox Business",
  "sync_filters": {
    "team_folders": {},
    "dbid:AAD8HDsVw440pp2awCYeRROGEyn5Xq17Y8Y": {
      "all_selected": true
    },
    "dbid:AAAID-u07I1UdsM39kJCfzJUY1f-3a6lblw": {
      "all_selected": true
    }
  },
  "ignore_paper_docs": false,
  "search_user": null,
  "search_query": null
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

With the above payload, you will configure the Datasource to sync team_folders and the sync_status is also updated so that the source appears in the web UI:

resp = requests.patch(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@sendToSpyder?force=true",
    headers=headers,  # This needs to show Onna auth bearer
    data=json.dumps({payload}),
)
1
2
3
4
5

After the Datasource is patched with the above payload, you can schedule it in the processing queue.

Calling the endpoint @sendToSpyder?force=true on your Datasource will schedule it in the processing queue.

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{datasource_id}/@sendToSpyder?force=true",
    headers=headers, # This needs to show Onna auth bearer
)
1
2
3
4
Last Updated: 9/1/2020, 12:42:29 PM