Create a static Datasource

Learn how to create a static Datasource using the API


💡 6 min read

How to create a Datasource in a workspace to help organize your data, add a source, upload plain text files, retrieve and download resources.

You'll reuse the authentication headers when creating workspaces and Datasources.

You can always check the results of the API calls on your Onna account.

# Requirements

Make sure that you're authenticated with your Onna instance.

Also please add the CGI Python library to parse a returned request header.

 





import cgi
import io
import json
import os
import requests
1
2
3
4
5

# Interrogate the API

Lets find out what some addable types are, and which ones are enabled. You can reuse the headers from the authentication script.

resp = requests.get(f"{base_url}/api/{container}/{account}/@enabledDSTypes", headers=headers)
sources = resp.json()
for source in sources:
    print(f"{source}")
1
2
3
4
[
    ConfluenceDatasource
    HangoutsDatasource
    ZendeskDatasource
    OnedriveDatasource
    GithubDatasource
    ImapDatasource
    SalesforceDatasource
    JiraDatasource
    BridgeDatasource
    DropboxDatasource
    TwitterDatasource
    SlackDatasource
    S3Datasource
    GDriveDatasource
    StaticDatasource
    BoxDatasource
    QuipDatasource
    MsTeamsDatasource
    CrawlerDatasource
    MGraphMailDatasource
    WorkplaceDatasource
    GMailDatasource
    SharepointDatasource
]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

You can see if there's anything in the account yet, by using the @data API

types = ["Workspace", "Rule", "Export"]
for item in types:
    resp = requests.get(f"{base_url}/api/{container}/{account}/@data?types={item}", headers=hdrs)
    print(f"{item}: {resp.json()")
1
2
3
4

If this is a new account, you may not have anything in there yet:

Workspace: {'updates': [], 'total': 0, 'deleted': [], 'cursor': None}
Rule: {'updates': [], 'total': 0, 'deleted': [], 'cursor': None}
Export: {'updates': [], 'total': 0, 'deleted': [], 'cursor': None}
1
2
3

# List Workspaces

resp = requests.get(
    f"{base_url}/api/{container}/{account}/@workspaces", headers=headers
)
print(f"Workspaces: {resp.json()}")
1
2
3
4

If you haven't created or have any workspaces shared with you the output may look like below:

Workspaces: []
1

Even if you do have some, go ahead and create a new one.

# Create a Workspace

# create a Workspace
workspace_uid = None
payload = {"@type": "Workspace", "title": "Legal"}
resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces",
    headers=headers,
    data=json.dumps(payload),
)
workspace_uid = resp.json()["@uid"]
workspace_id = resp.json()["@id"]
print(f"workspace: {workspace_id}\n\n")
1
2
3
4
5
6
7
8
9
10
11

The UID is the unique identifier, generated internally.

The ID is also unique, but provides the URI of the object.

# List Datasources

First, you'll list any existing Datasources.

You're going to save the Static Datasource that you create so it can be used in subsequent API calls.

# List Datasources
canonical_ds_id = None
resp = requests.get(f'{base_url}/api/{container}/{account}/@data?types=_datasources_', headers=headers)
datasources = resp.json()
print("Existing datasources:")
for source in datasources:
    print(f"{source}")
1
2
3
4
5
6
7

Next, you'll create a static Datasource in the workspace that was just created and save the it into the canonical_ds_id if it exists.

Tip

The API handles providing unique identifiers for all items; you can give two things the same name, and the system will provide unique UIDs for them.

# Create a static Datasource

You can use the variable canonical_ds_id to store the UID of the Datasource you'll create in this step. Notice that the Datasource is created in the workspace that you created above.

This is how you'll refer to the Datasource you've created.

# add static datasource to workspace
payload = {
    "@type": "StaticDatasource",
    "title": "My Static Datasource",
    "type_sync": "auto",
}
resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}",
    headers=headers,
    data=json.dumps(payload),
)
canonical_ds_id = resp.json()["@name"]  # set it here
print(f"Static Datasource: {resp.json()['title']}")
1
2
3
4
5
6
7
8
9
10
11
12
13

# Upload a file

Uploading a file via the API requires two calls - POST to create the file, and a PATCH to provide the content for it.

Optionally, you can force Onna to process the file via @sendToCompute endpoint. This isn't necessary as Onna automatically detects when a file is done uploading and sends it into the processing engine.

Please download the files Mo_Chutu_of_Lismore.txt and A_Quiet_Place.txt and put them in the same directory you run your code from.

# Add content to the static Datasource
with open("Mo_Chutu_of_Lismore.txt", "rb") as read_file:
    fd = read_file.read()

# First POST the resource to our Datasource to create it, then PATCH it with content
payload = {
    "@type": "Resource",
    "@behaviors": ["onna.canonical.behaviors.metadata.IMetadata"],
    "title": "Mo_Chutu_of_Lismore.txt",
}
resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}",
    headers=headers,
    data=json.dumps(payload),
)
upload_resource_id = resp.json()["@id"]  # full URL
upload_resource_name = resp.json()["@name"]
# PATCH
hdrs = {
    "Accept": "text/plain",
    "Authorization": "Bearer {}".format(jwt_token),
    "Content-Type": "application/octet-stream",
    "x-upload-size": str(os.path.getsize("Mo_Chutu_of_Lismore.txt")),
    "x-upload-filename": "Mo_Chutu_of_Lismore.txt",
}
resp = requests.patch(
    f"{base_url}/api/{container}/{account}/{workspace_uid}/{canonical_ds_id}/{upload_resource_name}/@upload/file",
    headers=hdrs,
    data=fd,
)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30

Repeat uploading a to the Datasource with the file A_Quiet_Place.txt.

You can list the contents of the Datasource to see what resources it contains:

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/@getAllChildren",
    headers=headers,
)
resources = resp.json()["items"]
print(f"Resources in datasource {resources}")
1
2
3
4
5
6

You can also see the size of the Datasource:

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/@size",
    headers=headers,
)
print(f"Datasource size: {resp['bytes_processed']}")
1
2
3
4
5

By changing the context of the endpoint, you can also see the size of the workspace:

resp = requests.get(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/@size",
    headers=**headers**,
)
print(f"Workspace size: {resp['bytes_processed']}"
1
2
3
4
5

If the sizes seem too small, you can force Onna to process the file:

resp = requests.post(
    f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/{upload_resource_name}/@sendToCompute",
    headers=hdrs,
)
resp.status_code
1
2
3
4
5

# Retrieve resources from a Datasource

Now that you have a couple of files in your Datasource, you can use API calls to find the unique ID (UID) of each resource in a Datasource, issue a GET on an individual resource, and download resources.

You can reuse the headers set above.

You can get all the resources in a Datasource with the @getAllChildren endpoint.

# Retrieving resources from a Datasource
resp = requests.get(f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/@getAllChildren",
                    headers=headers)
resources = resp.json()['items']
print(f"Resources in datasource {resources}")
1
2
3
4
5

The full URL of the resources are in the ['items'] array.

You'll save that array into the variable resources and then use the first one, saving it in the variable resource_id.

# Save the first one for our next query
resource_id = resources[0]
print(f"Keys {resource_id}")
1
2
3

Next, you'll GET the resource you saved in resource_id and then download it to /tmp/ directory via the @download endpoint.


# Get an individual resource
resp = requests.get(f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/{resource_id}", headers=hdrs)
print(f"Resource {resp.json()}")

# Now, download this resource
resp = requests.get(f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/{resource_id}/@download/file", headers=headers, stream=True)
# Create the filehandle
fname = cgi.parse_header(resp.headers['content-disposition'])[1]['filename']
# Write it to /tmp
fname = f'/tmp/{fname}'
# Write the file to /tmp/
with open(fname, 'wb') as f:
    for chunk in resp.iter_content(1024):
        f.write(chunk)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

You can generate a download URL for a particular resource to without an authorization header via the @generateUrl endpoint.

resp = requests.get(f"{base_url}/api/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/{resource_id}/@generateUrl", headers=headers, stream=True)
print(resp.json())
1
2

Example response

"url": "https://enterprise.onna.com/{container}/{account}/workspaces/{workspace_uid}/{canonical_ds_id}/3cdd3e4b7a00433ab2f1a987f45031f8/7f08203db81c4ad991ec27e7b93ce0e9/@onnaDownload/File.pdf?token=841f28098d3442e1e137bbbad55183b5"}
1

The URL key in the returned JSON provides a direct link to a particular resource.

Notice that there is a one-time use token in the URL, which allows a single access to this resource.

# Recap

You set up a workspace, added a static Datasource and uploaded two files to it.

You also downloaded one of these files after getting the contents of our Datasource.

You generated a download URL for a resource that does not require an authentication header.

In the next chapter, you'll set up a search, view the results and set up a trigger that will send us an email whenever a specified term is encountered.

Last Updated: 11/23/2020, 11:55:13 AM