Google BigQuery
Introduction
This guide takes you through setting up Google BigQuery for use with Cyclr, as well as installing a Google BigQuery Connector.
You will need a Google Cloud Platform account.
Setup & Authentication
Overview
The Google BigQuery Connector supports 2 Authentication Types, which are explained in this guide:
OAuth 2.0 Authorization Code - requires user interaction to install the Connector.
Service Account - impersonates a domain account and requires less user interaction.
Remote Setup in Google Cloud Console - performed by Cyclr Partner
Depending on the Authentication Type you choose, the setup is slightly different after following the initial “common” setup:
Common Setup - Enabling the Google BigQuery API
Both Authentication Types require the following setup to be performed.
To access the Google BigQuery API endpoints, you need to enable the Google BigQuery API within the project in your workspace. Google's documentation on how to do this can be found here.
Then skip to the appropriate section below:
OAuth 2.0 Authorization Code Setup
To authenticate the Connector using the “OAuth 2.0 Authorization Code” Authentication Type, you need to create a web application inside your Google Cloud Platform account to get your OAuth 2.0 credentials:
From the Google Cloud Platform Credentials page, select CREATE CREDENTIALS > OAuth client ID. Create a project with the following settings:
Setting | Value |
---|---|
Application type | Web application |
Authorised redirect URIs | A URL containing your Cyclr Service Domain. This can be found under Settings > General Settings > Service Domain in your Cyclr console. For example, |
Make note of the Client ID and Client secret.
For more information, see Google's guide on how to create client credentials.
Service Account Setup
To authenticate the Connector using the “Service Account” Authentication Type, you need to do the following:
Enable the Google BigQuery API for a project within your workspace
Create a service account
Enable domain-wide delegation for the service account
Creating a Service Account
A service account needs to be created to allow admin-level access to Google BigQuery within your workspace. Google's documentation on how to do this can be found here. When creating the service account, set up the fields below as follows:
Grant this service account access to project: Set the service account role to
Owner
.Grant users access to this service account: Set the Service account admins role field to any users that will be administering the service account.
Make note of the OAuth 2 client ID and Email for your service account. You will need these to enable domain-wide delegation and to authorize your account with Cyclr. You can find this on the service accounts page.
When adding the service account key, set the Key type to JSON
, then open the downloaded .json key file and make note of the private_key field. You need this to authorize your account with Cyclr.
Enabling Domain-wide delegation for your Service Account
Domain-wide delegation needs to be enabled for your service account to allow it to access user data on behalf of users in your workspace. You need to be a "super administrator" of the workspace to configure this. Google's documentation on how to do this can be found here. When enabling domain-wide delegation, set up the fields below as follows:
Client ID: Set to the OAuth 2 client ID from the previous section.
OAuth scopes (comma-delimited): Add the following scope:
https://www.googleapis.com/auth/bigquery
for BigQuery access.
Partner Setup in Cyclr Console
Having performed the appropriate setup described above, go to your Cyclr Partner Console:
Go to Connectors > Application Connector Library.
Use the search box to locate the Google BigQuery Connector entry.
Select the Pencil button.
Select the Settings tab.
Enter the below values:
Value | Description |
---|---|
Client ID | The default client ID to use. |
Client Secret | The default client secret to use. |
Scopes | The scopes you want to use. Cyclr uses a default scope of |
Select Save Changes.
If you leave these values blank, they must be provided each time the Connector is installed.
Cyclr Connector Installation
Depending on the Authentication Type you’ve chosen, the installation of a Google BigQuery Connector is slightly different, as explained below.
You’ll also be able to provide a Project ID and a Dataset ID, but neither are required unless you wish to use the Methods in the Connector’s Tables Category. See the Obtaining Project ID and Dataset ID Values section below for details.
OAuth 2.0 Authorization Code Installation
You can provide the following values:
Value | Description |
---|---|
Project ID | The project ID of the project to use. |
Dataset ID | The dataset ID of the dataset to use. |
Service Account Installation
You can provide the following values:
Value | Description |
---|---|
Service Account Email | The email address of the service account. |
Service Account Administrator Email | The email address of the administrator of the service account. |
Service Account Private Key | The private key of the service account. |
Project ID | The project ID of the project to use. |
Dataset ID | The dataset ID of the dataset to use. |
Additional Information
Obtaining Project ID and Dataset ID Values
You will need a Project ID and a Dataset ID to use Methods in the Connector’s Tables Category.
Project ID
From the Google Cloud Platform dashboard:
Select the project name from drop down button in the navigation bar.
Make a note of the project ID listed under the ID heading. As an example, that might be:
myproject-389110
You can also find the project ID if you install the Google BigQuery Connector without a project ID or dataset ID set, then use the Utilities > Projects > List Projects Method.
After you this, you can re-authenticate the Connector and set the Project ID on it.
Dataset ID
From the Google BigQuery dashboard:
In the Explorer pane, select the arrow to the left of the project ID to expand it.
Select the arrow to the left of each dataset to view the tables within it and make a note of the dataset ID.
You can also find the dataset ID if you install the Google BigQuery Connector without a project ID or dataset ID set, then use the Utilities > Datasets > List Datasets Method.
After you this, you can re-authenticate the Connector and set the Dataset ID on it.
Create a Table Custom Object Category
You can create a Custom Object Category to access each table within a project and dataset as it's own method category. Create multiple Custom Object Categories to access multiple tables under a single connector installation.
You need to install the Google BigQuery connector with a Project ID and Dataset ID for table custom objects to function.
To create a table Custom Object, from the Edit Connector page of the Google BigQuery connector:
Under the Methods & Fields heading, select the Tables methods category.
Select the pink Copy this Category to create a Custom Category Object icon.
Select the Select object dropdown.
Select the table you want to create a Custom Object Category for.
Select Copy.
The methods within the new method category you created now target the selected table.
Create a Cycle to request a large set of data
Use the List New Table Data Incrementally or List Partial Table Data method to incrementally list data in a Cycle. You can use Generic Webhook Connector methods to allow the cycle to send a request to itself to continually run. This approach is useful for very large sets of data.
When you use the List Partial Table Data method, you need to track the last index of the table data returned in the current request and send it in the next request. To do this, add request fields for the Generic Webhook Connector then map the last index of the table data returned as additional steps, as explained below.
Prerequisites
Install the Google BigQuery connector.
Create a Table Custom Object that targets the required table.
Install the Generic Webhook Connector
To install the Generic Webhook Connector, navigate to the Utility Connectors page:
From the console, select Templates > Template Connectors > + Install New Utility.
From an account, select Connectors > + Install New Utility.
Select Install under the Generic Webhook Connector.
Update the Name field to a recognisable name.
Select Next.
Add request fields for the Generic Webhook Connector
You only need to do this step for the List Partial Table Data method.
To allow the next request to use the last index of the table data returned in the current request, you must add request fields to the Generic Webhook connector that track the index:
Navigate to the Edit Connector page for the Generic Webhook Connector:
From the console, select Templates > Template Connectors. Under the Installed Utility Connectors heading, select the Edit Connector icon next to the the Generic Webhook Connector.
From an account, select Connectors. Under the Installed Utility Connectors heading, select the Edit Connector icon next to the Generic Webhook Connector.
Add a request field to the POST method
Under the Methods and Fields heading, select HTTP Methods.
Select POST.
Under the Request Fields heading, select the Add Field icon.
Set the Field Location to
lastIndex
.Set the Display Name to
Last Index
.Set the Data Type to
Integer
.Select Create.
Add a request field to the Webhook method
Under the Methods and Fields heading, select Webhooks.
Select Webhook.
Under the Request Fields heading, select the Add Field icon.
Set the Field Location to
lastIndex
.Set the Display Name to
Last Index
.Set the Data Type to
Integer
.Select Create.
Create a Cycle
From the console:
Select Templates > Template Library.
Select Create New Template.
Enter a template name.
Select Create.
From an account:
Select Cycles.
Select Design New Cycle.
Enter a Cycle name.
Select Create.
Add methods to the Cycle
Add following methods to the Cycle:
Generic Webhook > HTTP Methods > POST
Generic Webhook > Webhooks > Webhook
Google BigQuery > Tables > List New Table Data Incrementally or List Partial Table Data
Tools > Delay: You can use the Delay tool to set the time between requests once all current table data has been retrieved.
Connect the methods
Connect the methods as follows:
In this example, add any additional Cycle methods between the true exit of the List Partial Table Data method and the POST method.
Configure a Cycle
Setup the POST to webhook loop
The POST method must target the Webhook method to allow the Cycle to send a request to itself to continually run:
Select the Step setup of the Webhook method.
Copy the webhook URL and close the window.
Select the Step setup of the POST method.
Select Select... next to the URL field.
Select Type a Value.
Enter the webhook URL into the text box and close the window.
Map the last index of the table data returned
This step is only required for the List Partial Table Data method.
Select the Step setup of the List Partial Table Data method.
Select Ignore next to the Start Index field.
Select Webhook.
Select Nothing Selected.
Select Last Index. This is the field added earlier in this process.
Close the window.
Select the Step setup icon of the POST method.
Select Ignore next to the Last Index field. This is the field added earlier in this process.
Select the List Partial Table Data method.
Select Nothing Selected.
Select Last Row Index.
Close the window.
Configure the Rows Per Execution setting
The rows per execution setting determines how many table rows are returned per Cycle execution. You need to manually set the rows per execution, which is dependant on the width of your table data. For example, a starting point of 10000
might be useful.
Select the Step setup of the List Partial Table Data or List New Table Data Incrementally methods.
Select Ignore next to the Rows Per Execution field.
Select Type a Value.
Enter the enter the rows per execution into the text box.
Close the window.
Configure the delay time
Use the delay step to set the time between cycle executions once the cycle retrieves all of the table data. When new table data is found, the cycle continues to run until it retrieves all table data and then reverts back to the delay.
For example, a delay duration of 12 hours makes the Cycle check for new table data once every 12 hours once it retrieves all of the table data.
Select the Step setup of the Delay method.
Set the duration and close the window.
Give dataset access to another user
You can give other users access to a dataset. For more information, see Google BigQuery's documentation on IAM access control.
Add access for another user from the Google BigQuery console:
From the Explorer pane, navigate and select the dataset.
From the dataset navigation menu bar, select Sharing > Permissions.
Select Add Principal.
Under the Add principals heading, in the New principals field, enter users, groups, domains, or service accounts to give access to.
Under the Assign roles heading, select the Select a role dropdown box to assign roles.
Select Save.
Insert table data delay
Because Google BigQuery needs to process and stream the data from Google’s internal servers to the table before you can access it, there might be a delay when you insert table data, before you can access the data.
If the insert call doesn't return an error, the data appears in your table within 24 hours.