Dokumentation

1Introduction

The Analytics feature helps merchants to quickly analyze their transactions within wallee and get insights on how their bussiness is doing. With structured access to their data they can predict how to adapt to customer needs. Reports can be customized by writing SQL queries. All data analysis is handled by the PrestoDB open source SQL query engine for big data provided by Amazon Athena.

The Analytics Schema describes the data structures of the data which is exported from the main wallee database and can be queried through the Analytics Query Execution Service.

2The Analytics Query Execution API

The Analytics Query Execution Service is used to manage Analytics queries. It allows to submit queries, fetch results in JSON or CSV formats, monitor query execution and cancel running queries.

For the full API reference see the Analytics Query Execution Service reference documentation.

2.1Example: List Customers by Accumulated Transaction Amounts

In this example a query shall be executed which lists customers ordered by the amount they spent in the merchants space. Using the Analytics Schema we come up with the following PrestoDB SQL code which calculates the sum of transaction amounts grouped by customer:

SELECT SUM(completedAmount) AS total, customerId
FROM transaction
GROUP BY customerId
ORDER BY total DESC;

2.2Submit the Query

A query can be submitted for execution using the Submit Query method (HTTP POST). The request body is expected to contain an Analytics Query structure in JSON format.

To submit our example query we send the following JSON:

{
    "accountId": 1,
    "externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
    "queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC"
}

The query shall be executed by the account with ID 1 (parameter accountId). The actual SQL string is specified in the queryString parameter. Notice, we don’t provide any spaces here as we want to include all spaces belonging to the given account. Read more about the Space and Account Restriction later.

We also need to specify either a maxCacheAge or an externalId property. In our example we use an externalId containing the randomly generated UUID 0d74f422-3d7e-4f8b-a7c4-10be0769e09c. See the section Query Submission Idempotency for a more detailed explanation of these parameters.

The response for a submission request will be a Query Execution structure in JSON format. It will contain the id of the execution (which is needed to reference this execution in subsequent requests) as well as detail information about the query and the current state of the query execution:

{
	"account": 1,
	"externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
	"failureReason": null,
	"id": 1052,
	"processingEndTime": null,
	"processingStartTime": "2021-06-15T06:09:07.746Z",
	"queryString": "SELECT SUM(completedAmount) AS total, customerId  FROM transaction  GROUP BY customerId  ORDER BY total DESC",
	"scannedBytes": 0,
	"spaces": [],
	"state": "PROCESSING"
}

2.3Monitor the Query State

A submitted query is processed asynchronously. Initially the query will be in PROCESSING state and no results will yet be available. After the query has been processed and completed the state will change to PROCESSED and the query results can be fetched.

If the query execution fails or is canceled before completion, the state will change to FAILED resp. CANCELED instead and no results will be returned. In the former case the failureReason string contains information about the cause of the failure.

The current state of a previously submitted query can be inquired using the Execution Status method (HTTP GET) with an id URL parameter which contains the id value of the Query Execution response returned by the Submit Query method (see Submit the Query).

The response of the Execution Status method will again be a Query Execution. In the case of our example the request URL will look like this:

/api/analytics-query/status?id=1052

After reaching the final PROCESSED state we will get a response like the following:

{
	"account": 1,
	"externalId": "0d74f422-3d7e-4f8b-a7c4-10be0769e09c",
	"failureReason": null,
	"id": 1052,
	"processingEndTime": "2021-06-15T06:09:09.561Z",
	"processingStartTime": "2021-06-15T06:09:07.746Z",
	"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC",
	"scannedBytes": 311398,
	"spaces": [],
	"state": "PROCESSED"
}

2.4Cancel the Query Execution

A query execution in PROCESSING state can be canceled using Cancel Execution method (HTTP POST) containing a request body with the query execution ID of the Query Execution response returned by the Submit Query method (see Submit the Query). The request url will look like the following:

/api/analytics-query/cancel-execution?id=1052

Canceling a query will always return an empty response (unless there was an error). If the query execution to be canceled has already reached some final state (PROCESSED, FAILED or CANCELED) the canceling attempt will be silently ignored.

2.5Getting the Results

Once the Query Execution is in the PROCESSED state, using the Fetch Result service returns the result in a Query Result Batch. The HTTP GET request let’s you specify the URL parameter id to identify the Query Execution. Use maxRows to limit the number of rows in the result, e.g. 1 as shown in the example below. In addition, you can specify a timeout parameter for the maximum time in seconds to wait for the result if it is not yet available. Use 0 (the default) to return the result immediately.

/api/analytics-query-service--fetch-result?id=1052&maxRows=1

The response contains the JSON content of the result as rows which are described by the columns. Examine the below example response:

{
	"columns": [
		{
			"aliasName": "total",
			"columnName": "total",
			"description": null,
			"precision": 38,
			"referencedTable": null,
			"scale": 8,
			"tableName": null,
			"type": "decimal"
		},
		{
			"aliasName": "customerId",
			"columnName": "customerid",
			"description": null,
			"precision": 2147483647,
			"referencedTable": null,
			"scale": 0,
			"tableName": null,
			"type": "varchar"
		}
	],
	"nextToken": "QVNRNmZUY2lVQUg4dGQ1Nkltd3dFWjdIbFJ1L1RxL2ZNYUhmblhkV2ZnUlRBbTJpRlJ6WUsxSzVDblVnZEFjelF6Qm5wbU5uc05aam9aYld3NGVRdjIrbWtjYkpTdzZYcGc9PQ",
	"queryExecution": {
		"account": 2,
		"externalId": null,
		"failureReason": null,
		"id": 58,
		"processingEndTime": "2021-06-15T06:09:09.561Z",
		"processingStartTime": "2021-06-15T06:09:07.746Z",
		"queryString": "SELECT SUM(completedAmount) AS total, customerId FROM transaction GROUP BY customerId ORDER BY total DESC",
		"scannedBytes": 311398,
		"spaces": [],
		"state": "PROCESSED"
	},
	"rows": [
		[
			"8377.88000000",
			"30"
		]
	]
}

The result is paginated, meaning only a batch of rows is provided at once. This is configured with maxRows which are at maximum 1000. In order to receive the next batch, the nextToken can be used with Fetch Result to retrieve subsequent batches. The below example fetches the second row.

/api/client/analytics-query-service--fetch-result?id=1052&maxRows=1&nextToken=QVNRNmZUY2lVQUg4dGQ1Nkltd3dFWjdIbFJ1L1RxL2ZNYUhmblhkV2ZnUlRBbTJpRlJ6WUsxSzVDblVnZEFjelF6Qm5wbU5uc05aam9aYld3NGVRdjIrbWtjYkpTdzZYcGc9PQ

You can also download the query result in the CSV format by using the service Generate Download URL:

/api/analytics-query/generate-download-url?id=1052

The response body contains an Amazon S3 URL similar to the following:

https://...s3.eu-west-1.amazonaws.com/39779692-28b7-43fb-848c-21f8e748be81.csv?X-Amz-Security-Token=[...]&X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Date=20210614T083206Z&X-Amz-SignedHeaders=host&X-Amz-Expires=599&X-Amz-Credential=[...]&X-Amz-Signature=[...]

2.6Space and Account Restriction

A query must always be executed in the context of an authenticated account. The results of the query will be limited to the spaces which are associated with that account. A query may be limited to one space or multiple spaces but all spaces must belong to the executing account. Queries in spaces which do not belong to the executing account will fail with a permission error.

2.7Query Submission Idempotency

When submitting a query to Analytics you must specify either an External ID or a Maximal Cache Age to make the submission idempotent and prevent multiple submissions if multiple requests for the same query are received by the server. If both properties are specified in a request then the External ID will be preferred and the Maximal Cache Age ignored. If none is specified the request will fail with an error.

2.8External ID

The value of the externalId property is assigned by the client to a specific query to uniquely identify that query per account. If the server receives a query submission request for an account with an external ID which has already been used before for that account no new query will be submitted. Instead the query execution result of the original request with that ID will be returned. Note that the actual query in the request will be ignored in that case.

It is up to the client to generate the external IDs. Typical strategies are:

  • Generate a random UUID for each submission request (or some custom random ID with sufficient entropy to prevent collisions).

  • Keep a persistent counter which is atomically increased every time a query is submitted.

  • Store your queries in a database table with unique primary keys generated by the database and use these database keys for the external ID.

Note that the externalId cannot be used to later identify a previously submitted query!.

So you cannot use the external ID to request information about the status or fetch the results of a previously submitted query. Instead you will receive a query execution object as the response of a successfull submission (or resubmission) of a query. The ID of this execution object is used to identify the submitted query later on when requesting status information or fetching the result of the query.

2.9Maximal Cache Age

As an alternative to an External ID you can specify a maximal cache age (in minutes) using the maxCacheAge property (with a value of 1 or greater). When a query with a maximal cache age is submitted and an equivalent query (same query string, same executing account, same spaces) has been submitted not more than maxCacheAge minutes before, the query will not be resubmitted but instead the query execution result of the previously submitted query will be returned.

Note that the maximal cache age will be ignored if an external ID is specified in a request!