Datastore API

DKAN offers a Datastore API as a custom endpoint for the Drupal Services module.

To import and otherwise control individual resources in the Datastore, use the datastore functions in the Dataset REST API.

This API is designed to be as compatible as possible with the CKAN Datastore API.

Parameters

  • resource_id (mixed) – id (string) or ids (array) of the resource(s) to be searched against.
  • filters (mixed) – array or string of matching conditions to select
  • q (string) – fulltext search
  • offset (int) – offset this number of rows
  • limit (int) – maximum number of rows to return
  • fields (array or comma separated string) – fields to return (default: all fields in original order)
  • sort (string) – comma separated field names with ordering
  • join (array) – array of fields to join from multiple tables
  • group_by (array) – array of fields to group by

Note

If limit is not used in a query, 10 records will be returned by default. If limit is used, the API allows users to ask for up to 100 records. To get more than 100 records, the API must be used as a user with the “Perform unlimited index queries” permission.

Aggregation functions

  • sum (string) – field to compute the sum
  • avg (string) – field to compute the average
  • min (string) – field to compute the minimum
  • max (string) – field to compute the maximum
  • std (string) – field to compute the standard deviation
  • variance (string) – field to compute the variance
  • count (string) – field to compute the count

URL format

Parameters passed by URL share a common format:

param_name[resource_alias][field_name]=value,value1
  • param_name: the param you are using (e.g. offset)
  • resource_alias(optional): an alias to reference an specific resource in further params.
  • field_name(optional): a field name used by the param name.
  • value: a list of values divided by commas

Note that resource_alias and field_name arguments are optional and depend on what you want to query. For example, if you need to limit the number of records, you need to use the limit parameter. However, it doesn’t make sense to specify an alias or a field in such a case. You only need to provide the number of records you need to retrieve:

...&limit=5

There is one exception: Even when the sort parameter shares the above syntax, it also accepts an alternative format:

...&sort[field1]=desc

Multiple queries

Sometimes you want to do mutiple datastore queries in one network request (e.g., to feed a data dashboard). In that case you can post a JSON object to http://EXAMPLE.COM/api/action/datastore/search.json with all the queries to perform.

The request body should have a format similar to this:

Request body

{
  "my_query": {
    "resource_id": {
      "states": "d2142282-9838-4cca-972f-f1741410417b",
      "gold_prices":"d3c099c6-1340-4ee5-b030-8faf22b4b424"
    },
    "limit": 5
  },
  "my_query1": {
    "resource_id": {
      "gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
    },
    "limit": 5
  }
}

Response

{
  "my_query": {
    "help": "Search a datastore table. :param resource_id: id or alias of the data that is going to be selected.",
    "success": true,
    "result": {
      "fields": [
        {
          "id": "nombre",
          "type": "text"
        },
        {
          "id": "state_id",
          "type": "int"
        }
      ],
      "resource_id": {
        "states": "d2142282-9838-4cca-972f-f1741410417b",
        "gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
      },
      "limit": 1,
      "total": 5,
      "records": [
        {
          "nombre": "Alabama",
          "state_id": "1",
          "feeds*flatstore_entry*id": "1",
          "timestamp": "1466096874",
          "feeds*entity*id": "13"
        }
      ]
    }
  },
  "my_query1": {
    "help": "Search a datastore table. :param resource_id: id or alias of the data that is going to be selected.",
    "success": true,
    "result": {
      "fields": [
        {
          "id": "date",
          "type": "datetime"
        },
        {
          "id": "price",
          "type": "float"
        },
        {
          "id": "state_id",
          "type": "int"
        }
      ],
      "resource_id": {
        "gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
      },
      "limit": 1,
      "total": 748,
      "records": [
        {
          "date": "1950-01-01",
          "price": "34.73",
          "state_id": "1",
          "feeds*flatstore_entry*id": "1",
          "timestamp": "1466036208",
          "feeds*entity*id": "12"
        }
      ]
    }
  }
}

Response formats

Requests can be sent over HTTP. Data can be returned as JSON, XML, or JSONP. To retrieve data in a different format, change the extension in the url.

Instead of using this:

http://EXAMPLE.COM/api/action/datastore/search.json

Use this:

http://EXAMPLE.COM/api/action/datastore/search.xml

Or this:

http://EXAMPLE.COM/api/action/datastore/search.jsonp

Limitations

  • The q parameter doesn’t work in combination with the join parameter.
  • Filters don’t work with float (decimals) values

Examples

The following is a simple example with two resources that contain four records each. Note that the resource id would be a UUID not single digit number in real scenario.

Resource 1:

country population id timestamp
US 315,209,000 1 1359062329
CA 35,002,447 2 1359062329
AR 40,117,096 3 1359062329
JP 127,520,000 4 1359062329

Resource 2:

country squarekm id timestamp
US 9,629,091 1 1359062713
CA 9,984,670 2 1359062713
AR 2,780,400 3 1359062713
JP 377,930 4 1359062713

Simple query example

http://EXAMPLE.COM/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[country]=AR,US&fields[]=country&fields[]=population,timestamp&sort[country]=asc

Returns the country, population, and timestamp fields for US and AR from dataset 1 sorting by the country in ascending order.

Joins

If you wish to query multiple tables, indicate the table as an array key in the following fields:

http://example.com/api/dataset/search?resource_id[pop]=d3c099c6-1340-4ee5-b030-8faf22b4b424&resource_id[size]=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[pop][country]=US,AR&join[pop]=country&join[size]=country

Returns the country, population, squarekm and id for “US” and “AR” from datasets 11 and 13.

Caching

GET and POST request are cached by Drupal. The params passed through the request are used to create a cache id to store the data to be retrieved on further requests.

Since Datastore API uses the Drupal cache system under the hood, the Datastore API cache will be cleared at the same time as the rest of the Drupal cache. This could be when the cache is wiped manually, or when the cache lifetime ends.

These options can be configured at admin/config/development/performance

Field Names

In order to get data for specific fields, you need to add the argument ‘fields’ as an array to your request:

http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&fields[]=country

If you wish to get multiple fields, add the ‘fields[]’ parameter followed by the field name as many times as fields needed, for example:

http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&fields[]=field_name_1&fields[]=field_name_2

Important Note: if the resource you are querying has a file with column names which contains spaces or capital letters (e.g. ‘School Name’), you should NOT specify the field in the request in that way, instead, it should be referenced as lower case with underscores instead of spaces (e.g. school_name).

Read more about changes to the Datastore after upgrading to DKAN 7.x-1.16 here

Filters

If you wish to filter data based on the value of a field, you’ll need to specify the ‘filters’ parameter in the request, it should be formatted like:

filters[field_name_1]=value,value2

When you need to specify filters for multiple fields, then you’ll just join them with &, like this:

filters[field_name_1]=value&filters[field_name_2]=value2

So, for example you could execute the query:

http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[country]=AR,US

That query will return the records in which the ‘country’ is set to US or AR.