Datastore API¶
DKAN offers a Datastore API as a custom endpoint for the Drupal Services module.
To import and otherwise control individual resources in the Datastore, use the datastore functions in the Dataset REST API.
This API is designed to be as compatible as possible with the CKAN Datastore API.
Parameters¶
- resource_id (mixed) – id (string) or ids (array) of the resource(s) to be searched against.
- filters (mixed) – array or string of matching conditions to select
- q (string) – fulltext search
- offset (int) – offset this number of rows
- limit (int) – maximum number of rows to return
- fields (array or comma separated string) – fields to return (default: all fields in original order)
- sort (string) – comma separated field names with ordering
- join (array) – array of fields to join from multiple tables
- group_by (array) – array of fields to group by
Note
If limit is not used in a query, 10 records will be returned by default. If limit is used, the API allows users to ask for up to 100 records. To get more than 100 records, the API must be used as a user with the “Perform unlimited index queries” permission.
Aggregation functions¶
- sum (string) – field to compute the sum
- avg (string) – field to compute the average
- min (string) – field to compute the minimum
- max (string) – field to compute the maximum
- std (string) – field to compute the standard deviation
- variance (string) – field to compute the variance
- count (string) – field to compute the count
URL format¶
Parameters passed by URL share a common format:
param_name[resource_alias][field_name]=value,value1
- param_name: the param you are using (e.g. offset)
- resource_alias(optional): an alias to reference an specific resource in further params.
- field_name(optional): a field name used by the param name.
- value: a list of values divided by commas
Note that resource_alias
and field_name
arguments are optional
and depend on what you want to query. For example, if you need to limit
the number of records, you need to use the limit parameter. However, it
doesn’t make sense to specify an alias or a field in such a case. You
only need to provide the number of records you need to retrieve:
...&limit=5
There is one exception: Even when the sort
parameter shares the
above syntax, it also accepts an alternative format:
...&sort[field1]=desc
Multiple queries¶
Sometimes you want to do mutiple datastore queries in one network request (e.g., to feed a data dashboard). In that case you can post a JSON object to http://EXAMPLE.COM/api/action/datastore/search.json with all the queries to perform.
The request body should have a format similar to this:
Request body¶
{
"my_query": {
"resource_id": {
"states": "d2142282-9838-4cca-972f-f1741410417b",
"gold_prices":"d3c099c6-1340-4ee5-b030-8faf22b4b424"
},
"limit": 5
},
"my_query1": {
"resource_id": {
"gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
},
"limit": 5
}
}
Response¶
{
"my_query": {
"help": "Search a datastore table. :param resource_id: id or alias of the data that is going to be selected.",
"success": true,
"result": {
"fields": [
{
"id": "nombre",
"type": "text"
},
{
"id": "state_id",
"type": "int"
}
],
"resource_id": {
"states": "d2142282-9838-4cca-972f-f1741410417b",
"gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
},
"limit": 1,
"total": 5,
"records": [
{
"nombre": "Alabama",
"state_id": "1",
"feeds*flatstore_entry*id": "1",
"timestamp": "1466096874",
"feeds*entity*id": "13"
}
]
}
},
"my_query1": {
"help": "Search a datastore table. :param resource_id: id or alias of the data that is going to be selected.",
"success": true,
"result": {
"fields": [
{
"id": "date",
"type": "datetime"
},
{
"id": "price",
"type": "float"
},
{
"id": "state_id",
"type": "int"
}
],
"resource_id": {
"gold_prices": "d3c099c6-1340-4ee5-b030-8faf22b4b424"
},
"limit": 1,
"total": 748,
"records": [
{
"date": "1950-01-01",
"price": "34.73",
"state_id": "1",
"feeds*flatstore_entry*id": "1",
"timestamp": "1466036208",
"feeds*entity*id": "12"
}
]
}
}
}
Response formats¶
Requests can be sent over HTTP. Data can be returned as JSON, XML, or JSONP. To retrieve data in a different format, change the extension in the url.
Instead of using this:
http://EXAMPLE.COM/api/action/datastore/search.json
Use this:
http://EXAMPLE.COM/api/action/datastore/search.xml
Or this:
http://EXAMPLE.COM/api/action/datastore/search.jsonp
Limitations¶
- The
q
parameter doesn’t work in combination with thejoin
parameter. - Filters don’t work with float (decimals) values
Examples¶
The following is a simple example with two resources that contain four
records each. Note that the resource id
would be a UUID not
single digit number in real scenario.
Resource 1:
country | population | id | timestamp |
---|---|---|---|
US | 315,209,000 | 1 | 1359062329 |
CA | 35,002,447 | 2 | 1359062329 |
AR | 40,117,096 | 3 | 1359062329 |
JP | 127,520,000 | 4 | 1359062329 |
Resource 2:
country | squarekm | id | timestamp |
---|---|---|---|
US | 9,629,091 | 1 | 1359062713 |
CA | 9,984,670 | 2 | 1359062713 |
AR | 2,780,400 | 3 | 1359062713 |
JP | 377,930 | 4 | 1359062713 |
Simple query example¶
http://EXAMPLE.COM/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[country]=AR,US&fields[]=country&fields[]=population,timestamp&sort[country]=asc
Returns the country, population, and timestamp fields for US and AR from dataset 1 sorting by the country in ascending order.
Text Search¶
Requests with the ‘query’ argument will search the listed fields within the dataset:
http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&&fields[]=country&fields[]=population&query=US
This will return the country and population from US.
Joins¶
If you wish to query multiple tables, indicate the table as an array key in the following fields:
http://example.com/api/dataset/search?resource_id[pop]=d3c099c6-1340-4ee5-b030-8faf22b4b424&resource_id[size]=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[pop][country]=US,AR&join[pop]=country&join[size]=country
Returns the country
, population
, squarekm
and id
for “US” and “AR” from
datasets 11 and 13.
Caching¶
GET and POST request are cached by Drupal. The params passed through the request are used to create a cache id to store the data to be retrieved on further requests.
Since Datastore API uses the Drupal cache system under the hood, the Datastore API cache will be cleared at the same time as the rest of the Drupal cache. This could be when the cache is wiped manually, or when the cache lifetime ends.
These options can be configured at
admin/config/development/performance
Field Names¶
In order to get data for specific fields, you need to add the argument ‘fields’ as an array to your request:
http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&fields[]=country
If you wish to get multiple fields, add the ‘fields[]’ parameter followed by the field name as many times as fields needed, for example:
http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&fields[]=field_name_1&fields[]=field_name_2
Important Note: if the resource you are querying has a file with column names which contains spaces or capital letters (e.g. ‘School Name’), you should NOT specify the field in the request in that way, instead, it should be referenced as lower case with underscores instead of spaces (e.g. school_name).
Read more about changes to the Datastore after upgrading to DKAN 7.x-1.16 here
Filters¶
If you wish to filter data based on the value of a field, you’ll need to specify the ‘filters’ parameter in the request, it should be formatted like:
filters[field_name_1]=value,value2
When you need to specify filters for multiple fields, then you’ll just join them with &, like this:
filters[field_name_1]=value&filters[field_name_2]=value2
So, for example you could execute the query:
http://example.com/api/dataset/search?resource_id=d3c099c6-1340-4ee5-b030-8faf22b4b424&filters[country]=AR,US
That query will return the records in which the ‘country’ is set to US or AR.