You can use templated filters to refer to dates by selecting the beginning and end dates in a date filter — {% date_start date_filter %} and {% date_end date_filter %}, respectively. This page will walk you through some use cases examples and the steps to accomplish them.
Syntax notes
The following syntax works works with most dialects, but certain dialects have specific use cases. Example:-
BigQuery allows greater control when working with table wildcard functions like
TABLE_DATE_RANGEandTABLE_QUERY, so using{% table_date_range prefix date_filter %}is insufficient for specifying date filters. -
Hadoop allows working with date-partitioned columns, no matter the type (
string,date) or format (YYYY-MM-DD) of the column.
Usage notes
-
When there is no value specified for
date_filter, both{% date_start date_filter %}and{% date_end date_filter %}will evaluate toNULL. -
In the case of an open-ended
date_filter(such asbefore 2016-01-01orafter 2016-01-01), then one of{% date_start date_filter %}or{% date_end date_filter %}filters will beNULL.
To make sure neither of these two cases result in invalid SQL, you can use IFNULL or COALESCE in the LookML.
Use case examples
Monthly partitioned columns (in BigQuery)
In some BigQuery datasets, tables are organized by month, and the table ID has the year and month combination as a suffix. An example of this is in the following dataset, which has many tables with names like pagecounts_201601, pagecounts_201602, pagecounts_201603.
Example 1: LookML that depends on always_filter
The following derived table uses TABLE_QUERY([dataset], [expr]) to get the right set of tables to query:
view: pagecounts {
derived_table: {
sql: SELECT * FROM
TABLE_QUERY([fh-bigquery:wikipedia],
"length(table_id) = 17 AND
table_id >= CONCAT( 'pagecounts_' , STRFTIME_UTC_USEC({% date_start date_filter %},'%Y%m') ) AND
table_id <= CONCAT('pagecounts_' , STRFTIME_UTC_USEC({% date_end date_filter %},'%Y%m') )";
)
;;
}
filter: date_filter {
type: date
}
}
Some notes on the code in the expression:
-
table_idrefers to the name of the table in the dataset. -
length(table_id) = 17makes sure it ignores the other tables with names likepagecounts_201407_en_top64k. -
STRFTIME_UTC_USEC({% date_start date_filter %},'%Y%m')will output just theYYYYmmpart of the beginning date.
NULL will be substituted for the date_filter parts. Getting around this requires an always_filter on the Explore:
explore: pagecounts {
always_filter: {
filters: [date_filter: "2 months ago"]
}
}
Note that this will still fail for filters for dates from before the earliest date in the dataset because {% date_start date_filter %} will evaluate to NULL.
Example 2: LookML that does not depend on always_filter
It is also possible to use COALESCE or IFNULL to encode a default set of tables to query. In the following example, the past two months are used:
-
The lower bound:
COALESCE({% date_start date_filter %},DATE_ADD(CURRENT_TIMESTAMP(),-2,'MONTH')) -
The upper bound:
COALESCE({% date_end date_filter %},CURRENT_TIMESTAMP())
view: pagecounts {
derived_table: {
sql: SELECT * FROM
TABLE_QUERY([fh-bigquery:wikipedia],
"length(table_id) = 17 AND
table_id >= CONCAT( 'pagecounts_'; , STRFTIME_UTC_USEC(COALESCE({% date_start date_filter %},DATE_ADD(CURRENT_TIMESTAMP(),-2,'MONTH')),'%Y%m') ) AND
table_id <= CONCAT( 'pagecounts_' , STRFTIME_UTC_USEC(COALESCE({% date_end date_filter %},CURRENT_TIMESTAMP()),'%Y%m') )"
)
;;
}
filter: date_filter {
type: date
}
}
Log files are in UTC when querying in American time zones (in BigQuery)
Sometimes your Looker log files are stored in UTC, even though you are querying in Eastern or Pacific time zones. This issue can cause a problem where the log files have already rolled to tomorrow's
date in the local timezone of the query, resulting in some missed data.
The solution is to add an extra day to the end date of the date filter, to make sure that if it is past midnight UTC, those log entries are picked up.
The following examples use the public [githubarchive:day] dataset, which has a daily partition of GitHub information.
Example 1: LookML that depends on always_filter
view: githubarchive {
derived_table: {
sql: SELECT * FROM
TABLE_DATE_RANGE([githubarchive:day.],
{% date_start date_filter %},
DATE_ADD({% date_end date_filter %},1,"DAY")
)
;;
}
filter: date_filter {
type: date
}
}
Because this SQL will fail if NULL is substituted for the dates, it is necessary to add an always_filter to the Explore:
explore: githubarchive {
always_filter: {
filters: [date_filter: "2 days ago"]
}
}
Example 2: LookML that does not depend on always_filter
In this example, the default date range is encoded in the LookML. Because COALESCE was returning an unknown type, I ultimately had to use IFNULL to make the SQL work.
-
The lower bound:
IFNULL({% date_start date_filter %},CURRENT_DATE()) -
The upper bound:
IFNULL({% date_end date_filter %},CURRENT_DATE())+ 1 day
view: githubarchive {
derived_table: {
sql: SELECT * FROM
TABLE_DATE_RANGE([githubarchive:day.],
IFNULL({% date_start date_filter %},CURRENT_TIMESTAMP()),
DATE_ADD(IFNULL({% date_end date_filter %},CURRENT_TIMESTAMP()),1,"DAY")
)
;;
}
filter: date_filter {
type: date
}
}
Trailing N-day window functions (in BigQuery)
When you perform certain analyses, you might want to aggregate data over a historical timeframe. To perform this operation in SQL, one will typically implement a window function that reaches back n number of rows for a table unique by date. However, there is a catch-22 when using a date-partitioned table — one must first dictate the set of tables that the query will run against, even as the query really needs extra historical tables for computation.
The solution: Allow the start date to be earlier than the dates provided in the date filter. Here is an example reaching back an additional week:
view: githubarchive {
derived_table: {
sql: SELECT y._date,
y.foo,
y.bar
FROM (
SELECT _date,
SUM(foo) OVER (ORDER BY _date RANGE BETWEEN x PRECEDING AND CURRENT ROW),
COUNT(DISTINCT(bar)) OVER (ORDER BY _date RANGE BETWEEN x PRECEDING AND CURRENT ROW)
FROM (
SELECT _date,
foo,
bar
FROM TABLE_DATE_RANGE([something:something_else.], DATE_ADD(IFNULL({% date_start date_filter %},CURRENT_TIMESTAMP()), -7, "DAY"), IFNULL({% date_end date_filter %},CURRENT_TIMESTAMP()))
) x
) y
WHERE {% condition date_filter %} y._date {% endcondition %};;
}
filter: date_filter {
type: date
}
}
The extra SELECT statement is needed because it is supplying a WHERE constraint to trim the resultset back down to the date range the user originally specified in the query.
Table partitioned by date via string with format 'YYYY-MM-DD' (in Presto)
It is a common pattern in Hadoop tables to use partitioned columns to speed up search times for columns that are commonly searched on, especially dates. The format of the date columns can be arbitrary, though YYYY-MM-DD and YYYYMMDD are most common. The type of the date column can be string, date, or number.
In this example, a Hive table table_part_by_yyyy_mm_dd has a partitioned column dt, a string formatted YYYY-MM-DD, that is being searched by Presto.
When the generator is first run, the LookML looks like this:
view: table_part_by_yyyy_mm_dd {
sql_table_name: hive.taxi. table_part_by_yyyy_mm_dd ;;
suggestions: no
dimension: dt {
type: string
sql: ${TABLE}.dt ;;
}
}
Some notes on the code in the expressions in both of the following examples:
-
The output of
date_startanddate_endistype: timestamp. -
date_format( <expr>, '%Y-%m-%d')is used to convert the timestamp to a string and to the right format. -
The
coalesceis to handle the case of NULLs if someone types in a filter likebefore 2010-01-01orafter 2012-12-31. -
This is Presto dialect code, so Hive will have some differences in the format string (
yyyy-MM-dd) anddate_formatcan't take a NULL value, so thecoalescewould have to move in there with some sort of default value.
Example 1: LookML that uses a common table expression to filter the table
This example uses a derived table to filter the table.
view: table_part_by_yyyy_mm_dd {
# sql_table_name: hive.taxi. table_part_by_yyyy_mm_dd
suggestions: no
derived_table: {
sql: SELECT * FROM hive.taxi. table_part_by_yyyy_mm_dd
WHERE ( coalesce( dt >= date_format({% date_start date_filter %}, '%Y-%m-%d'), TRUE) )
AND ( coalesce( dt <= date_format({% date_end date_filter %}, '%Y-%m-%d'), TRUE) )
;;
}
filter: date_filter {
type: date
}
dimension: dt {
type: string
sql: ${TABLE}.dt ;;
}
}
Usually, partitioned tables take too long for full table scans (and consume too many cluster resources), so it is a good idea to put a default filter on the Explore for this view as well:
explore: table_part_by_yyyy_mm_dd {
always_filter: {
filters: [date_filter: "2013-01"]
}
}
Example 2: LookML that filters directly in the predicate
This example does the predicate filtering directly on the table, without a subquery or common table expression.
view: table_part_by_yyyy_mm_dd {
sql_table_name: hive.taxi.table_part_by_yyyy_mm_dd ;;
filter: date_filter {
type: date
sql: ( coalesce( ${dt} >= date_format({% date_start date_filter %}, '%Y-%m-%d'), TRUE) )
AND ( coalesce( ${dt} <= date_format({% date_end date_filter %}, '%Y-%m-%'), TRUE) );;
}
dimension: dt {
type: string
sql: ${TABLE}.dt ;;
}
}
We can validate that the table partitions are actually being used by checking the output of EXPLAIN in SQL Runner for a query generated by this LookML (you can access to it by clicking to the SQL section in the Data tab of the Explore page), you will see something like this:
output[table_part_by_yyyy_mm_dd.count] => [count:bigint]
table_part_by_yyyy_mm_dd.count := count
TopN[500 by (count DESC_NULLS_LAST)] => [count:bigint]
Aggregate(FINAL) => [count:bigint]
count := "count"("count_4")
RemoteExchange[GATHER] => count_4:bigint
Aggregate(PARTIAL) => [count_4:bigint]
count_4 := "count"(*)
Filter[(COALESCE(("dt" >= CAST('2013-04-01' AS VARCHAR)), true) AND COALESCE(("dt" <= CAST('2016-08-01' AS VARCHAR)), true))] => [dt:varchar]
TableScan[hive:hive:taxi: table_part_by_yyyy_mm_dd, originalConstraint = (COALESCE(("dt" >= CAST('2013-04-01' AS VARCHAR)), true) AND COALESCE(("dt" <= CAST('2016-08-01' AS VARCHAR)), true))] => [dt:varchar]
LAYOUT: hive dt := HiveColumnHandle{clientId=hive, name=dt, hiveType=string, hiveColumnIndex=-1, partitionKey=true}
:: [[2013-04-01, 2013-12-31]]
The partitionKey=true along with the range of partition keys listed indicate that it is only scanning those partitioned columns.