SQRL Specification

This is the specification for SQRL, a declarative SQL query language developed at DataSQRL for describing data pipelines. SQRL stands for "Structured Query and Reaction Language" because it extends SQL with support for streaming data and the ability to react to data in realtime. In addition, SQRL adds a number of convenience features that make it development-friendly.

Table Type System

In SQRL, every table is assigned a specific type that influences how queries interact with the data, the semantic validity of those queries, and how data is processed by different engines.

SQRL recognizes several distinct table types, each with unique characteristics and use cases:

STREAM: Comprises a stream of immutable records, each identified by a synthetic primary key and timestamp. These tables are ideal for representing events or actions over time.
VERSIONED_STATE: Contains records with a natural primary key and a timestamp, tracking changes over time to each record, thereby creating a change-stream.
STATE: Similar to VERSIONED_STATE but without tracking the history of changes. Each record is uniquely identified by its natural primary key.
LOOKUP: Supports lookup operations using a primary key but does not allow further processing of the data.
RELATION: Represents relational data that lacks a primary key, timestamp, or explicit streaming semantics. It is used primarily for static relational data integration.
STATIC: Consists of data that does not change over time, such as constants, table functions, or nested data structures. This type is treated as universally valid across all time points.

These table types will be used throughout this specification to further describe the semantics of sql queries.

Functions

Functions in SQRL are designed to be engine-agnostic, ensuring that their implementation is consistent across different platforms and execution environments. This uniformity is crucial for maintaining the semantic integrity of functions when executed under various systems.

Characteristics of Functions

Engine Agnosticism: Functions are defined in a way that does not depend on the specifics of the underlying engine.
Semantic Consistency: Regardless of the engine used, function should preserve their semantic meaning.
Mixed Engine Support: While functions are designed to be widely supported, some may have mixed support depending on the engine's capabilities.
Nullability Awareness: Functions in SQRL retain nullability information. This feature is vital for correct schema generation downstream, ensuring that data integrity is maintained through the potential propagation of null values.
Time-Preserving Capabilities: A significant feature of SQRL functions is their ability to handle time-based data efficiently. Time-preserving functions can manipulate and interpret timestamps in a way that aligns with the temporal dynamics of data streams.

For example, a time-preserving function called 'endOfWeek' could be implemented to aggregate timestamps into time windows. Time windows are a means to divide time into discrete buckets and aggregate all stream records within each bucket to produce a new stream table that contains one row for each aggregate.

Users.spending := SELECT endOfWeek(p.time) AS week,
         sum(t.price) AS spend, sum(t.saving) AS saved
      FROM @.purchases p JOIN p.totals t
      GROUP BY week ORDER BY week DESC;

Import

IMPORT qualifiedName (AS? alias=identifier)?

An import in SQRL describes a table, function, or other sqrl script to be added to the schema. Import paths use the dot character . to separate path components.

IMPORT datasqrl.seedshop.Orders;

Imports are intended to act much like language dependencies. The SQRL specification does not describe how imports are resolved and is up to the implementation.

Imports can end with a * to import all items on that level of the qualified path.

IMPORT mypackage.*;

Imports can be aliased using the AS keyword. Imports that end with a * cannot be aliased.

IMPORT datasqrl.seedshop.Orders AS MyOrders;

Export

EXPORT table=tablePath TO sink=qualifiedName;

The EXPORT statement is an explicit sink to a data system, like a kafka topic or database table. Import paths and export sink paths should be resolved the same way.

EXPORT UserPromotion TO mysink.promotion;

Export statements are most commonly used to export data to an external system, but it could refer to other components such as console log print statements or to a log engine. It is up to the underlying implementation to determine what modules are available by default for import and export paths.

EXPORT UserPromotion TO print.promotion;
EXPORT UserPromotion TO log.promotion;

note

Exports do not describe the connector mapping when an optimizer splits the workload between multiple engines.

Create Table

CREATE TABLE MyTable(
  myCol bigint,
  myCol2 bigint
);

SQRL allows a 'create table' statement. Create table statements describe a table that the SQRL implementation provides storage for. The implementation can optionally bring additional fields, such as a primary key and timestamp.

Assignment operator

The assignment operator := is used to specify the structure and contents of the right-hand side to the left-hand side. This operation is akin to the 'CREATE VIEW' statement in conventional SQL.

Distinct

Products := DISTINCT Products ON id ORDER BY updated DESC;

Distinct statements in SQRL are designed to select the most recent version of each row based on a specified key, effectively implementing deduplication in streaming data or ensuring data uniqueness in database systems.

Queries

MyTable := SELECT * FROM Table;

Naming and Selecting Columns

In SQRL, all columns must be explicitly named.

// Invalid
MyTable := SELECT COUNT() FROM Table;

// Valid
MyTable := SELECT COUNT() AS cnt FROM Table;

Table Shadowing

Tables can be shadowed, meaning a new table can be created with the same name as an existing one.

MyTable := SELECT * FROM MyTable;

Hidden Tables and Columns

When the name of a table or column starts with the underscore character _, it is considered hidden. Hidden tables and columns are not exposed in the API or imported by other scripts.

_MyHiddenTable := SELECT * FROM MyTable WHERE ...;

Example of Hidden Column:

MyTable := SELECT id, _hiddenColumn FROM Table;

In this example, _hiddenColumn will not be exposed in the API.

Usage of Hidden Tables

Hidden tables are useful for intermediate calculations or data transformations that should not be accessible externally.

_tempData := SELECT * FROM MyTable WHERE condition;

In this example, _tempData is used for an internal operation and is not exposed.

Nested Query

Nested tables represent parent-child relationships and simplify aggregations by parent rows.

MyTable.query := SELECT * FROM x;

We can query a nested table globally, i.e., over all rows in the table, or locally, i.e., only the rows associated with a given parent row.

Global Aggregation

Order_totals := SELECT sum(total) as price, 
    sum(coalesce(discount, 0.0)) as saving FROM Orders.items;

In this example, the Order_totals table contains a single aggregate that sums up the total and discount over all items in all orders. The result is one global aggregation over all order items.

Local Aggregation

Orders.totals := SELECT sum(total) as price, 
    sum(coalesce(discount, 0.0)) as saving FROM @.items;

This statement aggregates all items for each order. The result is one local aggregate for each row in the Orders table.

Difference Between Global and Local Aggregation

The difference between the two statements lies in the FROM clause. The first statement references the Orders.items table globally, meaning it considers all rows in the Orders.items table without any specific parent context.

The second statement references the Orders.items table locally by accessing the items relationship column on Orders. This makes it a localized query, defining a new nested table totals under the Orders table. The query on the right-hand side of the statement is interpreted in the context of each row in the parent table. The at-sign @ is used to refer to the parent row in a localized query. Hence, @.items means "all items that are associated with the current order record through the items relationship".

Usage of Nested Table Definitions

Nested table definitions are a convenient way to express GROUP BY and WINDOW queries by grouping on the rows in the parent table. This allows for more intuitive and organized data aggregation, making it easier to manage complex data relationships and calculations.

Join types

SQRL provides additional join types outside of standard SQL:

Default join
Temporal join
Interval join

Default Join

A join without a qualifier is a default join. It is up to the implementation to decide the best join type given the conditions of the join.

DefaultJoinExample := SELECT * FROM TableA JOIN TableB ON TableA.id = TableB.id;

Inner Join

Inner joins are explicit inner joins. In stream processing contexts, this can mean maintaining the state on both sides of the join to allow proper semantics, which can be expensive.

InnerJoinExample := SELECT * FROM TableA INNER JOIN TableB ON TableA.id = TableB.id;

Left Join and Left Outer Join

In SQRL, a left join and a left outer join are distinct. Left joins can let the implementation decide the best join type. Left outer joins are explicit left joins.

LeftJoinExample := SELECT * FROM TableA LEFT JOIN TableB ON TableA.id = TableB.id;
LeftOuterJoinExample := SELECT * FROM TableA LEFT OUTER JOIN TableB ON TableA.id = TableB.id;

Temporal Join

SQRL supports temporal joins between stream and state tables when joining on the state table's key. Temporal joins use the row from the state table at the timestamp of the stream row.

TemporalJoinExample := 
  SELECT l.login_time, t.transaction_time, t.amount
  FROM Logins l 
  TEMPORAL JOIN Transactions t 
  ON l.account_id = t.account_id;

Interval Join

Interval joins are defined by specifying upper and lower time bounds. The INTERVAL JOIN condition specifies that a transaction must occur within a specified time frame after a login to be included in the join.

CustomerActivity := 
  SELECT l.login_time, t.transaction_time, t.amount
  FROM Logins l 
  INTERVAL JOIN Transactions t 
  ON l.account_id = t.account_id 
  AND t.transaction_time BETWEEN l.login_time AND l.login_time + INTERVAL '1' HOUR;

This correlates logins with transactions that happen shortly after, capturing a critical timeframe for activity analysis.

Example: Temporal Joins in E-commerce

Temporal joins are essential in SQRL as they define different semantics, joining stream tables with state tables. For example, in an e-commerce scenario, if the price changes on a product, you do not want to retroactively update already placed orders.

IMPORT ecommerce.Orders;  // is a stream
IMPORT ecommerce.Products;  // is a stream

VersionedProducts := DISTINCT Product ON productid;  // converts to a versioned state table

OrdersWithPrice := 
  SELECT * 
  FROM Orders
  JOIN VersionedProducts;  // join on stream and state becomes a temporal join

In this example, VersionedProducts becomes a versioned state table, and the join with Orders ensures that each order reflects the product price at the time of the order, not the current product price.

Expressions

Expressions in SQRL allow you to define new columns based on calculations or transformations of existing columns.

Products.weight_in_oz := weight_in_gram / 28.35;

Defining New Columns

This statement adds a new column weight_in_oz to the existing Products table, which converts the product weight to ounces. The column name is the last item in the table path.

Expression Constraints

Expressions cannot be shadowed. Once an expression is defined, it cannot be overridden or redefined in the same context.

Once a table is queried, new columns cannot be added. Tables become immutable once referenced in a query.

Nested Expressions and Window Functions

Nested expressions can evaluate window functions, allowing for calculations over a set of rows related to the current row.

MyRow.num := RANK();

In this example, MyRow.num is defined using the RANK() window function, which assigns a rank to each row within the partition of the dataset.

Relationships

Relationships in SQRL make data relationships explicit, simplify joins, and allow API consumers to navigate through the data efficiently.

Defining Relationships

A relationship is defined using a JOIN expression, interpreted for each row of the table on which the relationship is defined. The at-sign @ refers to each row.

Users.purchases := JOIN Orders ON Orders.customerid = @.id;

This statement defines a purchases column in the Users table, relating each user to their corresponding orders where customerid matches the user's id.

Benefits

Simplifies Joins: Avoids repetitive join statements.
Explicit Relationships: Makes data relationships clear and easy to follow

Using Relationships in Queries

Users.spending := 
  SELECT endOfWeek(p.time) AS week,
         sum(t.price) AS spend, 
         sum(t.saving) AS saved
  FROM @.purchases p 
  JOIN p.totals t
  GROUP BY week 
  ORDER BY week DESC;

This example defines a spending nested table under Users, aggregating order totals for all purchases of each user. The FROM @.purchases expands to FROM @ JOIN Orders p ON p.customerid = @.id.

Parameters

Join declarations and tables in SQRL support parameters, allowing for dynamic queries.

Parameters use the SQL variable syntax, the @ followed by the variable name. This is not to be confused with localized queries

MyTable.byId(@val: BIGINT) := JOIN Table t ON t.id = @.id AND @val > 50;
MyTableById(@id: STRING) := SELECT * FROM Table t WHERE t.id = @id;

Arguments

Arguments may be provided in any syntactic order and maintain identical semantic meaning.

Parameterized queries are useful when describing different views of a table.

ProductById(@ProductID: String) := 
  SELECT * FROM UniqueInventory 
  WHERE ProductID = @ProductID;

Table Paths

In SQRL, table paths are traversed like a graph, not as subschemas.

Users.spending := SELECT endOfWeek(p.time) AS week,
         sum(t.price) AS spend, sum(t.saving) AS saved
      FROM @.purchases p JOIN p.totals t
      GROUP BY week ORDER BY week DESC;

This statement defines a nested table spending underneath Users which aggregates over the nested order totals for all purchases of each user. Relationships used in FROM and JOIN are expanded to their original definition. That means, FROM @.purchases gets expanded to FROM @ JOIN Orders p ON p.customerid = @.id.

Comments

SQRL supports the use of comments within the code to provide hints, enhance readability, provide documentation, and explain the logic of complex queries or operations.

SQRL supports two types of comments:

Single-line Comments: These are initiated with double forward slashes (//). Everything following the // on the same line is considered part of the comment.

// This is a single-line comment explaining the next SQL command
IMPORT data.SalesRecords;

Multi-line Comments: These are enclosed between / and /. Everything within these markers is treated as a comment, regardless of the number of lines. T

/*
 * This is a multi-line comment.
 * It can span multiple lines and is often used to comment out
 * chunks of code or to provide detailed documentation.
 */
IMPORT data.StockAdjustments;

Hints

Hints are included within multi-line comments and are prefixed with a plus sign (+) followed by the hint type and optional parameters. These hints do not alter the SQL syntax but suggest how the underlying engine should treat the subsequent SQL commands. Hints are placed above assignment statements.

// The below hint suggests that the following query should be executed on the stream engine.
/* +exec(streams) */
MyTable := SELECT * FROM InventoryStream;

SQRL Specification

Table Type System​

Functions​

Import​

Export​

Create Table​

Assignment operator​

Distinct​

Queries​

Nested Query​

Join types​

Expressions​

Relationships​

Parameters​

Table Paths​

Comments​

Hints​