SQL functions

Function NameDescription
parse_sql()Interprets and analyzes SQL queries, making it easier to extract and understand SQL statements within datasets.
format_sql()Converts the data model produced by parse_sql() back into a SQL statement for validation or formatting purposes.

parse_sql()

Analyzes an SQL statement and constructs a data model, enabling insights into the SQL content within a dataset.

Limitations

  • It is mainly used for simple SQL queries. SQL statements like stored procedures, Windows functions, common table expressions (CTEs), recursive queries, advanced statistical functions, and special joins are not supported.

Arguments

NameTypeRequired or OptionalDescription
sql_statementstringRequiredThe SQL statement to analyze.

Returns

A dictionary representing the structured data model of the provided SQL statement. This model includes maps or slices that detail the various components of the SQL statement, such as tables, fields, conditions, etc.

Examples

Basic data retrieval

The SQL statement SELECT * FROM db retrieves all columns and rows from the table named db.

hn
| project parse_sql("select * from db")

Run in Playground

WHERE Clause

This example parses a SELECT statement with a WHERE clause, filtering customers by subscription_status.

hn
| project parse_sql("SELECT id, email FROM customers WHERE subscription_status = 'active'")

Run in Playground

JOIN operation

This example shows parsing an SQL statement that performs a JOIN operation between orders and customers tables to match orders with customer names.

hn
| project parse_sql("SELECT orders.id, customers.name FROM orders JOIN customers ON orders.customer_id = customers.id")

Run in Playground

GROUP BY Clause

In this example, the parse_sql() function is used to parse an SQL statement that aggregates order counts by product_id using the GROUP BY clause.

hn
| project parse_sql("SELECT product_id, COUNT(*) as order_count FROM orders GROUP BY product_id")

Run in Playground

Nested Queries

This example demonstrates parsing a nested SQL query, where the inner query selects user_id from orders based on purchase_date, and the outer query selects names from users based on those IDs.

hn
| project parse_sql("SELECT name FROM users WHERE id IN (SELECT user_id FROM orders WHERE purchase_date > '2022-01-01')")

Run in Playground

ORDER BY Clause

Here, the example shows how to parse an SQL statement that orders users by registration_date in descending order.

hn
| project parse_sql("SELECT name, registration_date FROM users ORDER BY registration_date DESC")

Run in Playground

Sorting users by registration data

This example demonstrates parsing an SQL statement that retrieves the name and registration_date of users from the users table, and orders the results by registration_date in descending order, showing how to sort data based on a specific column.

hn | extend parse_sql("SELECT name, registration_date FROM users ORDER BY registration_date DESC")

Run in Playground

Querying with index hints to use a specific index

This query hints at MySQL to use a specific index named index_name when executing the SELECT statement on the users table.

hn 
| project parse_sql("SELECT * FROM users USE INDEX (index_name) WHERE user_id = 101")

Run in Playground

Inserting data with ON DUPLICATE KEY UPDATE

This example showcases MySQL’s ability to handle duplicate key entries elegantly by updating the existing record if the insert operation encounters a duplicate key.

hn 
| project parse_sql("INSERT INTO settings (user_id, setting, value) VALUES (1, 'theme', 'dark') ON DUPLICATE KEY UPDATE value='dark'")

Run in Playground

Using JSON functions

This query demonstrates MySQL’s support for JSON data types and functions, extracting the age from a JSON object stored in the user_info column.

hn 
| project parse_sql("SELECT JSON_EXTRACT(user_info, '$.age') AS age FROM users WHERE user_id = 101")

Run in Playground

format_sql()

Transforms the data model output by parse_sql() back into a SQL statement. Useful for testing and ensuring that the parsing accurately retains the original structure and intent of the SQL statement.

Arguments

NameTypeRequired or OptionalDescription
parsed_sql_modeldictionaryRequiredThe structured data model output by parse_sql().

Returns

A string that represents the SQL statement reconstructed from the provided data model.

Examples

Reformatting a basic SELECT Query

After parsing a SQL statement, you can reformat it back to its original or a standard SQL format.

hn
| extend parsed = parse_sql("SELECT * FROM db")
| project formatted_sql = format_sql(parsed)

Run in Playground

Formatting SQL Queries

This example first parses a SQL statement to analyze its structure and then formats the parsed structure back into a SQL string using format_sql.

hn 
| extend parsed = parse_sql("SELECT name, registration_date FROM users ORDER BY registration_date DESC")
| project format_sql(parsed)

Run in Playground

Formatting a simple SELECT Statement

This example demonstrates parsing a straightforward SELECT statement that retrieves user IDs and usernames from an user_accounts table where the active status is 1. After parsing, it uses format_sql to convert the parsed data back into a SQL string.

hn 
| extend parsed = parse_sql("SELECT user_id, username FROM user_accounts WHERE active = 1")
| project formatted_sql = format_sql(parsed)

Run in Playground

Reformatting a complex query with JOINS

In this example, a more complex SQL statement involving an INNER JOIN between orders and customers tables is parsed. The query selects orders and customer names for orders placed after January 1, 2023. format_sql is then used to reformat the parsed structure into a SQL string.

hn 
| extend parsed = parse_sql("SELECT orders.order_id, customers.name FROM orders INNER JOIN customers ON orders.customer_id = customers.id WHERE orders.order_date > '2023-01-01'")
| project formatted_sql = format_sql(parsed)

Run in Playground

Using format_sql with aggregation functions

This example focuses on parsing an SQL statement that performs aggregation. It selects product IDs and counts of total sales from a sales table, grouping by product_id and having a condition on the count. After parsing, format_sql reformats the output into an SQL string.

hn 
| extend parsed = parse_sql("SELECT product_id, COUNT(*) as total_sales FROM sales GROUP BY product_id HAVING COUNT(*) > 100")
| project formatted_sql = format_sql(parsed)

Run in Playground