We are excited to introduce the powerful capabilities of the union operator in Axiom Processing Language (APL). This operator is a game-changer in data processing as it adeptly combines events from two or more datasets, returning rows from each. The union operator processes rows sourced from multiple datasets into a cohesive set of results. The union operator not only enhances data analysis but also simplifies the management of complex datasets, making it an invaluable tool for querying.

Importance of the union operator

The union operator is important in querying because it allows the combination of data from multiple sources. This is useful when datasets contain related or complementary data. It simplifies queries and events that require a comprehensive view of data points collected from these different sources.

Scenario: union of two datasets

To understand how the union operator works, consider these datasets:

Server requests

_timestatusmethodtrace_id
12:10200GET1
12:15200POST2
12:20503POST3
12:25200POST4

App logs

_timetrace_idmessage
12:121foo
12:213bar
13:3527baz

Performing a union on Server requests and Application logs would result in a new dataset with all the rows from both DatasetA and DatasetB.

A union of requests and logs would produce the following result set:

_timestatusmethodtrace_idmessage
12:10200GET1
12:121foo
12:15200POST2
12:20503POST3
12:213bar
12:25200POST4
13:3527baz

This result combines the rows and merges types for overlapping fields.

Let’s explore some examples of the union operator in action to better understand its practical applications:

Filtering and projecting specific data from combined log sources

This query combines GitHub pull request event logs and GitHub push events, filters by actions made by github-actions[bot], and displays key event details such as time, repository, commits, head , id.

['github-pull-request-event']
| union ['github-push-event']
| where actor == "github-actions[bot]"
| project _time, repo, ['id'], commits, head

Run in Playground

Union with field removing

Removes the content_type and commits field in the datasets sample-http-logs and github-push-event before combining the datasets.

['sample-http-logs']
| union ['github-push-event']
| project-away content_type, commits

Run in Playground

Filtering after union

Performs a union and then filters the resulting set to only include rows where the method is GET.

['sample-http-logs']
| union ['github-issues-event']
| where method == "GET"

Run in Playground

Union with order by

After the union, the result is ordered by the type field.

['sample-http-logs']
| union hn
| order by type

Run in Playground

Union with joint conditions

Performs a union and then filters the resulting dataset for rows where content_type contains the letter a and city is seattle.

['sample-http-logs']
| union ['github-pull-request-event']
| where content_type contains "a" and ['geo.city']  == "Seattle"

Run in Playground

Union and counting unique values

After the union, the query calculates the number of unique geo.city and repo entries in the combined dataset.

['sample-http-logs']
| union ['github-push-event']
| summarize UniqueNames = dcount(['geo.city']), UniqueData = dcount(repo)

Run in Playground

Benefits of the union operator

  • Combine logs from different system components to pinpoint common issues or failures.
  • Consolidate security events from various sources to identify patterns and threats.
  • Unify events from product touch points for a richer understanding of user behavior.
  • Encourage logical groupings of related events in distinct datasets with convenient search.

Best practices of the union operator

To maximize the effectiveness of the union operator in APL, here are some best practices to consider:

  • Before using the union operator, ensure that the fields being merged have compatible data types.

  • Field and Column Projection: Use project or project-away to include or exclude specific fields. This can improve performance and the clarity of your results, especially when you only need a subset of the available data.

Ready to use the power of the union operator in your datasets? Start integrating these practices into your APL workflows today and transform your querying experience!