Advanced joining techniques: side joints, half-joints, anti-slip joints

Share

# Entry

INNER JOIN and LEFT JOIN support most SQL queries. A smaller class of problems requires different ones connection types: counting the results of a function returning a row-by-row set, filtering rows by existing in another table, and returning rows that do not match another table.

Three less popular connections handle them cleanly. LATERAL joins allow a subquery in a FROM clause to refer to columns from an earlier part of the same FROM clause. Partially combines returned rows where there is a match in another table, without duplicating those rows. The Anti Joins function returns rows that have no match.

Let’s look at how to put these patterns into practice.

Advanced joining techniques

# SIDE CONNECTIONS

A LATERAL subquery in a FROM clause can refer to columns from preceding tables in the same FROM clause. Without LATERAL, the subquery in FROM is evaluated independently and cannot see these columns.

This is most crucial when calling a function that returns a set (one that returns multiple rows as input). Set-returning functions can be called in a SELECT list, but to apply them row-by-row to a column from an external table inside a FROM clause, a LATERAL function is required.

Typical cases:

  • Vocation unnest() in an array column to get one row per array element
  • Vocation p_matches() With 'g' flag to extract every match in the row
  • Calculating top N score per group with correlated subquery in FROM
  • Splitting JSON arrays per line

// Example: Counting word occurrences

This is a Google question asks us to count how many times the words “bull” and “bear” appear in wa contents column. Matches cannot be case-sensitive, and substrings such as bullish or placental should be excluded.

Data: : google_file_store the table is:

filename contents
draft1.txt The stock exchange is forecasting a boom, which will make many investors elated.
draft2.txt The stock exchange predicts a boom, but analysts warn that we are waiting for a bear market.
final.txt The stock market predicts a boom… a bear market. As always, predicting the future of the market is uncertain…

Code: : p_matches() returns one row per match. To run it once in a row google_file_store and we will count all matches in the entire table, we will put it in the FROM clause of LATERAL. The m AND M there are anchors PostgreSQL word boundaries, which excludes the words “bullish” and “bullish”.

SELECT 'bull' AS word,
       COUNT(*) AS nentry
FROM google_file_store,
     LATERAL p_matches(LOWER(contents), 'm(bull)M', 'g')
UNION ALL
SELECT 'bear' AS word,
       COUNT(*) AS nentry
FROM google_file_store,
     LATERAL p_matches(LOWER(contents), 'm(bear)M', 'g');

// Exit

word entry
bull 3
bear 2

# Partially connects

A semi-join returns rows from the left table if there is at least one match in the right table, with each row from the left table appearing at most once. INNER JOIN duplicates the rows of the left table when there are multiple matches on the right. Semi-joining no.

Two SQL implementations:

  • WHERE EXISTS (SELECT 1 FROM ...)
  • WHERE col IN (SELECT col FROM ...)

EXISTS is a more general form because it supports multi-column join conditions and correlated subqueries without rewriting the query.

// Example: finding high-value customers

This is a question asks us to find customers who have placed at least one order over $100 and send us their customer ID and name.

Data: Previews online_store_customers AND online_store_orders: :

customer_id customer_name
1 Alice Johnson
2 Bob Smith
3 Charles Williams
10 Jack Anderson
order_id customer_id sum status
101 1 150 Paid Parking
102 1 200 Paid Parking
103 1 75 Paid Parking
115 9 450 Paid Parking

Code: The EXISTS subquery checks for each customer if there is any order over $100. SELECT 1 is a convention because EXISTS only cares about whether a row returns, not what is in it.

SELECT
    c.customer_id,
    c.customer_name
FROM online_store_customers c
WHERE EXISTS (
    SELECT 1
    FROM online_store_orders o
    WHERE o.customer_id = c.customer_id
      AND o.amount > 100
);

If we used INNER JOIN instead, customer 1 would appear twice in the result because the two orders matched. EXISTS returns client 1 once.

// Exit

customer_id customer_name
1 Alice Johnson
2 Bob Smith
3 Charles Williams
9 Iva Taylor

# Anti Joins

The anti-merge function returns rows from the left table that have no match in the right table. This is the reverse of a half-join.

Two SQL implementations:

  • LEFT JOIN ... WHERE right_table.col IS NULL
  • WHERE NOT EXISTS (SELECT 1 FROM ...)

Both give the same result. NOT EXISTS often creates a better query plan in contemporary versions of PostgreSQL and reads more directly. The LEFT JOIN + IS NULL pattern is older and useful when you also need right-hand columns for non-matching rows.

// Example: Free users without April connections

This is a question asks us to refund free users who did not make any calls in April 2020.

Data: Previews rc_calls AND rc_users: :

user_id connection_id connection_date
1218 0 2020-04-19 01:06:00
1554 1 2020-03-01 16:51:00
1857 2 2020-03-29 07:06:00
1525 3 2020-03-07 02:01:00
1910 39 2020-03-11 08:33:00
user_id status company_id
1218 free 1
1554 inactive 1
1857 free 2
1884 free 1

Code: The date filter is in the ON clause, not the WHERE clause. This distinction makes it an anti-combination. Placing a date filter in WHERE would remove the rows where the LEFT JOIN produced NULL values, collapsing them back to INNER JOIN. When the filter is enabled, free users without a qualifying April connection still create a row with NULL values ​​on the right, and the IS NULL check only retains those rows.

SELECT DISTINCT u.user_id
FROM rc_users u
LEFT JOIN rc_calls c
       ON u.user_id = c.user_id
      AND c.call_date BETWEEN '2020-04-01' AND '2020-04-30'
WHERE u.status="free"
  AND c.user_id IS NULL;

// Exit

# Application

Advanced joining techniques

These three joins solve cases where INNER JOIN and LEFT JOIN are awkward or wrong:

  • LATERAL is a way of calling functions that return a line-by-line set inside FROM.
  • EXISTS gives “match rows” without the duplication that INNER JOIN causes.
  • NOT EXISTS or LEFT JOIN + IS NULL gives neat “rows with no match”.

The formula to remember is tiny. When INNER JOIN duplicates unnecessary rows, exploit EXISTS. If you need rows that don’t match, exploit NOT EXISTS or LEFT JOIN + IS NULL. When a subquery in FROM needs to reference columns from an external table, add LATERAL.

Practice them live SQL interview questionsand the syntax becomes automatic.

Nate Rosidi is a data scientist and product strategist. He is also an adjunct professor of analytics and the founder of StrataScratch, a platform that helps data scientists prepare for job interviews using real interview questions from top companies. Nate writes about the latest career trends, gives interview advice, shares data science projects, and discusses everything related to SQL.

Latest Posts

More News