Postgres: Using Child Tables and Triggers

PostgreSQL provides for table inheritance which provides an extremely powerful tool to help break large tables into manageable subsets of data. Creating child tables using the inheritance functionality may be used to address performance by querying less data as each child table could split from the parent by a frequently used key thereby only involving a subset of data in a query. Child tables may also be physically located on different media, thus spreading the reads across several disks by locating each child table in a different partition and locating the partitions on different disks. But with all the power given to the developer or DBA comes some other complexities, one of which is cascading triggers.

To illustrate the issues involved with utilizing child tables, assume there is a parent table defined as:

create table parent
(
  some_data    text,
  date_of_data timestamp
);

The timestamp will be an important column to break up the data, for the assumed use case will be to query using the date to select some_data within a range of dates. It could also be expected that updates will occur on a date boundary such as a nightly insert process. This is where a problem may begin to impact the performance of the PostgreSQL database for with enough updates, PostgreSQL will kick off a vacuum process. Assuming that there is enough volume of data for this to be an issue, it is not unreasonable that a vacuum would execute for days at a time. This is just one example of a reason to break up the table, and a typical axis to break the data in a table such as parent would be upon the datestamp column date_of_data.

One simple approach could be to create a child table for each quarter (or month, etc.; however the by quarter approach will suffice to demonstrate the cascading triggers problem). Going with a table per quarter approach, four child tables must be created.

create table child_q1 () inherits (parent);
create table child_q2 () inherits (parent);
create table child_q3 () inherits (parent);
create table child_q4 () inherits (parent);

Each child table then corresponds to three months of data, but there is no way to enforce external inserts to abide by this concept of one table per quarter. To enforce that inserted data is placed into the expected table, a trigger may be created to insert any new rows into the desired child table.

First, define a trigger:

create or replace function parent_insert_trigger()
  returns trigger as
$BODY$
  declare quarter integer;
begin
  select extract(QUARTER from NEW.date_of_data::timestamp) into quarter;

  if (quarter = 1) then
     insert into child_q1 values (NEW.*);
  elseif (quarter = 2) then
     insert into child_q2 values (NEW.*);
  elseif (quarter = 3) then
     insert into child_q3 values (NEW.*);
  elseif (quarter = 4) then
     insert into child_q4 values (NEW.*);
  else
     raise exception 'Unexpected quarter of the year!';
  end if;
  return null;
end;
$BODY$
  language plpgsql;

After the trigger is defined, it must get connected to the parent table.

create trigger insert_parent_trigger
  before insert
  on parent
  for each row
  execute procedure parent_insert_trigger();

Once the trigger is hooked up with the parent table, an insert statement such as:

insert into parent (some_data, date_of_data) values ('This is some data', '2015-01-05 20:38:40');

Will add the new row into the child_q1 table and not the parent table as the insert into the parent table is redirected into child_q1 by the trigger. Note that queries on the parent table will appear to have the newly inserted record:

testdb=# select * from child_q1;
     some_data     |    date_of_data     
-------------------+---------------------
 This is some data | 2015-01-05 20:38:40
(1 row)
testdb=# select * from parent;
     some_data     |    date_of_data     
-------------------+---------------------
 This is some data | 2015-01-05 20:38:40
(1 row)

The row has not been duplicated but rather the use of inherits functionality allows queries on parent to reach into the child tables thus producing a result for a query on the parent. This allows for queries from external clients to be essentially blind to the fact that the data is actually stored in different physical tables.

With the function created and hooked up with the parent tables, inserts into parent get correctly split into child tables with no knowledge of the external client issuing the inserts that there are multiple physical tables behind parent actually containing the data. All appears to be well, until one day and enterprising developer attempts to "optimize"" the insert by writing directly to the child table - and nothing within the database schema prevents data with an incorrect data being placed into the wrong child table. Is it now possible to write directly into the child table, such as a statement like:

insert into child_q4 (some_data, date_of_data) values ('This is some data different data', '2015-01-05 20:38:40');

One the above insert statement is issued, the concept of a quarters' worth of data being stored in each child table is now broken, and benefits of the approach have been destroyed. It is necessary to guard against this unintentional destruction with an additional trigger on insert. Each of the child tables will require an insert trigger to enforce the distribution of data for each timespan.

create or replace function child_q1_insert_trigger()
  returns trigger as
$BODY$
  declare quarter integer;
begin
  select extract(QUARTER from NEW.date_of_data::timestamp) into quarter;

  if (quarter = 1) then
     insert into child_q1 values (NEW.*);
  else
     raise exception 'Expected Q1';
  end if;
  return null;
end;
$BODY$
  language plpgsql;

create or replace function child_q2_insert_trigger()
  returns trigger as
$BODY$
  declare quarter integer;
begin
  select extract(QUARTER from NEW.date_of_data::timestamp) into quarter;

  if (quarter = 2) then
     insert into child_q2 values (NEW.*);
  else
     raise exception 'Expected Q2';
  end if;
  return null;
end;
$BODY$
  language plpgsql;

create or replace function child_q3_insert_trigger()
  returns trigger as
$BODY$
  declare quarter integer;
begin
  select extract(QUARTER from NEW.date_of_data::timestamp) into quarter;

  if (quarter = 3) then
     insert into child_q3 values (NEW.*);
  else
     raise exception 'Expected Q3';
  end if;
  return null;
end;
$BODY$
  language plpgsql;

create or replace function child_q4_insert_trigger()
  returns trigger as
$BODY$
  declare quarter integer;
begin
  select extract(QUARTER from NEW.date_of_data::timestamp) into quarter;

  if (quarter = 4) then
     insert into child_q4 values (NEW.*);
  else
     raise exception 'Expected Q4';
  end if;
  return null;
end;
$BODY$
  language plpgsql;

And each trigger function needs to be connected with the appropriate child table.

create trigger insert_child_q1_trigger
  before insert
  on child_q1
  for each row
  execute procedure child_q1_insert_trigger();

create trigger insert_child_q2_trigger
  before insert
  on child_q2
  for each row
  execute procedure child_q2_insert_trigger();

create trigger insert_child_q3_trigger
  before insert
  on child_q3
  for each row
  execute procedure child_q2_insert_trigger();

create trigger insert_child_q4_trigger
  before insert
  on child_q4
  for each row
  execute procedure child_q4_insert_trigger();

At this point, the parent table redirects inserts into the correct child table per quarter, and each child table enforces that it is only accepting data where the date is within the quarter of data the child table contains. On the surface, it would appear that the problem is solved. Unfortunately, a new problem has arisen, for inserts into parent are now failing with:

insert into parent (some_data, date_of_data) values ('This is some data', '2015-01-15 08:42:16');

ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring
the platform's stack depth limit is adequate.
CONTEXT:  SQL statement "select extract(QUARTER from NEW.date_of_data::timestamp)"
PL/pgSQL function child_q1_insert_trigger() line 4 at SQL statement
SQL statement "insert into child_q1 values (NEW.*)"

The error of max_stack_depth exceeded is a little misleading if the hint is followed, for no amount of increase of the max_stack_depth parameter will solve this problem. What has happened is that the insert trigger connected to the parent table is invoking an insert upon the child_q1 table, which in turn invokes its insert trigger and since the insert into child_q1 is on a table inherited from parent, the original insert trigger on the parent table is again invoked. This cascading trigger execution is not by default handled by PostgreSQL, and it is up to the developer to ensure this problem does not occur.

The solution is simple, if a bit arcane. What is necessary is that the triggers only be invoked if they are directly invoked and thankfully PostgreSQL provides a function called pg_trigger_depth() for precisely this condition.

To solve, the triggers must first be unhooked from the tables:

drop trigger if exists insert_parent_trigger on parent;
drop trigger if exists insert_child_q1_trigger on child_q1;
drop trigger if exists insert_child_q2_trigger on child_q2;
drop trigger if exists insert_child_q3_trigger on child_q3;
drop trigger if exists insert_child_q4_trigger on child_q4;

And then the triggers must be added back, with the addition of pg_trigger_depth():

create trigger insert_parent_trigger
  before insert
  on parent
  for each row
  when (pg_trigger_depth() = 0)
  execute procedure parent_insert_trigger();

create trigger insert_child_q1_trigger
  before insert
  on child_q1
  for each row
  when (pg_trigger_depth() = 0)
  execute procedure child_q1_insert_trigger();

create trigger insert_child_q2_trigger
  before insert
  on child_q2
  for each row
  when (pg_trigger_depth() = 0)
  execute procedure child_q2_insert_trigger();

create trigger insert_child_q3_trigger
  before insert
  on child_q3
  for each row
  when (pg_trigger_depth() = 0)
  execute procedure child_q2_insert_trigger();

create trigger insert_child_q4_trigger
  before insert
  on child_q4
  for each row
  when (pg_trigger_depth() = 0)
  execute procedure child_q4_insert_trigger();

The pg_trigger_depth() function returns zero when the trigger is directly invoked, and thus stops in cascading between the two insert trigger functions. After installing the triggers using pg_trigger_depth(), the insert functions as expected and the enterprising developer has a bit of refactoring to do to honor the expected constraints of the child tables. Using the pg_trigger_depth() function solved the problem of the cascading triggers.