For example, a local table can inherit a foreign table:
-- Create a new table which inherits from the foreign table
# CREATE TABLE local_customers () INHERITS (remote.customers);-- Insert some data into it
# INSERT INTO local_customers VALUES (16, 'Bruce',$$Jo's Cupcakes$$, '2015-01-15', '2017-01-14', true, 'running', 'basic');-- And if we query the parent foreign table...
# SELECT tableoid::regclass, * FROM remote.customers; tableoid | id | name |company| registered_date | expiry_date | active | status | account_level ------------------+----+-------+---------------+-----------------+-------------+--------+---------+--------------- remote.customers | 1 | James | Hughson Corp | 2014-05-03 | 2016-05-02 | t | idle| premium local_customers | 16 | Bruce | Jo's Cupcakes | 2015-01-15 | 2017-01-14 | t | running | basic (2 rows)Or a foreign table can be made to inherit from a local table:
-- Create a new table that the foreign table will be a child of
# CREATE TABLE master_customers (LIKE remote.customers);-- Insert a new row into this table
# INSERT INTO master_customers VALUES (99, 'Jolly',$$Cineplanet$$, '2014-10-30', '2016-10-29', true, 'running', 'premium');-- Have the foreign table inherit from the new table
# ALTER TABLE remote.customers INHERIT master_customers;-- Let's have a look at the contents of the new table now
# SELECT tableoid::regclass, * FROM master_customers; tableoid | id | name |company| registered_date | expiry_date | active | status | account_level ------------------+----+-------+---------------+-----------------+-------------+--------+---------+--------------- master_customers | 99 | Jolly | Cineplanet| 2014-10-30 | 2016-10-29 | t | running | premium remote.customers | 1 | James | Hughson Corp | 2014-05-03 | 2016-05-02 | t | idle| premium local_customers | 16 | Bruce | Jo's Cupcakes | 2015-01-15 | 2017-01-14 | t | running | basic (3 rows)-- And the query plan...
# EXPLAIN ANALYSE SELECT tableoid::regclass, * FROM master_customers; QUERY PLAN--------------------------------------------------------------------------------------------------------------------------- Result (cost=0.00..140.80 rows=1012 width=145) (actual time=0.014..0.595 rows=3 loops=1)-> Append (cost=0.00..140.80 rows=1012 width=145) (actual time=0.012..0.591 rows=3 loops=1) -> Seq Scan on master_customers (cost=0.00..1.48 rows=48 width=145) (actual time=0.012..0.013 rows=1 loops=1) -> Foreign Scan on customers (cost=100.00..124.52 rows=484 width=145) (actual time=0.567..0.567 rows=1 loops=1) -> Seq Scan on local_customers (cost=0.00..14.80 rows=480 width=145) (actual time=0.007..0.008 rows=1 loops=1) Planning time: 0.256 ms Execution time: 1.040 ms (7 rows)ALLOW FOREIGN TABLES TO PARTICIPATE IN INHERITANCE:
Foreign tables can now be inheritance children, or parents. Much of thesystem was already ready for this, but we had to fix a few things ofcourse, mostly in the area of planner and executor handling of row locks. As side effects of this, allow foreign tables to have NOT VALID CHECKconstraints (and hence to accept ALTER ... VALIDATE CONSTRAINT), and toaccept ALTER SET STORAGE and ALTER SET WITH/WITHOUT OIDS. Continuing todisallow these things would've required bizarre and inconsistent specialcases in inheritance behavior. Since foreign tables don't enforce CHECKconstraints anyway, a NOT VALID one is a complete no-op, but that doesn'tmean we shouldn't allow it. And it's possible that some FDWs might haveuse for SET STORAGE or SET WITH OIDS, though doubtless they will be no-opsfor most. An additional change in support of this is that when a ModifyTable nodehas multiple target tables, they will all now be explicitly identifiedin EXPLAIN output, for example: Update on pt1 (cost=0.00..321.05 rows=3541 width=46) Update on pt1 Foreign Update on ft1 Foreign Update on ft2 Update on child3 -> Seq Scan on pt1 (cost=0.00..0.00 rows=1 width=46) -> Foreign Scan on ft1 (cost=100.00..148.03 rows=1170 width=46) -> Foreign Scan on ft2 (cost=100.00..148.03 rows=1170 width=46) -> Seq Scan on child3 (cost=0.00..25.00 rows=1200 width=46) This was done mainly to provide an unambiguous place to attach "Remote SQL"fields, but it is useful for inherited updates even when no foreign tablesare involved.
Initially I almost dismissed it. Another cool feature for hard-code nerds. But then it hit me. This is really big. Why? Read on …Let's see how to get it working, and what we can use it forI'll make myself 5 databases, named master and shard_1, shard_2, shard_3, and shard_4.Of course in real life scenario the databases would be on separate machines/instances, but this is just test environment, so it has to be enough.Now, in each of these databases, I'll make a poor-man sharding table for users:It creates the users table, but in each shard it will give different ids. In shard 1 it will be 1, 5, 9, …, in shard 2 it will be 2, 6, 10, … and so on.With the tables in place, let's make a master table and make all other inherit from it.Tedious. But it can be make more automatic.Now, with this in place I should be able to select, insert, update, and delete rows directly to shards:Immediately I can see a problem – my sequences on shards do not give the ids I thought they will. Reason is simple – ID is assigned on master. So, to make it work sensibly, I would have to make 4 sequences on master, and do the magic there. Given that values up to 8 are already taken, I'll have to change restart values too:And now, new inserts, will have ids assigned correctly:Better.Now, thanks to this new patch, I can scan them all together:Delete rows from them:and of course update them:Now, you have to notice that sharding in this way doesn't really make the queries faster – all foreign scans are done sequentially.But, assuming some a bit larger dataset:We can see that with fast queries it still makes sense.[master] $ explain analyze select * from users where id = 123124;QUERY PLAN———————————————————————————————————————-Append (cost=0.00..508.81 rows=29 width=36) (actual time=4.270..4.271 rows=1 loops=1)-> Seq Scan on users (cost=0.00..0.00 rows=1 width=36) (actual time=0.001..0.001 rows=0 loops=1)Filter: (id = 123124)-> Foreign Scan on users_shard_1 (cost=100.00..127.20 rows=7 width=36) (actual time=1.020..1.020 rows=0 loops=1)-> Foreign Scan on users_shard_2 (cost=100.00..127.20 rows=7 width=36) (actual time=1.247..1.247 rows=0 loops=1)-> Foreign Scan on users_shard_3 (cost=100.00..127.20 rows=7 width=36) (actual time=1.004..1.004 rows=0 loops=1)-> Foreign Scan on users_shard_4 (cost=100.00..127.20 rows=7 width=36) (actual time=0.993..0.994 rows=1 loops=1)Planning time: 0.349 msExecution time: 6.070 ms(9 rows)This doesn't look so good. But, PostgreSQL has built in partitioning. Which we can now use for some great effects:and with this in place:This is much nicer.Only two “tables" got scanned – users – empty parent of all shards, and the single shard that we needed.If I used partition/sharding key differently (using ranges for example) PostgreSQL could have be even smart enough to pick the right shard without guiding with (id % 4) = (123124 % 4).But this little thing doesn't really spoil the fun. Lo and behold it looks like we got sharding ( of course my choice of names for databases did spoil the fun a bit ) solution that will actually work.With some simple triggers on users() table (to redirect inserts to partitions/shards) we'll be golden.The only thing missing would be some way to run the foreign scans in parallel (when there is more than one). Maybe it will materialize one day, we'll see.
0 comments:
Post a Comment