-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Open
Labels
Description
create table x engine = MergeTree ORDER BY tuple() AS SELECT intDiv(number,100000) as a, number as b FROM numbers(1000000000);
SET prefer_localhost_replica = 0;
CREATE TABLE x_dist AS x ENGINE = Distributed(test_cluster_two_shards_localhost, currentDatabase(), x);
CREATE VIEW x_dist_v AS SELECT * FROM x_dist;
-- direct query from distributed table
select a, sum(b) from x_dist group by a ORDER BY sum(b) DESC limit 1;
--Stats: 1 row in set. Elapsed: 5.816 sec. Processed 2.00 billion rows, 32.00 GB (343.89 million rows/s., 5.50 GB/s.)
-- query from view
select a, sum(b) from x_dist_v group by a ORDER BY sum(b) DESC limit 1;
-- Stats: 1 row in set. Elapsed: 17.093 sec. Processed 2.00 billion rows, 32.00 GB (117.01 million rows/s., 1.87 GB/s.)
The differernce is caused by different local queries.
In case of the trivial view it pushes down the innermost subquery (i.e. view):
... executeQuery: ... SELECT `x`.`a`, `x`.`b` FROM `default`.`x` (stage: Complete)
In case of direct query from the distributed table - it constructs the subquery from the main one:
... executeQuery: ... SELECT `a`, sum(`b`) FROM `default`.`x` GROUP BY `a` ORDER BY sum(`b`) DESC LIMIT 1 (stage: WithMergeableState)
orloffv