背景

这是一篇博客阅读笔记，原博客Late row lookups: InnoDB写于 2011 年，作者是一个叫 Quassnoi 的俄罗斯人。

Quassnoi 在 2009 年时写了一篇文章讲 MySQL Limit 的性能优化(MySQL ORDER BY / LIMIT performance: late row lookups)，后来有读者提了两个问题：

Is this workaround specific to MyISAM engine?

How does PostgreSQL handle this?

Quassnoi 写下这篇新博客，即为了回答上述两个问题。

正文

The questions concerns a certain workaround for MySQL LIMIT … OFFSET queries like this:

1
2
3
4


SELECT *
FROM mytable
ORDER BY id
LIMIT 10 OFFSET 10000;

which can be improved using a little rewrite:

1
2
3
4
5
6
7
8


SELECT m.*
FROM (SELECT id
      FROM mytable
      ORDER BY id
      LIMIT 10 OFFSET 10000) q
         JOIN mytable m
              ON m.id = q.id
ORDER BY m.id;

注意：作者之前的文章讨论的是这个方法对 MyISAM 是有效的；问题是，对于 InnoDB 和 PostgreSQL 呢？

PostgreSQL

The Answer

The second questions is easy: PostgreSQL won’t pull the fields from the table until it really needs them. If a query involving an ORDER BY along with LIMIT and OFFSET is optimized to use the index for the ORDER BY part, the table lookups won’t happen for the records skipped.

这句话是说，PostgreSQL 只会在需要的时候才回表查询。如果一个查询涉及 ORDER BY、LIMIT 和 OFFSET，那么可以先利用索引跳过不需要的记录，只对需要的记录进行进行回表。

这其实就是『Late Row Lookups』；与之相对的，MySQL 执行的是『Early Row Lookups』。

作者后面说虽然 PostgreSQL 的查询计划不会输出回表信息，但可以通过一个简单的测试进行验证。但具体怎么进行这个实验，作者没讲，下面试着做个补充。

验证(存疑)

建表详情

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15


-- 建表
CREATE TABLE page
(
    id      SERIAL PRIMARY KEY,
    name    VARCHAR(16)  DEFAULT NULL,
    content VARCHAR(255) DEFAULT NULL
);
-- 为 name 字段创建二级索引
CREATE INDEX idx_name ON page (name);

-- 初始化 600 万条数据
INSERT INTO page (name, content)
    (SELECT CONCAT('小瓦', s.id) AS name,
            CONCAT('xx', s.id)   AS content
     FROM GENERATE_SERIES(1, 6000010) AS s(id));

执行 SQL 一：直接查询

1
2
3
4
5
6
7


postgres=# EXPLAIN ANALYZE SELECT * FROM page ORDER BY id OFFSET 6000000 LIMIT 10;
                                                              QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=199830.43..199830.76 rows=10 width=26) (actual time=540.002..540.003 rows=10 loops=1)
 ->  Index Scan using page_pkey on page  (cost=0.43..199846.81 rows=6000492 width=26) (actual time=0.087..427.982 rows=6000010 loops=1)
Planning Time: 0.108 ms
Execution Time: 540.022 ms

执行 SQL 二：使用子查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


postgres=# EXPLAIN ANALYZE SELECT t1.* FROM page t1, (SELECT id FROM page ORDER BY id OFFSET 6000000 LIMIT 10) t2 where t1.id=t2.id ORDER BY t1.id;
                                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=155811.47..155895.90 rows=10 width=26) (actual time=368.952..368.960 rows=10 loops=1)
 ->  Limit  (cost=155811.04..155811.30 rows=10 width=4) (actual time=368.941..368.942 rows=10 loops=1)
       ->  Index Only Scan using page_pkey on page  (cost=0.43..155823.81 rows=6000492 width=4) (actual time=0.015..248.584 rows=6000010 loops=1)
             Heap Fetches: 50
 ->  Index Scan using page_pkey on page t1  (cost=0.43..8.45 rows=1 width=26) (actual time=0.001..0.001 rows=1 loops=10)
       Index Cond: (id = page.id)
Planning Time: 0.375 ms
Execution Time: 369.004 ms

执行 SQL 三：使用子查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11


postgres=# EXPLAIN ANALYZE SELECT t1.* FROM page t1 join (SELECT id FROM page ORDER BY id OFFSET 6000000 LIMIT 10) t2 ON t1.id=t2.id ORDER BY t1.id;
                                                                   QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=155811.47..155895.90 rows=10 width=26) (actual time=362.645..362.652 rows=10 loops=1)
 ->  Limit  (cost=155811.04..155811.30 rows=10 width=4) (actual time=362.633..362.634 rows=10 loops=1)
       ->  Index Only Scan using page_pkey on page  (cost=0.43..155823.81 rows=6000492 width=4) (actual time=0.017..243.148 rows=6000010 loops=1)
             Heap Fetches: 50
 ->  Index Scan using page_pkey on page t1  (cost=0.43..8.45 rows=1 width=26) (actual time=0.001..0.001 rows=1 loops=10)
       Index Cond: (id = page.id)
Planning Time: 4.609 ms
Execution Time: 362.707 ms

三个查询的耗时为：`540.022 ms` vs `369.004 ms` vs `362.707 ms`。 > Tips: 使用 `EXPLAIN ANALYZE` 既可以获得查询计划，又能执行语句。

结论

上述实验可以证明该优化对于 PostgreSQL 同样可以有效。

InnoDB

为了回答这个问题，作者首先创建了一张表。

建表

建表详情

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51


# 创建一张内存表
CREATE TABLE filler
(
    id INT NOT NULL PRIMARY KEY AUTO_INCREMENT
) ENGINE = Memory;

# 创建 InnoDB 表
CREATE TABLE lookup
(
    id       INT  NOT NULL PRIMARY KEY,
    value    INT  NOT NULL,
    shorttxt TEXT NOT NULL,
    longtxt  TEXT NOT NULL
) ENGINE = InnoDB
  ROW_FORMAT = COMPACT;

# 为 value 字段创建索引
CREATE INDEX
    ix_lookup_value
    ON lookup (value);

#  创建存储计划
DELIMITER $$
CREATE PROCEDURE prc_filler(cnt INT)
BEGIN
    DECLARE _cnt INT;
    SET _cnt = 1;
    WHILE _cnt <= cnt
        DO
            INSERT
            INTO filler
            SELECT _cnt;
            SET _cnt = _cnt + 1;
        END WHILE;
END
$$

# 初始化内存表
DELIMITER ;
START TRANSACTION;
CALL prc_filler(100000);
COMMIT;

# 初始化 InnoDB 表
INSERT
INTO lookup
SELECT id,
       CEILING(RAND(20110211) * 1000000),
       RPAD('', CEILING(RAND(20110211 << 1) * 100), '*'),
       RPAD('', CEILING(8192 + RAND(20110211 << 1) * 100), '*')
FROM filler;

上面利用一张内存表和存储计划创建了一张 InnoDB 表 `lookup`：这张表包含一个加了索引的 `INT` 列，以及两个 `TEXT` 列，其中 shorttxt 存储短字符串(包含 1~100 个字符)，longtxt 存储长字符串(包含 8193~8293 个字符)。

通过主键索引查询 value 和 shottxt 两个字段时，是否使用子查询优化对耗时影响不大，略去不讨论。

通过主键索引查询 longtxt 列

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17


# Rewrite
SELECT LENGTH(l.longtxt)
FROM (SELECT id
      FROM lookup
      ORDER BY id
      LIMIT 10 OFFSET 90000) q
         JOIN lookup l
              ON l.id = q.id
ORDER BY q.id;
# 10 rows in set (0.133 sec)

# No rewrite
SELECT LENGTH(longtxt)
FROM lookup
ORDER BY id
LIMIT 10 OFFSET 90000;
# 10 rows in set (1.579 sec)

0.133 sec vs 1.579 sec。

Why such a difference?

The reason is that InnoDB, despite the fact it stores the data in the clustered index, is still able to move > some data out of the index. This is called external storage.

在 InnoDB 中，小于 768 字节的列会全部存储在页上，大于 768 字节则会分开存储。在上面的 lookup 表中，shottxt 列总是 on-page 存储的，而 longtxt 则是 off-page的。

因此，直接查询 longtxt 时，每扫描一条记录都要出发两次page lookups：第一次查聚簇索引，第二次查外部存储。这既会花费很多时间，也可能破坏 InnoDB 缓存，导致缓存命中率下降。

通过二级索引查询 shorttxt 列

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16


# Rewrite
SELECT LENGTH(l.shorttxt)
FROM (SELECT id, value
      FROM lookup
      ORDER BY value
      LIMIT 10 OFFSET 90000) q
         JOIN lookup l
              ON l.id = q.id
ORDER BY q.value;
# 10 rows in set (0.022 sec)

# No rewrite
SELECT LENGTH(shorttxt)
FROM lookup
ORDER BY value
LIMIT 10 OFFSET 90000;# 10 rows in set (0.209 sec)

0.022 sec vs 0.209 sec，耗时相差 10 倍。 InnoDB的二级索引查询有点类似MyISAM，都需要一次额外的回表查询去获取正在的数据。

上面第一个查询对于跳过的记录不会执行回表查询，因此速度是原来的 10 倍。它甚至比在主键索引上的查询(0.044 sec)还快，原因是二级索引包含的数据比主键索引更少，每页保存的数据更多，因此索引可能更加矮胖，扫描速度更快。

当然，二级索引上同样是Early Row Lookups。

结论

A trick used to avoid early row lookups for the LIMIT … OFFSET queries is useful on InnoDB tables too, though to different extent, depending on the ORDER BY condition and the columns involved:

It’s very useful on queries involving columns stored off-page (long TEXT, BLOB and VARCHAR columns)

It’s very useful on ORDER BY conditions served by secondary indexes

It’s quite useful on moderate sized columns (still stored on page) or CPU-intensive expressions

It’s almost useless on short columns without complex CPU-intensive processing

其它

PostgreSQL查询计划

除第一行以外每个->表示一个子动作
查询计划的阅读顺序都是从后往前
cost 由 .. 分割成两个数字，第一个数字表示启动成本，即返回第一行的成本；第二个数字表示返回所有数据的成本。
rows 表示返回行数
width 表示每行平均宽度
loops 表示索引扫描被执行过几次

Reference

Documentation: 15: 14.1. Using EXPLAIN - PostgreSQL

Late Row Lookups: InnoDB

博客『Late Row Lookups: InnoDB』的阅读笔记

背景

正文