Blog · Loji44AboutTAGSRSS🔍SEARCH

最近遇到一个MySQL分页+排序时,MySQL返回数据重复的问题,记录一下场景和自己当时的理解。

1. 问题(MySQL 5.7)

有一张数据表:业务线数据表 business_line 的内容如下:

mysql> SELECT * FROM business_line;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  1 | 业务线1        |         10 |      0 | 2019-12-10 11:07:04 | 2019-12-12 10:47:04 |
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
|  3 | 业务线3        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:22 |
|  4 | 业务线4        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:23 |
|  5 | 业务线5        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:24 |
|  6 | 业务线6        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:25 |
|  7 | 业务线7        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:26 |
|  8 | 业务线8        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:27 |
|  9 | 业务线9        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:28 |
| 10 | 业务线10       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:30 |
| 11 | 业务线11       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:34 |
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
+----+---------------+------------+--------+---------------------+---------------------+

created_time字段设计为DEFAULT CURRENT_TIMESTAMP,且由于表中的数据是使用INSERT语句批量一次性插入数据表中,所以每一行数据中的created_time字段值都是2019-12-10 11:07:04,这是前提。

我的需求是在前端页面使用分页 + 按created_time降序排序的查询规则来查询并展示这个表的数据,分页语句为:SELECT * FROM business_line ORDER BY created_time DESC LIMIT #{count} OFFSET #{offset};

现在查询第一页的数据(每页10条,offset为 0):

mysql> SELECT * FROM business_line ORDER BY created_time DESC LIMIT 10 OFFSET 0;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
| 10 | 业务线10       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:30 |
|  9 | 业务线9        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:28 |
|  8 | 业务线8        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:27 |
|  7 | 业务线7        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:26 |
|  6 | 业务线6        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:25 |
|  5 | 业务线5        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:24 |
|  4 | 业务线4        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:23 |
|  3 | 业务线3        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:22 |
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
+----+---------------+------------+--------+---------------------+---------------------+

嗯 … 看着好像没啥问题。继续查询第二页数据(每页10条,offset为 10):

mysql> SELECT * FROM business_line ORDER BY created_time DESC LIMIT 10 OFFSET 10;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
+----+---------------+------------+--------+---------------------+---------------------+

问题来了:第二页居然查出重复的数据,已知业务线2业务线12在第一页查询的时候已经查出来了。总共就两页数据,反倒是业务线1业务线11始终没有查出来!多么窒息的操作 …

2. 问题原因追踪

翻阅MySQL官网使用手册,在 8.2.1.17 LIMIT Query Optimization 一章中找到一些原因:

If multiple rows have identical values in the ORDER BY columns, the server is free to return those rows in any order, and may do so differently depending on the overall execution plan. In other words, the sort order of those rows is nondeterministic with respect to the nonordered columns.

这里意思是,如果 ORDER BY 所指定的排序字段中存在多行的值相同(就是上述遇到的情况),针对未指定排序的字段,MySQL返回的结果行数据的顺序是不确定的。既然返回的数据顺序不确定,那么在LIMIT取数据的时候,就有可能取到重复的数据,这就解释了我们遇到的情况为什么是这样的。可能MySQL就是这样设计的吧🤷‍♂️🤷‍♂️🤷‍♂️。

另一个影响数据顺序的因素是LIMIT子句,ORDER BY的时候,后面有LIMIT和没有LIMIT时返回数据的顺序是不一样的。

-- 注意:ORDER BY 后面没有跟 LIMIT 语句时返回的数据顺序:

mysql> SELECT * FROM business_line ORDER BY created_time;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  1 | 业务线1        |         10 |      0 | 2019-12-10 11:07:04 | 2019-12-12 10:47:04 |
| 11 | 业务线11       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:34 |
| 10 | 业务线10       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:30 |
|  9 | 业务线9        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:28 |
|  8 | 业务线8        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:27 |
|  7 | 业务线7        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:26 |
|  6 | 业务线6        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:25 |
|  5 | 业务线5        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:24 |
|  4 | 业务线4        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:23 |
|  3 | 业务线3        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:22 |
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
+----+---------------+------------+--------+---------------------+---------------------+

-- ORDER BY 后面跟 LIMIT 语句时返回的数据顺序:

mysql> SELECT * FROM business_line ORDER BY created_time LIMIT 5;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
|  3 | 业务线3        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:22 |
|  4 | 业务线4        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:23 |
|  5 | 业务线5        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:24 |
+----+---------------+------------+--------+---------------------+---------------------+

3. 问题解决方案

当指定的排序字段存在多行数据中的值都一样的情况,MySQL的排序可能就会包含不确定性,就像问题中描述的现象一样:分页返回了重复的数据。为了消除这种不确定性,保证返回的数据顺序是确定的,可以额外使用其他具有唯一性的字段作为排序字段,例如自增ID。

所以根据上述问题中的场景,既然指定的排序字段 created_time 已经不可避免地在每行数据中包含了相同的值,那需要再额外使用一个其值具备唯一性的字段自增ID来作为排序字段,得到的分页结果将会是跟预期一样正常了。

现在查询第一页的数据(每页10条,offset为 0),注意额外加了 id 作为排序字段:

mysql> SELECT * FROM business_line ORDER BY created_time DESC, id LIMIT 10 OFFSET 0;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  1 | 业务线1        |         10 |      0 | 2019-12-10 11:07:04 | 2019-12-12 10:47:04 |
|  2 | 业务线2        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:21 |
|  3 | 业务线3        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:22 |
|  4 | 业务线4        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:23 |
|  5 | 业务线5        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:24 |
|  6 | 业务线6        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:25 |
|  7 | 业务线7        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:26 |
|  8 | 业务线8        |         11 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:27 |
|  9 | 业务线9        |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:28 |
| 10 | 业务线10       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:30 |
+----+---------------+------------+--------+---------------------+---------------------+

继续查询第二页数据(每页10条,offset为 10),注意额外加了 id 作为排序字段:

mysql> SELECT * FROM business_line ORDER BY created_time DESC, id LIMIT 10 OFFSET 10;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
| 11 | 业务线11       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:34 |
| 12 | 业务线12       |          1 |      0 | 2019-12-10 11:07:04 | 2019-12-12 11:41:37 |
+----+---------------+------------+--------+---------------------+---------------------+

至此,分页数据重复的问题得到解决。

ORDER BY created_time DESC, id 语句中,MySQL会优先使用 created_time 字段进行降序排序,当遇到 created_time 字段值一样时,会使用 id 进行升序排序(不指定则默认ASC),从而解决了 ORDER BY + LIMIT 组合使用时分页数据重复的问题了。

4. 后续补充

MySQL 8.0 + 的版本已经不存在上述的问题。现在以MySQL 8.0版本来做实验,现在表中数据如下:

mysql> SELECT * FROM business_line;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  1 | 业务线1        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  2 | 业务线2        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  3 | 业务线3        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  4 | 业务线4        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  5 | 业务线5        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  6 | 业务线6        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  7 | 业务线7        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  8 | 业务线8        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  9 | 业务线9        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
| 10 | 业务线10       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
| 11 | 业务线11       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
| 12 | 业务线12       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
+----+---------------+------------+--------+---------------------+---------------------+

现在查询第一页的数据(每页10条,offset为 0):

mysql> SELECT * FROM business_line ORDER BY created_time DESC LIMIT 10 OFFSET 0;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
|  1 | 业务线1        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  2 | 业务线2        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  3 | 业务线3        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  4 | 业务线4        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  5 | 业务线5        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  6 | 业务线6        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  7 | 业务线7        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  8 | 业务线8        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
|  9 | 业务线9        |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
| 10 | 业务线10       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
+----+---------------+------------+--------+---------------------+---------------------+

再查询第二页的数据每页10条,offset为 10):

mysql> SELECT * FROM business_line ORDER BY created_time DESC LIMIT 10 OFFSET 10;
+----+---------------+------------+--------+---------------------+---------------------+
| id | buz_line_name | company_id | status | created_time        | updated_time        |
+----+---------------+------------+--------+---------------------+---------------------+
| 11 | 业务线11       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
| 12 | 业务线12       |          0 |      0 | 2020-01-08 04:01:24 | 2020-01-08 04:01:24 |
+----+---------------+------------+--------+---------------------+---------------------+

从结果上看,MySQL 8.0版本已经默认处理了这种情况,即如果排序字段值相同,则会再根据自增ID进行排序。

然而在官方的手册文档上,关于这部分的说明仍跟5.7版本一样:8.2.1.19 LIMIT Query Optimization,不知道为何,也许文档没来得及修改?