如何在三个表40k行中优化缓慢的“选择不同的”查询，该查询仅返回22个结果

问题描述

| 因此，我有一个我正在尝试重构的其他人编写的此查询，该查询为某项（通常是鞋子）提取了某些功能/材料。有很多产品，因此有很多连接表项，但是只有少数几个功能可用。我在想，必须有一种方法来减少涉及“大”项目列表，获得这些功能的需要，而且我听说应避免使用明显的方法，但我不希望这样做。这里没有可以替换\“ distinct \”选项的声明。根据我的日志，我得到的结果速度很慢： Query_time：7 Lock_time：0发送的行数：32已检查的行数：5362862 Query_time：8 Lock_time：0发送的行数：22已检查的行数：6581994 就像消息中说的那样，有时需要7或8秒，有时甚至每次要查询500万行以上。这可能是由于同时发生其他负载，因为这是直接从MySQL命令行在数据库上运行的选择：

MysqL> SELECT disTINCT features.FeatureId,features.Name
       FROM features,itemsfeatures,items
       WHERE items.FlagStatus != \'U\'
         AND items.TypeId = \'13\'
         AND features.Type = \'Material\'
         AND features.FeatureId = itemsfeatures.FeatureId
       ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (0.00 sec)

MysqL> select count(*) from features;
+----------+
| count(*) |
+----------+
|      122 |
+----------+
1 row in set (0.00 sec)

MysqL> select count(*) from itemsfeatures;
+----------+
| count(*) |
+----------+
|    38569 |
+----------+
1 row in set (0.00 sec)

MysqL> select count(*) from items;
+----------+
| count(*) |
+----------+
|     8656 |
+----------+
1 row in set (0.00 sec)

explain SELECT disTINCT features.FeatureId,features.Name  FROM features,items    WHERE items.FlagStatus != \'U\'  AND items.TypeId = \'13\'  AND features.Type = \'Material\' AND features.FeatureId = itemsfeatures.FeatureId  ORDER BY features.Name;
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
| id | select_type | table         | type | possible_keys     | key       | key_len | ref                             | rows | Extra                                        |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
|  1 | SIMPLE      | features      | ref  | PRIMARY,Type      | Type      | 33      | const                           |   21 | Using where; Using temporary; Using filesort |
|  1 | SIMPLE      | itemsfeatures | ref  | FeatureId         | FeatureId | 4       | sherman_live.features.FeatureId |  324 | Using index; distinct                        |
|  1 | SIMPLE      | items         | ALL  | TypeId,FlagStatus | NULL      | NULL    | NULL                            | 8656 | Using where; distinct; Using join buffer     |
+----+-------------+---------------+------+-------------------+-----------+---------+---------------------------------+------+----------------------------------------------+
3 rows in set (0.04 sec)

编辑：以下是没有区别的示例结果（但有限制，因为否则会挂起）以进行比较：

SELECT features.FeatureId,features.Name        FROM features,items        WHERE items.FlagStatus != \'U\'          AND items.TypeId = \'13\'          AND features.Type = \'Material\'          AND features.FeatureId = itemsfeatures.FeatureId        ORDER BY features.Name limit 10;
+-----------+-----------+
| FeatureId | Name      |
+-----------+-----------+
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
|        40 | Alligator |
+-----------+-----------+
10 rows in set (23.30 sec)

这是使用分组依据，而不是选择不同的：

SELECT features.FeatureId,items        WHERE items.FlagStatus != \'U\'          AND items.TypeId = \'13\'          AND features.Type = \'Material\'          AND features.FeatureId = itemsfeatures.FeatureId        group by features.name ORDER BY features.Name;
+-----------+--------------------+
| FeatureId | Name               |
+-----------+--------------------+
|        40 | Alligator          |
|        41 | Burnished Calfskin |
|        42 | Calfskin           |
|        59 | Canvas             |
|        43 | Chromexcel         |
|        44 | Cordovan           |
|        57 | Cotton             |
|        45 | Crocodile          |
|        58 | Deerskin           |
|        61 | Eel                |
|        46 | Italian Leather    |
|        47 | Lizard             |
|        48 | Nappa              |
|        49 | NuBuck             |
|        50 | Ostrich            |
|        51 | Patent Leather     |
|        60 | Rubber             |
|        52 | Sharkskin          |
|        53 | Silk               |
|        54 | Suede              |
|        56 | Veal               |
|        55 | Woven              |
+-----------+--------------------+
22 rows in set (13.28 sec)

编辑：添加了赏金 ...由于我正试图了解这个普遍的问题，因此，除了这个查询特别容易造成的速度缓慢之外，一般如何替换错误的选择不同的查询。我想知道，选择的唯一身份替代品通常不是一组吗（尽管在这种情况下，由于它仍然很慢，所以不是一个全面的解决方案）？

解决方法

如Joe所述，似乎确实缺少联接条件这是您当前的查询

SELECT DISTINCT 
        features.FeatureId,features.Name
FROM    features,itemsfeatures,items
WHERE   items.FlagStatus != \'U\'
        AND items.TypeId = \'13\'
        AND features.Type = \'Material\'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name

这是带有显式联接的查询

SELECT DISTINCT 
        features.FeatureId,features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId CROSS JOIN
        items
WHERE   items.FlagStatus != \'U\'
        AND items.TypeId = \'13\'
        AND features.Type = \'Material\'
ORDER BY features.Name

我无法100％确定，但看起来删除对items表的任何引用应该会给您完全相同的结果

SELECT DISTINCT 
        features.FeatureId,itemsfeatures
WHERE   features.Type = \'Material\'
        AND features.FeatureId = itemsfeatures.FeatureId
ORDER BY features.Name

查询的编写方式似乎需要typeID为13且Flagstatus <> U的物料的物料清单。如果是这种情况，原始查询返回的结果是错误的。它只是返回所有物料的所有物料。因此，正如Joe陈述的那样，为项目添加内部联接，并使用显式联接，因为它们使含义更清楚。我更喜欢使用分组依据，但distinct会执行相同的操作。

SELECT  features.FeatureId,features.Name
FROM    features INNER JOIN
        itemsfeatures on features.FeatureId = itemsfeatures.FeatureId INNER JOIN
        items on itemsfeatures.ItemID = items.ItemID
WHERE   items.FlagStatus != \'U\'
        AND items.TypeId = \'13\'
        AND features.Type = \'Material\'
GROUP BY features.FeatureId,features.Name
ORDER BY features.Name

现在排序了，现在有了速度。创建以下三个索引。

FeaturesIndex(Type,FeatureID,Name)
ItemsFeaturesIndex(FeatureId)
ItemsIndex(TypeId,FlagStatus,ItemID)

这样可以加快当前查询和我列出的查询的速度。 , 似乎您缺少将itemsfeatures链接到items的JOIN条件。如果您使用显式的JOIN操作编写查询，则更为明显。

SELECT DISTINCT f.FeatureId,f.Name  
    FROM features f
        INNER JOIN itemsfeatures ifx
            ON f.FeatureID = ifx.FeatureID
        INNER JOIN items i
            ON ifx.ItemID = i.ItemID /* This is the part you\'re missing */
    WHERE i.FlagStatus != \'U\'  
        AND i.TypeId = \'13\'  
        AND f.Type = \'Material\' 
    ORDER BY f.Name;

, 我几乎相信乔的回答是正确的。但是，如果您认为Joe是错误的，并且希望获得与原始查询相同的结果，但是速度更快，请使用以下查询：

SELECT DISTINCT features.FeatureId,features.Name
    FROM features,itemsfeatures
    WHERE features.Type = \'Material\'
        AND features.FeatureId = itemsfeatures.FeatureId
    ORDER BY features.Name;

三个优化优化优化何在查询查询缓慢返回返回