如何使用Python NLP从数据库表中提取与搜索字符串中的关键字匹配的关键字

问题描述

我有一个带有“ Neon”表的数据库。我正在尝试获取与表“ Neon”关联的所有与搜索字符串相关的关键字。

示例:

Neon Records:

POLICY_NUM          DAYS_TO_BOUND   
0170254497              PL rating
0755698054              PL rating
1525668307              PL rating
1525668312              Air
1525668314              Java
1525668356              Sand
    

我有一个搜索字符串

"Save the day by Sand and Java"

我想得到类似的结果

['Sand':'1525668356','Java':'1525668314']

完成的线索正在连接到数据库提取表数据

import pandas as pd
import logging
import config
from @R_502_6308@alchemy import create_engine  # install MysqLConnector and PyMysqL


def db_connection():
    """
    :return:
    """
    try:
        engine = create_engine('MysqL+pyMysqL://{0}:{1}@{2}/{3}'.format(config.database_config['user'],config.database_config['password'],config.database_config['host'],config.database_config['database']))

        return engine

    except Exception as e:
        logging.info(e)
    finally:
        pass


def extract_table(query):
    """
    :param query:
    :return:
    """

    engine = db_connection()
    @R_502_6308@_select_query = query
    details = pd.read_@R_502_6308@(@R_502_6308@_select_query,engine)
    return details

database_query = {
    'select_query_for_data': 'select * from Neon'
}
 

请让我知道您对此的想法。

解决方法

您在寻找这个吗?

DECLARE @DataSource TABLE
(
    [POLICY_NUM] VARCHAR(16),[DAYS_TO_BOUND] VARCHAR(16)
);

INSERT INTO @DataSource ([POLICY_NUM],[DAYS_TO_BOUND])
VALUES ('0170254497','PL Rating'),('0755698054',('1525668307',('1525668312','Air'),('1525668314','Java'),('1525668356','Sand');

DECLARE @DataString VARCHAR(4000) = 'Save the day by Sand and Java';
DECLARE @DataStringXML XML = '<a>' + REPLACE(@DataString,' ','</a><a>') + '</a>';

WITH DataSource ([rowID],[rowValue]) AS
(
    SELECT ROW_NUMBER() OVER (ORDER BY T.c ASC),T.c.value('.','VARCHAR(256)')
    FROM @DataStringXML.nodes('a') T(c)
)
SELECT '[' + STUFF
(
    (
        SELECT ',' + '''' + V.[rowValue] + ''':''' + D.[POLICY_NUM] + ''''
        FROM DataSource V
        INNER JOIN @DataSource D
            ON V.[rowValue] = D.[DAYS_TO_BOUND]
        ORDER BY V.[rowID]
        FOR XML PATH(''),TYPE
    ).value('.','VARCHAR(MAX)'),1,''
) + ']'

它产生:

['Sand':'1525668356','Java':'1525668314']