问题描述
我正在使用python3分析sqlite数据库文件中的某些数据。我想将所有表连接到Python中的一个巨型表中。我对执行此操作的python命令有些了解,但是sql语句对我来说太复杂了。我需要创建将在数据库文件上执行的sql语句的帮助。我也希望所有这些数据也作为熊猫数据框输出。
station
id
name
lat
long
dock_count
city
installation_date
status
station_id
bikes_available
docks_available
time
trip
id
duration
start_date
start_station_name
start_station_id
end_date
end_station_name
end_station_id
bike_id
subscription_type
zip_code
weather
date
max_temperature_f
mean_temperature_f
min_temperature_f
max_dew_point_f
mean_dew_point_f
min_dew_point_f
max_humidity
mean_humidity
min_humidity
max_sea_level_pressure_inches
mean_sea_level_pressure_inches
min_sea_level_pressure_inches
max_visibility_miles
mean_visibility_miles
min_visibility_miles
max_wind_Speed_mph
mean_wind_speed_mph
max_gust_speed_mph
precipitation_inches
cloud_cover
events
wind_dir_degrees
zip_code
我想将所有表联接到一个巨型表中,然后选择1000次行程并包含所有联接的数据。这意味着我需要了解行程表中的一些外键,它们是:
start_date,points to weather,status
start_station_id,points to station
end_date,status
end_station_id points to station
我正在考虑的联接如下:
select 1000 rows from trip join (
weather where trip.start_date = weather.date as startweather
) and join (
weather where trip.end_date = weather.date as endweather
) and join (
station where trip.start_station_id = station.id as startstation
) and join(
station where trip.end_station_id = station.id as endstation
) and join (
status where trip.start_station_id = station.status_id and trip.start_date = station.date as startstationstatus
) and join(
status where trip.end_station_id = station.status_id and trip.end_date = station.date as endstationstatus)
)
解决方法
我将发布该问题的答案,因为我最终使用的查询显示了sqlite的许多不同功能。这是我使用的查询:
Select count() FROM trip as tr INNER JOIN station as startst on startst.id = tr.start_station_id INNER JOIN station as endst on endst.id = tr.end_station_id INNER JOIN weather as startwea on startwea.date = SUBSTR(tr.start_date,1,9) INNER JOIN status as ststat on trim(substr(ststat.time,6,2),"0") = substr(tr.start_date,instr(tr.start_date,"/") - 1) and trim(substr(ststat.time,9,"/") + 1,instr(substr(tr.start_date,"/") + 1),"/") - 1) and substr(ststat.time,4) = substr(tr.start_date," "),-4) WHERE tr.id > 0 AND tr.id <= 7000000 AND tr.id % 100000 = 0
最后这太复杂了,因为我必须通过date列将两个表连接起来,并且每一列中的日期格式都不同。