问题描述
我是一个全新的人,在一个又一个论坛潜水之后,我决定自己构建它,但它太长了。我将感谢提供更简单的解决方案或场景的贡献。我会尽量详细,所以请做好准备,这将是一个很长的话题,我们开始吧:
问题:客户想知道这是否可以通过 sql Server 完成:我们每个程序的性别分裂是什么,以及在今年最后一个季度中它是如何随时间变化的?
并且他们还提供了以下列名称:ClientID、ClientName、Program、StartDate、性别、地点。
如果用 Excel、Tableau 或 PowerBI 之类的工具制作它并不是很复杂,但它确实让我思考如何使用 sql 来实现。
所以,为了简洁起见,我首先决定创建两个程序:Program_A、Program_B
无论如何,继续测试。 首先我创建了一个名为 General 的表并为其提供了一些数据(对于前面冗长的重复代码块,我深表歉意):
CREATE TABLE General(
ClientID int IDENTITY(1,1) NOT NULL,ClientName varchar(20) NOT NULL,Program varchar(20) NOT NULL,StartDate date,Gender varchar(30) NULL,Location varchar(30) NULL)
INSERT [dbo].[General] ([ClientName],[Program],[StartDate],[Gender],[Location])
VALUES ('John Doe','Program_A','2020-10-01','Male','US')
INSERT [dbo].[Fellows] ([ClientName],[Location])
VALUES ('Chewbaka Girl','Program_B','Female','CA')
INSERT [dbo].[Fellows] ([ClientName],[Location])
VALUES ('Jane Doe','2020-12-01','UK')
INSERT [dbo].[Fellows] ([ClientName],[Location])
VALUES ( 'Carol Smith','2020-11-01',[Location])
VALUES ('Pedro Mostaza',[Location])
VALUES ('Jean Plurier',[Location])
VALUES ('Nicole Kiteman',[Location])
VALUES ('Sonia Cepeda',[Location])
VALUES ('Alejandra Moncayo',[Location])
VALUES ('Britanny Royce',[Location])
VALUES ('Arnold Lotfrey',[Location])
VALUES ('Richard Books',[Location])
VALUES ('Camero lovely',[Location])
VALUES ('Henry Lakes',[Location])
VALUES ('Cameron lovely',[Location])
VALUES ('Paula Mint',[Location])
VALUES ('Shirley Timer',[Location])
VALUES ('Andrew Rocks','CA')
第二,我使用包含 WITH 和 CASE 的 CTE 创建了一个小查询,将 Gender 列分成男性/女性两个不同的列,并将它们的值转换为数字,以便稍后添加百分比:
WITH CTE
AS (SELECT Program,StartDate,COUNT(CASE WHEN Gender='Male' THEN 1 END) As Male,COUNT(CASE WHEN Gender='Female' THEN 1 END) As Female,COUNT(CASE WHEN (Gender='' OR Gender IS Null) THEN 1 END) As 'NotAssigned'
FROM General
GROUP BY Program,StartDate)
SELECT Program,Male,Female,Male*100.0/(Male + Female) as Male_Ratio,Female*100.0/(Male + Female) as Female_Ratio
INTO Program_GenderBreakdown
from CTE;
第三,我还将开始日期转换为月份以方便阅读:
SELECT * INTO Results
FROM (
SELECT
Program,Male_Ratio AS Percentage,'Male' AS Gender
FROM Program_GenderBreakdown
) T --temporary name
PIVOT (
SUM(Percentage)
FOR StartDate
IN (
[2020-10-01],[2020-11-01],[2020-12-01]
)
) AS PvtMale
UNION ALL --then unite both male and female pivots
--query For female
SELECT * FROM (
SELECT
Program,Female_Ratio AS Percentage,'Female' AS Gender
FROM Program_GenderBreakdown
) T
PIVOT (
SUM(Percentage)
FOR StartDate
IN (
[2020-10-01],[2020-12-01]
)
) AS PvtFemale
第四我使用 CAST 将日期(为了美观而保留 2 位小数)转换为月份名称
SELECT Program,Gender,CAST([2020-10-01] AS DECIMAL(19,2)) AS 'October',CAST([2020-11-01] AS DECIMAL(19,2)) AS 'November',CAST([2020-12-01] AS DECIMAL(19,2)) AS 'December',CASE WHEN [2020-10-01]=0 THEN -1 ELSE CAST((([2020-11-01]/[2020-10-01])-1) AS DECIMAL(19,2)) END AS 'MoMOct-Nov',CASE WHEN [2020-11-01]=0 THEN -1 ELSE CAST((([2020-12-01]/[2020-11-01])-1) AS DECIMAL(19,2)) END AS 'MoMNov-Dec'
FROM Pivott
ORDER BY Program,Gender DESC
结果并不像我想象的那样,因为表格看起来与我想要的相反,但到目前为止计算仍然有效。最终结果如下:
结论:虽然计算有效,但我如何使表格看起来像最初预期的那样,甚至值得吗?提前感谢您对此的任何帮助。或者,即使您想路过并给我有关此线程的反馈,我们也将不胜感激。
解决方法
您可以使用条件组来实现这一点,如下所示:
SELECT CONCAT(Program,'_',datename(month,startdate)) as ProgramMonth,convert(decimal(5,2),(COUNT(CASE WHEN Gender = 'Male' then 1 end) * 1.0/COUNT(*)) * 100) as MaleCount,(COUNT(CASE WHEN Gender = 'Female' then 1 end) * 1.0/count(*)) * 100) as FemaleCount
FROM general
group by CONCAT(Program,startdate))
计划月 | 男伯爵 | FemaleCount |
---|---|---|
Program_A_December | 50.00 | 50.00 |
Program_A_November | 0.00 | 100.00 |
Program_A_October | 66.67 | 33.33 |
Program_B_December | 0.00 | 100.00 |
Program_B_November | 75.00 | 25.00 |
Program_B_October | 0.00 | 100.00 |