使用 group by 聚合 R 中的数据并保留非 NA 的其他列的值

问题描述

我想知道是否有人可以帮忙。我有以下数据集,其中一个 ID 是一家公司,该公司随着时间的推移雇佣了不同数量的 ID 重复人员。我们有 ID 的地址,但没有为每一行收集:

    <!DOCTYPE html>
<html>
    <canvas id="gameCanvas" width="800" height="600"></canvas>
    <script>
        var canvas;
        var canvasContext;

        var ballX = 50;
        var ballY = 50;
        var ballSpeedX = 10;
        var ballSpeedY = 4;

        var paddle1Y = 250;
        var paddle2Y = 250;
        const PADDLE_HEIGHT = 100;
        const PADDLE_WIDTH = 10;

        function calculateMousePos(evt) {
            var rect = canvas.getBoundingClientRect();
            var root = document.documentElement;
            var mouseX = evt.clientX - rect.left - root.scrollLeft;
            var mouseY = evt.clientY - rect.top - root.scrollTop;
            return {
                x:mouseX,y:mouseY
            };
        }

        window.onload = function() {
            canvas = document.getElementById('gameCanvas');
            canvasContext = canvas.getContext('2d');

            var framesPerSecond = 60;
            setInterval( function(){
                moveEverything();
                drawEverything();
            },1150/framesPerSecond );

            canvas.addEventListener('mousemove',function(evt){
                    var mousePos = calculateMousePos(evt);
                    paddle2Y = mousePos.y-(PADDLE_HEIGHT/2);
                })
        }

        function ballReset() {
            ballX = canvas.width/2;
            ballY = canvas.height/2;
        }

        function moveEverything() {
            ballX = ballX + ballSpeedX;
            ballY = ballY + ballSpeedY;

            }
            if(ballX < 0) {
                if(ballY > paddle1Y &&
                   ballY < paddle1Y+PADDLE_HEIGHT){
                ballSpeedX = -ballSpeedX;
                   } else {
                        ballReset()
                   }
            }

                   if(ballX > canvas.width) {
                       if (ballY > paddle2Y &&
                           ballY < paddle2Y + PADDLE_HEIGHT) {
                ballSpeedX = -ballSpeedX; 
                    } else {
                        ballReset();
                           }
            }

            if(ballY < 0) {
                ballSpeedY = -ballSpeedY;
            }

            if(ballY > canvas.height) {
                ballSpeedY = -ballSpeedY;
        }

        function drawEverything() {
            colorRect(0,canvas.width,canvas.height,'black');

            colorRect(0,paddle1Y,PADDLE_WIDTH,PADDLE_HEIGHT,'white');
            colorRect(canvas.width - PADDLE_WIDTH,paddle2Y,'white');

            colorCircle(ballX,ballY,10,'white');
        
        }

        function colorCircle(centerX,centerY,radius,drawColor) {
            canvasContext.fillStyle = drawColor;
            canvasContext.beginPath();
            canvasContext.arc(centerX,Math.PI*2,true);
            canvasContext.fill();
        }

        function colorRect( leftX,topY,width,height,drawColor) {
            canvasContext.fillStyle = drawColor;
            canvasContext.fillRect(leftX,height);
        }
    </script>
</html>

我想分组 ID 并添加一列,显示一个 ID 雇佣的城市总数,以及一个显示地址 ID 的列。当我这样做时,因为地址中有缺失值,R 会自动为每个可能有缺失值的 ID 选择第一行。因此,结果应该如下:

ID      Address        Number of hiring
1                             5
2       Montreal              2 
3                             3
4       Helsinki              4 
1       London                1
2                             3
3       dubai                 5

我正在尝试在 R 中使用 dplyr

解决方法

您可以为每个 Address 选择第一个非空 ID :

library(dplyr)

df %>%
  group_by(ID) %>%
  summarise(Address = Address[Address != ''][1],total_hiring = sum(Number_of_hiring,na.rm  =TRUE))

#     ID Address  total_hiring
#  <int> <chr>           <int>
#1     1 London              6
#2     2 Montreal            5
#3     3 Dubai               8
#4     4 Helsinki            4

数据

df <- structure(list(ID = c(1L,2L,3L,4L,1L,3L),Address = c("","Montreal","","Helsinki","London","Dubai"),Number_of_hiring = c(5L,5L)),class = "data.frame",row.names = c(NA,-7L))