有没有一种简单的方法可以从本地宏中提取第 N 个单词，该宏在 Stata 中以逗号或空格+逗号分隔？输出编辑 01)这种结构levelsof + inlist似乎是必要的示例

问题描述

给定一个包含一串级别的本地宏，这些级别由逗号（“，”）或逗号和空格 (",") 甚至只有空格 (" ")，是否有一种简单的方法可以提取第一个 N 这个本地宏的级别（或单词）？

字符串看起来像 "12,123,1321,41"、"12,41" 或 "12 123 1321 41"。

基本上我会对 宏函数 word # of string 的一个版本感到满意或多或少会像 word 1/N of string 一样工作。（参见“用于解析的宏函数” pg 12 in Macro definition and manipulation)

有关更多上下文，我正在处理 levelsof,local() sep() 的输出。所以我可以选择更容易使用的分隔符。我想要将结果级别作为参数传递给 inlist() 函数。以下通常有效，但 inlist() 最多只需要 250 个参数。这就是为什么我会喜欢从 levelsof()

的结果中提取 250 个单词的块

sysuse auto,clear
levelsof mpg if trunk > 20,local(levels) sep(",")
list if inlist(mpg,`levels')

到目前为止的“解决方案”

我想出了一个不简单的方法来实现这一点，但它看起来并不好，而且我想知道是否有一种简单的内置方法可以做到这一点。

sysuse auto,clear

levelsof mpg if trunk > 20,")
scalar number_of_words = 3
forvalues i = 1 (1) `=number_of_words' {
        local word_i = `i'
        local this_level : word `word_i' of `levels'
        local list_of_levels = "`list_of_levels'`this_level'" 
        
        di as text "loop: `i'"
        di as text "this level: `this_level'"
        di as text "list of levels so far: `list_of_levels'"
    }

di "`list_of_levels'"

// trim trailing comma
local trimmed_list_of_levels = substr( "`list_of_levels'",1,strlen( "`list_of_levels'" )-1) 

di "`trimmed_list_of_levels'"
list make mpg price trunk if inlist(mpg,`trimmed_list_of_levels')

输出

. sysuse auto,clear
(1978 Automobile Data)

. 
. levelsof mpg if trunk > 20,")
12,15,17,18

. scalar number_of_words = 3

. forvalues i = 1 (1) `=number_of_words' {
  2.         local word_i = `i'
  3.         local this_level : word `word_i' of `levels'
  4.         local list_of_levels = "`list_of_levels'`this_level'" 
  5.         
.         di as text "loop: `i'"
  6.         di as text "this level: `this_level'"
  7.         di as text "list of levels so far: `list_of_levels'"
  8.     }
loop: 1
this level: 12,list of levels so far: 12,loop: 2
this level: 15,loop: 3
this level: 17,. 
. di "`list_of_levels'"
12,. 
. // trim trailing comma
. local trimmed_list_of_levels = substr( "`list_of_levels'",strlen( "`list_of_levels'" )-1) 

. 
. di "`trimmed_list_of_levels'"
12,17

. list make mpg price trunk if inlist(mpg,`trimmed_list_of_levels')

     +------------------------------------------+
     | make                mpg    price   trunk |
     |------------------------------------------|
  2. | AMC Pacer            17    4,749      11 |
  5. | Buick Electra        15    7,827      20 |
 23. | Dodge St. Regis      17    6,342      21 |
 26. | Linc. Continental    12   11,497      22 |
 27. | Linc. Mark V         12   13,594      18 |
     |------------------------------------------|
 31. | Merc. Marquis        15    6,165      23 |
 53. | Audi 5000            17    9,690      15 |
 74. | Volvo 260            17   11,995      14 |
     +------------------------------------------+

与评论相关的编辑。

编辑 01)

例如，以下内容不起作用。它返回错误 130 expression too long。

clear 

set obs 1000
gen id = _n 
gen x1 = rnormal()

sum * 
levelsof id if x1>0,")
sum * if inlist(id,`levels')

这种结构（levelsof + inlist）似乎是必要的示例

clear 

set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()

sum * 
levelsof id if x1>2,")
sum * if x1>2 // if threshold is small enough,there will be too many values for inlist()
sum * if inlist(id,`levels')

解决方法

以您的附加示例为基础，您可以使用 egen max 为整个 id 创建一个标志为 1，该标志具有任何情况，其中 x1值高于某个阈值。例如：

clear 
set seed 2021
set obs 5000
gen id = round(_n/5)
gen x1 = rnormal()

sum * 
levelsof id if x1>2,local(levels) sep(",")
sum * if x1>2 // if threshold is small enough,there will be too many values for inlist()
sum * if inlist(id,`levels')

//This will do the same thing
gen over_threshold = x1>2 
egen id_over_thresh = max(over_threshold),by(id)

sum * if id_over_thresh

local-variables stata stata-macros string

有没有一种简单的方法可以从本地宏中提取第 N 个单词，该宏在 Stata 中以逗号或空格+逗号分隔？ 输出编辑 01)这种结构levelsof + inlist似乎是必要的示例