问题描述
如何匹配多个控件? (即多个控制/供应商,每个案例/摄魂怪或选择匹配率)
我正在使用“FUZZY 扩展”(Data -> Propensity score Matching...
) 的“倾向得分匹配”。
当我在对话框中使用帮助按钮 (?) 时,我得到的地方是(第一个片段):
FUZZY 有几个此对话框不支持的功能,...并且为每种情况匹配多个控件。
...要显示 FUZZY 的语法帮助,请运行以下语法:FUZZY /HELP。
当我运行“FUZZY /HELP”时,没有提到如何确定控件大小或比例(第二个片段)。
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head>
<Meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<title>Propensity score Matching</title>
<style type="text/css">
<!--
H1 {font-weight:bold; color:#006699; font-size:125%; }
H2 {font-weight:bold; color:#006699; font-size:110%; }
TABLE {font-size:100%;}
/* paragraph tags */
.step {text-indent: -1.3em; margin-left:1.3em; margin-top: 0px;}
.menuselection {margin-left:10px}
.bullet {list-style-type: disc;margin-top:12px; margin-left:36px; text-indent:-1em; }
.codeblock {background-color: #ffffe6; display:block; margin-left:5px; padding:5px;}
/* inline tags */
.screen {font-weight:bold; color:#408080} /*** used refer to on-screen text ***/
.name {font-style: italic} /*** used to tag names,such as variable names,file names,and so forth ***/
.runinhead {font-weight: bold}
.superscript {vertical-align:super; font-size:80%}
.subscript {vertical-align:sub; font-size:80%}
-->
</style>
</head>
<body>
<h1>Propensity score Matching</h1>This procedure matches case
records with similar control records contained in a single
dataset. It first runs a logistic regression with the
case/control group variable as the dependent variable. Then it
selects a match for each case from the control group based on the
propensity score from the logistic regression. The score is an
estimate of the probability of membership in the case group.
<p>The procedure produces and activates a new dataset of cases
and matched controls.</p>
<p><span class="runinhead">Group Indicator</span> Select the
variable that defines whether a record is in the case or control
group. A variable value of one indicates a case,and a value of
zero indicates a control.</p>
<p><span class="runinhead">Predictors</span> Select the variables
to be used in the logistic regression to model case/control
membership.</p>
<p><span class="runinhead">Name for Propensity Variable</span>
Enter a name for the variable to hold the score (propensity) from
the logistic regression. This variable will be used for
matching.</p>
<p class="bullet">• The variable name must not already be in
use.</p>
<p><span class="runinhead">Match Tolerance</span> Specify the
tolerance for the score in matching cases and controls. A control
is eligible to match a case if the absolute value of the
difference in the propensity scores is less than or equal to this
value. A value of 0 means exact matches only while a value of 1
means any control would match any case. Smaller values produce
closer matches but may increase the number of unmatched
cases.</p>
<p class="bullet">• The output tables produced by this procedure
show the distribution of the number of eligible matches given the
tolerance and can be useful in choosing a tolerance value.</p>
<p><span class="runinhead">Case ID</span> Specify a case
identifier variable. The Case id value for the matched control
will be stored with each case record.</p>
<p><span class="runinhead">Match ID Variable Name</span> Specify
a name for the variable that will hold the id of the selected
control record. If a match cannot be produced,the variable value
will be system missing.</p>
<p class="bullet">• The variable name must not already be in
use.</p>
<p><span class="runinhead">Output Dataset Name</span> Specify a
name for the dataset to be created. The dataset will hold all the
cases and the controls that were matched with them.</p>
<p class="bullet">• The dataset name must not already be in
use.</p>
<p class="bullet">• The output is a dataset,not a permanent
file. You must save the dataset to create an actual file.</p>
<h2>Options</h2>
<p><span class="runinhead">Variable for Number of Eligible
Cases</span> Optionally,enter a variable name to record the size
of the control pool,i.e.,the number of eligible cases,for each
case record. This is the number of controls that meet the
matching criteria after matches for prevIoUs case records have
been removed,if sampling without replacement.</p>
<p class="bullet">• The variable name must not already be in
use</p>
<p><span class="runinhead">Sampling</span> Control cases can be
drawn with or without replacement.</p>
<p><span class="runinhead">Give priority to exact matches</span>
Check this Box to cause the matching algorithm to try first for
an exact match before trying for a fuzzy match.</p>
<p><span class="runinhead">Maximize execution performance</span>
Check this Box to maximize performance in selecting matches -
maximum speed and minimum memory usage. For large datasets this
Box should be checked. When it is used,there is a slightly
greater chance that a case will not find be matched with a
control.</p>
<p class="bullet">• You cannot specify both exact priority and
minimize memory.</p>
<p><span class="runinhead">Randomize case order when drawing
matches</span> Check this to draw matches for the case records in
random order. Otherwise,the case records are processed in the
order in which they occur in the file. This Could result in fewer
controls to select from later in the file. If the case records
have a systematic order,processing in file order Could introduce
some bias.</p>
<p class="bullet">• This has no effect when maximize execution
performance is checked.</p>
<p><span class="runinhead">Random Number Seed</span> If you want
to be able to reproduce the matches exactly on a future run with
the same data,you can set a seed value for the random numbers
used when picking from a set at random. This seed is not related
to the general SPSS Statistics random number seed.</p>
<h2>Additional Features</h2>
<p>This dialog generates Syntax for the FUZZY extension command
as well as some other code. FUZZY also provides more general ways
to do matching. To display Syntax help for FUZZY,run the
following Syntax:</p>
<p class="codeblock">FUZZY /HELP.</p>
<p>FUZZY has a several features not supported in this dialog,including using separate case and control datasets as input,matching on a set of variables without the intermediate logistic
regression,and matching multiple controls with each case. Custom
functions written in Python can also be used for calculating the
fuzzy distance.</p>
<h2>Requirements</h2>This command requires the Python Essentials
and at least version 20 of IBM SPSS Statistics. For single
dataset usage,Statistics version 21 or a hot fix for version 20
may be required.
<p>It requires at least version 1.3.0 of the FUZZY extension
command. That is newer than the version shipped with Statistics
21.0.0.0 You can download the Python Essentials from the SPSS
Community website at www.ibm.com/developerworks/spssdevcentral.
The latest version of FUZZY can be found in the Extension
Commands collection.</p>
<hr>
<p style="font-size:80%;">
© copyright IBM Corp. 1989,2013</p>
</body></html>
body,td {
background-color: white;
font-size: 14px;
margin: 8px;
}
.Syntax {
border: thin solid blue;
padding: 8px;
-moz-Box-sizing: border-Box;
-webkit-Box-sizing: border-Box;
Box-sizing: border-Box;
background-color: #fef5ca;
color: #0000CD;
font-family: sans-serif,monospace;
}
.Syntax:before {
content: "Syntax:";
}
.example {
border: thin solid blue;
padding: 8px;
-moz-Box-sizing: border-Box;
-webkit-Box-sizing: border-Box;
Box-sizing: border-Box;
color: #0000CD;
background-color: #fef5ca;
font-family: sans-serif,monospace;
}
.example:before {
content: "Example:";
}
.examplenobefore {
border: thin solid blue;
padding: 8px;
-moz-Box-sizing: border-Box;
-webkit-Box-sizing: border-Box;
Box-sizing: border-Box;
color: #0000CD;
background-color: #fef5ca;
font-family: sans-serif,monospace;
}
table {text-align: left;
}
strong {
color:#000080;
color:#0000CD;
}
tt,code,pre {
font-family: sans-serif,monospace;
}
h1 {
font-size:2.0em;
background-image: url(IBMdialogicon.png);
background-repeat: no-repeat;
background-position: left;
padding-left: 24px;
}
h2 {
font-size:1.5em;
color: #0000CD;
padding-left: 8px;
background-color: #fef5ca;
max-width: 220px;
}
h3 {
font-size:1.5em;
}
h4 {
font-size:1.0em;
}
h5 {
font-size:0.9em;
}
h6 {
font-size:0.8em;
}
a:visited {
color: rgb(50%,0%,50%);
}
pre {
margin-top: 0;
border: 1px solid #ccc;
white-space: pre-wrap;
}
pre code {
display: block; padding: 0.0em;
}
code.r,code.cpp {
background-color: #fef5ca;
}
table,td,th {
border: none;
}
blockquote {
color:#666666;
margin:0;
padding-left: 1em;
border-left: 0.5em #EEE solid;
}
hr {
height: 0px;
border-bottom: none;
border-top-width: thin;
border-top-style: dotted;
border-top-color: #999999;
}
@media print {
* {
background: transparent !important;
color: black !important;
filter:none !important;
-ms-filter: none !important;
}
body {
font-size:12pt;
max-width:100%;
}
a,a:visited {
text-decoration: underline;
}
hr {
visibility: hidden;
page-break-before: always;
}
pre,blockquote {
padding-right: 1em;
page-break-inside: avoid;
}
tr,img {
page-break-inside: avoid;
}
img {
max-width: 100% !important;
}
@page :left {
margin: 15mm 20mm 15mm 10mm;
}
@page :right {
margin: 15mm 10mm 15mm 20mm;
}
p,h2,h3 {
orphans: 3; widows: 3;
}
h2,h3 {
page-break-after: avoid;
}
}
<!DOCTYPE html>
<!-- saved from url=(0014)about:internet -->
<html>
<head>
<Meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<Meta http-equiv="x-ua-compatible" content="IE=9" >
<link rel="stylesheet" type="text/css" href="extSyntax.css" />
<title>FUZZY Extension Command</title>
</head>
<body>
<h1>FUZZY Extension Command</h1>
<p>Match cases from two datasets based on one or more keys drawing randomly from matching cases.</p>
<p>Note: the original version of this command was called CASECTRL. FUZZY
is a superset of that functionality.</p>
<div class="Syntax">
<p>FUZZY DEMANDERDS = <em>dataset</em> supplierDS = <em>dataset</em> BY = <em>keys</em><sup>*</sup>
supplierID = <em>variable</em> NEWDEMANDERIDVARS = <em>variables</em><br/>
GROUP = <em>variable</em><br/>
FUZZ = <em>matching tolerances</em><br/>
CUSTOMFUZZ = <em>user-written function for calculating match eligibility</em><br/>
EXACTPRIORITY = TRUE<sup>*</sup> or FALSE<br/>
copYTODEMANDER = <em>supplier variable names</em><br/>
MATCHGROUPVAR = <em>variable</em> (default is “matchgroup”)<br/>
DRAWPOOLSIZE = <em>variable</em><br/>
DEMANDERID = <em>variable</em><br/>
DS3 = <em>dataset</em></p>
<p>/OPTIONS
SAMPLEWITHREPLACEMENT=TRUE or FALSE<sup>**</sup><br/>
MINIMIZEMEMORY = TRUE<sup>**</sup> or FALSE<br/>
SHUFFLE = TRUE or FALSE<sup>**</sup><br/>
SEED = <em>number</em></p>
<p>/OUTFILE LOGFILE=“<em>filespec</em>”“ LOGACCESSMODE=OVERWRITE<sup>**</sup> or APPEND</p>
<p>/HELP.</p>
<p><sup>*</sup> required<br/>
<sup>**</sup> Default</p>
</div>
<p>FUZZY /HELP prints this output and does nothing else.</p>
<p>Example:</p>
<pre class="example"><code>FUZZY DEMANDERDS=demand supplierDS = supply BY=agegroup gender
supplierID = id NEWDEMANDERIDVARS=supplierId.
</code></pre>
<p>Example using a single input dataset (the active dataset):</p>
<pre class="example"><code>FUZZY by=x1 supplierid = id newdemanderidvars=sid group=group
drawpoolsize=drawpool.
</code></pre>
<pre class="example"><code>FUZZY DEMANDERDS=demander supplierDS=supplier
BY=origin cylinder supplierID=id
NEWDEMANDERIDVARS=matchedcaseid
copYTODEMANDER=mpg randomnumber randomstring
DS3=dsextra DEMANDERID=demanderid.
</code></pre>
<p>FUZZY takes two datasets,a demander and a supplier or a single dataset with a
group identification variable. It attempts to find a match for each
demander case from the supplier dataset based on the variables named in BY. If more than one
candidate matches,it picks randomly. No sorting of either dataset is required.</p>
<p>If using a single dataset,the <strong>DEMANDERDS</strong> and <strong>supplierDS</strong> keywords
can be omitted. The active dataset will be used.</p>
<p>By default,a match is defined by identical values for all the <strong>BY</strong> variables. A system-missing
value prevents a case from being matched. Fuzzy matching is also available for numeric variables.
Specify FUZZ=list-of-matching tolerances. There must be one fuzz value for each BY variable,listed in BY-value order.</p>
<p>A tolerance is the maximum difference in either direction that is allowed for a match. Thus,values of 1 and 2 would match if tolerance is 1 or more,and a tolerance of zero means an
exact match on that variable. You must use 0 for any string variable. If <strong>FUZZ</strong> is used,rejection counts for each variable are show in the output.</p>
<p>By default,with fuzzy matching,an exact match is first tried,and then a fuzzy match is tried.
There is no attempt to get the closest fuzzy match,just a match within the tolerance.
<strong>EXACTPRIORITY</strong> = FALSE causes all suppliers within the fuzz range to be considered equally.
EXACTPRIORITY must be FALSE if MINIMIZEMEMORY is TRUE.</p>
<p>Using EXACTPRIORITY may introduce a subtle artifact that may need to be considered in
subsequent analysis. While it will generally produce closer matches,cases with variable values
where candidates are scarce will tend to get matches that differ more than where candidates
are abundant.</p>
<p><strong>CUSTOMFUZZ</strong> can be used to substitute a user-written calculation for the
built-in fuzzy calculation. It should specify a Python module name and function
as a quoted string,e.g. "mymodule.fuzzycalc”. The function should return</p>
<ul>
<li>0 - no match</li>
<li>1- fuzzy match.</li>
<li>It coudl also return 2 (exact match).</li>
</ul>
<p>If the case comparison produced an exact match,this function is not called.
The function signature should be
<code>functionname(demander,supplier)</code><br/>
where demander and supplier are lists of the BY variable values for the
demander and supplier cases.
Note that the function needs to deal with missing values (None) and may
need to handle both string and numeric comparisons.
The FUZZ subcommand is ignored if a customfuzz function is used.</p>
<p>The <strong>DEMANDERDS</strong> and <strong>supplierDS</strong> values can be the same,indicating that all the data
is in the same dataset. In this case,<strong>GROUP</strong> must be used to distinguish the cases.
<strong>GROUP</strong> names a variable that indicates which are demander cases and which supplier ones.
A value of 1 indicates demander and 0 indicates supplier. Any other values,including missing,cause the case to be ignored.</p>
<p>This procedure builds some possibly large tables in memory,so it may not be appropriate for very
large datasets.</p>
<p>There are several output options.
The <strong>ID</strong> or IDs of matching cases are appended to the demander dataset variables. The number of
variables listed as <strong>NEWDEMANDERIDVARS</strong> determines how many matches are attempted. These variables
must not already exist in the demander dataset.</p>
<p>The variables in the supplier dataset that are listed in <strong>copYTODEMANDER</strong> are copied to the
demander dataset as new variables or replacement values. For existing variables,the types
must agree. If no match is found,existing demander dataset values are not changed and new
variable values will be sysmis or blank.</p>
<p>copYTODEMANDER cannot be used with GROUP.</p>
<p>Only one NEWDEMANDERIDVARS may be specified if copYTODEMANDER is used.<br/>
None of the Metadata such as variable and value labels is copied. Use APPLY DICTIONARY
to bring over variable properties.</p>
<p>If <strong>DS3</strong> is specified,a new dataset is created containing the cases in the supplier dataset actually
used for the matches. It will be the active dataset after the command is run.<br/>
DS3 cannot be used with GROUP.
(This implies that any unnamed dataset will be closed.)
It contains all the variables froom the supplier dataset plus the <strong>MATCHGROUPVAR</strong>.
If DEMANDERID is specified,it also contains the ID variable from the demander dataset. These
variable names must all be unique.
DS3 is only a dataset: use the SAVE command to turn it into a file.</p>
<p><strong>DRAWPOOLSIZE</strong> can name a variable for the demander dataset that will record the
number of cases in the supplier dataset that are eligible to match the demander case.
This can be useful in identifying the types of cases in terms of BY variables where
the supplier pool is thin. The variable must not already exist in the demander dataset.</p>
<h2>OPTIONS</h2>
<p>By default,sampling from the supplier dataset is done without replacement. Specify
<strong>SAMPLEWITHREPLACEMENT</strong>=TRUE to sample with replacement.</p>
<p>By default,memory usage is minimized in picking supplier dataset candidate match cases
(all eligible cases have an equal probability of selection). This requires an extra data pass. If
the possible number of matching cases for a demander case is small or the supplier dataset is
not large,specifying
<strong>MINIMIZEMEMORY</strong>=FALSE may improve performance by eliminating the extra data pass. In the
case of 1-1 matching,this is recommended.
MINIMIZEMEMORY=TRUE cannot be combined with EXACTPRIORITY=TRUE.</p>
<p>When MINIMIZEMEMORY=TRUE,each supplier case is eligible for assignment at random
to just one demander case for which it is within tolerance. When FALSE,the case is
eligible for all cases within the tolerance,and one is picked at random from all of those.
This means that when TRUE,a usually small number of demander cases may go unmatched
when a match Could have been found,but when FALSE,the matching tables can become
very large,especially in the PSM case.</p>
<p>By default,cases in the demander dataset are processed in case order. If there are insufficient
supplier cases,you may specify <strong>SHUFFLE</strong>=TRUE to process the demander cases in random order.
This ensures that earlier cases do not have an advantage over later ones in matching.<br/>
SHUFFLE increases the memory requirement and will take longer to execute.</p>
<p>Use <strong>SEED</strong>=number to set the random number generator to a kNown state for repeatability.</p>
<h2>OUTFILE</h2>
<p>Matching can be quite time consuming. Progress can be logged to a file during the run.
Specify a file/path as <strong>LOGFILE</strong>=“filespec” to record progress.
If <strong>LOGACCESSMODE</strong> is OVERWRITE,each run overwrites an existing file. APPEND
appends to the log file.
The contents record the state the process has reached and writes the number of operations
completed at 1000 or 5000 case intervals for some operations.</p>
<p>© copyright IBM Corp. 1989,2014</p>
</body>
</html>
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)