SPSS Propensity Score 匹配多个控件

问题描述

如何匹配多个控件? (即多个控制/供应商,每个案例/摄魂怪或选择匹配率)

我正在使用“FUZZY 扩展”(Data -> Propensity score Matching...) 的“倾向得分匹配”。 当我在对话框中使用帮助按钮 (?) 时,我得到的地方是(第一个片段):

FUZZY 有几个此对话框不支持功能,...并且为每种情况匹配多个控件。

...要显示 FUZZY 的语法帮助,请运行以下语法:FUZZY /HELP。

当我运行“FUZZY /HELP”时,没有提到如何确定控件大小或比例(第二个片段)。

对不起,这些片段,帮助页面在本地文件中...

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head>
  <Meta http-equiv="Content-Type" content="text/html; charset=utf-8">

  <title>Propensity score Matching</title>
  <style type="text/css">

  <!-- 

  H1 {font-weight:bold; color:#006699; font-size:125%; }
  H2 {font-weight:bold; color:#006699; font-size:110%; }
  TABLE {font-size:100%;}

  /* paragraph tags */
  .step {text-indent: -1.3em; margin-left:1.3em; margin-top: 0px;}
  .menuselection {margin-left:10px}
  .bullet {list-style-type: disc;margin-top:12px; margin-left:36px; text-indent:-1em; }
  .codeblock {background-color: #ffffe6; display:block; margin-left:5px; padding:5px;}

  /* inline tags */
  .screen {font-weight:bold; color:#408080}                       /*** used refer to on-screen text ***/
  .name {font-style: italic}                                                       /*** used to tag names,such as variable names,file names,and so forth ***/
  .runinhead {font-weight: bold} 
  .superscript {vertical-align:super; font-size:80%}
  .subscript {vertical-align:sub; font-size:80%}


  --> 
  </style>
</head>

<body>
  <h1>Propensity score Matching</h1>This procedure matches case
  records with similar control records contained in a single
  dataset. It first runs a logistic regression with the
  case/control group variable as the dependent variable. Then it
  selects a match for each case from the control group based on the
  propensity score from the logistic regression. The score is an
  estimate of the probability of membership in the case group.

  <p>The procedure produces and activates a new dataset of cases
  and matched controls.</p>

  <p><span class="runinhead">Group Indicator</span> Select the
  variable that defines whether a record is in the case or control
  group. A variable value of one indicates a case,and a value of
  zero indicates a control.</p>

  <p><span class="runinhead">Predictors</span> Select the variables
  to be used in the logistic regression to model case/control
  membership.</p>

  <p><span class="runinhead">Name for Propensity Variable</span>
  Enter a name for the variable to hold the score (propensity) from
  the logistic regression. This variable will be used for
  matching.</p>

  <p class="bullet">• The variable name must not already be in
  use.</p>

  <p><span class="runinhead">Match Tolerance</span> Specify the
  tolerance for the score in matching cases and controls. A control
  is eligible to match a case if the absolute value of the
  difference in the propensity scores is less than or equal to this
  value. A value of 0 means exact matches only while a value of 1
  means any control would match any case. Smaller values produce
  closer matches but may increase the number of unmatched
  cases.</p>

  <p class="bullet">• The output tables produced by this procedure
  show the distribution of the number of eligible matches given the
  tolerance and can be useful in choosing a tolerance value.</p>

  <p><span class="runinhead">Case ID</span> Specify a case
  identifier variable. The Case id value for the matched control
  will be stored with each case record.</p>

  <p><span class="runinhead">Match ID Variable Name</span> Specify
  a name for the variable that will hold the id of the selected
  control record. If a match cannot be produced,the variable value
  will be system missing.</p>

  <p class="bullet">• The variable name must not already be in
  use.</p>

  <p><span class="runinhead">Output Dataset Name</span> Specify a
  name for the dataset to be created. The dataset will hold all the
  cases and the controls that were matched with them.</p>

  <p class="bullet">• The dataset name must not already be in
  use.</p>

  <p class="bullet">• The output is a dataset,not a permanent
  file. You must save the dataset to create an actual file.</p>

  <h2>Options</h2>

  <p><span class="runinhead">Variable for Number of Eligible
  Cases</span> Optionally,enter a variable name to record the size
  of the control pool,i.e.,the number of eligible cases,for each
  case record. This is the number of controls that meet the
  matching criteria after matches for prevIoUs case records have
  been removed,if sampling without replacement.</p>

  <p class="bullet">• The variable name must not already be in
  use</p>

  <p><span class="runinhead">Sampling</span> Control cases can be
  drawn with or without replacement.</p>

  <p><span class="runinhead">Give priority to exact matches</span>
  Check this Box to cause the matching algorithm to try first for
  an exact match before trying for a fuzzy match.</p>

  <p><span class="runinhead">Maximize execution performance</span>
  Check this Box to maximize performance in selecting matches -
  maximum speed and minimum memory usage. For large datasets this
  Box should be checked. When it is used,there is a slightly
  greater chance that a case will not find be matched with a
  control.</p>

  <p class="bullet">• You cannot specify both exact priority and
  minimize memory.</p>

  <p><span class="runinhead">Randomize case order when drawing
  matches</span> Check this to draw matches for the case records in
  random order. Otherwise,the case records are processed in the
  order in which they occur in the file. This Could result in fewer
  controls to select from later in the file. If the case records
  have a systematic order,processing in file order Could introduce
  some bias.</p>

  <p class="bullet">• This has no effect when maximize execution
  performance is checked.</p>

  <p><span class="runinhead">Random Number Seed</span> If you want
  to be able to reproduce the matches exactly on a future run with
  the same data,you can set a seed value for the random numbers
  used when picking from a set at random. This seed is not related
  to the general SPSS Statistics random number seed.</p>

  <h2>Additional Features</h2>

  <p>This dialog generates Syntax for the FUZZY extension command
  as well as some other code. FUZZY also provides more general ways
  to do matching. To display Syntax help for FUZZY,run the
  following Syntax:</p>

  <p class="codeblock">FUZZY /HELP.</p>

  <p>FUZZY has a several features not supported in this dialog,including using separate case and control datasets as input,matching on a set of variables without the intermediate logistic
  regression,and matching multiple controls with each case. Custom
  functions written in Python can also be used for calculating the
  fuzzy distance.</p>

  <h2>Requirements</h2>This command requires the Python Essentials
  and at least version 20 of IBM SPSS Statistics. For single
  dataset usage,Statistics version 21 or a hot fix for version 20
  may be required.

  <p>It requires at least version 1.3.0 of the FUZZY extension
  command. That is newer than the version shipped with Statistics
  21.0.0.0 You can download the Python Essentials from the SPSS
  Community website at www.ibm.com/developerworks/spssdevcentral.
  The latest version of FUZZY can be found in the Extension
  Commands collection.</p>
  <hr>

  <p style="font-size:80%;">
  © copyright IBM Corp. 1989,2013</p>


</body></html>

body,td {
   background-color: white;
   font-size: 14px;
   margin: 8px;
}

.Syntax {
    border: thin solid blue;
    padding: 8px;
    -moz-Box-sizing: border-Box;
    -webkit-Box-sizing: border-Box;
    Box-sizing: border-Box;
    background-color: #fef5ca;
    color: #0000CD;
    font-family: sans-serif,monospace;
}
.Syntax:before {
    content: "Syntax:";
}

.example {
    border: thin solid blue;
    padding: 8px;
    -moz-Box-sizing: border-Box;
    -webkit-Box-sizing: border-Box;
    Box-sizing: border-Box;
    color: #0000CD;
    background-color: #fef5ca;
    font-family: sans-serif,monospace;
}
.example:before {
    content: "Example:";
}
.examplenobefore {
    border: thin solid blue;
    padding: 8px;
    -moz-Box-sizing: border-Box;
    -webkit-Box-sizing: border-Box;
    Box-sizing: border-Box;
    color: #0000CD;
    background-color: #fef5ca;
    font-family: sans-serif,monospace;
}
table {text-align: left;
}
strong {
    color:#000080;
    color:#0000CD;
}
tt,code,pre {
    font-family: sans-serif,monospace;
}

h1 { 
   font-size:2.0em;
    background-image: url(IBMdialogicon.png);
    background-repeat: no-repeat;
    background-position: left;
    padding-left: 24px;
}

h2 { 
   font-size:1.5em;
   color: #0000CD;
   padding-left: 8px;
   background-color: #fef5ca;
   max-width: 220px;
}

h3 { 
   font-size:1.5em; 
}

h4 { 
   font-size:1.0em; 
}

h5 { 
   font-size:0.9em; 
}

h6 { 
   font-size:0.8em; 
}

a:visited {
   color: rgb(50%,0%,50%);
}

pre {   
   margin-top: 0;
   border: 1px solid #ccc;
   white-space: pre-wrap;
}

pre code {
   display: block; padding: 0.0em;
}

code.r,code.cpp {
   background-color: #fef5ca;
}

table,td,th {
  border: none;
}

blockquote {
   color:#666666;
   margin:0;
   padding-left: 1em;
   border-left: 0.5em #EEE solid;
}

hr {
   height: 0px;
   border-bottom: none;
   border-top-width: thin;
   border-top-style: dotted;
   border-top-color: #999999;
}

@media print {
   * { 
      background: transparent !important; 
      color: black !important; 
      filter:none !important; 
      -ms-filter: none !important; 
   }

   body { 
      font-size:12pt; 
      max-width:100%; 
   }
       
   a,a:visited { 
      text-decoration: underline; 
   }

   hr { 
      visibility: hidden;
      page-break-before: always;
   }

   pre,blockquote { 
      padding-right: 1em; 
      page-break-inside: avoid; 
   }

   tr,img { 
      page-break-inside: avoid; 
   }

   img { 
      max-width: 100% !important; 
   }

   @page :left { 
      margin: 15mm 20mm 15mm 10mm; 
   }
     
   @page :right { 
      margin: 15mm 10mm 15mm 20mm; 
   }

   p,h2,h3 { 
      orphans: 3; widows: 3; 
   }

   h2,h3 { 
      page-break-after: avoid; 
   }
}
<!DOCTYPE html>
<!-- saved from url=(0014)about:internet -->
<html>
<head>
<Meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
<Meta http-equiv="x-ua-compatible" content="IE=9" >
<link rel="stylesheet" type="text/css" href="extSyntax.css" />
<title>FUZZY Extension Command</title>


</head>

<body>
<h1>FUZZY Extension Command</h1>

<p>Match cases from two datasets based on one or more keys drawing randomly from matching cases.</p>

<p>Note: the original version of this command was called CASECTRL.  FUZZY
is a superset of that functionality.</p>
<div class="Syntax">
<p>FUZZY DEMANDERDS = <em>dataset</em> supplierDS = <em>dataset</em> BY = <em>keys</em><sup>&#42;</sup>
supplierID = <em>variable</em> NEWDEMANDERIDVARS = <em>variables</em><br/>
GROUP = <em>variable</em><br/>
FUZZ = <em>matching tolerances</em><br/>
CUSTOMFUZZ = <em>user-written function for calculating match eligibility</em><br/>
EXACTPRIORITY = TRUE<sup>&#42;</sup> or FALSE<br/>
copYTODEMANDER = <em>supplier variable names</em><br/>
MATCHGROUPVAR = <em>variable</em> (default is &ldquo;matchgroup&rdquo;)<br/>
DRAWPOOLSIZE = <em>variable</em><br/>
DEMANDERID = <em>variable</em><br/>
DS3 = <em>dataset</em></p>

<p>/OPTIONS
SAMPLEWITHREPLACEMENT=TRUE or FALSE<sup>&#42;&#42;</sup><br/>
MINIMIZEMEMORY = TRUE<sup>&#42;&#42;</sup> or FALSE<br/>
SHUFFLE = TRUE or FALSE<sup>&#42;&#42;</sup><br/>
SEED = <em>number</em></p>

<p>/OUTFILE LOGFILE=&ldquo;<em>filespec</em>&rdquo;&ldquo; LOGACCESSMODE=OVERWRITE<sup>&#42;&#42;</sup> or APPEND</p>

<p>/HELP.</p>

<p><sup>&#42;</sup> required<br/>
<sup>&#42;&#42;</sup> Default</p>
</div>
<p>FUZZY /HELP  prints this output and does nothing else.</p>

<p>Example:</p>

<pre class="example"><code>FUZZY DEMANDERDS=demand supplierDS = supply BY=agegroup gender
supplierID = id NEWDEMANDERIDVARS=supplierId.
</code></pre>

<p>Example using a single input dataset (the active dataset):</p>

<pre class="example"><code>FUZZY by=x1 supplierid = id newdemanderidvars=sid group=group
drawpoolsize=drawpool.
</code></pre>


<pre class="example"><code>FUZZY DEMANDERDS=demander supplierDS=supplier
BY=origin cylinder supplierID=id
NEWDEMANDERIDVARS=matchedcaseid
copYTODEMANDER=mpg randomnumber randomstring
DS3=dsextra DEMANDERID=demanderid.
</code></pre>

<p>FUZZY takes two datasets,a demander and a supplier or a single dataset with a 
group identification variable.  It attempts to find a match for each
demander case from the supplier dataset based on the variables named in BY.  If more than one 
candidate matches,it picks randomly.  No sorting of either dataset is required.</p>

<p>If using a single dataset,the <strong>DEMANDERDS</strong> and <strong>supplierDS</strong> keywords
can be omitted.  The active dataset will be used.</p>

<p>By default,a match is defined by identical values for all the <strong>BY</strong> variables.  A system-missing
value prevents a case from being matched.  Fuzzy matching is also available for numeric variables.
Specify FUZZ=list-of-matching tolerances.  There must be one fuzz value for each BY variable,listed in BY-value order.</p>

<p>A tolerance is the maximum difference in either direction that is allowed for a match.  Thus,values of 1 and 2 would match if tolerance is 1 or more,and a tolerance of zero means an
exact match on that variable.  You must use 0 for any string variable.  If <strong>FUZZ</strong> is used,rejection counts for each variable are show in the output.</p>

<p>By default,with fuzzy matching,an exact match is first tried,and then a fuzzy match is tried.
There is no attempt to get the closest fuzzy match,just a match within the tolerance.
<strong>EXACTPRIORITY</strong> = FALSE causes all suppliers within the fuzz range to be considered equally.
EXACTPRIORITY must be FALSE if MINIMIZEMEMORY is TRUE.</p>

<p>Using EXACTPRIORITY may introduce a subtle artifact that may need to be considered in
subsequent analysis.  While it will generally produce closer matches,cases with variable values 
where candidates are scarce will tend to get matches that differ more than where candidates
are abundant.</p>

<p><strong>CUSTOMFUZZ</strong> can be used to substitute a user-written calculation for the
built-in fuzzy calculation.  It should specify a Python module name and function
as a quoted string,e.g. &quot;mymodule.fuzzycalc&rdquo;.  The function should return</p>

<ul>
<li>0 - no match</li>
<li>1- fuzzy match.</li>
<li>It coudl also return 2 (exact match).</li>
</ul>

<p>If the case comparison produced an exact match,this function is not called.
The function signature should be
<code>functionname(demander,supplier)</code><br/>
where demander and supplier are lists of the BY variable values for the
demander and supplier cases.
Note that the function needs to deal with missing values (None) and may
need to handle both string and numeric comparisons.
The FUZZ subcommand is ignored if a customfuzz function is used.</p>

<p>The <strong>DEMANDERDS</strong> and <strong>supplierDS</strong> values can be the same,indicating that all the data
is in the same dataset.  In this case,<strong>GROUP</strong> must be used to distinguish the cases.
<strong>GROUP</strong> names a variable that indicates which are demander cases and which supplier ones.
A value of 1 indicates demander and 0 indicates supplier.  Any other values,including missing,cause the case to be ignored.</p>

<p>This procedure builds some possibly large tables in memory,so it may not be appropriate for very
large datasets.</p>

<p>There are several output options.
The <strong>ID</strong> or IDs of matching cases are appended to the demander dataset variables.  The number of
variables listed as <strong>NEWDEMANDERIDVARS</strong> determines how many matches are attempted.  These variables
must not already exist in the demander dataset.</p>

<p>The variables in the supplier dataset that are listed in <strong>copYTODEMANDER</strong> are copied to the 
demander dataset as new variables or replacement values.  For existing variables,the types 
must agree.  If no match is found,existing demander dataset values are not changed and new
variable values will be sysmis or blank.</p>

<p>copYTODEMANDER cannot be used with GROUP.</p>

<p>Only one NEWDEMANDERIDVARS may be specified if copYTODEMANDER is used.<br/>
None of the Metadata such as variable and value labels is copied.  Use APPLY DICTIONARY 
to bring over variable properties.</p>

<p>If <strong>DS3</strong> is specified,a new dataset is created containing the cases in the supplier dataset actually 
used for the matches.  It will be the active dataset after the command is run.<br/>
DS3 cannot be used with GROUP.
(This implies that any unnamed dataset will be closed.)
It contains all the variables froom the supplier dataset plus the <strong>MATCHGROUPVAR</strong>.
If DEMANDERID is specified,it also contains the ID variable from the demander dataset.  These 
variable names must all be unique.
DS3 is only a dataset: use the SAVE command to turn it into a file.</p>

<p><strong>DRAWPOOLSIZE</strong> can name a variable for the demander dataset that will record the
number of cases in the supplier dataset that are eligible to match the demander case.
This can be useful in identifying the types of cases in terms of BY variables where
the supplier pool is thin.  The variable must not already exist in the demander dataset.</p>

<h2>OPTIONS</h2>

<p>By default,sampling from the supplier dataset is done without replacement.  Specify
<strong>SAMPLEWITHREPLACEMENT</strong>=TRUE to sample with replacement.</p>

<p>By default,memory usage is minimized in picking supplier dataset candidate match cases
(all eligible cases have an equal probability of selection).  This requires an extra data pass.  If
the possible number of matching cases for a demander case is small or the supplier dataset is
not large,specifying
<strong>MINIMIZEMEMORY</strong>=FALSE may improve performance by eliminating the extra data pass.  In the
case of 1-1 matching,this is recommended.
MINIMIZEMEMORY=TRUE cannot be combined with EXACTPRIORITY=TRUE.</p>

<p>When MINIMIZEMEMORY=TRUE,each supplier case is eligible for assignment at random 
to just one demander case for which it is within tolerance.  When FALSE,the case is 
eligible for all cases within the tolerance,and one is picked at random from all of those.
This means that when TRUE,a usually small number of demander cases may go unmatched
when a match Could have been found,but when FALSE,the matching tables can become
very large,especially in the PSM case.</p>

<p>By default,cases in the demander dataset are processed in case order.  If there are insufficient 
supplier cases,you may specify <strong>SHUFFLE</strong>=TRUE to process the demander cases in random order.
This ensures that earlier cases do not have an advantage over later ones in matching.<br/>
SHUFFLE increases the memory requirement and will take longer to execute.</p>

<p>Use <strong>SEED</strong>=number to set the random number generator to a kNown state for repeatability.</p>

<h2>OUTFILE</h2>

<p>Matching can be quite time consuming.  Progress can be logged to a file during the run.
Specify a file/path as <strong>LOGFILE</strong>=&ldquo;filespec&rdquo; to record progress.
If <strong>LOGACCESSMODE</strong> is OVERWRITE,each run overwrites an existing file.  APPEND
appends to the log file.
The contents record the state the process has reached and writes the number of operations
completed at 1000 or 5000 case intervals for some operations.</p>

<p>&copy; copyright IBM Corp. 1989,2014</p>

</body>

</html>

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)