使用 PowerShell 下载网页上的所有 pdf

问题描述

我在网上找到了以下代码来下载网页上的所有pdf:

$psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/
"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href

$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

但 PS 给了我错误

Invoke-WebRequest : The response content cannot be parsed because the Internet Explorer engine is not
available,or Internet Explorer's first-launch configuration is not complete. Specify the UseBasicParsing
parameter and try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:1 char:11
+ $psPage = Invoke-WebRequest "https://www.pi.infn.it/~rizzo/ingegneria ...
+           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : NotImplemented: (:) [Invoke-WebRequest],NotSupportedException
    + FullyQualifiedErrorId : WebCmdletIEDomNotSupportedException,Microsoft.PowerShell.Commands.InvokeWebReq
   uestCommand

You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [],RuntimeException
    + FullyQualifiedErrorId : InvokeMethodonNull

Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+                                                        ~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Split-Path],ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterargumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.S
   plitPathCommand

Invoke-WebRequest : Cannot validate argument on parameter 'Uri'. The argument is null or empty. Provide an
argument that is not null or empty,and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:4 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+                                                ~~
    + CategoryInfo          : InvalidData: (:) [Invoke-WebRequest],ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterargumentValidationError,Microsoft.PowerShell.Commands.InvokeWebReques
   tCommand

我怎样才能克服这个问题?为什么它需要 Internet Explorer 引擎?


编辑:我尝试以这种方式修改代码

$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest -Uri $site -UseBasicParsing
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls |  where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }

错误是:

You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\a.ps1:3 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A")
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [],RuntimeException
    + FullyQualifiedErrorId : InvokeMethodonNull

EDIT 2 我试图以这种方式修改代码。新代码

$psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href
$urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}

Windows PowerShell 给了我一个错误

Invoke-WebRequest : Missing an argument for parameter 'Uri'. Specify a parameter of type 'System.Uri' and
try again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:1 char:29
+ $psPage = Invoke-WebRequest -Uri -UseBasicParsing "https://www.pi.inf ...
+                             ~~~~
    + CategoryInfo          : InvalidArgument: (:) [Invoke-WebRequest],ParameterBindingException
    + FullyQualifiedErrorId : MissingArgument,Microsoft.PowerShell.Commands.InvokeWebRequestCommand

You cannot call a method on a null-valued expression.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:2 char:1
+ $urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -li ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [],RuntimeException
    + FullyQualifiedErrorId : InvokeMethodonNull

Split-Path : Cannot bind argument to parameter 'Path' because it is null.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:66
+ ... h-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Split-Path -Leaf)}
+                                                        ~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Split-Path],and then try the command again.
At C:\Users\Raffaele\Desktop\Nuova cartella\b.ps1:3 char:48
+ $urls | ForEach-Object {Invoke-WebRequest -Uri $_ -OutFile ($_ | Spli ...
+                                                ~~
    + CategoryInfo          : InvalidData: (:) [Invoke-WebRequest],Microsoft.PowerShell.Commands.InvokeWebReques
   tCommand

解决方法

以您的方式:首先,您必须删除 URL 中的“about:”或将其替换为空:

$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A") | ? {$_.href -like "*.pdf"} | Select-Object -ExpandProperty href | ForEach-Object {$_.replace("about:","")}

其次你必须重新创建完整的 URL :

$urls | ForEach-Object {Invoke-WebRequest -Uri "$site$_" -OutFile $_ }

但是您可以使用 "textcontent""pathname" 来简化

$site = "https://www.pi.infn.it/~rizzo/ingegneria/appunti_fisII_ing_mecc/"
$psPage = Invoke-WebRequest $site
$urls = $psPage.ParsedHtml.getElementsByTagName("A")
$urls |  where {$_.pathname -like "*pdf"} | % {Invoke-WebRequest -Uri "$site$($_.pathname)" -OutFile $_.pathname }