问题描述
我正在使用 urllib.request.urlopen 查询URL http://dblp.org/db/conf/lak/index。由于某些原因,我无法使用python模块 urllib 访问该站点,因为我收到以下HTTP状态代码错误:
HTTPError:HTTP错误406:不可接受
这是我用来发出此请求的代码:
public static byte[] decrypt(byte[] cryptoBytes,byte[] aesSymKey)
throws NoSuchAlgorithmException,NoSuchPaddingException,InvalidKeyException,InvalidAlgorithmParameterException,IllegalBlockSizeException,BadPaddingException {
// https://github.com/onelogin/java-saml/issues/23
String cipherMethod = "AES/CBC/ISO10126Padding"; // This should be derived from Cryptic Saml
AlgorithmParameterSpec iv = new IvParameterSpec(cryptoBytes,16);
// Strip off the the first 16 bytes because those are the IV
byte[] cipherBlock = Arrays.copyOfRange(cryptoBytes,16,cryptoBytes.length);
// Create a secret key based on symKey
SecretKeySpec secretSauce = new SecretKeySpec(aesSymKey,"AES");
// Now we have all the ingredients to decrypt
Cipher cipher = Cipher.getInstance(cipherMethod);
cipher.init(Cipher.DECRYPT_MODE,secretSauce,iv);
// Do the decryption
byte[] decrypedBytes = cipher.doFinal(cipherBlock);
return decrypedBytes;
}
以下是与此错误相关的堆栈跟踪:
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'http://dblp.org/db'
html = urlopen(url).read()
soup = BeautifulSoup(html)
print(soup.prettify())
解决方法
我正在调查 406错误代码,当服务器无法使用请求中指定的accept-header响应时,就会发生这种错误。如果我可以让 urlopen 正常工作,我也将发布该答案。
使用 Python请求
时没有出现此错误manage_pages
下面的答案使用 urlopen ,它不会产生406错误。
import requests
from bs4 import BeautifulSoup
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
raw_html = requests.get('http://dblp.org/db/conf/lak/index')
soup = BeautifulSoup(raw_html.content,'html.parser')
print(soup.prettify())