如何编码，解码和取消引用相同的URL

问题描述

我正在处理3个不同的URL，其中我处理空格，重音符号和解码URL。在第一种情况下，URL名称中包含空格：

https://url-example-with-space/Table%201.5.b.xls

第二个URL带有重音符号： https://url-example-with-accent-inundaci%C3%B3n_WFL/FeatureServer/16

第三个URL已经具有编码格式： http://url-example-with-encode-format/Personal&Page=Personal%20Link%20a%20Project%20of%20I%2BD&NQUser=smartlink&NQPassword=p7bFn0H6udk

我正在使用unquote转换链接号1，因为如果再次对其进行编码，它将获得双重编码，这意味着它将Table%201.5.b.xls转换为Table%2525201.5.b.xls 重音网址将其转换为_inundaci%2525C3%2525B3n_WFL/而不是保留inundaci%C3%B3n_WFL

时，也会发生相同的情况

我已经设法使我的代码解决了这些问题，并使链接保持其原始格式。但是，使用第三个URL并非如此。它正在转换为Link%20a%20Project%20of%20I%20D&NQUser=，而不是保留Link%20a%20Project%20of%20I%2BD&NQUser（区别是％2BD＆NQUser和％20D＆NQUser“）

我觉得它们之间存在冲突，因为取消引号的％xx用其单字符等效值转义，然后当再次进行编码时，它以某种方式占用了空间。我不知道还有什么办法使它起作用。这是我的代码：

# coding: utf-8
import urllib
import urlparse
import re

netloc_re = re.compile('^(?:([^:]*)[:]([^@]*)@)?([^:]+)(?:[:](\d+))?$')
# durl= https://url-example-with-space/Table%201.5.b.xls
# durl = https://url-example-with-accent-inundaci%C3%B3n_WFL/FeatureServer/16
durl= http://url-example-with-encode-format/Personal&Page=Personal%20Link%20a%20Project%20of%20I%2BD&NQUser=smartlink&NQPassword=p7bFn0H6udk

unquote_url = urllib.unquote((durl).encode('utf-8'))
print('unquote_url value')
print(unquote_url)

unquote = urllib.unquote((unquote_url).decode('utf-8'))
print('unquote value')
print(unquote).encode('utf-8')


parsed_url = urlparse.urlparse(unquote.encode('utf-8').strip())
print('parsed_url value')
print(parsed_url)

netloc_m = netloc_re.match(parsed_url.netloc)
username,password,host,port = (
    urllib.quote(g) if g else g for g in netloc_m.groups())
netloc = ('{}:{}@'.format(username,password) if username and password else '') + \
    host + (':' + port if port else '')

path = urllib.quote(parsed_url.path)

query = [(k,urllib.quote(v)) for k,v in urlparse.parse_qsl(
    parsed_url.query.replace(';',urllib.quote(';')),keep_blank_values=True)]
query = u'&'.join(
    map(lambda qi: qi[0] + (u'=' + qi[1] if qi[1] else u''),query))

final_url = urlparse.urlunparse((parsed_url.scheme,netloc,path,parsed_url.params,query,parsed_url.fragment))
                                
print('-----')
print(final_url)

谢谢！

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

encode encode encoding encoding encoding python python-2.7