如何返回最大尺寸的图像

问题描述

我已经能够过滤出页面中的所有图片网址,并将它们一个一个显示

import requests
from bs4 import BeautifulSoup


article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c"
response = requests.get(article_URL)
soup = bs4.BeautifulSoup(response.text,'html.parser')
images = soup.find('body').find_all('img')
i = 0
image_url = []
for im in images:
    print(im)
    i+=1
    url = im.get('src')
    image_url.append(url)
    print('Downloading: ',url) 
    try:
        response = requests.get(url,stream=True)
        with open(str(i) + '.jpg','wb') as out_file:
            shutil.copyfileobj(response.raw,out_file)
            del response
    except:
        print('Could not download: ',url)

new = [x for x in image_url if x is not None]
for url in new:
    resp = requests.get(url,stream=True).raw
    image = np.asarray(bytearray(resp.read()),dtype="uint8")
    image = cv2.imdecode(image,cv2.IMREAD_COLOR)
#     height,width,channels = image.shape
    height,_ = image.shape
    dimension = []
    for items in height,width:
        dimension.append(items)
#     print(height,width)
    print(dimension)

我想从网址列表中打印尺寸最大的图像

这是我从列表中得到的结果,这还不够好

[72,72]
[95,96]
[13,60]
[227,973]
[17,60]
[229,771]

解决方法

我看到两个问题。

  1. 您可以在循环内创建public partial class Form1 : Form { private ICipherDecipher myCipher; public Form1() { myCipher = new MyCipher(); InitializeComponent(); } private void button1_Click(object sender,EventArgs e) { string textToBeCiphered = textBox1.Text; string textCiphered = myCipher.Cipher(textToBeCiphered); textBox2.Text = textCiphered; } } ,以便删除先前的值。您必须在循环和内部循环使用之前创建dimention = []

    dimention = []

    在循环之后,您可以使用dimension.append( (width,height) ) 与最大max(dimension)配对

  2. 您在width中仅保留width,height,因此您不知道哪个文件有此尺寸。您应该保留所有信息

    dimension

我的版本。

我使用字典dimension.append( (width,height,url,filename) ) 保留所有信息

data

然后我在data.append({ 'url': url,'path': filename,'width': width,'height': height,}) 中使用key来获得最大为max()的物品

width

但我可以使用max(data,key=lambda x:x['width']) x['height']

x['width'] * x['height']

顺便说一句::仅使用import requests from bs4 import BeautifulSoup import shutil import cv2 article_URL = "https://medium.com/bhavaniravi/build-your-1st-python-web-app-with-flask-b039d11f101c" response = requests.get(article_URL) soup = BeautifulSoup(response.text,'html.parser') images = soup.find('body').find_all('img') # --- loop --- data = [] i = 0 for img in images: print('HTML:',img) url = img.get('src') if url: # skip `url` with `None` print('Downloading:',url) try: response = requests.get(url,stream=True) i += 1 url = url.rsplit('?',1)[0] # remove ?opt=20 after filename ext = url.rsplit('.',1)[-1] # .png,.jpg,.jpeg filename = f'{i}.{ext}' print('Filename:',filename) with open(filename,'wb') as out_file: shutil.copyfileobj(response.raw,out_file) image = cv2.imread(filename) height,width = image.shape[:2] data.append({ 'url': url,}) except Exception as ex: print('Could not download: ',url) print('Exception:',ex) print('---') # --- after loop --- print('max:',max(data,key=lambda x:x['width'])) all_sorted = sorted(data,key=lambda x:x['width'],reverse=True) print('Top 3:',all_sorted[:3]) # or for item in all_sorted[:3]: print(item['width'],item['url'])

获取图像
src
,

在创建新数组之后,立即在代码中进行以下更改:

images = []
for url in new:
    resp = requests.get(url,stream=True).raw
    image = np.asarray(bytearray(resp.read()),dtype="uint8")
    image = cv2.imdecode(image,cv2.IMREAD_COLOR)
    images.append((image.shape,image))
# sort images by area (largest to smallest)
images.sort (key = lambda x: x[0][0] * x[0][1],reverse=True)

最大图像现在位于索引0处,可以通过images [0] [1]进行访问,并且可以使用images [0] [0]打印其形状。您也可以将lambda函数更改为x [0] [0](按高度排序)或x [0] [1](按宽度排序)。