如何使用TwythonStreamer从Twitter API获取全文字段值

问题描述

尝试通过以下代码获取完整的推文。我知道您想将参数tweet_mode设置为'extended'值,但是由于我不是此处的标准查询,因此我不知道它适合哪里。对于文本字段,我总是会得到部分文本,并以“ ...”和URL开头。使用此配置,您将如何获得完整的推文:

from twython import Twython,TwythonStreamer
import json
import pandas as pd
import csv

def process_tweet(tweet):
    d = {}
    d['hashtags'] = [hashtag['text'] for hashtag in tweet['entities']['hashtags']]
    d['text'] = tweet['text']
    d['user'] = tweet['user']['screen_name']
    d['user_loc'] = tweet['user']['location']
    return d
    
    
# Create a class that inherits TwythonStreamer
class MyStreamer(TwythonStreamer):     

    # Received data
    def on_success(self,data):

        # Only collect tweets in English
        if data['lang'] == 'en':
            tweet_data = process_tweet(data)
            self.save_to_csv(tweet_data)

    # Problem with the API
    def on_error(self,status_code,data):
        print(status_code,data)
        self.disconnect()
        
    # Save each tweet to csv file
    def save_to_csv(self,tweet):
        with open(r'tweets_about_california.csv','a') as file:
            writer = csv.writer(file)
            writer.writerow(list(tweet.values()))

# Instantiate from our streaming class
stream = MyStreamer(creds['CONSUMER_KEY'],creds['CONSUMER_SECRET'],creds['ACCESS_TOKEN'],creds['ACCESS_SECRET'])
# Start the stream
stream.statuses.filter(track='california',tweet_mode='extended')

解决方法

tweet_mode=extended参数对v1.1流API无效,因为所有Tweet均以扩展格式和默认(140)格式提供。

如果Tweet对象的值为truncated: true,则有效负载中将有一个附加元素-extended_tweet。这是full_text值的存储位置。

请注意,此答案仅适用于v1.1 Twitter API,在v2中,默认情况下,流API中会返回所有Tweet文本(Twython当前不支持v2)。