In this blog, we will use twitter developer account to collect data from Twitter API V2. We will collect some recent tweets and convert into pandas dataframe. We will also collect recent tweets for particular twitter user.
We need twitter developer account. If you have twitter account you can login to https://developer.twitter.com/en.
I don't have twitter account, so i am creating.
Click on Sign up
Click on Sign up.
We can use Sign up with Google/Sign up with Apple or Sign up with phone or email. I am using Sign up with phone or email.
Enter your Name, email and date of birth then click on Next.
Click on Next
Click on Sign up.
Verification code will come to your email. Enter verification code then click on Next.
You have to enter password then click Next.
It will ask you to upload profile picture. If you want you can upload or you can skip for now.
Same for bio, if you want to write, then write or otherwise skip for now.
Now twitter account created.
Go to this url: https://developer.twitter.com/en -> then click on Developer Portal
As this twitter account is new, so it will not verify now. I have to wait for atleast one week to verify. After verification create project and app.
After verification, go to https://developer.twitter.com/ and create project and app. I have created project1 and my app name is xv-app.
When we create app, we will get keys and tokens like below. We have to copy the keys and tokens, later we need that.
pip install tweepy
import tweepy
tweepy.__version__
In my system, latest version of tweepy is 4.10.0. If you have already installed, you can upgrade tweepy by using this pip command.
pip install --upgrade tweepy
dir(tweepy)
help(tweepy.Client)
class Client(BaseClient)
| Client(bearer_token=None, consumer_key=None, consumer_secret=None, access_token=None, access_token_secret=None, *, return_type=<class 'tweepy.client.Response'>, wait_on_rate_limit=False)
|
| Client( bearer_token=None, consumer_key=None, consumer_secret=None, access_token=None, access_token_secret=None, *, return_type=Response, wait_on_rate_limit=False )
|
| Twitter API v2 Client
|
| .. versionadded:: 4.0
|
| Parameters
| ----------
| bearer_token : str | None
| Twitter API OAuth 2.0 Bearer Token / Access Token
| consumer_key : str | None
| Twitter API OAuth 1.0a Consumer Key
| consumer_secret : str | None
| Twitter API OAuth 1.0a Consumer Secret
| access_token : str | None
| Twitter API OAuth 1.0a Access Token
| access_token_secret : str | None
| Twitter API OAuth 1.0a Access Token Secret
| return_type : type[dict | requests.Response | Response]
| Type to return from requests to the API
| wait_on_rate_limit : bool
| Whether to wait when rate limit is reached
|
| Attributes
| ----------
| session : requests.Session
| Requests Session used to make requests to the API
| user_agent : str
| User agent used when making requests to the API
|
| Method resolution order:
| Client
| BaseClient
| builtins.object
|
| Methods defined here:
|
| add_list_member(self, id, user_id, *, user_auth=True)
| Enables the authenticated user to add a member to a List they own.
|
| .. versionadded:: 4.2
|
| .. versionchanged:: 4.5
| Added ``user_auth`` parameter
|
| Parameters
| ----------
| id : int | str
| The ID of the List you are adding a member to.
| user_id : int | str
| The ID of the user you wish to add as a member of the List.
| user_auth : bool
| Whether or not to use OAuth 1.0a User Context to authenticate
|
| Returns
| -------
| dict | requests.Response | Response
|
bearer_token = 'AAAAAAAAAAAAAAAAAAAAANvYfwEAAAAAD6QJD%2FFEorvUIwOusfma61Vzxe8%3DVcIrTho7pJBqmuvh95U1LWyKK7T9NclTGPyj137auvIF0VuUPs'
client = tweepy.Client(bearer_token = bearer_token)
client
We can see lot of methods are available with client using dir() method.
dir(client)
search_recent_tweets(query, *, user_auth=False, **params) method of tweepy.client.Client instance
search_recent_tweets(query, *, end_time=None, expansions=None, max_results=None, media_fields=None, next_token=None, place_fields=None, poll_fields=None, since_id=None, sort_order=None, start_time=None, tweet_fields=None, until_id=None, user_fields=None, user_auth=False)
The recent search endpoint returns Tweets from the last seven days that *match a search query.
The Tweets returned by this endpoint count towards the Project-level
`Tweet cap`_.
.. versionchanged:: 4.6
Added ``sort_order`` parameter
Parameters
----------
query : str
One rule for matching Tweets. If you are using a
`Standard Project`_ at the Basic `access level`_, you can use the
basic set of `operators`_ and can make queries up to 512 characters
long. If you are using an `Academic Research Project`_ at the Basic
access level, you can use all available operators and can make
queries up to 1,024 characters long.
end_time : datetime.datetime | str | None
YYYY-MM-DDTHH:mm:ssZ (ISO 8601/RFC 3339). The newest, most recent
UTC timestamp to which the Tweets will be provided. Timestamp is in
second granularity and is exclusive (for example, 12:00:01 excludes
the first second of the minute). By default, a request will return
Tweets from as recent as 30 seconds ago if you do not include this
parameter.
expansions : list[str] | str | None
:ref:`expansions_parameter`
max_results : int | None
The maximum number of search results to be returned by a request. A
number between 10 and 100. By default, a request response will
return 10 results.
media_fields : list[str] | str | None
:ref:`media_fields_parameter`
next_token : str | None
This parameter is used to get the next 'page' of results. The value
used with the parameter is pulled directly from the response
provided by the API, and should not be modified.
place_fields : list[str] | str | None
:ref:`place_fields_parameter`
poll_fields : list[str] | str | None
:ref:`poll_fields_parameter`
since_id : int | str | None
Returns results with a Tweet ID greater than (that is, more recent
than) the specified ID. The ID specified is exclusive and responses
will not include it. If included with the same request as a
``start_time`` parameter, only ``since_id`` will be used.
sort_order : str | None
This parameter is used to specify the order in which you want the
Tweets returned. By default, a request will return the most recent
Tweets first (sorted by recency).
start_time : datetime.datetime | str | None
YYYY-MM-DDTHH:mm:ssZ (ISO 8601/RFC 3339). The oldest UTC timestamp
(from most recent seven days) from which the Tweets will be
provided. Timestamp is in second granularity and is inclusive (for
example, 12:00:01 includes the first second of the minute). If
included with the same request as a ``since_id`` parameter, only
``since_id`` will be used. By default, a request will return Tweets
from up to seven days ago if you do not include this parameter.
tweet_fields : list[str] | str | None
:ref:`tweet_fields_parameter`
until_id : int | str | None
Returns results with a Tweet ID less than (that is, older than) the
specified ID. The ID specified is exclusive and responses will not
include it.
user_fields : list[str] | str | None
:ref:`user_fields_parameter`
user_auth : bool
Whether or not to use OAuth 1.0a User Context to authenticate
Returns
-------
dict | requests.Response | Response
query = "datascience"
tweets = client.search_recent_tweets(query=query, tweet_fields=['context_annotations', 'created_at'], max_results=10)
tweets
for tweet in tweets.data:
print("Text: ", tweet.text)
print("Created at: ", tweet.created_at)
print("\n")
Annotations have been added to the Tweet object from all v2 endpoints that return a Tweet object. Tweet annotations offer a way to understand contextual information about the Tweet itself. Though 100% of Tweets are reviewed, due to the contents of Tweet text, only a portion are annotated.
1. Entity annotations (NER): Entities are comprised of people, places, products, and organizations. Entities are delivered as part of the entity payload section. They are programmatically assigned based on what is explicitly mentioned (named-entity recognition) in the Tweet text.
2. Context annotations: Derived from the analysis of a Tweet’s text and will include a domain and entity pairing which can be used to discover Tweets on topics that may have been previously difficult to surface. At present, we’re using a list of 80+ domains to categorize Tweets.
Tweet annotation types
- Entities
Entity annotations are programmatically defined entities that are nested within the entities field and are reflected as annotations in the payload. Each annotation has a confidence score and an indication of where in the Tweet text the entities were identified (start and end fields).
The entity annotations can have the following types:
Person - Barack Obama, Daniel, or George W. Bush
Place - Detroit, Cali, or "San Francisco, California"
Product - Mountain Dew, Mozilla Firefox
Organization - Chicago White Sox, IBM
Other - Diabetes, Super Bowl 50
- Context
Context annotations are delivered as a context_annotations field in the payload. These annotations are inferred based on semantic analysis (keywords, hashtags, handles, etc) of the Tweet text and result in domain and/or entity labels. Context annotations can yield one or many domains.
for tweet in tweets.data:
print(tweet.text)
if len(tweet.context_annotations) > 0:
print("\n")
print(tweet.context_annotations)
import requests
In this time, we are passing return_type: type to return from requests to the API.
client1 = tweepy.Client(bearer_token = bearer_token, return_type = requests.Response)
client1
new_tweets = client1.search_recent_tweets(query=query, tweet_fields=['context_annotations', 'created_at'], max_results=10)
new_tweets
tweets_json = new_tweets.json()
tweets_json
import pandas as pd
tweets_data = tweets_json['data']
df = pd.json_normalize(tweets_data)
df
We converted recent tweets in pandas dataframe. Now we have data, whatever we want, we can do with data.
df['text']
Suppose we want to collect data for particular user. In this example, i want to get Rahul Gandhi's tweets. We need Rahul Gandhi's twitter id to fetch the tweets. To get twitter id, i used this url https://tweeterid.com/ and pass the twitter user name and got the twitter id of Rahul Gandhi's account.
id = '3171712086'
tweets = client.get_users_tweets(id=id, tweet_fields=['context_annotations','created_at','geo'])
for tweet in tweets.data:
print("\nTweet text: ", tweet.text)
print("Created at: ", tweet.created_at)