Can twitter API be bypassed? A new methodology for collecting chronological information without restrictions

Aldo Hernandez-Suarez, Gabriel Sanchez-Perez, Karina Toscano-Medina, Rocio Toscano-Medina, Victor Martinez-Hernandez, Jesus Olivares-Mercado, Hector Perez-Meana, Victor Sanchez

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations

Abstract

Retrieving information from social networks is a first and primordial step in many data analysis fields such as Natural Language Processing and Machine Learning. Important data science tasks rely on historical data gathering for further predictive results. Recent works use public platforms for collecting public streams of information like Twitter API, which allows querying chronological tweets from periods no longer than three weeks. In this paper, we present Twitter Scrapy, a new methodology for collecting historical tweets from time periods of arbitrary duration using web scraping techniques that bypass Twitter API restrictions.

Original languageEnglish
Title of host publicationNew Trends in Intelligent Software Methodologies, Tools and Techniques - Proceedings of the 17th International Conference, SoMeT 2018
EditorsHamido Fujita, Enrique Herrera-Viedma
PublisherIOS Press BV
Pages453-462
Number of pages10
ISBN (Electronic)9781614998990
DOIs
StatePublished - 2018
Event17th International Conference on New Trends in Intelligent Software Methodology Tools and Techniques, SoMeT 2018 - Granada, Spain
Duration: 26 Sep 201828 Sep 2018

Publication series

NameFrontiers in Artificial Intelligence and Applications
Volume303
ISSN (Print)0922-6389
ISSN (Electronic)1879-8314

Conference

Conference17th International Conference on New Trends in Intelligent Software Methodology Tools and Techniques, SoMeT 2018
Country/TerritorySpain
CityGranada
Period26/09/1828/09/18

Keywords

  • Twitter bots
  • Twitter scrapy
  • Web crawling
  • Web scraping
  • Web spiders

Fingerprint

Dive into the research topics of 'Can twitter API be bypassed? A new methodology for collecting chronological information without restrictions'. Together they form a unique fingerprint.

Cite this