Untitled

DATA DOUBLE CONFIRM

Reading csv data from Github - Python

Today I decided to poke around a little to see if it would be possible to read csv files directly from Github, and the answer is yes. As I have published numerous csv datasets on Github, I thought it would be easier for people to access them without downloading the datasets/ cloning the repository, and as always (or as I'd hoped), there is an answer on the internet.

So, for example, to read in the dataset called 'arrivals2018.csv' hosted on the datadoubleconfirm repository, what we need to do is to get the link to the raw file and then run the code below. The screenshots show how to obtain the raw link.


import pandas as pd
url = 'https://raw.githubusercontent.com/hxchua/datadoubleconfirm/master/datasets/arrivals2018.csv'
df = pd.read_csv(url, error_bad_lines=False)

List of datasets published on my Github

References:

https://stackoverflow.com/questions/32400867/pandas-read-csv-from-url