Data Vis Practice – The Tasty Road (Ver.Seoul)

*This falls under the #100DaysOfCode challenge.

** Source code references belong to Lucy Park(D3 code) and Seoul open source project(Seoul TopoJSON map). This is the post to note on how D3 and TopoJson works.

The Data visualisation, especially data scientific appliance to storytelling and journalism, has been a fascinated topic. I had been thinking to start my own project on my career path, but there were not much of demands of it. Everyone wants to hire developers to work on SPA, Full SSAS, or whatever product is but not for story tellings. Perhaps because I am from artistic backgrounds or I am very active on political issues – the web development and new techs for up-to-date service based products were not fascinating me. Don’t get me wrong. I love web stuff and all new innovational approaches. I just am not into all the hype that forces me to hop on it. 2015 I was a kind of useless developer because I am not a seasoned angular developer. 2016 now I am again an incompetent developer because I am not an experienced Angular2 or React with Redux developer.

What is a framework for? Isn’t it supposed to be a tool just helping you to draw your abstract ambitions into a real product? No one answers me about that, and I don’t want to waste my time to learn things it will become useless at any time. Again, don’t get me wrong. I think those tools are very useful and there are a lot of reasons why people are on the highest fever pitch. I am..I am just not into it. I need something more..more likely me. More inspirational aspects of driving me to put my hands on and dig the craftsmanship.

From 2016 to 2017, I have seen three big elections. Presidental elections in USA, France, and South Korea. While observing those elections through media, there were a ton of delicate charts and data tables within the journalism platforms. While reading the articles, which is storytelling aside with insights, I was really into it. Lean into it, Put my emotions while reading articles, and observe the responses from crowds in the web. And finally, make changes. Some results were bitter, and some results were sweet to watch. That was the moment to give me motivation.

I decided to study more about data visualisation. It is going to be 100 challenges. It would sometimes be interactive infographics, or it could be my assignment after following some tutorials or books. Whatever would be, I think it is a good start to give myself to do something steady.

So, here is my challenge one.

I used to go to various tasty restaurants. I pinned locations on the Google map. The thing is there is an expiration date on the each pin, and I just lost my pins when they expire.

The thing is there is an expiration date on the each pin, and I just lost my pins when they expire.

So I decided to take a note where I have been in Seoul and tried to map them via d3.

There was a blog explaining how to put the pins on the maps by using TopoJson and d3. I will put all the references at the end of this post.

http://bl.ocks.org/DigitalSpaceCat/ee4aebbc1acf83e47abbb39560a77fb4#file-index-html

Screenshot 2017-06-02 15.43.02

I grabbed what Lucy built for making her own chart. The difference between my chart and her chart is how to grab and pin the restaurant. The main goal of this chart is learning d3 and how to use it. I didn’t use Python as she does for generating JSON because my main aim is to see how the coordinate location pins work on the d3 chart. I manually googled the restaurant and entered the location coordinates manually in the CSV file.

What I learned from this work is,

  1. D3 is a just tool for drawing data. It is a drawing tool with javascript code. It is not anything about data.
  2. To draw a map and transfer to topoJson, you need to know Python or R to some of the levels. You cannot do this with javascript
  3. For the data cleaning and grabbing some visualisation ideas, at least you need some other tools like MS Excel and Tableau.

I will try to make Sydney and Busan version next time.

Reference: Lucy Park’s tech blog (https://www.lucypark.kr/blog//)

Advertisements

Stock Market analysis practice

*This is not a tutorial post but just notes of practicing on following tech materials.

I have studied the python data visualization course performed by Jose Portlla. I highly recommend this course if you are interested in Python drills. At this post, I am writing my note while I practice stock market analysis.


import pandas as pd
from pandas import Series,DataFrame
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
sns.set_style('whitegrid')

from pandas.io.data import DataReader

from datetime import datetime

from __future__ import division

I recall some libraries to generate graphs. In the tutorial operated by Jose, Yahoo data and pandas have been used to grab some data.

For more detail, probably you need to go to Udemy and see the lectures note. In this post, I am just writing what I’ve been practiced.

I want to see a historical view of the closing price.


AAPL['Adj Close'].plot(legend=True,figsize=(10,4))

axes_subplots

And let’s see the volumes of tradings

matplotlib_volumes

Still I am very on to sloppy understanding “Why the graph looks like this?” Perhaps, this is the point where I need to study on financial analysis.


AAPL[['Adj Close','MA for 10 days','MA for 20 days','MA for 50 days']].plot(subplots=False,figsize=(10,4))

I’ve got this moving average graph.

moving_axes_subplots

I want to compare the daily percentage return of two stocks to check how correlated. I expect that comparing Google to itself will show me a quite-matched linear relationship.


sns.jointplot('GOOG','GOOG',tech_rets,kind='scatter',color='seagreen')

sns.jointplot('GOOG','MSFT',tech_rets,kind='scatter')

seaborn.axisgrid.1

seaborn.joint.grid.2

Wow. two stocks are perfectly correlated with each other a linear relationship between its daily return values.

The blue one is a comparison between Google and Microsoft.

Seems like seaborn and pandas make all the data be represented on its comparison analysis. I couldn’t do this with Excel or other program.

seaborn.pairplot

I wonder if I can generate similar result in R. I am quite happy with python since I am subscribing some python course(but must of them are web development though.) Hopefully, I can catch some R codes later along with D3.

Titanic survivor practice

*This is not a tutorial but just my personal practice notes following tech materials.

 

As for all data analyzing practice, I go to kaggle to grab some data. Kaggle is awesome to find and inspire myself to figure out how to analyse data with others codes.

I am not much experienced in Python. Most of people who learn python do develop Django, but I just couldn’t have myself back-end minded. I will get there soon though.

The reason why I start learning python is that there are a log of science-subjective articles and materials. Comparing to Ruby on Rails, Django is not much attractive to me-this is personal opinion- but I can see a lot of potentials to integrate web frame work with science methods, thinking of a massive libraries on Python.

There are a couple of practice materials on Kaggle and I dive into this materials since I am currently watching the course of Jose Portllia

https://www.kaggle.com/c/titanic


import pandas as pd
from pandas import Series,DataFrame

titanic_df = pd.read_csv('train.csv')

titanic_df.head()

I opened the file with pandas and set up the Titanic CSV.

I though there are many of survivors but obviously not.

The code above will bring the data table but I am quite don’t get it to my head. For visualising them, I imported numpy, matplotlib, and seaborn.

Those three libraries are most used ones and I am quite happy with using them so far.

I wondered how many people survived and how we should treat youths among genders. Each of passengers is having different class as well. Think might sum up the factors of survival from sinking. In this case, I didn’t take genders from whom are younger than 16.

def male_female_child(passenger):
 age,sex = passenger
 if age < 16:
 return 'child'
 else:
 return sex

titanic_df['person'] = titanic_df[['Age','Sex']].apply(male_female_child,axis=1)

sns.factorplot('Pclass',data=titanic_df,hue='person')

I get this graph.

factorplot_3classes
Ipython(I may need to say Jupyter from now on) has a lot of functions to visualise data easily. I also installed R on the kennel of Jupyter but haven’t used it yet.

Learning new language is pain so I may be stick to Python for a while.


fig = sns.FacetGrid(titanic_df, hue="Sex",aspect=4)

fig.map(sns.kdeplot,'Age',shade= True)

oldest = titanic_df['Age'].max()
old set the x lower limit at 0
fig.set(xlim=(0,oldest))

fig.add_legend()

What I grab from this code is a beautiful face grid graph.

Face_grid1

So far, I’ve gotten a great pictures of survivors based on gender, class, age but haven’t got them sectioned by cabin parts.

deck = titanic_df['Cabin'].dropna()

levels = []

for level in deck:
 levels.append(level[0]) 

cabin_df = DataFrame(levels)
cabin_df.columns = ['Cabin']
sns.factorplot('Cabin',data=cabin_df,palette='winter_d')

cabin_df = cabin_df[cabin_df.Cabin != 'T']

sns.factorplot('Cabin',data=cabin_df,palette='summer')

cabin_histograms

Cool.

I wonder if class, gender, and ages are involving to the number of survivors.


sns.factorplot('Pclass','Survived',data=titanic_df)

sns.factorplot('Pclass','Survived',hue='person',data=titanic_df)

generations=[10,20,40,60,80]
sns.lmplot('Age','Survived',hue='Pclass',data=titanic_df,palette='winter')

class_survived

Survival rates for the 3rd class are substantially lower but considering previous graphs, It seems that more amount of men were at 3rd class.

class_age_survived

So far I followed the instruction of Jose’s data visualisation lecture and python’s library pretty covered what I want to see.

Later, I will practice the stock market analysis following next part of Jose’s data visualisation materials.

Python env problem

I installed Anaconda so far and switching around between python 2 and 3 depending on my practice. What I almost am interested is visualising data and turning missive figures into epitomised graphs.

I had no problem to have anaconda python as default python but it continuously am back to default mac os x python which is not set up any of package controls.

I am currently using anaconda python with this tutorial

http://stackoverflow.com/questions/22773432/mac-using-default-python-despite-anaconda-install

This stack post helps to me solve the problem but what annoys me is that the default python is back to mac os x python(which is 2.7.6 not conda’s 2.7.10)

hm…


export PATH="/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:$PATH"

export PATH="$HOME/anaconda/bin:$PATH"