*This is not a tutorial but just my personal practice notes following tech materials.
As for all data analyzing practice, I go to kaggle to grab some data. Kaggle is awesome to find and inspire myself to figure out how to analyse data with others codes.
I am not much experienced in Python. Most of people who learn python do develop Django, but I just couldn’t have myself back-end minded. I will get there soon though.
The reason why I start learning python is that there are a log of science-subjective articles and materials. Comparing to Ruby on Rails, Django is not much attractive to me-this is personal opinion- but I can see a lot of potentials to integrate web frame work with science methods, thinking of a massive libraries on Python.
There are a couple of practice materials on Kaggle and I dive into this materials since I am currently watching the course of Jose Portllia
import pandas as pd from pandas import Series,DataFrame titanic_df = pd.read_csv('train.csv') titanic_df.head()
I opened the file with pandas and set up the Titanic CSV.
I though there are many of survivors but obviously not.
The code above will bring the data table but I am quite don’t get it to my head. For visualising them, I imported numpy, matplotlib, and seaborn.
Those three libraries are most used ones and I am quite happy with using them so far.
I wondered how many people survived and how we should treat youths among genders. Each of passengers is having different class as well. Think might sum up the factors of survival from sinking. In this case, I didn’t take genders from whom are younger than 16.
def male_female_child(passenger): age,sex = passenger if age < 16: return 'child' else: return sex titanic_df['person'] = titanic_df[['Age','Sex']].apply(male_female_child,axis=1) sns.factorplot('Pclass',data=titanic_df,hue='person')
I get this graph.
Learning new language is pain so I may be stick to Python for a while.
fig = sns.FacetGrid(titanic_df, hue="Sex",aspect=4) fig.map(sns.kdeplot,'Age',shade= True) oldest = titanic_df['Age'].max() old set the x lower limit at 0 fig.set(xlim=(0,oldest)) fig.add_legend()
What I grab from this code is a beautiful face grid graph.
So far, I’ve gotten a great pictures of survivors based on gender, class, age but haven’t got them sectioned by cabin parts.
deck = titanic_df['Cabin'].dropna() levels =  for level in deck: levels.append(level) cabin_df = DataFrame(levels) cabin_df.columns = ['Cabin'] sns.factorplot('Cabin',data=cabin_df,palette='winter_d') cabin_df = cabin_df[cabin_df.Cabin != 'T'] sns.factorplot('Cabin',data=cabin_df,palette='summer')
I wonder if class, gender, and ages are involving to the number of survivors.
sns.factorplot('Pclass','Survived',data=titanic_df) sns.factorplot('Pclass','Survived',hue='person',data=titanic_df) generations=[10,20,40,60,80] sns.lmplot('Age','Survived',hue='Pclass',data=titanic_df,palette='winter')
Survival rates for the 3rd class are substantially lower but considering previous graphs, It seems that more amount of men were at 3rd class.
So far I followed the instruction of Jose’s data visualisation lecture and python’s library pretty covered what I want to see.
Later, I will practice the stock market analysis following next part of Jose’s data visualisation materials.