Activity
1: Basics usage of Data Visualization
In this worksheet, we will learn
First, we need to import a library and feed some data to plot the graph of the fed data by using some functions.
“.plot” is for adding data to the graph we will show.
“.title” is for setting the title of the graph that will be shown as a whole.
“.xlabel” is for naming the independent value of a graph of x coordinate.
“.ylabel” is for naming the dependent value of a graph or y coordinate.
You can give the value of x and y to as much as you want to test.
Now, let’s read some data from the comma-separated file (csv), which is a useful format to store some data.
Before reading
data from a file, download the file from the link below and move it under the
same location as the python file you are writing.
Data file contain a list of countries with number of population according to certain year. (Note: You can learn data analysis and visualization with python from that link)
If you want to read data specifically, use dataset[dataset.columnname]
You can specify the number of lines you want to read by
using “.head(numberOfLines)”
Or Just a sample data randomly.
To get the last set of data
To create Bar Chart from matplotlib library, we use a function,
.bar(x,
y, labelname = “ ”) .
The attribute color stands for the color of a graph.
(g for green, r for red, y for yellow, b for bright blue, default as blue, and so on)
To use Histogram, we have to use the function, .hist(x, y, histtype, rwidth)
X and Y are datas that you want to show in a histogram.
Histype is the type of the histogram you want to use, in this example,we will just go with ‘bar’.
rwidth is for the width of each bar.
Scatter plots are used to compare 2 variables or 3 variables, depending on the number of dataset you have.
To use Scatter Plot, use a function,
.scatter(x, y, label= “ ”,
color = “ ”, s, marker = “ ”)
label is to label the graph.
color is for the color of the scatters.
s is for the size of the scatters.
marker is for the symbols of every scatters.
(There are number of different symbols, for example, “s” for square, “h” for hexagon, “8” for octagon”, and so on).
Activtyudfdfdfdfdfdfdddddaa
Activity
2: Application on online dataset
In this activity, you will learn -
First, download the dataset from the links below
For Iris Dataset,
https://archive.ics.uci.edu/ml/datasets/iris
For Wine Dataset,
https://www.kaggle.com/zynicide/wine-reviews[1]
First, go to the webpage that is provided above. Then click “Data Folder”.
Second, click “Iris.data” to download.
And then, change the “.data” format to “.csv” format in Jupyter Notebook.
(Data file should be in the same location as the python file you are
going to write in.)
To import dataset and read them, we have to use pandas library.
You will see the data with columns and rows from the dataset.
And then, same to the second dataset.
Type of graphs we are going to apply on the datasets are
1. Bar chart
2. Histogram
3. Scatter plot
4. Line Chart (Don’t worry you will get this eventually)
We will be trying to create those graphs using both libraries.Each graph can be created by using only one library.You will see the difference of using both libraries.
Before plotting out the data, we have to create a figure by using plt.subplots() method.
Then we take ‘points’ from the dataset as data and store 2 variables, points and frequency. And show it in the bar chart by using bar method.
To plot scatter using Matplotlib, we can use scatter method. We have to create a figure and an axis using plt.subplots to give our plot a title and labels.
Matplotlib we can create line chart by calling the plot method. We can also plot multiple columns in one graph by looping the columns we want and plotting each column on the same axis.
Histogram can be created using hist method. If you pass data like the points from the wine-review dataset, it will calculate how often each class occurs.
Before plotting bar chart, we have to get our data first. Use the method, value_count() and then sort from smallest to largest using sort_inedx().
To plot bar chart we can use the plot.bar() method
To create a scatter plot in Pandas, we can call <dataset>.plot.scatter() method and pass three arguments, name of x-column and y-column, and the title of the graph.
Plotting a line chart is a bit easier than in Matplotlib. You have to use <dataframe>.plot.line() method. While in Matplotlib we needed to loop-through each column we wanted to plot, in Pandas we don’t need to do this because it automatically plots all available numeric columns (at least if we don’t specify a specific columns).
In Pandas, we can create a Histogram with plot.hist method. There aren’t any required arguments but we can optionally pass some like the bin size.
It’s also easy to plot multiple histograms at a time with Pandas library, by using .plot.hist(subplots=True, layout=(x, y), figsize=(10,10), bins=20).