Activity 1: Basics usage of Data Visualization

 

In this worksheet, we will learn

  1. How to import a library
  2. Basics of reading data from the dataset file
  3. Basics of Plotting out data (Bar chart and Histogram)

 

Lesson 1: Importing Library and plotting out with own data

 

First, we need to import a library and feed some data to plot the graph of the fed data by using some functions.

 

“.plot” is for adding data to the graph we will show.

“.title” is for setting the title of the graph that will be shown as a whole.

“.xlabel” is for naming the independent value of a graph of x coordinate.

“.ylabel” is for naming the dependent value of a graph or y coordinate.

 

You can give the value of x and y to as much as you want to test.

           

  

 

Lesson 2: Reading data from existing file (.csv)

 

Now, let’s read some data from the comma-separated file (csv), which is a useful format to store some data.

 

Before reading data from a file, download the file from the link below and move it under the same location as the python file you are writing.

 

Data file contain a list of countries with number of population according to certain year. (Note: You can learn data analysis and visualization with python from that link)

 

 

 

 

 

 

 

If you want to read data specifically, use dataset[dataset.columnname]

 

You can specify the number of lines you want to read by using .head(numberOfLines)

 

 

 

 

Or Just a sample data randomly.

 

 

 

To get the last set of data

 

 

 

 

 

 

 

 

 

 

 

Lesson 3: Bar charts, Histogram and Scatter plots (using matplotlib)

 

 

To create Bar Chart from matplotlib library, we use a function,

 .bar(x, y, labelname = “ ”) .

 

The attribute color stands for the color of a graph.

(g for green, r for red, y for yellow, b for bright blue, default as blue, and so on)

 

 

 

 

 

To use Histogram, we have to use the function, .hist(x, y, histtype, rwidth)

X and Y are datas that you want to show in a histogram.

Histype is the type of the histogram you want to use, in this example,we will just go with ‘bar’.

rwidth is for the width of each bar.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Scatter plots are used to compare 2 variables or 3 variables, depending on the number of dataset you have.

 

To use Scatter Plot, use a function,

 .scatter(x, y, label= “ ”, color = “ ”, s, marker = “ ”)

 

 

 

label is to label the graph.

color is for the color of the scatters.

s is for the size of the scatters.

marker is for the symbols of every scatters.

 

(There are number of different symbols, for example, “s” for square, “h” for hexagon, “8” for octagon”, and so on).

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Activtyudfdfdfdfdfdfdddddaa

 

 

 

Activity 2: Application on online dataset

 

            In this activity, you will learn -

  1. How to visualize large set of data from the internet
  2. The differences of using 2 libraries (Matplotlib & Pandas)

 

 

First, download the dataset from the links below

 

For Iris Dataset,

https://archive.ics.uci.edu/ml/datasets/iris

 

For Wine Dataset,

https://www.kaggle.com/zynicide/wine-reviews[1] 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

How to download “iris.csv“ dataset?

 

First, go to the webpage that is provided above. Then click “Data Folder”.

 

A screenshot of a social media post

Description automatically generated

 

 

 

 

 

 

 

 

 

 

 

Second, click “Iris.data” to download.

 

A screenshot of a social media post

Description automatically generated

 

 

 

 

And then, change the “.data” format to “.csv” format in Jupyter Notebook.

(Data file should be in the same location as the python file you are going to write in.)

 

A screenshot of a social media post

Description automatically generated

 

 

 

 

 

Step 1. Import dataset and Libraries

 

To import dataset and read them, we have to use pandas library.

 

 

 

You will see the data with columns and rows from the dataset.

And then, same to the second dataset.

 

 

 

 

 

 

 

Step 2. Let’s Start plotting out the data

 

            Type of graphs we are going to apply on the datasets are

1.              Bar chart

2.              Histogram

3.              Scatter plot

4.              Line Chart (Don’t worry you will get this eventually)

 

 

We will be trying to create those graphs using both libraries.Each graph can be created by using only one library.You will see the difference of using both libraries.

 

 

1.1  Bar Chart (using Matplotlib)

 

      Before plotting out the data, we have to create a figure by using plt.subplots() method.

      Then we take ‘points’ from the dataset as data and store 2 variables, points and frequency. And show it in the bar chart by using bar method.

 

 

 

 

 

 

 

 

 

1.2  Scatter Plot (using Matplotlib)

 

To plot scatter using Matplotlib, we can use scatter method. We have to create a figure and an axis using plt.subplots to give our plot a title and labels.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.3  Line Chart (using Matplotlib)

 

Matplotlib we can create line chart by calling the plot method. We can also plot                                              multiple columns in one graph by looping the columns we want and plotting each column on the same axis.   

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1.4  Histogram (using Matplotlib)

 

Histogram can be created using hist method. If you pass data like the points from the wine-review dataset, it will calculate how often each class occurs.

 

 

 

 

 

 

 

 

 

 

 

 

2.1  Bar Chart (using Pandas)

 

Before plotting bar chart, we have to get our data first. Use the method, value_count() and then sort from smallest to largest using sort_inedx().

To plot bar chart we can use the plot.bar() method

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.2.  Scatter Plot (using Pandas)

 

To create a scatter plot in Pandas, we can call <dataset>.plot.scatter() method and pass three arguments, name of x-column and y-column, and the title of the graph.

           

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

2.3. Line Chart (using Pandas)

 

Plotting a line chart is a bit easier than in Matplotlib. You have to use <dataframe>.plot.line() method. While in Matplotlib we needed to loop-through each column we wanted to plot, in Pandas we don’t need to do this because it automatically plots all available numeric columns (at least if we don’t specify a specific columns).

 

 

 

2.4. Histogram (using Pandas)

 

In Pandas, we can create a Histogram with plot.hist method. There aren’t any required arguments but we can optionally pass some like the bin size.

           

 

 

 

It’s also easy to plot multiple histograms at a time with Pandas library, by using .plot.hist(subplots=True, layout=(x, y), figsize=(10,10), bins=20).

 


Instructor should have this data set file and share it to the students. It requires email for the first time using their platform.