Importance of Stat

Importance of Statistics for Data Science | Statistics and Data Science

In the modern world, everything has become data-driven. The amount of data produced every second in the world multiples into terabytes, and this implies that the field of data science has also grown at a similar pace simultaneously. Analyzing such large amounts of data require capable data scientists too, so it is no surprise that there is a huge demand for great data analysts and that it has become such a lucrative field today.

The main aim of data science is to analyse the unstructured data being produced today, but this is often impossible to do qualitatively – it has to be done quantitatively. After analyzing this data, organisations need to obtain real insights about their customers and their needs, so that these insights can be translated into proper business value quickly. Therefore, the onus is on data scientists to carry out their analyses properly, so as to improve and optimise the way business is conducted. Organisations in a variety of fields, ranging from health care to entertainment currently follow this model.

Data scientists must have a deep understanding of statistical concepts in order to carry out quantitative analysis on the available data. Therefore, they must learn statistics for data science to be successful – this is a given. However, there are a lot of statistics for data science tutorials available online, and the ones by Acadgild are comprehensive enough to provide you with a thorough understanding of what is discussed here.

Let us take a look at some statistical concepts that every data scientist must know, to make his job easier.

Linear Regression

Linear regression is a lynch-pin of statistics and is used to predict the value of a variable based on the values of the other variables present in the analysis. This is done by fitting the best linear relationship in the scatter-plot of the values of two variables – the dependent and the independent ones. The best fit is obtained by ensuring that the sum of all distances between the obtained shape and the values of each point is as less as possible.

There are two types of linear regressions – simple and multiple. In the former, there are only two variables used – a dependent one and an independent one. In the latter, more than one independent variable is used in a bid to predict the value of the dependent variable more accurately.

Classification

This is a general term, which is used to refer to data mining methods which categorize the available data to obtain correct and accurate analysis and predictions from them. It is also called a Decision Tree, and there are two main classification methods – Logistic Regression and Discriminant Analysis. For more information on these complex statistical methods, you should check out the course by Acadgild.

Resampling Methods

In this method, samples are drawn from the original data samples repeatedly to obtain a unique sampling distribution which follows the actual data set. This is usually done when the data set is far too large to be analysed entirely, as is the case in most big data analysis. The estimates obtained from this method is unbiased, as it is from the unbiased samples which are from all possible results of the data that the researcher has.

In order to learn more about basic statistics for data science, the best thing to do would be to enroll for an online course and complete it. Acadgild offers high quality and highly rated data science courses which can put you on your way to a successful career as a data scientist.

China Big Brother

China’s ‘Big Brother’ surveillance technology isn’t nearly as all-seeing as the government wants you to think

Harrison Jacobs Jul. 15, 2018, 9:01 AM

RTS1JUBA
It’s definitely impressive — but it’s not quite “Big Brother” yet.
  • The Chinese government is working to combine its 170+ million security cameras with artificial intelligence and facial recognition technology to create a vast surveillance state.
  • The government isn’t quiet about its efforts, often playing up technological successes in state media to convince the populace of its impressive capabilities.
  • But recent reports suggest that the technology is not as ubiquitous or useful as the government wants its citizens to believe.
  • For example, an exec at one of the main Chinese artificial intelligence startups powering the surveillance technology told Business Insider that its platform cannot handle searching for more than 1,000 people at a time due to technological limitations.

The Chinese government is working to create a techno-authoritarian state powered by artificial intelligence and facial recognition to track and monitor its 1.4 billion citizens.

The government has big plans to have a ubiquitous surveillance network, leading the country to becoming the biggest market in the world for video surveillance — $6.4 billion in 2016, according to estimates from IHS Markit Ltd. China already has 170 million security cameras in use for its so-called Skynet surveillance system, with 400 million more on the way in the coming years.

Far from hiding its wide-reaching abilities from the public, the government has frequently touted its high-tech surveillance successes in recent months.

GettyImages 915205044

Last September, English-language state newspaper China Dailytouted how police in Qingdao used facial recognition technology to catch 25 would-be criminals. In March, Beijing police began using facial recognition and AI-powered glasses to catch criminals — just a couple months after police in Henan and Zhengzhou began testing the glasses at train stations.

In Xiangyang, a giant screen was set up over a crosswalk to display the names and faces of jaywalkers and other lawbreakers that cameras caught at the intersection. And in December, China demonstrated its sophisticated “Skynet” system by having it track down a BBC reporter in just 7 minutes.

But all of these successes belie a simple reality: the surveillance tech is not nearly as pervasive or effective as the government or the media purports it to be.

Face++ isn’t all-powerful yet

China FacialRecognition Megvii FacePlusPlus (20 of 27)
Entrance to Megvii’s offices are managed by its Face++ software.

On a recent visit to the offices of Megvii, a leading artificial intelligence startup and one of the main providers behind the facial recognition tech used by Chinese police, I met with Xie Yinan, the company’s vice president.

Despite notions that Chinese police’s facial recognition capabilities can track down anyone, anywhere, that’s simply not what the technology is capable of, according to Xie.

He said Megvii’s Face++ platform, which numerous police departments in China have used to help them arrest 4,000 people since 2016, has serious technological limitations.

For example, even if China had facial scans of every one of its citizens uploaded to its system, it would be impossible to identify everyone passing in front of a Face++-linked camera. While the Face++ algorithm is more than 97% accurate, it can only search a limited number of faces at a time.

In order to work, police would have to upload the faces they want to track to a local server at the train station or command center where they intend to look. Face++ would then use its algorithm to match those faces to the ones it encounters in the real world.

China FacialRecognition Megvii FacePlusPlus (17 of 27)

Xie said it wouldn’t be feasible to have the system search for more than 1,000 faces at a time — the data and processing power required for an operation larger than that would require a supercomputer. Plus, Xie said they can’t run the system 24/7 today. It’s the kind of thing police will have to activate proactively when a situation is underway.

While it is possible that the system could be connected to a supercomputer over the cloud to amplify computing power, it would be too dangerous from a security perspective. The system has to stay offline and local.

When I asked whether Xie or the company have any concerns over how police could misuse the Face++ platform, he essentially said it’s up to the government to write the legal framework on when and how law enforcement can use it.

“We don’t have access to the data,” he said. “What we do is sell them a server [loaded with Face++]. That’s all.”

Exaggerating technological advancements

Facial recognition isn’t the only area where China’s techno-authoritarian capabilities have been exaggerated, by both the media and the government.

At the crosswalk in Xiangyang, there is a 5- to 6-day delay between when someone commits crime and when their face appears on the billboard. Local officials told The New York Times that humans, not an algorithm, look through the photos the crosswalk camera captures to match them with people’s identities.

Meanwhile, the smart glasses police are using in Beijing and Zhengzhou only work if a target stands still for several seconds. It’s less being used to spot criminals than to verify travelers’ identifications.

GettyImages 102576641
Police in China monitor vast quantities of surveillance cameras from central command hubs.

But, in some ways, it hardly matters. Those nuances are often lost on the public, particularly when state media has gone to such lengths to convince its populace of its technological prowess.

In Zhengzhou, a heroin smuggler confessed after police showed the suspect their smart glasses and said it could incriminate him, The Times recently reported.

“The whole point is that people don’t know if they’re being monitored, and that uncertainty makes people more obedient,”Martin Chorzempa, a fellow at the Peterson Institute for International Economics, told The Times.

Of course, it’s likely only a matter of time before the technology gets better. The Chinese government and the country’s tech investors are pouring money into facial recognition startups like Megvii.

Megvii raised $460 million last November, much of which came from a state-owned venture fund. While the valuation hasn’t been disclosed, it’s likely that it is close to or tops $2 billion. Two smaller Chinese companies include DeepGlint, and Yitu Technology, which raised $380 million last year.

Data Visualization

The Why of Data Visualization in Analytics – Can we do without Data Visualization?

The CEO of one popular company had proclaimed that executives today are like children; they like to see their reports in the form of pictures. A picture is after all worth a thousand words! Satire apart, as per recent research, the brain comprehends creative work the best when it is tired. That is to say, the brain thinks and perceives through its right half more clearly when the left half has switched-off! No doubt that even Archimedes was using his right brain when he came up with the framework about his Archimedes principle in a bath tub!! The popularly acclaimed “eureka eureka” moment happened when his left brain had almost dozed off!! Data Scientists have also observed this phenomenon vividly while studying huge amounts of data or Big Data and trying to come up with meaningful reports and crisp dashboards that are intelligible. They have now been using the concept of Data Visualization in Analytics that appeals more to the right brain for easy comprehension even when the left hemisphere of the brain is overworked!

What is Data Visualization?
By definition, Data Visualization is the visual communication of information that has been abstracted in some schematic form. Data Visualization is a feature incorporated in Business Analytics for enabling executives to better understand the reports that are extracted out of tons of data by building a visual context around it. To put simply, Data Visualization tends to make complicated data accessible, intelligible, and usable.

Importance of Data Visualization:
As stated by Micheal Dell, Big Data Analytics is the next trillion dollar market as most organizations have to manage huge amounts of data. Hence they need tools to not only analyze it but also churn graphical dashboard summaries for quick analysis. Scrolling through table-based summaries derived from Big Data can cause an information overload. This is precisely the stage where Data Visualization steps in and makes the information more comprehensible through the diagrammatic shapes, sizes, and thickness and thinness of lines. The graphic nature of the dashboards and reports enables businesses to act on the data quickly as compared to table-based reports – the ‘analytics part’ and the ‘visualization part’ actually go hand-in-hand.

The data-to-intelligence journey gets complete only with Data Visualization. Visual sense is the most acute of the five senses. As per research, human beings grasp the differences in colors, hues, length, width, shape orientation readily without much effort. Hence communicating complicated data set in an intuitive manner becomes easy with Data Visualization.

The feature of Data Visualization clearly brings forth the patterns and correlations that may be deeply buried in data and may be unnoticeable in table-based reports. The images and charts also include ‘interactive capability’, which enables the executive to drill-down into the analytics tables for deeper analysis.

Today, BI Analytics tools have also made it possible to leverage current IT investment, streamline existing data in any data source, and use plug-and-play interfaces to derive colorful dashboards. Tableau, Netezza, nSights, and Cognos are some of the products that are making headways in the market. 

Now, it is also possible to host the BI Analytics and Data Visualization framework on Mobility platform. The arrangement not only allows executives to generate crisp and colorful dashboards at the click of a button but also tap deeper into their business intelligence while on the go on their handheld devices.

Data Visualization is indispensable especially when it comes to Analytics of Big Data. However, Data Visualization alone does not produce any valuable insights of its own – it is the role of analytics. Data Visualization just summarizes and conveys the analytics, which are produced, in a pictorial summary with added features for drill-down. To make the journey of Data Analytics and Data Visualization more effective, efforts have to be taken on ‘how the data is collected’ and ‘what data is collected’. In absence of which executives may tend to waste time in just analyzing and finding meaning in their existing data in different ways without really getting to the bigger picture. Thus a meaningful dataset, Data Analytics, and Data Visualization have to work together in order to gain quick actionable insights even through Big Data. Only then the executive can relate to the ‘big picture’ and take informed actions.

– Vikram Kole
Vice President
Datamatics Global Services Limited

Data Science vs.Big Data vs.Data Analytics

By Shivam Arora,

Data Science vs. Big Data vs. Data Analytics

Data is everywhere. In fact, the amount of digital data that exists is growing at a rapid rate, doubling every two years, and changing the way we live. According to IBM, 2.5 billion gigabytes (GB) of data was generated every day in 2012.

An article by Forbes states that Data is growing faster than ever before and by the year 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet.

Which makes it extremely important to at least know the basics of the field. After all, here is where our future lies.

In this article, we will differentiate between the Data Science, Big Data, and Data Analytics, based on what it is, where it is used, the skills you need to become a professional in the field, and the salary prospects in each field.

Let’s first start off with understanding what these concepts are.

What They Are

Data Science: Dealing with unstructured and structured data, Data Science is a field that comprises of everything that related to data cleansing, preparation, and analysis.

Data Science is the combination of statistics, mathematics, programming, problem-solving, capturing data in ingenious ways, the ability to look at things differently, and the activity of cleansing, preparing and aligning the data.

In simple terms, it is the umbrella of techniques used when trying to extract insights and information from data.

Big Data: Big Data refers to humongous volumes of data that cannot be processed effectively with the traditional applications that exist. The processing of Big Data begins with the raw data that isn’t aggregated and is most often impossible to store in the memory of a single computer.

A buzzword that is used to describe immense volumes of data, both unstructured and structured, Big Data inundates a business on a day-to-day basis. Big Data is something that can be used to analyze insights which can lead to better decisions and strategic business moves.

The definition of Big Data, given by Gartner is, “Big data is high-volume, and high-velocity and/or high-variety information assets that demand cost-effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation”.

Data Analytics: Data Analytics the science of examining raw data with the purpose of drawing conclusions about that information.

Data Analytics involves applying an algorithmic or mechanical process to derive insights. For example, running through a number of data sets to look for meaningful correlations between each other.

It is used in a number of industries to allow the organizations and companies to make better decisions as well as verify and disprove existing theories or models.

The focus of Data Analytics lies in inference, which is the process of deriving conclusions that are solely based on what the researcher already knows.

The Applications of Each Field

Applications of Data Science:

  • Internet search: Search engines make use of data science algorithms to deliver best results for search queries in a fraction of seconds.
  • Digital Advertisements: The entire digital marketing spectrum uses the data science algorithms – from display banners to digital billboards. This is the mean reason for digital ads getting higher CTR than traditional advertisements.
  • Recommender systems: The recommender systems not only make it easy to find relevant products from billions of products available but also adds a lot to user-experience. A lot of companies use this system to promote their products and suggestions in accordance with the user’s demands and relevance of information. The recommendations are based on the user’s previous search results.

Applications of Big Data:

  • Big Data for financial services: Credit card companies, retail banks, private wealth management advisories, insurance firms, venture funds, and institutional investment banks use big data for their financial services. The common problem among them all is the massive amounts of multi-structured data living in multiple disparate systems which can be solved by big data. Thus big data is used in a number of ways like: 
    • Customer analytics
    • Compliance analytics
    • Fraud analytics
    • Operational analytics
  • Big Data in communications: Gaining new subscribers, retaining customers, and expanding within current subscriber bases are top priorities for telecommunication service providers. The solutions to these challenges lie in the ability to combine and analyze the masses of customer-generated data and machine-generated data that is being created every day.
  • Big Data for Retail: Brick and Mortar or an online e-tailer, the answer to staying the game and being competitive is understanding the customer better to serve them. This requires the ability to analyze all the disparate data sources that companies deal with every day, including the weblogs, customer transaction data, social media, store-branded credit card data, and loyalty program data.

Applications of Data Analysis:

  • Healthcare: The main challenge for hospitals with cost pressures tightens is to treat as many patients as they can efficiently, keeping in mind the improvement of the quality of care. Instrument and machine data is being used increasingly to track as well as optimize patient flow, treatment, and equipment used in the hospitals. It is estimated that there will be a 1% efficiency gain that could yield more than $63 billion in the global healthcare savings.
  • Travel: Data analytics is able to optimize the buying experience through the mobile/ weblog and the social media data analysis. Travel sights can gain insights into the customer’s desires and preferences. Products can be up-sold by correlating the current sales to the subsequent browsing increase browse-to-buy conversions via customized packages and offers. Personalized travel recommendations can also be delivered by data analytics based on social media data.
  • Gaming: Data Analytics helps in collecting data to optimize and spend within as well as across games. Game companies gain insight into the dislikes, the relationships, and the likes of the users.
  • Energy Management: Most firms are using data analytics for energy management, including smart-grid management, energy optimization, energy distribution, and building automation in utility companies. The application here is centered on the controlling and monitoring of network devices, dispatch crews, and manage service outages. Utilities are given the ability to integrate millions of data points in the network performance and lets the engineers use the analytics to monitor the network.

The Skills you Require

To become a Data Scientist:

  • Education: 88% have a Master’s Degree and 46% have PhDs
  • In-depth knowledge of SAS and/or R: For Data Science, R is generally preferred.
  • Python coding: Python is the most common coding language that is used in data science along with Java, Perl, C/C++.
  • Hadoop platform: Although not always a requirement, knowing the Hadoop platform is still preferred for the field. Having a bit of experience in Hive or Pig is also a huge selling point.
  • SQL database/coding: Though NoSQL and Hadoop have become a major part of the Data Science background, it is still preferred if you can write and execute complex queries in SQL.
  • Working with unstructured data: It is most important that a Data Scientist is able to work with unstructured data be it on social media, video feeds, or audio.

To become a Big Data professional:

  • Analytical skills: The ability to be able to make sense of the piles of data that you get. With analytical abilities, you will be able to determine which data is relevant to your solution, more like problem-solving.
  • Creativity: You need to have the ability to create new methods to gather, interpret, and analyze a data strategy. This is an extremely suitable skill to possess.
  • Mathematics and statistical skills: Good, old-fashioned “number crunching”. This is extremely necessary, be it in data science, data analytics, or big data.
  • Computer science: Computers are the workhorses behind every data strategy. Programmers will have a constant need to come up with algorithms to process data into insights.
  • Business skills: Big Data professionals will need to have an understanding of the business objectives that are in place, as well as the underlying processes that drive the growth of the business as well as its profit.

To become a Data Analyst:

  • Programming skills: Knowing programming languages are R and Python are extremely important for any data analyst.
  • Statistical skills and mathematics: Descriptive and inferential statistics and experimental designs are a must for data scientists.
  • Machine learning skills
  • Data wrangling skills: The ability to map raw data and convert it into another format that allows for a more convenient consumption of the data.
  • Communication and Data Visualization skills
  • Data Intuition: it is extremely important for professional to be able to think like a data analyst.

Now let’s talk about salaries!

Though in the same domain, each of these professionals, data scientists, big data specialists, and data analysts, earn varied salaries.

The average a data scientist earns today, according to Indeed.com is $123,000 a year. According toGlassdoor, the average salary for a Data Scientist is $113,436 per year.

The average salary of a Big Data specialist according to Glassdoor is $62,066 per year.

The average salary for a data analyst according to Glassdoor is $60,476 per year.

Now that you know the differences, which one do you think is most suited for you – Data Science? Big Data? Or Data Analytics?


Academic Service to the Office of the National Digital Economy and Society Commission (ONDE)

(Updated!!)

November 2018, Final report (both in Thai and English) were submitted to ONDE. This project has been carried out successfully.

May 2018, A workshop on Reviewing and Enhancing ASEAN ICT Skills Standard is to be hosted on 2-3 July 2018 with an aim to consider a skill standard in 3 areas respect to Big Data, Social Business and Internet of Things, and to discuss related an approach to promote the use of ICT Standard within the region.

March 2018, Thanachai Thumthawatworn (a member of IDL Laboratory), on behalf of Assumption University, had signed a project contract entitled “Reviewing and Enhancing ASEAN ICT Skills Standard” with the ONDE with an aim to review and develop ICT skills standard for ASEAN country. Details can be found here.

Academic service to the Office of the University Registrar (Assumption University)

March 2018, IDL presented data analytic results (based on AU’s students intakes for the past five years) to the University Registrar, Dr. Soonthorn Pibooncharoensit. Details can be found here.

For this academic service, IDL is planning to deliver the results every academic semester, hence the Office of the University Registrar is able to utilize the results for their recruitment’s planning and management.

Big Data with 8 V’s – 5 Things You Need to Know About Big Data

Big data is a field that treats of ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate.[2]Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. Big data was originally associated with three key concepts: volume, variety, and velocity.[3] Other concepts later attributed with big data are veracity (i.e., how much noise is in the data) [4] and value.[5]

https://www.kdnuggets.com/2018/03/5-things-big-data.html

Licence Plate Number Detection

The project aims to assist police officers to enforce the traffic laws. A sample use case is detecting driving in the wrong direction. The camera was used to capture the motion of objects. If the objects travel in the wrong direction, the camera snaps a photo and feeds to the licence plate character recognition subsystem to extract the characters of the licence plate.

Project Contributors

  • Prateep Dharaan
  • Roshan Pandey
  • Vitchayut Cheravinich

Interesting Fact

da-salary-fact

Data Scientist is a hot new role. Evidence is visible when ones try seeking for data scientist roles. The role requires knowledge to deal with data which eventually adds value to the business.