WEBVTT
1
00:00:00.005 --> 00:00:04.002
- Usually when we're faced with a large amount of data
2
00:00:04.002 --> 00:00:08.000
one of the first steps is to compute summary statistics
3
00:00:08.000 --> 00:00:11.000
for the data and the consideration.
4
00:00:11.000 --> 00:00:14.009
NumPy helps us do that, with a set of functions
5
00:00:14.009 --> 00:00:18.004
for calculating aggregates for NumPy arrays.
6
00:00:18.004 --> 00:00:21.000
Which take an array as an input,
7
00:00:21.000 --> 00:00:24.009
and by the fault, return a scalar, as output.
8
00:00:24.009 --> 00:00:29.002
These are statistics such as averages, standard deviation,
9
00:00:29.002 --> 00:00:31.003
functions for calculating the sum,
10
00:00:31.003 --> 00:00:33.008
and the product developments in array.
11
00:00:33.008 --> 00:00:38.004
Let's jump into our first example and see how we can compute
12
00:00:38.004 --> 00:00:41.002
the sum of all values in array.
13
00:00:41.002 --> 00:00:44.008
There is a simple function to achieve this called sum.
14
00:00:44.008 --> 00:00:49.000
First, lets import numpy as np
15
00:00:49.000 --> 00:00:53.003
and create a one dimensional array called first arr.
16
00:00:53.003 --> 00:00:58.006
Then we contain tens from 10 to 100.
17
00:00:58.006 --> 00:01:03.001
And then 2, two dimensional arrays called second arr
18
00:01:03.001 --> 00:01:09.002
that has dimensions 3 by 3.
19
00:01:09.002 --> 00:01:16.000
And third arr that has dimensions 2 by 5.
20
00:01:16.000 --> 00:01:19.007
Let's calculate the sum for all three arrays.
21
00:01:19.007 --> 00:01:23.009
We just need to type first arr dot sum,
22
00:01:23.009 --> 00:01:28.001
to calculate the sum of all values in first arr.
23
00:01:28.001 --> 00:01:34.007
And similarly, for the second and third array.
24
00:01:34.007 --> 00:01:38.002
What if you want to calculate the sum of each column
25
00:01:38.002 --> 00:01:40.007
in the second array, second arr?
26
00:01:40.007 --> 00:01:43.007
Then we have to pause the axis as argument,
27
00:01:43.007 --> 00:01:46.008
in this case, axis equal 0.
28
00:01:46.008 --> 00:01:52.001
We can do this by typing, second arr dot sum,
29
00:01:52.001 --> 00:01:54.003
axis equal 0.
30
00:01:54.003 --> 00:01:56.002
For the sum of each row,
31
00:01:56.002 --> 00:02:00.007
we will type, second arr dot sum,
32
00:02:00.007 --> 00:02:03.007
axis equals 1.
33
00:02:03.007 --> 00:02:07.000
Next, lets see a function called prod.
34
00:02:07.000 --> 00:02:11.009
Prod functions finds the product of all elements in array.
35
00:02:11.009 --> 00:02:17.005
We will calculate the product for all three arrays.
36
00:02:17.005 --> 00:02:18.008
Just as for sum,
37
00:02:18.008 --> 00:02:22.003
we can calculate the product only for columns.
38
00:02:22.003 --> 00:02:26.001
Let's try it out on at third arr, by typing
39
00:02:26.001 --> 00:02:30.004
third arr dot prod, X is equal 0.
40
00:02:30.004 --> 00:02:33.007
Now, we'll use another statistics function
41
00:02:33.007 --> 00:02:35.004
to find the average.
42
00:02:35.004 --> 00:02:36.007
The average function
43
00:02:36.007 --> 00:02:39.008
will return the average of a given array.
44
00:02:39.008 --> 00:02:44.001
If you pass an axis, it will return average just for columns
45
00:02:44.001 --> 00:02:45.009
or just for rows.
46
00:02:45.009 --> 00:02:48.008
Let's calculate the average for the three arrays
47
00:02:48.008 --> 00:02:51.009
by typing, np dot average.
48
00:02:51.009 --> 00:02:57.000
First arr for the first array.
49
00:02:57.000 --> 00:03:04.002
And similarly, for the second and third array.
50
00:03:04.002 --> 00:03:07.002
To extremely use the statistics function,
51
00:03:07.002 --> 00:03:11.002
our min and max functions to find the minimum value,
52
00:03:11.002 --> 00:03:14.003
and maximum value of a given array.
53
00:03:14.003 --> 00:03:17.009
Let's find min and max for our first array.
54
00:03:17.009 --> 00:03:24.009
We'll just type np dot min first arr.
55
00:03:24.009 --> 00:03:27.007
Np dot max first arr.
56
00:03:27.007 --> 00:03:31.008
Our last stop is to learn functions for calculating mean
57
00:03:31.008 --> 00:03:35.002
and standard deviation of the given input array.
58
00:03:35.002 --> 00:03:38.006
To get the mean for our first array,
59
00:03:38.006 --> 00:03:43.004
type np dot mean, first arr.
60
00:03:43.004 --> 00:03:49.005
And for standard deviation type np dot std,
61
00:03:49.005 --> 00:03:51.002
first arr.
62
00:03:51.002 --> 00:03:55.001
NumPy provides many other different aggregate functions
63
00:03:55.001 --> 00:03:57.007
that we won't cover and discuss here,
64
00:03:57.007 --> 00:04:00.002
but you can easily search and find them
65
00:04:00.002 --> 00:04:02.002
in a NumPy documentation.