NumPy is a powerful library in Python for numerical and scientific computing. The 'numpy.var()' function is used to compute the variance of an array or a sequence of numbers. Variance is a measure of the spread or dispersion of data points. Here are step-by-step examples of how to use the 'numpy.var()' function:
Step 1: Import NumPy
You need to import the NumPy library before using it.
import numpy as np
Step 2: Create a NumPy Array
You can create a NumPy array, which is an efficient data structure for numerical operations, to calculate the variance. Here's an example of creating an array:
data = np.array([10, 20, 30, 40, 50])
Read also:
Step 3: Calculate Variance
Now that you have your data in a NumPy array, you can use the 'numpy.var()' function to calculate the variance:
variance = np.var(data)
This calculates the population variance by default. If you want to calculate the sample variance, you can specify the ddof (Delta Degrees of Freedom) parameter as 1, like this:
sample_variance = np.var(data, ddof=1)
Step 4: Print the Result
You can print the calculated variance:
print("Population Variance:", variance)
print("Sample Variance:", sample_variance)
Here's the complete code:
import numpy as np
data = np.array([10, 20, 30, 40, 50])
variance = np.var(data)
sample_variance = np.var(data, ddof=1)
print("Population Variance:", variance)
print("Sample Variance:", sample_variance)
Population Variance: 200.0
Sample Variance: 250.0
This code will calculate and print both the population variance and the sample variance of the data array.
Remember that the 'numpy.var()' function can be used with arrays of any dimension, and it can also calculate the variance along specific axes if you're working with multi-dimensional arrays.
Understand Variance
The variance of a dataset is calculated as the average of the squared differences between each data point and the dataset's mean. The formula for population variance is:
Where:
- N is the number of data points.
- xᵢ represents each data point.
- µ is the mean of the data points.
import numpy as np
data = np.array([10, 20, 30, 40, 50])
mean = np.mean(data)
N = len(data) # Number of data points
sum_squared_diff = np.sum((data - mean) ** 2)
population_variance = sum_squared_diff / N
print("Population Variance:", population_variance)
Population Variance: 200.0
In this code:
- 'N' is the number of data points.
- 'data' - mean calculates the difference between each data point and the mean.
- '(data - mean) ** 2' squares these differences.
- 'np.sum()' calculates the sum of the squared differences.
- Finally, you divide the sum by 'N' to get the population variance.
Another Method: Understand Variance
import numpy as np
data = np.array([10, 20, 30, 40, 50])
mean = np.mean(data)
N = len(data) # Number of data points
squared_diffs = [(x - mean) ** 2 for x in data]
population_variance = sum(squared_diffs) / N
print("Population Variance:", population_variance)
Population Variance: 200.0
Here's how it works:
- N is the number of data points.
- The list comprehension [(x - mean) ** 2 for x in data] calculates the squared differences for each data point.
- sum(squared_diffs) sums up the squared differences.
- Finally, you divide the sum by N to get the population variance.