Computing C.I given the underlying distribution using lineplot()

The lineplot() function which is available in Seaborn, a data visualization library for Python is best to show trends over a period of time however it also helps in plotting the confidence interval.

Syntax:

sns.lineplot(x=None, y=None, hue=None, size=None, style=None, data=None, palette=None, hue_order=None, hue_norm=None, sizes=None, size_order=None, size_norm=None, dashes=True, markers=None, style_order=None, units=None, estimator=ā€™meanā€™, ci=95, n_boot=1000, sort=True, err_style=ā€™bandā€™, err_kws=None, legend=ā€™briefā€™, ax=None, **kwargs,)

Parameters:

  • x, y: Input data variables; must be numeric. Can pass data directly or reference columns in data.
  • hue: Grouping variable that will produce lines with different colors. Can be either categorical or numeric, although color mapping will behave differently in latter case.
  • style: Grouping variable that will produce lines with different dashes and/or markers. Can have a numeric dtype but will always be treated as categorical.
  • data: Tidy (ā€œlong-formā€) dataframe where each column is a variable and each row is an observation.
  • markers: Object determining how to draw the markers for different levels of the style variable.
  • legend: How to draw the legend. If ā€œbriefā€, numeric ā€œhueā€œ and ā€œsizeā€œ variables will be represented with a sample of evenly spaced values.

Return: The Axes object containing the plot.

By default, the plot aggregates over multiple y values at each value of x and shows an estimate of the central tendency and a confidence interval for that estimate.

Example:

Python3




# import libraries
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
  
# generate random data
np.random.seed(0)
x = np.random.randint(0, 30, 100)
y = x+np.random.normal(0, 1, 100)
  
# create lineplot
ax = sns.lineplot(x, y)


In the above code, variable x will store 100 random integers from 0 (inclusive) to 30 (exclusive) and variable y will store 100 samples from the Gaussian (Normal) distribution which is centred at 0 with spread/standard deviation 1. NumPy operations are usually done on pairs of arrays on an element-by-element basis. In the simplest case, the two arrays must have exactly the same shape, as in the above example. Finally, a lineplot is created with the help of seaborn library with 95% confidence interval by default. The confidence interval can easily be changed by changing the value of the parameter ā€˜ciā€™ which lies within the range of [0, 100], here I have not passed this parameter hence it considers the default value 95.

The light blue shade indicates the confidence level around that point if it has higher confidence the shaded line will be thicker.

How to Plot a Confidence Interval in Python?

Confidence Interval is a type of estimate computed from the statistics of the observed data which gives a range of values thatā€™s likely to contain a population parameter with a particular level of confidence.

A confidence interval for the mean is a range of values between which the population mean possibly lies. If Iā€™d make a weather prediction for tomorrow of somewhere between -100 degrees and +100 degrees, I can be 100% sure that this will be correct. However, if I make the prediction to be between 20.4 and 20.5 degrees Celsius, Iā€™m less confident. Note how the confidence decreases, as the interval decreases. The same applies to statistical confidence intervals, but they also rely on other factors.

A 95% confidence interval, will tell me that if we take an infinite number of samples from my population, calculate the interval each time, then in 95% of those intervals, the interval will contain the true population mean. So, with one sample we can calculate the sample mean, and from there get an interval around it, that most likely will contain the true population mean.

Area under the two black lines shows the 95% confidence interval

Confidence Interval as a concept was put forth by Jerzy Neyman in a paper published in 1937. There are various types of the confidence interval, some of the most commonly used ones are: CI for mean, CI for the median, CI for the difference between means, CI for a proportion and CI for the difference in proportions.

Letā€™s have a look at how this goes with Python.

Similar Reads

Computing C.I given the underlying distribution using lineplot()

The lineplot() function which is available in Seaborn, a data visualization library for Python is best to show trends over a period of time however it also helps in plotting the confidence interval....

Computing C.I. given the underlying distribution using regplot()

...

Computing C.I. using Bootstrapping

The seaborn.regplot() helps to plot data and a linear regression model fit. This function also allows plotting the confidence interval....