Statistics Interview Questions for Basic Level

Statistics Interview Questions for Intermediate Level

1. What is the difference between Descriptive Statistics and Inferential Statistics?

Category	Descriptive Statistics	Inferential Statistics
Definition	These statistics are used to summarize the main features of a Data distribution	These statistics are used to draw conclusions about a larger population by using sample data
Relies on	Descriptive Statistics relies mostly on graphical representation to get meaningful information	Inferential statistics relies on Probability Distribution and Mathematical formulas for meaningful conclusions
Techniques used	Mean, Median, Mode, Standard Deviation, Range, Histogram, Box Plot, etc.	Hypothesis Test (t-test, z-test, Chi-square test), ANOVA, confidence interval, etc.
Assumptions	Descriptive Statistics does not involve any kind of assumptions about the population.	Inferential Statistics is often associated with assumptions like Normality, Independence and Random Sampling.
Example Scenarios	Median salary in a university placement record	The length of flippers for all the Penguins in the world

2. Difference between Population and Sample

Category	Population	Sample
Definition	Population is the entirety of the data that we are interested in.	A sample is the subset of the data that we are interested in.
Size	Population’s size is large enough to include every member of every group.	Sample’s size is relatively smaller.
Representation	Population represents the complete data about the group we are interested in.	Sample represents the subset of a population such that it has all the features of the entire population.

3. What is Random Sampling? What is its use?

Random Sampling is a process of selecting a subset from a population such that it ensures every member of a group in that population has equal chance of getting selected. Random Sampling is used to:

it helps in making generalizations about the population
it helps in reducing bias
helps in extracting meaningful statistical inferences

4. What is Qualitative Data and Quantitative Data?

Qualitative Data: Qualitative data cannot be explained in numbers. It is also called Categorical Data. It can be divided into groups and classes. Example: Gender, Color, Age category, etc.
Quantitative Data: Quantitative data, on the other hand is, numerical data. This gives information about the measure of something and can be used in performing mathematical operations. Example: Sales of a car company, Bitcoin Value, etc.

5. What is meant by Probability Distribution?

Probability Distribution is a function that describes the likelihood of possible outcomes of a random event. That means it tells how likely it is for an event to occur and associates a probability to it.

6. What a nominal data and ordinal data?

Nominal Data: It is a type of Qualitative Data which has no inherent order of rankings. That means this type of data does not have any numerical significance associated with them. Example: Types of Colors, Animal Species, etc.
Ordinal Data: It is a type of qualitative data which has a defined order of ranking associated with it. Some group are given more preference over others. Example: Education Level, Likert Scale in Survey response, etc.

7. What is the Central Limit Theorem?

Central Limit Theorem states that:

” The sampling distribution of a sample means approaches normal distribution as the sample size increases irrespective of the shape of Population distribution.”

This theorem holds true for sample size greater than 30. For a Sampling Distribution that follows CLT:

The sampling mean ( [Tex]\overline{x} [/Tex] ) is equal to population mean ( [Tex]\mu [/Tex] )
The standard deviation of sample distribution( [Tex]\sigma_{s} [/Tex] ) is equal to standard deviation of population distribution ( [Tex]\sigma_{p} [/Tex] ) divided by square root of sample size ( n ).

8. Explain Skewness in Distribution. Why does it happen?

Skewness in distribution refers to the distortion in the data points of distribution, making the shape asymmetric. There are two types of skewness:

Left/Negative Skewness: This is when the distribution shape is distorted towards left
Right/Positive Skewness: This is when the distribution shape is distorted towards right

Skewness happens due to the presence of outliers. Outliers in a dataset decides the direction of skewness (positive or negative).

9. What is Normal Distribution? How is it different from a Uniform Distribution in Terms of Measure of Central Tendency?

Category	Normal Distribution	Uniform Distribution
Definition	It is a continuous probability distribution which is symmetric about the mean and having most data occurrence at mean.	It is a continuous probability distribution where every value within a given range is equally likely to occur.
Formula	[Tex]f(x) = \frac1{\sigma{\sqrt{2\pi}}}e^{-\frac{(x-{\mu})^2}{2\sigma^2}} [/Tex] where, [Tex]f(x) [/Tex] = Normal probability density function [Tex]x [/Tex] = Mean of the Normal Distribution [Tex]\sigma [/Tex] = Standard Deviation of Normal Distribution	[Tex]f(x) = \frac1{b-a} , a\leq{x}\leq{b} [/Tex] where, a = minimum of the distribution b = maximum of the distribution x = mean of the distribution
Shape	It is a bell shaped curve	It is a rectangular shaped curve
Measure of Central tendency	For Normal Distribution, mean = median = mode.	For Uniform Distribution, mean = median = average of maximum and minimum in the distribution, and mode is undefined.

10. What is Binomial Distribution?

It is a Discrete probability distribution function that models the number of successes in fixed number of Bernoulli trials, where each trial is either success or failure. The Binomial Distribution function is given as:

[Tex]P(X=k)=\binom{n}{k}p^k(1-p)^{(n-k)} [/Tex], where

n = number of events conducted

p = Probability of the event happening

11. What is an Outlier?

An Outlier is a data point that is significantly different from other data points. Usually, Outliers are present in the extremes of the distribution and stand out as compared to their out data point counterparts.

12. What is the Measure of Center/ Measure of Central Tendency? Explain in brief about it.

Measure of Center/ Measure of Central Tendency is a part of statistics that talks about the “center” of a probability distribution (PD) /dataset. It uses 3 measures of “centers” for it, which are:

Mean: The average of all the data points present in the dataset.
Median: The middle data point of the sorted Dataset/PD.
Mode: The data point which occurs most frequently in a dataset/PD.

13. What is the Measure of Dispersion? Explain in brief about it.

Measure of Dispersion/ Measure of Spread talks about how much distributed the data points are with respect to a single point. Usually, Measure of Dispersion is examined around the mean of the dataset. It explains how “spread out” the data points are around the mean. There are few metrics which tells about the dispersion of a dataset, among which the most used ones are:

Range: The difference between the minimum and maximum value in the dataset
Standard Deviation : it is the square root of variance.
Variance: It is the average of the squared difference of each data point from the mean

14. What is complement rule in probability?

The Complement Rule in Probability states that:

“The probability an event does not occur is one minus the probability of the event occurring”

(Note: The complement Rule holds true for Independent events.)

15. What are Non probability sampling methods? Name a few of them.

Non Probability Sampling methods is based on personal preference of the concerned people. In this type of sampling method, usually sampling is done at the person’s own convenience. Some of the methods are:

Convenience sample: A probability sampling method where the sample are chosen based on the ease to reach or contact.
Snowball sample: its a method where initially approached people are given the task to further spread the recruitment of new people, like a snowball pattern.

16. What is Dependent Event and Independent Event?

Category	Dependent Event	Independent Event
Definition	Two events are dependent when the outcome of one event is influence by the outcome of another event.	Two events are dependent when the outcome of one event does not affect the outcome of another event.
Formula	[Tex]P(A\cap{B}) = P(A) \cdot P(B\|A) [/Tex]	[Tex]P(A\cap{B}) = P(A) \cdot P(B) [/Tex]
Example	drawing cards from a deck without replacement	rolling a fair six-sided die

17. What is margin of error?

It is defined as the maximum expected difference between the population parameter and sample estimate.

18. What is the difference between Poisson Distribution and Bernoulli Distribution?

category	Poisson Distribution	Bernoulli Distribution
Definition	A discrete probability distribution used to explain the number of events/ occurrences occurring within a given time period.	A discrete probability distribution used to model the likelihood of binomial (two) events which are success and failure
Probability Mass Function	[Tex]p(X=x) = {e^{-\lambda}\lambda^{x}}/x! [/Tex] where, X = random event x = number of times the event occurs e = Euler’s constant (2.718) [Tex]\lambda [/Tex] = average number of times an event occurs	[Tex]P(X=x) = p^k(1-p)^{(n-k)} [/Tex] where, x= 0,1 X = random event
Independence	Used for independent events that occur at a constant rate	The events here may or may not be independent.
Example	Number of phone calls at a call center in an hour	Success or failure in a product quality test

Statistics Interview Questions for Basic Level

Top 50 Plus Interview Questions for Statistics with Answers 2023

Similar Reads