How to create crosstabs from a Dictionary in Python?
In this article, we are going to see how to create crosstabs from dictionaries in Python. The pandas crosstab function builds a cross-tabulation table that can show the frequency with which certain groups of data appear.
This method is used to compute a simple cross-tabulation of two (or more) factors. By default, computes a frequency table of the factors unless an array of values and an aggregation function are passed.
Syntax: pandas.crosstab(index, columns, values=None, rownames=None, colnames=None, aggfunc=None, margins=False, margins_name=’All’, dropna=True, normalize=False)
Arguments :
- index : array-like, Series, or list of arrays/Series, Values to group by in the rows.
- columns : array-like, Series, or list of arrays/Series, Values to group by in the columns.
- values : array-like, optional, array of values to aggregate according to the factors. Requires `aggfunc` be specified.
- rownames : sequence, default None, If passed, must match number of row arrays passed.
- colnames : sequence, default None, If passed, must match number of column arrays passed.
- aggfunc : function, optional, If specified, requires `values` be specified as well.
- margins : bool, default False, Add row/column margins (subtotals).
- margins_name : str, default ‘All’, Name of the row/column that will contain the totals when margins is True.
- dropna : bool, default True, Do not include columns whose entries are all NaN.
*** QuickLaTeX cannot compile formula: *** Error message: Error: Nothing to show, formula is empty
Stepwise implementation:
Step 1: Create a dictionary.
Python3
raw_data = { 'Digimon' : [ 'Kuramon' , 'Pabumon' , 'Punimon' , 'Botamon' , 'Poyomon' , 'Koromon' , 'Tanemon' , 'Tsunomon' , 'Tsumemon' , 'Tokomon' ], 'Stage' : [ 'Baby' , 'Baby' , 'Baby' , 'Baby' , 'Baby' , 'In-Training' , 'In-Training' , 'In-Training' , 'In-Training' , 'In-Training' ], 'Type' : [ 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' , 'Free' ], 'Attribute' : [ 'Neutral' , 'Neutral' , 'Neutral' , 'Neutral' , 'Neutral' , 'Fire' , 'Plant' , 'Earth' , 'Dark' , 'Neutral' ], 'Memory' : [ 2 , 2 , 2 , 2 , 2 , 3 , 3 , 3 , 3 , 3 ], 'Equip Slots' : [ 0 , 0 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 ], 'Lv 50 HP' : [ 324 , 424 , 5343 , 52 , 63 , 42 , 643 , 526 , 42 , 75 ], 'Lv50 SP' : [ 86 , 75 , 64 , 43 , 86 , 64 , 344 , 24 , 24 , 12 ], 'Lv50 Atk' : [ 86 , 74 , 6335 , 421 , 23 , 36436 , 65 , 75 , 86 , 52 ]} print (raw_data) |
Output:
{‘Digimon’: [‘Kuramon’, ‘Pabumon’, ‘Punimon’, ‘Botamon’, ‘Poyomon’, ‘Koromon’, ‘Tanemon’, ‘Tsunomon’, ‘Tsumemon’, ‘Tokomon’], ‘Stage’: [‘Baby’, ‘Baby’, ‘Baby’, ‘Baby’, ‘Baby’, ‘In-Training’, ‘In-Training’, ‘In-Training’, ‘In-Training’, ‘In-Training’], ‘Type’: [‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’, ‘Free’], ‘Attribute’: [‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Neutral’, ‘Fire’, ‘Plant’, ‘Earth’, ‘Dark’, ‘Neutral’], ‘Memory’: [2, 2, 2, 2, 2, 3, 3, 3, 3, 3], ‘Equip Slots’: [0, 0, 1, 1, 1, 1, 1, 1, 1, 1], ‘Lv 50 HP’: [324, 424, 5343, 52, 63, 42, 643, 526, 42, 75], ‘Lv50 SP’: [86, 75, 64, 43, 86, 64, 344, 24, 24, 12], ‘Lv50 Atk’: [86, 74, 6335, 421, 23, 36436, 65, 75, 86, 52]}
*** QuickLaTeX cannot compile formula: *** Error message: Error: Nothing to show, formula is empty
Step 2: Create a dataframe by using the Pandas Dataframe function.
Python3
import pandas as pd raw_data_df = pd.DataFrame(raw_data,columns = [ 'Digimon' , 'Stage' , 'Type' , 'Attribute' , 'Memory' , 'Equip Slots' , 'Lv 50 HP' , 'Lv50 SP' , 'Lv50 Atk' ]) print (raw_data_df) |
Output:
Step 3: Using crosstab.
Python3
import pandas as pd raw_data_df = pd.DataFrame(raw_data,columns = [ 'Digimon' , 'Stage' , 'Type' , 'Attribute' , 'Memory' , 'Equip Slots' , 'Lv 50 HP' , 'Lv50 SP' , 'Lv50 Atk' ]) print (raw_data_df) |
Output:
You can add multiple indices (rows) to a crosstab as well. This can be done by passing a list of variables to the crosstab function, you wanted to break items down by region and quarter, you can pass these into the index parameter.
Python3
raw_data_fd = pd.crosstab( [raw_data_df[ 'Attribute' ], raw_data_df[ 'Memory' ]], raw_data_df[ 'Digimon' ], margins = True ) raw_data_fd |
Output