Create dataframe in Pandas

One of the simple modes to create dataframe in Python, pandas is to create it from a dictionary. Below example create a dataframe from dictionary.

import pandas as pd

dict1 = {

    'Ford': [120, 230, 120, 431],

    'Renault': [320, 233, 547, 622],

    'Audi': [230, 123, 457, 232],

    'Toyota': [230, 123, 457, 232],

    'Opel': [230, 123, 457, 232]

}

print(dict1.keys())

print("Ford key is:", end=' ')

print(dict1['Ford'])

sells = pd.DataFrame(dict1) # create sells dataframe from 'dict1' dictionary

print('DataFrame is:')

print(sells)


#output:

dict_keys(['Ford', 'Renault', 'Audi', 'Toyota', 'Opel'])

Ford key is: [120, 230, 120, 431]

DataFrame is:

   Ford  Renault  Audi  Toyota  Opel

0   120      320   230     230   230

1   230      233   123     123   123

2   120      547   457     457   457

3   431      622   232     232   232

Comments about above code: "dict1" dictionary contain situation with car sales for a machine dealer, for first 4 month of the year.

Dictionary keys are 'Ford', 'Renault' 'Audi' 'Toyota' 'Opel'.

Dictionary values are lists, means for 'Ford' key value is list [120, 230, 120, 431].

Recalling from dictionaries theory print(dict1.keys()), which will show: 

dict_keys(['Ford', 'Renault', 'Audi', 'Toyota', 'Opel'])

and print(dict1['Ford']) will show [120, 230, 120, 431].

And finally, method that create sells dataframe from dictionary is: sells = pd.DataFrame(dict1).


There is a possibility to create dataframe from dictionary using method from_dict of class DataFrame, which works similar like in previous example, but it has more options.

Example: 

import pandas as pd

dict2 = {

    'candy':['80%', '60%', '45%'],

    'chocolate':['12%', '24%', '7%'],

    'wafer':['14%', '18%', '16%']

}


sells2=pd.DataFrame.from_dict(dict2)

print(sells2)


#output: 

  candy chocolate wafer

0   80%       12%   14%

1   60%       24%   18%

2   45%        7%   16%

We see using DataFrame.from_dict specifying only dictionary from which we create dataframe, it will create a dataframe in which keys become dataframe columns.


Using from_dict with parameter orient='index' will create a dataframe in which keys are first values in row like below example: 

import pandas as pd

dict2 = {

    'candy':['80%', '60%', '45%'],

    'chocolate':['12%', '24%', '7%'],

    'wafer':['14%', '18%', '16%']

}


sells2i=pd.DataFrame.from_dict(dict2, orient='index')

print(sells2i)


#output:

                   0       1       2

candy         80%  60%  45%

chocolate  12%   24%   7%

wafer         14%   18%  16%


Creating dataframe from a list of tuples. 

This can be achieved using method DataFrame.from_records

Example: 

import pandas as pd

marks = [('Mike', 9), ('Debora', 10), ('Steve', 9), ('Tim', 8)]

marks_df = pd.DataFrame.from_records(marks, columns=['Student', 'Mark'])


#Output:

      Student  Mark

0    Mike         9

1    Debora    10

2    Steve        9

3    Tim           8


Other example for dataframe from list of tuples with more elements: 

import pandas as pd

marks = [('Mike', 9, 8), ('Debora', 10, 9), ('Steve', 9, 10), ('Tim', 8, 9)]

marks_df = pd.DataFrame.from_records(marks, columns=['Student', 'Mark1','Mark2'])

print(marks_df)


#output:

  Student     Mark1  Mark2

0    Mike       9         8

1   Debora    10       9

2   Steve        9        10

3   Tim           8        9


from_records can be used similar to create dataframe from a list of dictionaries,

Example:

import pandas as pd

marks = [{'Student':'Mike' , 'Mark':9},

               {'Student':'Debora' , 'Mark':10},

       {'Student':'Steve' , 'Mark':9},

       {'Student':'Tim' , 'Mark':8}

        ]

marks_df1 = pd.DataFrame.from_records(marks)

print(marks_df1)


#Output

      Student   Mark

0    Mike         9

1    Debora    10

2    Steve        9

3    Tim           8


We can create dataframe from numpy array

Example:

import pandas as pd

import numpy as np

data = np.array([('Student','Mark'),('Mike', 9), ('Debora', '10'), ('Steve', 9), ('Tim', 8)])

marks_df2= pd.DataFrame.from_records(data)

print(marks_df2)


#output:

         0            1

0     Student  Mark

1     Mike         9

2     Debora    10

3     Steve        9

4     Tim           8