Shared Flashcard Set

Details

Pandas
using pandas data structures
39
Computer Science
Professional
06/21/2016

Additional Computer Science Flashcards

 


 

Cards

Term
create a series
Definition

my_series = pd.Series(data, index=my_index)

 

data: 1darray, list, tuple and map

my_index: 1darray, tuple, list

Term

print all contents of a Series

print all the indices of a Series

Definition

my_series.values

my_series.index

Term
index a Series
Definition

my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])


my_series[1]

my_series['b']

my_series[['c','b']]

my_series['a': 'c']

my_series[1: 3]

my_series[[1, 3]]

Term

boolean indexing a Series

arithmatic operation

applying math function

Definition

my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])


my_series[my_series > 10]

my_series * 2

np.exp(my_series)

Term
check if a pd data structure has NaN
Definition

pd.isnull(my_pd_data_struct)

pd.notnull(my_pd_data_struct)

Term
name attribute of a Series
Definition

my_series.name

my_series.index.name

Term
arithmatic operations of Series
Definition

critical Series feature is that it automatically aligns differently indexed data in arithmetic operations.


series_1

a    10

b    11

c    12

series_2

c    1

a    2

k    3

series_1 + series_2

a    12.0

b     NaN

c    13.0

k     NaN

 
 
Term
Think of a Series as ...
Definition
A fixed length ordered dictionary
Term
change the index of a Series
Definition
my_series.index = my_list
Term
Think of a DataFrame as ...
Definition

map of lists, tuples, 1darrays or series

map_1 = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

df_1 = DataFrame(map_1)

 

map of maps

pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

df_2 = DataFrame(map_2)

Term
Change the column names and indices of a DataFrame
Definition

my_df.index = list_1

my_df.columns = list_2

Term
Indexing from DataFrame
Definition

my_df.col_name

my_df['col_name']

my_df[['col_1', 'col_2']]

 

The column returned when indexing a DataFrame is a view on the underlying data, not a copy.


df.ix[val]  # select single row

df.ix[:, val]  # select single col

 

df.ix[val1, val2]  # select both rows and cols

 
Term
Assign a single value or a sequence to a column in DataFrame
Definition

my_df['col_name'] = my_value

my_df['col_name'] = my_sequence

Term
What is the difference between assigning a list/1darray and a series to a column in a DataFrame
Definition

When assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. If you assign a Series, it will be instead conformed exactly to the

DataFrame’s index, inserting missing values in any holes

Term
Transpose a DataFrame
Definition
my_df.T
Term
DataFrame's index and columns can have name attribute
Definition

df_1

      Nevada  Ohio

2000     NaN   1.5

2001     2.4   1.7

2002     2.9   3.6


df_1.index.name = 'year'

df_1.columns.name = 'state'


df_1

state  Nevada  Ohio

year               

2000      NaN   1.5

2001      2.4   1.7

2002      2.9   3.6

 
Term
Convert values in a DataFrame into a 2darray
Definition
my_df.values
Term
Recollect everything about index objects
Definition

Any array or other sequence of labels used when constructing a Series or DataFrame is internally converted to an Index object.


Index objects are immutable


In addition to being array-like, an Index also functions as a fixed size set

Term

Which pandas method creates a new object

with the data conformed to a new index.

Definition
reindex()
Term
reindex used on Series
Definition

obj = Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])


obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
In [82]: obj2
Out[82]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN

obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)
 
 

Term
reindex used on a DataFrame
Definition

frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])


 

frame2 = frame.reindex(['a', 'b', 'c', 'd'])


 

states = ['Texas', 'Utah', 'California']


 

frame.reindex(columns=states)


 

frame.ix[['a', 'b', 'c', 'd'], states]

 
 
 
Term
Remove values from a Series
Definition

obj = Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])

obj.drop('c')

obj.drop(['d', 'c'])

 
 

 

Term
Definition

df = DataFrame(np.arange(16).reshape((4, 4)), index=['Ohio', 'Colorado', 'Utah', 'New York'], columns=['one', 'two', 'three', 'four'])


 

df.drop(['Colorado', 'Ohio'])

 

data.drop(['two', 'four'], axis=1)


 




 
Term

s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd','e'])

s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])


What is s1 + s2

Definition

In [130]: s1 + s2

Out[130]:

 5.2

 1.1

 NaN

 0.0

 NaN

 NaN


When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. The internal data alignment introduces NA values in the indices that don’t overlap

Term

Arithmatic and data alignment for DataFrames ...

 

What happens when two DataFrames with non-identical index/columns are added

Definition

s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])

s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])


s1 + s2

 
Term
How to replace NaN values created when adding two data_frames of non-identical structure, with some_value?
Definition

df1.add(df2, fill_value=0)


add, sub, div, mul are the methods where fill_value can be used


Term

frame = DataFrame(np.arange(12.).reshape((4, 3)),  columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])


series = frame.ix[0]

 

series2 = Series(range(3), index=['b', 'e', 'f'])


WHat is the output of:

frame - series

frame - series2



 
Definition

By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame's columns, broadcasting down the rows.


If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union



 



Term
In operations between a DataFrame and Series, how do we make the Series broadcast over columns instead of rows?
Definition

frame = DataFrame(np.arange(12.).reshape((4, 3)),  columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])


series3 = frame['d']


 

frame.sub(series3, axis=0)

Term

How do numpy ufuncs work on pandas objects?


How does a function that takes 1darray as input work on DataFrame?


How does a function (call it func_2) that takes single_element as input work on DataFrame?


How does func_2 work on a Series?


Definition

np.abs(my_df)

np.abs(my_series)


func_1  # some function that takes 1darray as input

my_df.apply(func_1)  # default is axis=0

my_df.apply(func_1, axis=1)


func_2  # some function that takes one_element as input

 

my_df.applymap(func_2)  


my_series.map(func_2)



Term

How to sort indices of a Series?

 

How to sort indices of a DataFrame along both the axises?

Definition

my_series.sort_index()

 

my_df.sort_index(axis=0)

my_df.sort_index(axis=1)

Term

How to sort values in a series?

 

What happens to NaN values?

Definition

my_series.order()


Nan values are placed at the end of the series

Term
How do I sort the rows of a DataFrame according to values in a particular column?
Definition

frame = DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})


 

frame.sort_index(by='b')

Term

Is it mandatory to have unique indices for pandas objects?


How do we know if a object has unique index?


What is the output when indexing an pd object with non-unique index?

Definition

No.


my_series.index.is_unique


obj = Series(range(5), index=['a', 'a', 'b', 'b', 'c'])

 

obj['a']

 

Term
What does my_df.describe() do?
Definition
try it
Term

How do methods of pandas data-structures deal with missing values?


Methods to check if a pandas object has missing values?


How do I remove NaN from a Series?


How do I remove a rows from data-frame if rows have even a single NaN?


How to remove rows which only have NaN values?


How do I do the above two with columns of DF?


How do I remove rows or columns from a DF that have less than a certain no. of non-NA elements?

Definition

NA values are ignored by methods for computation.


my_series.isnull()  or my_series.notnull()

my_df.isnull()


my_series.dropna()


my_df.dropna()


my_df.dropna(how='all')


my_df.dropna(axis=1);  my_df.dropna(how='all', axis=1)


my_df.dropna(thresh=some_int)

Term

How to fill NA with a certain value?


Does the above operation make changes to the object or create a new object?

Definition

my_df.fillna(my_value)


fillna() creates a new object. To make changes to the object without creating a new object do the following:


_ = my_df.fillna(some_value, inplace=True)

Term

Hirarchial Indexing

 


data = Series(np.random.randn(10), index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])


frame = DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])

 
 
Definition

data.index  # returns a list of 2-element-tuples


 

data['b']; data['b':'c']; data.ix[['b', 'd']];  data[:, 2]


 

data.unstack()  # creates a DataFrame from a multi-index Series


 

data.unstack().stack()  # gets back the Series


frame.index.names = ['key1', 'key2']

 

frame.columns.names = ['state', 'color']


# create a multi-index

MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],names=['state', 'color'])

Term
How do I add elements to a Series
Definition

my_series.add(my_data, fill_value=0)

# my_data could be a scalar value or another series

Term
How to produce a histogram of unique values in a series or column-of-DataFrame?
Definition

my_series.value_counts()


Supporting users have an ad free experience!