Term
|
Definition
my_series = pd.Series(data, index=my_index)
data: 1darray, list, tuple and map
my_index: 1darray, tuple, list |
|
|
Term
print all contents of a Series
print all the indices of a Series |
|
Definition
my_series.values
my_series.index |
|
|
Term
|
Definition
my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])
my_series[1]
my_series['b']
my_series[['c','b']]
my_series['a': 'c']
my_series[1: 3]
my_series[[1, 3]] |
|
|
Term
boolean indexing a Series
arithmatic operation
applying math function |
|
Definition
my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])
my_series[my_series > 10]
my_series * 2
np.exp(my_series) |
|
|
Term
check if a pd data structure has NaN |
|
Definition
pd.isnull(my_pd_data_struct)
pd.notnull(my_pd_data_struct) |
|
|
Term
name attribute of a Series |
|
Definition
my_series.name
my_series.index.name |
|
|
Term
arithmatic operations of Series |
|
Definition
A critical Series feature is that it automatically aligns differently indexed data in arithmetic operations.
series_1
a 10
b 11
c 12
series_2
c 1
a 2
k 3
series_1 + series_2
a 12.0
b NaN
c 13.0
k NaN
|
|
|
Term
|
Definition
A fixed length ordered dictionary |
|
|
Term
change the index of a Series |
|
Definition
my_series.index = my_list |
|
|
Term
Think of a DataFrame as ... |
|
Definition
map of lists, tuples, 1darrays or series
map_1 = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}
df_1 = DataFrame(map_1)
map of maps
pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}
df_2 = DataFrame(map_2) |
|
|
Term
Change the column names and indices of a DataFrame |
|
Definition
my_df.index = list_1
my_df.columns = list_2 |
|
|
Term
|
Definition
my_df.col_name
my_df['col_name']
my_df[['col_1', 'col_2']]
The column returned when indexing a DataFrame is a view on the underlying data, not a copy.
df.ix[val] # select single row
df.ix[:, val] # select single col
df.ix[val1, val2] # select both rows and cols
|
|
|
Term
Assign a single value or a sequence to a column in DataFrame |
|
Definition
my_df['col_name'] = my_value
my_df['col_name'] = my_sequence |
|
|
Term
What is the difference between assigning a list/1darray and a series to a column in a DataFrame |
|
Definition
When assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. If you assign a Series, it will be instead conformed exactly to the
DataFrame’s index, inserting missing values in any holes |
|
|
Term
|
Definition
|
|
Term
DataFrame's index and columns can have name attribute |
|
Definition
df_1
Nevada Ohio
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
df_1.index.name = 'year'
df_1.columns.name = 'state'
df_1
state Nevada Ohio
year
2000 NaN 1.5
2001 2.4 1.7
2002 2.9 3.6
|
|
|
Term
Convert values in a DataFrame into a 2darray |
|
Definition
|
|
Term
Recollect everything about index objects |
|
Definition
Any array or other sequence of labels used when constructing a Series or DataFrame is internally converted to an Index object.
Index objects are immutable
In addition to being array-like, an Index also functions as a fixed size set |
|
|
Term
Which pandas method creates a new object
with the data conformed to a new index. |
|
Definition
|
|
Term
|
Definition
obj = Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])
obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])
In [82]: obj2
Out[82]:
a -5.3
b 7.2
c 3.6
d 4.5
e NaN
obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)
|
|
|
Term
reindex used on a DataFrame |
|
Definition
frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])
frame2 = frame.reindex(['a', 'b', 'c', 'd'])
states = ['Texas', 'Utah', 'California']
frame.reindex(columns=states)
frame.ix[['a', 'b', 'c', 'd'], states]
|
|
|
Term
Remove values from a Series |
|
Definition
obj = Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])
obj.drop('c')
obj.drop(['d', 'c'])
|
|
|
Term
|
Definition
df = DataFrame(np.arange(16).reshape((4, 4)), index=['Ohio', 'Colorado', 'Utah', 'New York'], columns=['one', 'two', 'three', 'four'])
df.drop(['Colorado', 'Ohio'])
data.drop(['two', 'four'], axis=1)
|
|
|
Term
s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd','e'])
s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
What is s1 + s2 |
|
Definition
In [130]: s1 + s2
Out[130]:
a 5.2
c 1.1
d NaN
e 0.0
f NaN
g NaN
When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. The internal data alignment introduces NA values in the indices that don’t overlap
|
|
|
Term
Arithmatic and data alignment for DataFrames ...
What happens when two DataFrames with non-identical index/columns are added |
|
Definition
s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])
s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])
s1 + s2
|
|
|
Term
How to replace NaN values created when adding two data_frames of non-identical structure, with some_value? |
|
Definition
df1.add(df2, fill_value=0)
add, sub, div, mul are the methods where fill_value can be used
|
|
|
Term
frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
series = frame.ix[0]
series2 = Series(range(3), index=['b', 'e', 'f'])
WHat is the output of:
frame - series
frame - series2
|
|
Definition
By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame's columns, broadcasting down the rows.
If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union
|
|
|
Term
In operations between a DataFrame and Series, how do we make the Series broadcast over columns instead of rows? |
|
Definition
frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])
series3 = frame['d']
frame.sub(series3, axis=0) |
|
|
Term
How do numpy ufuncs work on pandas objects?
How does a function that takes 1darray as input work on DataFrame?
How does a function (call it func_2) that takes single_element as input work on DataFrame?
How does func_2 work on a Series?
|
|
Definition
np.abs(my_df)
np.abs(my_series)
func_1 # some function that takes 1darray as input
my_df.apply(func_1) # default is axis=0
my_df.apply(func_1, axis=1)
func_2 # some function that takes one_element as input
my_df.applymap(func_2)
my_series.map(func_2)
|
|
|
Term
How to sort indices of a Series?
How to sort indices of a DataFrame along both the axises? |
|
Definition
my_series.sort_index()
my_df.sort_index(axis=0)
my_df.sort_index(axis=1) |
|
|
Term
How to sort values in a series?
What happens to NaN values? |
|
Definition
my_series.order()
Nan values are placed at the end of the series |
|
|
Term
How do I sort the rows of a DataFrame according to values in a particular column? |
|
Definition
frame = DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})
frame.sort_index(by='b') |
|
|
Term
Is it mandatory to have unique indices for pandas objects?
How do we know if a object has unique index?
What is the output when indexing an pd object with non-unique index? |
|
Definition
No.
my_series.index.is_unique
obj = Series(range(5), index=['a', 'a', 'b', 'b', 'c'])
obj['a']
|
|
|
Term
What does my_df.describe() do? |
|
Definition
|
|
Term
How do methods of pandas data-structures deal with missing values?
Methods to check if a pandas object has missing values?
How do I remove NaN from a Series?
How do I remove a rows from data-frame if rows have even a single NaN?
How to remove rows which only have NaN values?
How do I do the above two with columns of DF?
How do I remove rows or columns from a DF that have less than a certain no. of non-NA elements? |
|
Definition
NA values are ignored by methods for computation.
my_series.isnull() or my_series.notnull()
my_df.isnull()
my_series.dropna()
my_df.dropna()
my_df.dropna(how='all')
my_df.dropna(axis=1); my_df.dropna(how='all', axis=1)
my_df.dropna(thresh=some_int) |
|
|
Term
How to fill NA with a certain value?
Does the above operation make changes to the object or create a new object? |
|
Definition
my_df.fillna(my_value)
fillna() creates a new object. To make changes to the object without creating a new object do the following:
_ = my_df.fillna(some_value, inplace=True) |
|
|
Term
Hirarchial Indexing
data = Series(np.random.randn(10), index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])
frame = DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
|
|
Definition
data.index # returns a list of 2-element-tuples
data['b']; data['b':'c']; data.ix[['b', 'd']]; data[:, 2]
data.unstack() # creates a DataFrame from a multi-index Series
data.unstack().stack() # gets back the Series
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
# create a multi-index
MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],names=['state', 'color']) |
|
|
Term
How do I add elements to a Series |
|
Definition
my_series.add(my_data, fill_value=0)
# my_data could be a scalar value or another series |
|
|
Term
How to produce a histogram of unique values in a series or column-of-DataFrame? |
|
Definition
|
|