Shared Flashcard Set

Details

Title

Pandas

Description

using pandas data structures

Total Cards

Subject

Computer Science

Level

Professional

Created

06/21/2016

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

create a series

Definition

my_series = pd.Series(data, index=my_index)

data: 1darray, list, tuple and map

my_index: 1darray, tuple, list

Term

print all contents of a Series

print all the indices of a Series

Definition

my_series.values

my_series.index

Term

index a Series

Definition

my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])

my_series[1]

my_series['b']

my_series[['c','b']]

my_series['a': 'c']

my_series[1: 3]

my_series[[1, 3]]

Term

boolean indexing a Series

arithmatic operation

applying math function

Definition

my_series = pd.Series(np.arange(10, 13), index=['a', 'b', 'c'])

my_series[my_series > 10]

my_series * 2

np.exp(my_series)

Term

check if a pd data structure has NaN

Definition

pd.isnull(my_pd_data_struct)

pd.notnull(my_pd_data_struct)

Term

name attribute of a Series

Definition

my_series.name

my_series.index.name

Term

arithmatic operations of Series

Definition

A critical Series feature is that it automatically aligns differently indexed data in arithmetic operations.

series_1

a 10

b 11

c 12

series_2

c 1

a 2

k 3

series_1 + series_2

a 12.0

b NaN

c 13.0

k NaN

Term

Think of a Series as ...

Definition

A fixed length ordered dictionary

Term

change the index of a Series

Definition

my_series.index = my_list

Term

Think of a DataFrame as ...

Definition

map of lists, tuples, 1darrays or series

map_1 = {'state': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'],'year': [2000, 2001, 2002, 2001, 2002], 'pop': [1.5, 1.7, 3.6, 2.4, 2.9]}

df_1 = DataFrame(map_1)

map of maps

pop = {'Nevada': {2001: 2.4, 2002: 2.9}, 'Ohio': {2000: 1.5, 2001: 1.7, 2002: 3.6}}

df_2 = DataFrame(map_2)

Term

Change the column names and indices of a DataFrame

Definition

my_df.index = list_1

my_df.columns = list_2

Term

Indexing from DataFrame

Definition

my_df.col_name

my_df['col_name']

my_df[['col_1', 'col_2']]

The column returned when indexing a DataFrame is a view on the underlying data, not a copy.

df.ix[val] # select single row

df.ix[:, val] # select single col

df.ix[val1, val2] # select both rows and cols

Term

Assign a single value or a sequence to a column in DataFrame

Definition

my_df['col_name'] = my_value

my_df['col_name'] = my_sequence

Term

What is the difference between assigning a list/1darray and a series to a column in a DataFrame

Definition

When assigning lists or arrays to a column, the value’s length must match the length of the DataFrame. If you assign a Series, it will be instead conformed exactly to the

DataFrame’s index, inserting missing values in any holes

Term

Transpose a DataFrame

Definition

my_df.T

Term

DataFrame's index and columns can have name attribute

Definition

df_1

Nevada Ohio

2000 NaN 1.5

2001 2.4 1.7

2002 2.9 3.6

df_1.index.name = 'year'

df_1.columns.name = 'state'

df_1

state Nevada Ohio

year

2000 NaN 1.5

2001 2.4 1.7

2002 2.9 3.6

Term

Convert values in a DataFrame into a 2darray

Definition

my_df.values

Term

Recollect everything about index objects

Definition

Any array or other sequence of labels used when constructing a Series or DataFrame is internally converted to an Index object.

Index objects are immutable

In addition to being array-like, an Index also functions as a fixed size set

Term

Which pandas method creates a new object

with the data conformed to a new index.

Definition

reindex()

Term

reindex used on Series

Definition

obj = Series([4.5, 7.2, -5.3, 3.6], index=['d', 'b', 'a', 'c'])

obj2 = obj.reindex(['a', 'b', 'c', 'd', 'e'])

In [82]: obj2

Out[82]:

a -5.3

b 7.2

c 3.6

d 4.5

e NaN

obj.reindex(['a', 'b', 'c', 'd', 'e'], fill_value=0)

Term

reindex used on a DataFrame

Definition

frame = DataFrame(np.arange(9).reshape((3, 3)), index=['a', 'c', 'd'], columns=['Ohio', 'Texas', 'California'])

frame2 = frame.reindex(['a', 'b', 'c', 'd'])

states = ['Texas', 'Utah', 'California']

frame.reindex(columns=states)

frame.ix[['a', 'b', 'c', 'd'], states]

Term

Remove values from a Series

Definition

obj = Series(np.arange(5.), index=['a', 'b', 'c', 'd', 'e'])

obj.drop('c')

obj.drop(['d', 'c'])

Term

Definition

df = DataFrame(np.arange(16).reshape((4, 4)), index=['Ohio', 'Colorado', 'Utah', 'New York'], columns=['one', 'two', 'three', 'four'])

df.drop(['Colorado', 'Ohio'])

data.drop(['two', 'four'], axis=1)

Term

s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd','e'])

s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])

What is s1 + s2

Definition

In [130]: s1 + s2

Out[130]:

a  5.2

c  1.1

d  NaN

e  0.0

f  NaN

g  NaN

When adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs. The internal data alignment introduces NA values in the indices that don’t overlap

Term

Arithmatic and data alignment for DataFrames ...

What happens when two DataFrames with non-identical index/columns are added

Definition

s1 = Series([7.3, -2.5, 3.4, 1.5], index=['a', 'c', 'd', 'e'])

s2 = Series([-2.1, 3.6, -1.5, 4, 3.1], index=['a', 'c', 'e', 'f', 'g'])

s1 + s2

Term

How to replace NaN values created when adding two data_frames of non-identical structure, with some_value?

Definition

df1.add(df2, fill_value=0)

add, sub, div, mul are the methods where fill_value can be used

Term

frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

series = frame.ix[0]

series2 = Series(range(3), index=['b', 'e', 'f'])

WHat is the output of:

frame - series

frame - series2

Definition

By default, arithmetic between DataFrame and Series matches the index of the Series on the DataFrame's columns, broadcasting down the rows.

If an index value is not found in either the DataFrame’s columns or the Series’s index, the objects will be reindexed to form the union

Term

In operations between a DataFrame and Series, how do we make the Series broadcast over columns instead of rows?

Definition

frame = DataFrame(np.arange(12.).reshape((4, 3)), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

series3 = frame['d']

frame.sub(series3, axis=0)

Term

How do numpy ufuncs work on pandas objects?

How does a function that takes 1darray as input work on DataFrame?

How does a function (call it func_2) that takes single_element as input work on DataFrame?

How does func_2 work on a Series?

Definition

np.abs(my_df)

np.abs(my_series)

func_1 # some function that takes 1darray as input

my_df.apply(func_1) # default is axis=0

my_df.apply(func_1, axis=1)

func_2 # some function that takes one_element as input

my_df.applymap(func_2)

my_series.map(func_2)

Term

How to sort indices of a Series?

How to sort indices of a DataFrame along both the axises?

Definition

my_series.sort_index()

my_df.sort_index(axis=0)

my_df.sort_index(axis=1)

Term

How to sort values in a series?

What happens to NaN values?

Definition

my_series.order()

Nan values are placed at the end of the series

Term

How do I sort the rows of a DataFrame according to values in a particular column?

Definition

frame = DataFrame({'b': [4, 7, -3, 2], 'a': [0, 1, 0, 1]})

frame.sort_index(by='b')

Term

Is it mandatory to have unique indices for pandas objects?

How do we know if a object has unique index?

What is the output when indexing an pd object with non-unique index?

Definition

No.

my_series.index.is_unique

obj = Series(range(5), index=['a', 'a', 'b', 'b', 'c'])

obj['a']

Term

What does my_df.describe() do?

Definition

try it

Term

How do methods of pandas data-structures deal with missing values?

Methods to check if a pandas object has missing values?

How do I remove NaN from a Series?

How do I remove a rows from data-frame if rows have even a single NaN?

How to remove rows which only have NaN values?

How do I do the above two with columns of DF?

How do I remove rows or columns from a DF that have less than a certain no. of non-NA elements?

Definition

NA values are ignored by methods for computation.

my_series.isnull() or my_series.notnull()

my_df.isnull()

my_series.dropna()

my_df.dropna()

my_df.dropna(how='all')

my_df.dropna(axis=1); my_df.dropna(how='all', axis=1)

my_df.dropna(thresh=some_int)

Term

How to fill NA with a certain value?

Does the above operation make changes to the object or create a new object?

Definition

my_df.fillna(my_value)

fillna() creates a new object. To make changes to the object without creating a new object do the following:

_ = my_df.fillna(some_value, inplace=True)

Term

Hirarchial Indexing

data = Series(np.random.randn(10), index=[['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 2, 3, 1, 2, 2, 3]])

frame = DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])

Definition

data.index # returns a list of 2-element-tuples

data['b']; data['b':'c']; data.ix[['b', 'd']]; data[:, 2]

data.unstack() # creates a DataFrame from a multi-index Series

data.unstack().stack() # gets back the Series

frame.index.names = ['key1', 'key2']

frame.columns.names = ['state', 'color']

# create a multi-index

MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']],names=['state', 'color'])

Term

How do I add elements to a Series

Definition

my_series.add(my_data, fill_value=0)

# my_data could be a scalar value or another series

Term

How to produce a histogram of unique values in a series or column-of-DataFrame?

Definition

my_series.value_counts()

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile