pandas - How to Add a New Column With Selected Values from Another Column In Python -


i have been trying figure out day. new python.

i have table 50,000 records. table below explain trying do.

i add third column called category. column contain values based results conditions set on movies column.

----------------------------------------- n     | movies                ----------------------------------------- 1        | save last dance  ----------------------------------------- 2        | love , other drugs --------------------------------------- 3        | dance me       --------------------------------------- 4        | love        --------------------------------------- 5        | high school musical ---------------------------------------- 

the condition this; search through movies column these words {dance, love, , musical). if word found in string, return word in category column.

this produce new dataframe @ end;

----------------------------------------- n        | movies               | category ----------------------------------------- 1        | save last dance  | dance ----------------------------------------- 2        | love , other drugs | love --------------------------------------- 3        | dance me        | dance --------------------------------------- 4        | love        | love --------------------------------------- 5        | high school musical  | musical ---------------------------------------- 

thanks in advance!!

share|improve question
up vote 0 down vote accepted

a faster way create mask categories, assuming have smallish number:

in [22]:  dance_mask = df['movies'].str.contains('dance') love_mask = df['movies'].str.contains('love') musical_mask = df['movies'].str.contains('musical') df[dance_mask] out[22]:    n               movies 0  1  save last dance 2  3        dance me  [2 rows x 2 columns]  in [26]: # set category df.ix[dance_mask,'category'] = 'dance' df out[26]:    n                movies category 0  1   save last dance    dance 1  2  love , other drugs      nan 2  3         dance me    dance 3  4         love      nan 4  5   high school musical      nan  [5 rows x 3 columns]  in [28]: # repeat remaining masks df.ix[love_mask,'category'] = 'love' df.ix[musical_mask,'category'] = 'musical' df out[28]:    n                movies category 0  1   save last dance    dance 1  2  love , other drugs     love 2  3         dance me    dance 3  4         love     love 4  5   high school musical  musical  [5 rows x 3 columns] 
share|improve answer
    
thanks alot taking time answer question. needed. – ian apr 7 '14 @ 10:55
    
@ian no problems, can accept answer, there should tick mark underneath arrows. – edchum apr 7 '14 @ 10:59

if have 2d list this:

def add_category(record):     movie = record[1]     categories = []     category in ['dance', 'love', 'musical']:         if category in movie:             categories.append(category)     return record.append(', '.join(categories))  database = [add_category(record) record in database] 

you can change how values category column calculated changing add_category() function.

share|improve answer
    
hi scorpion_god, help. when tried this, got error, 'str' object has no attribute '_data'. have used solution below, , worked. – ian apr 7 '14 @ 10:56

your answer

 
discard

posting answer, agree privacy policy , terms of service.

not answer you're looking for? browse other questions tagged or ask own question.

Comments