sql - How to calculate TF-IDF in OracleSQL? -


this text mining project. purpose of project see how every word weighs differently in different document.

now having 2 tables, 1 table tf information (word | wordfrequency_in_eachfile), table idf (word | howmanyfile_have_eachword). not sure query use calculation.

the math trying here is: wordfrequency_in_eachfile*(log(n/howmanyfile_have_eachword)+1) n total number of document. below code:

create table tf_idf (word, tf*idf) select a.frequency*((log(10,132366/b.totalcount)+1))  term_frequency a, document_frequency b a.word=b.word; 

here 1323266 total number of documents, , totalcount how many documents word shows.

since new sql, appreciate little explanation code. lot!

calculation looks good, there invalid syntax.

right variant may below:

create table tf_idf select    a.word                                           word,   a.frequency*( log(10, 132366/b.totalcount) + 1)  tfidf    term_frequency     a,    document_frequency b    a.word=b.word ; 

in create ... select ... statement don't need column specifications. column names , types derived field aliases. also, must provide values word column in new table. , 1 more point: there 1 excess pair of brackets in expression.


Comments

Popular posts from this blog

PHPMotion implementation - URL based videos (Hosted on separate location) -

javascript - Using Windows Media Player as video fallback for video tag -

c# - Unity IoC Lifetime per HttpRequest for UserStore -