sql - How to calculate TF-IDF in OracleSQL? -
this text mining project. purpose of project see how every word weighs differently in different document.
now having 2 tables, 1 table tf information (word | wordfrequency_in_eachfile), table idf (word | howmanyfile_have_eachword). not sure query use calculation.
the math trying here is: wordfrequency_in_eachfile*(log(n/howmanyfile_have_eachword)+1)
n total number of document. below code:
create table tf_idf (word, tf*idf) select a.frequency*((log(10,132366/b.totalcount)+1)) term_frequency a, document_frequency b a.word=b.word;
here 1323266 total number of documents, , totalcount how many documents word shows.
since new sql, appreciate little explanation code. lot!
calculation looks good, there invalid syntax.
right variant may below:
create table tf_idf select a.word word, a.frequency*( log(10, 132366/b.totalcount) + 1) tfidf term_frequency a, document_frequency b a.word=b.word ;
in create ... select ...
statement don't need column specifications. column names , types derived field aliases. also, must provide values word
column in new table. , 1 more point: there 1 excess pair of brackets in expression.
Comments
Post a Comment