Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'll withstand my statement: model based on a corpus of PR, scholar, licenses and the like texts. If they are into real statistical NLP.

Or just esthetic rules + word dictionary.



If I were to make the software, the corpus of PR, licenses, etc. would be the way I go. But "they did it statistically" doesn't answer the question "what is the model?" There are many different statistical models one could use. My other post has a few things we've figured out.

But I'm starting to think a rule-based lexicon isn't out of the question, given these >1 scores on some texts.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: