Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You could go a step further by putting the suffixes themselves into the trie and then identifying identical subtrees.

If you can use gzip there's bound to be a clever way of using a suffix array as well, that might end up being better unless you can use an optimised binary format for the tree.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: