Application of some Retrieved Information Method on Internet
This paper compares several methods of information extraction on the internet. Today, internet has become a
treasure of knowledge. Every year, thousands of pieces of
different information are posted on the internet. So, extracted
information on the internet for many different purposes has
become an important problem today. Users may extract
information based on some available tools such as Lapis, Risk,
Rapier, Wien, and Stalker. However, these tools have a
disadvantage: we must update the training data when the
website changes. So SVM and CRF associated with natural
language processing are the best solutions to solve this
problem. Information extraction from online Vietnamese news
website with SVM and CRF brings experiment results that is
very optimistic. Its results reach nearly 90% of the accuracy
in websites and the processing time is less than one minute per
site when the specified number of link levels is 1 within the
base site.
Keywords: RI (Retrieved Information), CRFs
(Condition Random Fields), SVM (Support Vector Machine), ECT (Embedded Catalog Tree)
Download Full-Text








