mina-lis2600: Week12 Reading Note

This week’s reading is about how to indexing and aggregating information in the web. As you imagine, the data in the web is increasing exponentially and it is challenging for an information specialist to indexing and archiving those for future uses. Those articles show real attempts of this process and explain the algorithm behind these. In the Web Search Engines, David Hawking tries to explain how to index information through “the whole of the Web”. At the first part of the article, he reveals that the search engines such as Google, Yahoo, and MSN establish data centers as crawlers and collect data by them. And then, he describes issues which occur during the crawling process. They are related to the speed of crawling machine, crawler’s politeness while archiving information in a certain web site, and how to exclude and duplicate content. At the second part of the article, Hawking explains that indexing and query processing algorithm. Although the content of the article is strange and hard to understand but I can finally understand it thanks to the author’s clarification about terms in the process. OAI (open Archives Initiative) has developed standards for effective distribution of contents which is interoperable among diverse digital environment. Also, OAI offers the open access and its projects are extending to various fields.

mina-lis2600

Sunday, April 1, 2012

Week12 Reading Note

No comments:

Post a Comment