{"title":"从网上拍卖的杂项文件中提取固定信息","authors":"Yukitaka Kusumura, Y. Hijikata, S. Nishida","doi":"10.1109/AINA.2003.1192919","DOIUrl":null,"url":null,"abstract":"Net auctions have been widely utilized with the recent development of the Internet. However it has a problem that there are too many items for bidders to select the most suitable one. We aim at supporting bidders on net auctions by automatically extracting the information of the item's features from Web pages in net auctions and generating a table containing the features of some items for comparison. But because descriptions are not uniform in net auctions, there are two problems in extracting the features. The first problem is that there are some formats. The second problem is that the keywords of features are sometimes omitted. We proposed the solutions to the problems. The solution to the first problem is to distinguish the format type from tables, items and sentences, and extract the feature values in the most suitable way. The solution to the second problem is to learn the keywords in extracting from the descriptions with the keywords. And after that, the keywords are used in extracting from the descriptions without keywords. And we constructed the system which collects the information of items, extracts their features from their text information by text mining methods and generates the table containing extracted features.","PeriodicalId":382765,"journal":{"name":"17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003.","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Extracting fixed information from miscellaneous documents on net auction\",\"authors\":\"Yukitaka Kusumura, Y. Hijikata, S. Nishida\",\"doi\":\"10.1109/AINA.2003.1192919\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Net auctions have been widely utilized with the recent development of the Internet. However it has a problem that there are too many items for bidders to select the most suitable one. We aim at supporting bidders on net auctions by automatically extracting the information of the item's features from Web pages in net auctions and generating a table containing the features of some items for comparison. But because descriptions are not uniform in net auctions, there are two problems in extracting the features. The first problem is that there are some formats. The second problem is that the keywords of features are sometimes omitted. We proposed the solutions to the problems. The solution to the first problem is to distinguish the format type from tables, items and sentences, and extract the feature values in the most suitable way. The solution to the second problem is to learn the keywords in extracting from the descriptions with the keywords. And after that, the keywords are used in extracting from the descriptions without keywords. And we constructed the system which collects the information of items, extracts their features from their text information by text mining methods and generates the table containing extracted features.\",\"PeriodicalId\":382765,\"journal\":{\"name\":\"17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003.\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2003-03-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/AINA.2003.1192919\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"17th International Conference on Advanced Information Networking and Applications, 2003. AINA 2003.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/AINA.2003.1192919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Extracting fixed information from miscellaneous documents on net auction
Net auctions have been widely utilized with the recent development of the Internet. However it has a problem that there are too many items for bidders to select the most suitable one. We aim at supporting bidders on net auctions by automatically extracting the information of the item's features from Web pages in net auctions and generating a table containing the features of some items for comparison. But because descriptions are not uniform in net auctions, there are two problems in extracting the features. The first problem is that there are some formats. The second problem is that the keywords of features are sometimes omitted. We proposed the solutions to the problems. The solution to the first problem is to distinguish the format type from tables, items and sentences, and extract the feature values in the most suitable way. The solution to the second problem is to learn the keywords in extracting from the descriptions with the keywords. And after that, the keywords are used in extracting from the descriptions without keywords. And we constructed the system which collects the information of items, extracts their features from their text information by text mining methods and generates the table containing extracted features.