{"title":"通过连体转换器进行语义专利嵌入,用于现有技术搜索","authors":"Konrad Vowinckel, Volker D. Hähnke","doi":"10.1016/j.wpi.2023.102192","DOIUrl":null,"url":null,"abstract":"<div><p><span><span><span>The identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of </span>natural language processing in the form of </span>language models<span><span> such as BERT enable the creation of the next generation of </span>prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant (</span></span><span><math><mrow><mi>α</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>01</mn></mrow></math></span>).</p></div>","PeriodicalId":51794,"journal":{"name":"World Patent Information","volume":"73 ","pages":"Article 102192"},"PeriodicalIF":2.2000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"SEARCHFORMER: Semantic patent embeddings by siamese transformers for prior art search\",\"authors\":\"Konrad Vowinckel, Volker D. Hähnke\",\"doi\":\"10.1016/j.wpi.2023.102192\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p><span><span><span>The identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of </span>natural language processing in the form of </span>language models<span><span> such as BERT enable the creation of the next generation of </span>prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant (</span></span><span><math><mrow><mi>α</mi><mo>=</mo><mn>0</mn><mo>.</mo><mn>01</mn></mrow></math></span>).</p></div>\",\"PeriodicalId\":51794,\"journal\":{\"name\":\"World Patent Information\",\"volume\":\"73 \",\"pages\":\"Article 102192\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"World Patent Information\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0172219023000224\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"INFORMATION SCIENCE & LIBRARY SCIENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"World Patent Information","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0172219023000224","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
SEARCHFORMER: Semantic patent embeddings by siamese transformers for prior art search
The identification of relevant prior art for patent applications is of key importance for the work of patent examiners. The recent advancements in the field of natural language processing in the form of language models such as BERT enable the creation of the next generation of prior art search tools. These models can generate vectorial representations of input text, enabling the use of vector similarity as proxy for semantic text similarity. We fine-tuned a patent-specific BERT model for prior art search on a large set of real-world examples of patent claims, corresponding passages prejudicing novelty or inventive step, and random text fragments, creating the SEARCHFORMER. We show in retrospective ranking experiments that our model is a real improvement. For this purpose, we compiled an evaluation collection comprising 2014 pairs of patent application and related potential prior art documents. We employed two representative baselines for comparison: (i) an optimized combination of automatically built queries and the BM25 ranking function, and (ii) several state-of-the-art language models, including SentenceTransformers optimized for semantic retrieval. Ranking performance was measured as rank of the first relevant result. Using t-tests, we show that the achieved ranking improvements of the SEARCHFORMER over the baselines are statistically significant ().
期刊介绍:
The aim of World Patent Information is to provide a worldwide forum for the exchange of information between people working professionally in the field of Industrial Property information and documentation and to promote the widest possible use of the associated literature. Regular features include: papers concerned with all aspects of Industrial Property information and documentation; new regulations pertinent to Industrial Property information and documentation; short reports on relevant meetings and conferences; bibliographies, together with book and literature reviews.