Despite recent dramatic successes, natural language processing (NLP) is not ready to address a variety of real-world problems. Its reliance on large standard corpora, a training and evaluation paradigm that favors the learning of shallow heuristics, and large computational resource requirements, makes domain-specific application of even the most successful NLP techniques difficult. This paper proposes technical language processing (TLP) which brings engineering principles and practices to NLP specifically for the purpose of extracting actionable information from language generated by experts in their technical tasks, systems, and processes. TLP envisages NLP as a socio-technical system rather than as an algorithmic pipeline. We describe how the TLP approach to meaning and generalization differs from that of NLP, how data quantity and quality can be addressed in engineering technical domains, and the potential risks of not adapting NLP for technical use cases. Engineering problems can benefit immensely from the inclusion of knowledge from unstructured data, currently unavailable due to issues with out of the box NLP packages. We illustrate the TLP approach by focusing on maintenance in industrial organizations as a case-study.
More accurate predictions of the biological properties of chemical compounds would guide the selection and design of new compounds in drug discovery and help to address the enormous cost and low success-rate of pharmaceutical R&D. However, this domain presents a significant challenge for AI methods due to the sparsity of compound data and the noise inherent in results from biological experiments. In this paper, we demonstrate how data imputation using deep learning provides substantial improvements over quantitative structure-activity relationship (QSAR) machine learning models that are widely applied in drug discovery. We present the largest-to-date successful application of deep-learning imputation to datasets which are comparable in size to the corporate data repository of a pharmaceutical company (678 994 compounds by 1166 endpoints). We demonstrate this improvement for three areas of practical application linked to distinct use cases; (a) target activity data compiled from a range of drug discovery projects, (b) a high value and heterogeneous dataset covering complex absorption, distribution, metabolism, and elimination properties, and (c) high throughput screening data, testing the algorithm's limits on early stage noisy and very sparse data. Achieving median coefficients of determination, R2, of 0.69, 0.36, and 0.43, respectively, across these applications, the deep learning imputation method offers an unambiguous improvement over random forest QSAR methods, which achieve median R2 values of 0.28, 0.19, and 0.23, respectively. We also demonstrate that robust estimates of the uncertainties in the predicted values correlate strongly with the accuracies in prediction, enabling greater confidence in decision-making based on the imputed values.
Predicting equipment failure is important because it could improve availability and cut down the operating budget. Previous literature has attempted to model failure rate with bathtub-formed function, Weibull distribution, Bayesian network, or analytic hierarchy process. But these models perform well with a sufficient amount of data and could not incorporate the two salient characteristics: imbalanced category and sharing structure. Hierarchical model has the advantage of partial pooling. The proposed model is based on Bayesian hierarchical B-spline. Time series of the failure rate of 99 Republic of Korea Naval ships are modeled hierarchically, where each layer corresponds to ship engine, engine type, and engine archetype. As a result of the analysis, the suggested model predicted the failure rate of an entire lifetime accurately in multiple situational conditions, such as prior knowledge of the engine.
In elite sports, there is an opportunity to take advantage of rich and detailed datasets generated across multiple threads of the sporting business. Challenges currently exist due to time constraints to analyse the data, as well as the quantity and variety of data available to assess. Artificial Intelligence (AI) techniques can be a valuable asset in assisting decision makers in tackling such challenges, but deep AI skills are generally not held by those with rich experience in sporting domains. Here, we describe how certain commonly available AI services can be used to provide analytic assistance to sports experts in exploring, and gaining insights from, typical data sources. In particular, we focus on the use of Natural Language Processing and Conversational Interfaces to provide users with an intuitive and time-saving toolkit to explore their datasets and the conclusions arising from analytics performed on them. We show the benefit of presenting powerful AI and analytic techniques to domain experts, showing the potential for impact not only at the elite level of sports, where AI and analytic capabilities may be more available, but also at a more grass-roots level where there is generally little access to specialist resources. The work described in this paper was trialled with Leatherhead Football Club, a semi-professional team that, at the time, were based in the English 7th tier of football.