石头:2229 - 371 x
2016年数据挖掘:通过机器智能科学发现:药物发现的新途径——卡洛Trugenberger——InfoCodex AG-Semantic技术
大多数的大规模数据的非结构化和多数最重要的一部分是文本。而数据挖掘技术是成熟和标准化结构化数据;数值数据、非结构化数据在很大程度上仍是未知的领域。一般的重点在于信息提取,它试图从文本检索已知信息。圣杯是知识发现,机器将发掘全新的事实和关系,先前并不知道任何人类专家。的确,理解文本的意义通常被认为是在一起的人类智慧的特征。语义人工智能的最终目标是计划的软件会明白自由文本的意义,至少在实践提供新的、可操作的信息压缩的文件。作为跳板目前的愿景的道路上我可以介绍一个全新的药物研究方法,即识别相关信息采用自组织的语义引擎文本我大的生物医学研究论文库、一种由默克公司开创InfoCodex软件。我将描述方法和主要实验成功的发明最新的生物标志物和糖尿病和肥胖症的表型在PubMed抽象的概念,公共临床试验和默克公司内部文件。报道的方法显示太多的承诺和可能影响从根本上制药研究如何缩短上市时间的新型药物,早期识别的死角。 Understanding written language is a key component of human intelligence. Correspondingly, doing something useful with large quantities of text documents that are out of reach for human analysis requires, unavoidably some form of artificial intelligence [5]. This is why handling unstructured data is harder than analyzing their numerical counterpart, for which well-defined and developed mathematical methods are readily available. Indeed, there is as yet no standard approach to text mining, the unstructured counterpart to data mining. There are several approaches to teach a machine to comprehend text [6-8]. The vast bulk of research and applications focuses on natural language processing (NLP) techniques for information extraction (IE). Information extraction aims to identify mentions of named entities (e.g. “genes” in bioscience applications) and relationships between these entities (as in “is a” or “is caused by”). Entities and their relations are often called “triples” and databases of identified triples “triple stores”. Such triple stores are the idea of the online 3.0 vision, during which machines are going to be ready to automatically recognize the meaning of online documents and, correspondingly, interact intelligently with human end users. IE techniques are also the main tool used to curate domain-specific terminologies and ontologies extracted from large document corpora. Information extraction, however, is not thought for discovery. By its very design, it is limited to identifying semantic relationships that are explicitly lexicalized in a document: by definition these relations are known to the human expert who formulated them. The “Holy Grail” [9] of the text mining, instead is knowledge discovery from large corpora of text. Here one expects machines to generate novel hypotheses by uncovering previously unnoticed correlations from information distributed over very large pools of documents. These hypotheses must then be tested experimentally. Knowledge discovery is about unearthing implicit information versus the specific relations recovered by information extraction. The present paper is about machine knowledge discovery within the biomedical and pharmacogenomics literature.
传记:
卡洛Trugenberger已经获得理论物理学博士学位1988年在瑞士联邦理工学院,苏黎世和他1997年从博科尼大学经济学硕士学位,米兰。在理论物理国际学术生涯(麻省理工,洛斯阿拉莫斯国家实验室,日内瓦的欧洲核子研究中心,慕尼黑马克斯普朗克研究所)使他的位置在日内瓦大学理论物理学副教授。2001年,他决定退出学术界和利用他的专长在信息理论中,神经网络和机器智能设计一个创新的语义技术和创立公司InfoCodex AG-Semantic技术,瑞士。
卡洛一个Trugenberger
阅读全文下载全文