Knowledge extraction

Cognitive Psychology: Attention · Decision making · Learning · Judgement · Memory · Motivation · Perception · Reasoning · Thinking - Cognitive processes Cognition - Outline Index

Knowledge Extraction is the creation of knowledge from structured (relational databases, XML) and unstructured (text, documents, images) sources. The resulting knowledge needs to be in a machine-readable and machine-interpretable format and must represent knowledge in a manner that facilitates inferencing. Although it is methodically similar to Information Extraction (NLP) and ETL (Data Warehouse), the main criteria is that the extraction result goes beyond the creation of structured information or the transformation into a relational schema. It requires either the reuse of existing formal knowledge (reusing identifiers or ontologies) or the generation of a schema based on the source data.

The RDB2RDF W3C group ^[1] is currently standardizing a language for extraction of RDF from relational databases. Another popular example for Knowledge Extraction is the transformation of Wikipedia into structured data and also the mapping to existing knowledge (see DBpedia, Freebase and ^[2] ).

It is possible that the texts gathered in the Psychology Wiki can be usefully treated in the same way.

Overview[]

After the standardization of knowledge representation languages such as RDF and OWL, much research has been conducted in the area, especially regarding transforming relational databases into RDF, Entity resolution, Knowledge Discovery and Ontology Learning. The general process uses traditional methods from Information Extraction and ETL, which transform the data from the sources into structured formats.

The following criteria can be used to categorize approaches in this topic (some of them only account for extraction from relational databases)^[3]:

Source	Which data sources are covered: Text, Relational Databases, XML, CSV
Exposition	How is the extracted knowledge made explicit (Ontology file, Semantic Database)? How can you query it?
Synchronization	Is the knowledge extraction process executed once to produce a dump or is the result synchronized with the source? Static or Dynamic. Are changes to the result written back (Bi-directional)
Reuse of vocabularies	The tool is able to reuse existing vocabularies in the extraction. For example the table column 'firstName' can be mapped to foaf:firstName. Some automatic approaches are not capable of mapping vocab.
Automatisation	The degree to which the extraction is assisted/automated. Manual, GUI, semi-automatic, automatic.
Requires a Domain Ontology	A pre-existing ontology is needed to map to it. So either a mapping is created or a schema is learned from the source (Ontology learning).

Examples[]

Entity Linking[]

DBpedia Spotlight, OpenCalais, the Zemanta API, Extractiv and PoolParty Extractor analyze free text via Named Entity Recognition and then disambiguates candidates via Name Resolution and links the found entities to the DBpedia knowledge repository^[4] (DBpedia Spotlight web demo or PoolParty Extractor Demo).

President Obama called Wednesday on Congress to extend a tax break for students included in last year's economic stimulus package, arguing that the policy provides more generous assistance.

As President Obama is linked to a DBpedia LinkedData resource, further information can be retrieved automatically and a Semantic Reasoner can for example infer that the mentioned entity is of the type Person (using FOAF (software)) and of type Presidents of the United States (using YAGO). Counter examples: Methods that only recognize entities or link to Wikipedia articles and other targets that do not provide further retrieval of structured data and formal knowledge.

Relational Databases to RDF[]

Triplify, D2R Server, Ultrawrap, and Virtuoso RDF Views are tools that transform relational databases to RDF. During this process they allow reusing existing vocabularies and ontologies during the conversion process. When transforming a typical relational table named users, one column (e.g.name) or an aggregation of columns (e.g.first_name and last_name) has to provide the URI of the created entity. Normally the primary key is used. Every other column can be extracted as a relation with this entity.^[5] Then properties with formally defined semantics are used (and reused) to interpret the information. For example a column in a user table called marriedTo can be defined as symmetrical relation and a column homepage can be converted to a property from the FOAF Vocabulary called foaf:homepage, thus qualifying it as an inverse functional property. Then each entry of the user table can be made an instance of the class foaf:Person (Ontology Population). Additionally domain knowledge (in form of an ontology) could be created from the status_id, either by manually created rules (if status_id is 2, the entry belongs to class Teacher ) or by (semi)-automated methods (Ontology Learning). Here is an example transformation:

Name	marriedTo	homepage	status_id
Peter	Mary	http://example.org/Peters_page	1
Claus	Eva	http://example.org/Claus_page	2

:Peter :marriedTo :Mary .  
:marriedTo a owl:SymmetricProperty .  
:Peter foaf:homepage  <http://example.org/Peters_page> .  
:Peter a foaf:Person .   
:Peter a :Student .  
:Claus a :Teacher .

Extraction from structured sources to RDF[]

1:1 Mapping from RDB Tables/Views to RDF Entities/Attributes/Values[]

When building a RDB representation of a problem domain, the starting point is frequently an entity-relationship diagram (ERD). Typically, each entity is represented as a database table, each attribute of the entity becomes a column in that table, and relationships between entities are indicated by foreign keys. Each table typically defines a particular class of entity, each column one of its attributes. Each row in the table describes an entity instance, uniquely identified by a primary key. The table rows collectively describe an entity set. In an equivalent RDF representation of the same entity set:

Each column in the table is an attribute (i.e., predicate)
Each column value is an attribute value (i.e., object)
Each row key represents an entity ID (i.e., subject)
Each row represents an entity instance
Each row (entity instance) is represented in RDF by a collection of triples with a common subject (entity ID).

So, to render an equivalent view based on RDF semantics, the basic mapping algorithm would be as follows:

create an RDFS class for each table
convert all primary keys and foreign keys into IRIs
assign a predicate IRI to each column
assign an rdf:type predicate for each row, linking it to an RDFS class IRI corresponding to the table
for each column that is neither part of a primary or foreign key, construct a triple containing the primary key IRI as the subject, the column IRI as the predicate and the column's value as the object.

Early mentioning of this basic or direct mapping can be found in Tim Berners-Lee's comparison of the ER model to the RDF model.^[5]

Complex mappings of relational databases to RDF[]

The 1:1 mapping mentioned above exposes the legacy data as RDF in a straightforward way, additional refinements can be employed to improve the usefulness of RDF output respective the given Use Cases. Normally, information is lost during the transformation of an entity-relationship diagram (ERD) to relational tables (Details can be found in Object-relational impedance mismatch) and has to be reverse engineered. From a conceptual view, approaches for extraction can come from two directions. The first direction tries to extract or learn an OWL schema from the given database schema. Early approaches used a fixed amount of manually created mapping rules to refine the 1:1 mapping.^[6]^[7]^[8] More elaborate methods are employing heuristics or learning algorithms to induce schematic information (methods overlap with Ontology learning). While some approaches try to extract the information from the structure inherent in the SQL schema^[9] (analysing e.g. foreign keys), others analyse the content and the values in the tables to create conceptual hierarchies^[10] (e.g. a columns with few values are candidates for becoming categories). The second direction tries to map the schema and its contents to a pre-existing domain ontology (see also: Ontology alignment). Often, however, a suitable domain ontology does not exist and has to be created first.

XML[]

As XML is structured as a tree, any data can be easily represented in RDF, which is structured as a graph. XML2RDF is one example of an approach that uses RDF blank nodes and transforms XML elements and attributes to RDF properties. The topic however is more complex as in the case of relational databases. In a relational table the primary key is an ideal candidate for becoming the subject of the extracted triples. An XML element, however, can be transformed - depending on the context- as a subject, a predicate or object of a triple. XSLT can be used a standard transformation language to manually convert XML to RDF.

Survey of Methods / Tools[]

Name	Data Source	Data Exposition	Data Synchronisation	Mapping Language	Vocabulary Reuse	Mapping Automat.	Req. Domain Ontology	Uses GUI
A Direct Mapping of Relational Data to RDF	Relational Data	SPARQL/ETL	dynamic	N/A	false	automatic	false	false
CSV2RDF4LOD	CSV	ETL	static	RDF	true	manual	false	false
Convert2RDF	Delimited text file	ETL	static	RDF/DAML	true	manual	false	true
D2R Server	RDB	SPARQL	bi-directional	D2R Map	true	manual	false	false
DartGrid	RDB	own query language	dynamic	Visual Tool	true	manual	false	true
DataMaster	RDB	ETL	static	proprietary	true	manual	true	true
Google Refine's RDF Extension	CSV, XML	ETL	static	none		semi-automatic	false	true
Krextor	XML	ETL	static	xslt	true	manual	true	false
MAPONTO	RDB	ETL	static	proprietary	true	manual	true	false
METAmorphoses	RDB	ETL	static	proprietary xml based mapping language	true	manual	false	true
MappingMaster	CSV	ETL	static	MappingMaster	true	GUI	false	true
ODEMapster	RDB	ETL	static	proprietary	true	manual	true	true
OntoWiki CSV Importer Plug-in - DataCube & Tabular	CSV	ETL	static	The RDF Data Cube Vocaublary	true	semi-automatic	false	true
Poolparty Extraktor (PPX)	XML, Text	LinkedData	dynamic	RDF (SKOS)	true	semi-automatic	true	false
RDBToOnto	RDB	ETL	static	none	false	automatic, the user furthermore has the chance to fine-tune results	false	true
RDF 123	CSV	ETL	static	false	false	manual	false	true
RDOTE	RDB	ETL	static	SQL	true	manual	true	true
Relational.OWL	RDB	ETL	static	none	false	automatic	false	false
T2LD	CSV	ETL	static	false	false	automatic	false	false
The RDF Data Cube Vocabulary	Multidimensional statistical data in spreadsheets			Data Cube Vocabulary	true	manual	false
TopBraid Composer	CSV	ETL	static	SKOS	false	semi-automatic	false	true
Triplify	RDB	LinkedData	dynamic	SQL	true	manual	false	false
Ultrawrap	RDB	SPARQL/ETL	dynamic	R2RML	true	semi-automatic	false	true
Virtuoso RDF Views	RDB	SPARQL	dynamic	Meta Schema Language	true	semi-automatic	false	true
Virtuoso Sponger	structured and semi-structured data sources	SPARQL	dynamic	Virtuoso PL & XSLT	true	semi-automatic	false	false
VisAVis	RDB	RDQL	dynamic	SQL	true	manual	true	true
XLWrap: Spreadsheet to RDF	CSV	ETL	static	TriG Syntax	true	manual	false	false
XML to RDF	XML	ETL	static	false	false	automatic	false	false

Extraction from natural language sources[]

The biggest portion of information contained in business documents, even about 80%,^[11] is encoded in natural language and therefore unstructured. Because unstructured data are rather badly suited to extract knowledge from it, it is necessary to apply more complex methods, which nevertheless generally supply worse results, than it would be possible for structured data. The massive acquisition of extracted knowledge should compensate the increased complexity and decreased quality of extraction. In the following, natural language sources are understood as sources of information, where the data are given in an unstructured fashion as plain text. But the text can be additionally embedded in a markup document (e. g. HTML document), because the most of the systems remove the markup elements automatically.

Traditional Information Extraction (IE)[]

The Traditional Information Extraction ^[12] is a technology of natural language processing, which extracts information from typically natural language texts and structures these in a suitable manner. The kinds of information to be identified must be specified in a model before beginning the process, which is why the whole process of Traditional Information Extraction is domain dependent. The IE is split in the following five subtasks.

Named Entity Recognition (NER)
Coreference Resolution (CO)
Template Element Construction (TE)
Template Relation Construction (TR)
Template Scenario Production (ST)

The task of Named Entity Recognition is to recognize and to categorize all named entities contained in a text (assignment of a named entity to a predefined category). This works by application of grammar based methods or statistical models.

The Coreference Resolution identifies equivalent, by NER recognized, entities within a text. There are two relevant kinds of equivalence relationships. The first one relates to the relationship between two different represented entities (e. g. IBM Europe and IBM) and the second one to the relationship between an entity and their anaphoric references (e. g. it and IBM). Both kinds should be recognized by the Coreference Resolution.

At the Template Element Construction the IE system identifies descriptive properties of entities, recognized by NER and CO. These properties correspond to ordinary qualities like red or big.

The Template Relation Construction identifies relations, which exist between the template elements. These relations can be of several kinds, such as works-for or located-in, with the restriction, that both domain and range correspond to entities.

In the Template Scenario Production events, which are described in the text, will be identified and structured with respect to the entities, recognized by NER and CO and relations, identified by TR.

Ontology-Based Information Extraction (OBIE)[]

The Ontology-Based Information Extraction ^[11] is a subfield of Information Extraction, with which at least one ontology is used to guide the process of information extraction from natural language text. Though, the OBIE system uses methods of Traditional Information Extraction to identify concepts, instances and relations of the used ontologies in the text, which will be structured to an ontology after the process. Thus, the input ontologies constitute the model of information to be extracted.

Ontology Learning (OL)[]

It has been suggested that [[::Ontology learning|Ontology learning]] be merged into this article or section. (Discuss)

With Ontology Learning ^[13] whole ontologies from natural language text are semi-automatically extracted. Therefore, it can be applied to support ontology engineering. It is usually split into the following eight subtasks, which are not necessarily supported by all Ontology Learning (OL) systems.

Domain Terminology Extraction
Concept Discovery
Concept Hierarchy Derivation
Learning of non-taxonomic relations
Rule Discovery
Ontology Population
Concept Hierarchy Extension
Frame and event detection

At the Domain Terminology Extraction domain-specific terms are extracted, which are used in the following Concept Discovery to derive concepts. Relevant terms can be determined e. g. by calculation of the TF/IDF values or by application of the C-value / NC-value method. The resulted list of terms has to be filtered by a domain expert. Subsequent, similarly to Coreference Resolution in IE, the OL system determines synonyms, because they share the same meaning and therefore correspond to the same concept. The most common methods therefor are clustering and the application of statistical similarity measures.

In the Concept Discovery terms are grouped to meaning bearing units, which correspond to an abstraction of the world and therefore to concepts. The grouped terms are these domain-specific terms and their synonyms, which were identified in the Domain Terminology Extraction.

In the Concept Hierarchy Derivation the OL system tries to arrange the extracted concepts in a taxonomic structure. This is mostly achieved by unsupervised hierarchical clustering methods. Because the result of such methods is often noisy, a supervision, e. g. by evaluation by the user, is integrated. A further method for the derivation of a concept hierarchy exists in the usage of several patterns, which should indicate a sub- or supersumption relationship. Pattern like “X, what is a Y” or “X is a Y” indicate, that X is a subclass of Y. Such pattern can be analyzed efficiently, but they occur too infrequent, to extract enough sub- or supersumption relationships. Instead bootstrapping methods are developed, which learn these patterns automatically and therefore ensure a higher coverage.

At the Learning of non-taxonomic relations relationships are extracted, which don´t express any sub- or supersumption. Such relationships are e. g. works-for or located-in. There are two common approaches to solve this subtask. The first one bases upon the extraction of anonymous associations, which are named appropriate in a second step. The second approach extracts verbs, which indicate a relationship between the entities, represented by the surrounding words. But the result of both approaches has to be evaluated by an ontologist.

In the Rule Discovery ^[14] axioms (formal description of concepts) are generated for the extracted concepts. This can be achieved, e. g., by analyzing the syntactic structure of a natural language definition and the application of transformation rules on the resulting dependency tree. The result of this process is a list of axioms, which is afterward comprehended to a concept description. This one has to be evaluated by an ontologist.

At the Ontology Population the ontology is augmented with instances of concepts and properties. For the augmentation with instances of concepts methods, which are based on the matching of lexico-syntactic patterns, are used. Instances of properties are added by application of bootstrapping methods, which collect relationtuples.

In the Concept Hierarchy Extension the OL system tries to extend the taxonomic structure of an existing ontology with further concepts. This can be realized supervised by an trained classifier or unsupervised by the application of similarity measures.

In Frame/Event Detection, the OL system tries to extract complex relationships from text, e.g. who departed from where to what place and when. Approaches range from applying SVM with kernel methods to Semantic Role Labelling (SRL) ^[15] to deep semantic parsing techniques.^[16]

Semantic Annotation (SA)[]

At the Semantic Annotation ^[17] of natural language text this one is augmented with metadata (often represented in RDFa), which should make the semantics of contained terms machine-understandable. At this process, which is generally semi-automatic, knowledge is extracted in the sense, that a link between lexical terms and e. g. concepts from ontologies is established. Thus, the knowledge is also won, which meaning of a term in the processed context was intended. The semi-automatic Semantic Annotation can be split in the following two subtasks.

Terminology Extraction
Entity Linking

At the Terminology Extraction lexical terms from the text are extracted. For this purpose a tokenizer determines at first the word boundaries and solves abbreviations. Afterward terms from the text, which correspond to a concept, are extracted with the help of a domain-specific lexicon to link these at Entity Linking.

At Entity Linking ^[18] a link between the extracted lexical terms from the source text and the concepts from an ontology is established. For this, candidate-concepts are detected appropriate to the several meanings of a term with the help of a lexicon. Closing, the context of the terms is analyzed, to determine the most appropriate disambiguation, to assign the term to the correct concept.

Tools[]

The following criteria can be used to categorize tools, which extract knowledge from natural language text.

Source	Which input formats can be processed by the tool (e. g. plain text, HTML or PDF)?
Access Paradigm	Can the tool query the data source or uses it a whole dump for the extraction process?
Data Synchronization	Is the result of the extraction process synchronized with the source?
Uses Output Ontology	Does the tool link the result with an ontology?
Mapping Automation	How automated is the extraction process (manual, semi-automtic or automatic)?
Requires Ontology	Does the tool need an ontology for the extraction?
Uses GUI	Does the tool offer a graphical user interface?
Approach	Which approach (IE, OBIE, OL or SA) is used by the tool?
Extracted Entities	Which types of entities (e. g. named entities, concepts or relationships) can be extracted by the tool?
Applied Techniques	Which techniques are applied (e. g. NLP, statistical methods, clustering or machine learning)?
Output Model	Which model is used to represent the result of the tool (e. g. RDF or OWL)?
Supported Domains	Which domains are supported (e. g. economy or biology)?
Supported Languages	Which languages can be processed (e. g. english or german)?

The following table characterizes some tools for Knowledge Extraction from natural language sources.

Name	Source	Access Paradigm	Data Synchronization	Uses Output Ontology	Mapping Automation	Requires Ontology	Uses GUI	Approach	Extracted Entities	Applied Techniques	Output Model	Supported Domains	Supported Languages
AeroText ^[19]	plain text, HTML, XML, SGML	dump	no	yes	automatic	yes	yes	IE	named entities, relationships, events	linguistic rules	proprietary	domain-independent	english, spanish, arabic, chinese, indonesian
AlchemyAPI ^[20]	plain text, HTML				automatic		yes	SA					multilingual
ANNIE ^[21]	plain text	dump				yes	yes	IE		finite state algorithms			multilingual
ASIUM ^[22]	plain text	dump			semi-automatic		yes	OL	concepts, concept hierarchy	NLP, clustering
Attensity Exhaustive Extraction ^[23]					automatic			IE	named entities, relationships, events	NLP
DBpedia Spotlight ^[24]	plain text, HTML	dump, SPARQL	yes	yes	automatic	no	yes	SA	annotation to each word, annotation to non-stopwords	NLP, statistical methods, machine learning	RDFa	domain-independent	english
FRED	plain text	dump	no	yes	automatic	no	yes	OL+IE+SA	concepts, concept hierarchy, frames, events, relationships, named entities	NLP, DRT, heuristical rules	RDF-OWL	domain-independent	english
iDocument ^[25]	HTML, PDF, DOC	SPARQL		yes			yes	OBIE	instances, property values	NLP		personal, business
NetOwl Extractor ^[26]	plain text, HTML, XML, SGML, PDF, MS Office	dump	No	Yes	Automatic	yes	Yes	IE	named entities, relationships, events	NLP	XML, RDF-OWL, others	multiple domains	English, Arabic Chinese (Simplified and Traditional), French, Korean, Persian (Farsi and Dari), Russian, Spanish
OntoGen ^[27]					semi-automatic		yes	OL	concepts, concept hierarchy, non-taxonomic relations, instances	NLP, machine learning, clustering
OntoLearn ^[28]	plain text, HTML	dump	no	yes	automatic	yes	no	OL	concepts, concept hierarchy, instances	NLP, statistical methods	proprietary	domain-independent	english
OntoLearn Reloaded	plain text, HTML	dump	no	yes	automatic	yes	no	OL	concepts, concept hierarchy, instances	NLP, statistical methods	proprietary	domain-independent	english
OntoSyphon ^[29]	HTML, PDF, DOC	dump, search engine queries	no	yes	automatic	yes	no	OBIE	concepts, relations, instances	NLP, statistical methods	RDF	domain-independent	english
ontoX ^[30]	plain text	dump	no	yes	semi-automatic	yes	no	OBIE	instances, datatype property values	heuristic-based methods	proprietary	domain-independent	language-independent
OpenCalais	plain text, HTML, XML	dump	no	yes	automatic	yes	no	SA	annotation to entities, annotation to events, annotation to facts	NLP, machine learning	RDF	domain-independent	english, french, spanish
PoolParty Extractor ^[31]	plain text, HTML, DOC, ODT	dump	no	yes	automatic	yes	yes	OBIE	named entities, concepts, relations, concepts that categorize the text, enrichments	NLP, machine learning, statistical methods	RDF, OWL	domain-independent	english, german, spanish, french
SCOOBIE	plain text, HTML	dump	no	yes	automatic	no	no	OBIE	instances, property values, RDFS types	NLP, machine learning	RDF, RDFa	domain-independent	english, german
SemTag ^[32]^[33]	HTML	dump	no	yes	automatic	yes	no	SA		machine learning	database record	domain-independent	language-independent
smart FIX	plain text, HTML, PDF, DOC, e-Mail	dump	yes	no	automatic	no	yes	OBIE	named entities	NLP, machine learning	proprietary	domain-independent	english, german, french, dutch, polish
Text2Onto ^[34]	plain text, HTML, PDF	dump	yes	no	semi-automatic	yes	yes	OL	concepts, concept hierarchy, non-taxonomic relations, instances, axioms	NLP, statistical methods, machine learning, rule-based methods	OWL	deomain-independent	english, german, spanish
Text-To-Onto ^[35]	plain text, HTML, PDF, PostScript	dump			semi-automatic	yes	yes	OL	concepts, concept hierarchy, non-taxonomic relations, lexical entities referring to concepts, lexical entities referring to relations	NLP, machine learning, clustering, statistical methods			german
The Wiki Machine ^[36]	plain text, HTML, PDF, DOC	dump	no	yes	automatic	yes	yes	SA	annotation to proper nouns, annotation to common nouns	machine learning	RDFa	domain-independent	english, german, spanish, french, portuguese, italian, russian
ThingFinder ^[37]								IE	named entities, relationships, events				multilingual
Zemanta	plain text, HTML	dump	yes	no	automatic	no	yes	SA	named entities, concepts	NLP, statistical methods	RDF	domain-independent	english

Knowledge discovery[]

Knowledge discovery describes the process of automatically searching large volumes of data for patterns that can be considered knowledge about the data.^[38] It is often described as deriving knowledge from the input data. Knowledge discovery developed out of the Data mining domain, and is closely related to it both in terms of methodology and terminology.^[39]

The most well-known branch of data mining is knowledge discovery, also known as Knowledge Discovery in Databases (KDD). Just as many other forms of knowledge discovery it creates abstractions of the input data. The knowledge obtained through the process may become additional data that can be used for further usage and discovery.

Another promising application of knowledge discovery is in the area of software modernization, weakness discovery and compliance which involves understanding existing software artifacts. This process is related to a concept of reverse engineering. Usually the knowledge obtained from existing software is presented in the form of models to which specific queries can be made when necessary. An entity relationship is a frequent format of representing knowledge obtained from existing software. Object Management Group (OMG) developed specification Knowledge Discovery Metamodel (KDM) which defines an ontology for the software assets and their relationships for the purpose of performing knowledge discovery of existing code. Knowledge discovery from existing software systems, also known as software mining is closely related to data mining, since existing software artifacts contain enormous value for risk management and business value, key for the evaluation and evolution of software systems. Instead of mining individual data sets, software mining focuses on metadata, such as process flows (e.g. data flows, control flows, & call maps), architecture, database schemas, and business rules/terms/process.

Input data[]

Databases
- Relational data
- Database
- Document warehouse
- Data warehouse
Software
- Source code
- Configuration files
- Build scripts
Text
- Concept mining
Graphs
- Molecule mining
Sequences
- Data stream mining
- Learning from time-varying data streams under concept drift
Web

Output formats[]

Data model
Metadata
Metamodels
Ontology
Knowledge representation
Knowledge tags
Business rule
Knowledge Discovery Metamodel (KDM)
Business Process Modeling Notation (BPMN)
Intermediate representation
Resource Description Framework (RDF)
Software metrics

Ontology Learning[]

Main article: Ontology learning

Template:Empty section

References[]

↑ RDB2RDF Working Group, Website: http://www.w3.org/2001/sw/rdb2rdf/ , charter: http://www.w3.org/2009/08/rdb2rdf-charter, R2RML: RDB to RDF Mapping Language: http://www.w3.org/TR/r2rml/
↑ K. Nakayama, M. Pei, M. Erdmann, M. Ito, M. Shirakawa, T. Hara, S. Nishio: Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction, Proc. of Wikimania (Jul. 2008). http://wikipedia-lab.org/en/images/0/06/Wikimania2008.pdf
↑ LOD2 EU Deliverable 3.1.1 Knowledge Extraction from Structured Sources http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf
↑ Life in the Linked Data Cloud. www.opencalais.com. URL accessed on 2009-11-10.
↑ ^5.0 ^5.1 Tim Berners-Lee (1998), "Relational Databases on the Semantic Web". Retrieved: February 20, 2011.
↑ Hu et al. (2007), “Discovering Simple Mappings Between Relational Database Schemas and Ontologies”, In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐238, Busan, Korea, 11‐15 November 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf
↑ R. Ghawi and N. Cullot (2007), "Database-to-Ontology Mapping Generation for Semantic Interoperability". In Third International Workshop on Database Interoperability (InterDB 2007). http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf
↑ Li et al. (2005) "A Semi-automatic Ontology Acquisition Method for the Semantic Web", WAIM, volume 3739 of Lecture Notes in Computer Science, page 209-220. Springer. http://dx.doi.org/10.1007/11563952_19
↑ Tirmizi et al. (2008), “Translating SQL Applications to the Semantic Web”, Lecture Notes in Computer Science, Volume 5181/2008 (Database and Expert Systems Applications). http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf
↑ Farid Cerbah (2008). "Learning Highly Structured Semantic Repositories from Relational Databases", The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf
↑ ^11.0 ^11.1 Wimalasuriya, Daya C.; Dou, Dejing (2010). "Ontology-based information extraction: An introduction and a survey of current approaches", Journal of Information Science, 36(3), p. 306 - 323, http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf (retrieved: 18.06.2012).
↑ Cunningham, Hamish (2005). "Information Extraction, Automatic", Encyclopedia of Language and Linguistics, 2, p. 665 - 677, http://gate.ac.uk/sale/ell2/ie/main.pdf (retrieved: 18.06.2012).
↑ Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text", Information, Wissenschaft und Praxis, 57, p. 315 - 320, http://people.aifb.kit.edu/pci/Publications/iwp06.pdf (retrieved: 18.06.2012).
↑ Völker, Johanna; Hitzler, Pascal; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources", Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685, http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).
↑ Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web", Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.
↑ Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames", Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.
↑ Erdmann, M.; Maedche, Alexander; Schnurr, H.-P.; Staab, Steffen (2000). "From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools", Proceedings of the COLING, http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf (retrieved: 18.06.2012).
↑ Rao, Delip; McNamee, Paul; Dredze, Mark (2011). "Entity Linking: Finding Extracted Entities in a Knowledge Base", Multi-source, Multi-lingual Information Extraction and Summarization, http://www.cs.jhu.edu/~delip/entity-linking.pdf (retrieved: 18.06.2012).
↑ Rocket Software, Inc. (2012). "technology for extracting intelligence from text", http://www.rocketsoftware.com/products/aerotext (retrieved: 18.06.2012).
↑ Orchestr8 (2012): "AlchemyAPI Overview", http://www.alchemyapi.com/api (retrieved: 18.06.2012).
↑ The University of Sheffield (2011). "ANNIE: a Nearly-New Information Extraction System", http://gate.ac.uk/sale/tao/splitch6.html#chap:annie (retrieved: 18.06.2012).
↑ ILP Network of Excellence. "ASIUM (LRI)", http://www-ai.ijs.si/~ilpnet2/systems/asium.html (retrieved: 18.06.2012).
↑ Attensity (2012). "Exhaustive Extraction", http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ (retrieved: 18.06.2012).
↑ Mendes, Pablo N.; Jakob, Max; Garcia-Sílva, Andrés; Bizer; Christian (2011). "DBpedia Spotlight: Shedding Light on the Web of Documents", Proceedings of the 7th International Conference on Semantic Systems, p. 1 - 8, http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf (retrieved: 18.06.2012).
↑ Adrian, Benjamin; Maus, Heiko; Dengel, Andreas (2009). "iDocument: Using Ontologies for Extracting Information from Text", http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf (retrieved: 18.06.2012).
↑ SRA International, Inc. (2012). "NetOwl Extractor", http://www.sra.com/netowl/entity-extraction/ (retrieved: 18.06.2012).
↑ Fortuna, Blaz; Grobelnik, Marko; Mladenic, Dunja (2007). "OntoGen: Semi-automatic Ontology Editor", Proceedings of the 2007 conference on Human interface, Part 2, p. 309 - 318, http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf (retrieved: 18.06.2012).
↑ Missikoff, Michele; Navigli, Roberto; Velardi, Paola (2002). "Integrated Approach to Web Ontology Learning and Engineering", Computer, 35(11), p. 60 - 63, http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf (retrieved: 18.06.2012).
↑ McDowell, Luke K.; Cafarella, Michael (2006). "Ontology-driven Information Extraction with OntoSyphon", Proceedings of the 5th international conference on The Semantic Web, p. 428 - 444, http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf (retrieved: 18.06.2012).
↑ Yildiz, Burcu; Miksch, Silvia (2007). "ontoX - A Method for Ontology-Driven Information Extraction", Proceedings of the 2007 international conference on Computational science and its applications, 3, p. 660 - 673, http://publik.tuwien.ac.at/files/pub-inf_4769.pdf (retrieved: 18.06.2012).
↑ semanticweb.org (2011). "PoolParty Extractor", http://semanticweb.org/wiki/PoolParty_Extractor (retrieved: 18.06.2012).
↑ Dill, Stephen; Eiron, Nadav; Gibson, David; Gruhl, Daniel; Guha, R.; Jhingran, Anant; Kanungo, Tapas; Rajagopalan, Sridhar; Tomkins, Andrew; Tomlin, John A.; Zien, Jason Y. (2003). "SemTag and Seeker: Bootstraping the Semantic Web via Automated Semantic Annotation", Proceedings of the 12th international conference on World Wide Web, p. 178 - 186, http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html (retrieved: 18.06.2012).
↑ Uren, Victoria; Cimiano, Philipp; Iria, José; Handschuh, Siegfried; Vargas-Vera, Maria; Motta, Enrico; Ciravegna, Fabio (2006). "Semantic annotation for knowledge management: Requirements and a survey of the state of the art", Web Semantics: Science, Services and Agents on the World Wide Web, 4(1), p. 14 - 28, http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf, (retrieved: 18.06.2012).
↑ Cimiano, Philipp; Völker, Johanna (2005). "Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery", Proceedings of the 10th International Conference of Applications of Natural Language to Information Systems, 3513, p. 227 - 238, http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf (retrieved: 18.06.2012).
↑ Maedche, Alexander; Volz, Raphael (2001). "The Ontology Extraction & Maintenance Framework Text-To-Onto", Proceedings of the IEEE International Conference on Data Mining, http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf (retrieved: 18.06.2012).
↑ Machine Linking. "We connect to the Linked Open Data cloud", http://thewikimachine.fbk.eu/html/index.html (retrieved: 18.06.2012).
↑ Inxight Federal Systems (2008). "Inxight ThingFinder and ThingFinder Professional", http://inxightfedsys.com/products/sdks/tf/ (retrieved: 18.06.2012).
↑ Frawley William. F. et al. (1992), "Knowledge Discovery in Databases: An Overview", AI Magazine (Vol 13, No 3), 57-70 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011)
↑ Fayyad U. et al. (1996), "From Data Mining to Knowledge Discovery in Databases", AI Magazine (Vol 17, No 3), 37-54 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230

Template:Semantic Web Template:Computable knowledge

This page uses Creative Commons Licensed content from Wikipedia (view authors).

[RDB2RDF-1] RDB2RDF Working Group, Website: http://www.w3.org/2001/sw/rdb2rdf/ , charter: http://www.w3.org/2009/08/rdb2rdf-charter, R2RML: RDB to RDF Mapping Language: http://www.w3.org/TR/r2rml/

[Nakayama-2] K. Nakayama, M. Pei, M. Erdmann, M. Ito, M. Shirakawa, T. Hara, S. Nishio: Wikipedia Mining - Wikipedia as a Corpus for Knowledge Extraction, Proc. of Wikimania (Jul. 2008). http://wikipedia-lab.org/en/images/0/06/Wikimania2008.pdf

[lod2_eu-3] LOD2 EU Deliverable 3.1.1 Knowledge Extraction from Structured Sources http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf

[OpenCalaisLinkedData-4] Life in the Linked Data Cloud. www.opencalais.com. URL accessed on 2009-11-10.

[timbl_reldb4semweb-5] 5.0 ^5.1 Tim Berners-Lee (1998), "Relational Databases on the Semantic Web". Retrieved: February 20, 2011.

[Hu-6] Hu et al. (2007), “Discovering Simple Mappings Between Relational Database Schemas and Ontologies”, In Proc. of 6th International Semantic Web Conference (ISWC 2007), 2nd Asian Semantic Web Conference (ASWC 2007), LNCS 4825, pages 225‐238, Busan, Korea, 11‐15 November 2007. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.97.6934&rep=rep1&type=pdf

[Ghawi-7] R. Ghawi and N. Cullot (2007), "Database-to-Ontology Mapping Generation for Semantic Interoperability". In Third International Workshop on Database Interoperability (InterDB 2007). http://le2i.cnrs.fr/IMG/publications/InterDB07-Ghawi.pdf

[Li-8] Li et al. (2005) "A Semi-automatic Ontology Acquisition Method for the Semantic Web", WAIM, volume 3739 of Lecture Notes in Computer Science, page 209-220. Springer. http://dx.doi.org/10.1007/11563952_19

[Tirmizi-9] Tirmizi et al. (2008), “Translating SQL Applications to the Semantic Web”, Lecture Notes in Computer Science, Volume 5181/2008 (Database and Expert Systems Applications). http://citeseer.ist.psu.edu/viewdoc/download;jsessionid=15E8AB2A37BD06DAE59255A1AC3095F0?doi=10.1.1.140.3169&rep=rep1&type=pdf

[Cerbah-10] Farid Cerbah (2008). "Learning Highly Structured Semantic Repositories from Relational Databases", The Semantic Web: Research and Applications, volume 5021 of Lecture Notes in Computer Science, Springer, Berlin / Heidelberg http://www.tao-project.eu/resources/publications/cerbah-learning-highly-structured-semantic-repositories-from-relational-databases.pdf

[Wimalasuriya-11] 11.0 ^11.1 Wimalasuriya, Daya C.; Dou, Dejing (2010). "Ontology-based information extraction: An introduction and a survey of current approaches", Journal of Information Science, 36(3), p. 306 - 323, http://ix.cs.uoregon.edu/~dou/research/papers/jis09.pdf (retrieved: 18.06.2012).

[Cunningham-12] Cunningham, Hamish (2005). "Information Extraction, Automatic", Encyclopedia of Language and Linguistics, 2, p. 665 - 677, http://gate.ac.uk/sale/ell2/ie/main.pdf (retrieved: 18.06.2012).

[Cimiano06-13] Cimiano, Philipp; Völker, Johanna; Studer, Rudi (2006). "Ontologies on Demand? - A Description of the State-of-the-Art, Applications, Challenges and Trends for Ontology Learning from Text", Information, Wissenschaft und Praxis, 57, p. 315 - 320, http://people.aifb.kit.edu/pci/Publications/iwp06.pdf (retrieved: 18.06.2012).

[Voelker-14] Völker, Johanna; Hitzler, Pascal; Cimiano, Philipp (2007). "Acquisition of OWL DL Axioms from Lexical Resources", Proceedings of the 4th European conference on The Semantic Web, p. 670 - 685, http://smartweb.dfki.de/Vortraege/lexo_2007.pdf (retrieved: 18.06.2012).

[Coppola09-15] Coppola B.; Gangemi A.; Gliozzo A.; Picca D.; Presutti V. (2009). "Frame Detection over the Semantic Web", Proceedings of the European Semantic Web Conference (ESWC2009), Springer, 2009.

[Draicchio12-16] Presutti V.; Draicchio F.; Gangemi A. (2009). "Knowledge extraction based on Discourse Representation Theory and Linguistic Frames", Proceedings of the Conference on Knowledge Engineering and Knowledge Management (EKAW2012), LNCS, Springer, 2012.

[Erdmann-17] Erdmann, M.; Maedche, Alexander; Schnurr, H.-P.; Staab, Steffen (2000). "From Manual to Semi-automatic Semantic Annotation: About Ontology-based Text Annotation Tools", Proceedings of the COLING, http://www.ida.liu.se/ext/epa/cis/2001/002/paper.pdf (retrieved: 18.06.2012).

[Rao-18] Rao, Delip; McNamee, Paul; Dredze, Mark (2011). "Entity Linking: Finding Extracted Entities in a Knowledge Base", Multi-source, Multi-lingual Information Extraction and Summarization, http://www.cs.jhu.edu/~delip/entity-linking.pdf (retrieved: 18.06.2012).

[Rocket-Software-Inc-19] Rocket Software, Inc. (2012). "technology for extracting intelligence from text", http://www.rocketsoftware.com/products/aerotext (retrieved: 18.06.2012).

[Orchestr8-20] Orchestr8 (2012): "AlchemyAPI Overview", http://www.alchemyapi.com/api (retrieved: 18.06.2012).

[The-University-of-Sheffield-21] The University of Sheffield (2011). "ANNIE: a Nearly-New Information Extraction System", http://gate.ac.uk/sale/tao/splitch6.html#chap:annie (retrieved: 18.06.2012).

[ILP-Network-of-Excellence-22] ILP Network of Excellence. "ASIUM (LRI)", http://www-ai.ijs.si/~ilpnet2/systems/asium.html (retrieved: 18.06.2012).

[Attensity-23] Attensity (2012). "Exhaustive Extraction", http://www.attensity.com/products/technology/semantic-server/exhaustive-extraction/ (retrieved: 18.06.2012).

[Mendes-24] Mendes, Pablo N.; Jakob, Max; Garcia-Sílva, Andrés; Bizer; Christian (2011). "DBpedia Spotlight: Shedding Light on the Web of Documents", Proceedings of the 7th International Conference on Semantic Systems, p. 1 - 8, http://www.wiwiss.fu-berlin.de/en/institute/pwo/bizer/research/publications/Mendes-Jakob-GarciaSilva-Bizer-DBpediaSpotlight-ISEM2011.pdf (retrieved: 18.06.2012).

[Adrian-25] Adrian, Benjamin; Maus, Heiko; Dengel, Andreas (2009). "iDocument: Using Ontologies for Extracting Information from Text", http://www.dfki.uni-kl.de/~maus/dok/AdrianMausDengel09.pdf (retrieved: 18.06.2012).

[SRA-International-Inc-26] SRA International, Inc. (2012). "NetOwl Extractor", http://www.sra.com/netowl/entity-extraction/ (retrieved: 18.06.2012).

[Fortuna-27] Fortuna, Blaz; Grobelnik, Marko; Mladenic, Dunja (2007). "OntoGen: Semi-automatic Ontology Editor", Proceedings of the 2007 conference on Human interface, Part 2, p. 309 - 318, http://analytics.ijs.si/~blazf/papers/OntoGen2_HCII2007.pdf (retrieved: 18.06.2012).

[Missikoff-28] Missikoff, Michele; Navigli, Roberto; Velardi, Paola (2002). "Integrated Approach to Web Ontology Learning and Engineering", Computer, 35(11), p. 60 - 63, http://wwwusers.di.uniroma1.it/~velardi/IEEE_C.pdf (retrieved: 18.06.2012).

[McDowell-29] McDowell, Luke K.; Cafarella, Michael (2006). "Ontology-driven Information Extraction with OntoSyphon", Proceedings of the 5th international conference on The Semantic Web, p. 428 - 444, http://turing.cs.washington.edu/papers/iswc2006McDowell-final.pdf (retrieved: 18.06.2012).

[Yildiz-30] Yildiz, Burcu; Miksch, Silvia (2007). "ontoX - A Method for Ontology-Driven Information Extraction", Proceedings of the 2007 international conference on Computational science and its applications, 3, p. 660 - 673, http://publik.tuwien.ac.at/files/pub-inf_4769.pdf (retrieved: 18.06.2012).

[semanticweb-org-31] semanticweb.org (2011). "PoolParty Extractor", http://semanticweb.org/wiki/PoolParty_Extractor (retrieved: 18.06.2012).

[Dill-32] Dill, Stephen; Eiron, Nadav; Gibson, David; Gruhl, Daniel; Guha, R.; Jhingran, Anant; Kanungo, Tapas; Rajagopalan, Sridhar; Tomkins, Andrew; Tomlin, John A.; Zien, Jason Y. (2003). "SemTag and Seeker: Bootstraping the Semantic Web via Automated Semantic Annotation", Proceedings of the 12th international conference on World Wide Web, p. 178 - 186, http://www2003.org/cdrom/papers/refereed/p831/p831-dill.html (retrieved: 18.06.2012).

[Uren-33] Uren, Victoria; Cimiano, Philipp; Iria, José; Handschuh, Siegfried; Vargas-Vera, Maria; Motta, Enrico; Ciravegna, Fabio (2006). "Semantic annotation for knowledge management: Requirements and a survey of the state of the art", Web Semantics: Science, Services and Agents on the World Wide Web, 4(1), p. 14 - 28, http://staffwww.dcs.shef.ac.uk/people/J.Iria/iria_jws06.pdf, (retrieved: 18.06.2012).

[Cimiano05-34] Cimiano, Philipp; Völker, Johanna (2005). "Text2Onto - A Framework for Ontology Learning and Data-Driven Change Discovery", Proceedings of the 10th International Conference of Applications of Natural Language to Information Systems, 3513, p. 227 - 238, http://www.cimiano.de/Publications/2005/nldb05/nldb05.pdf (retrieved: 18.06.2012).

[Maedche-35] Maedche, Alexander; Volz, Raphael (2001). "The Ontology Extraction & Maintenance Framework Text-To-Onto", Proceedings of the IEEE International Conference on Data Mining, http://users.csc.calpoly.edu/~fkurfess/Events/DM-KM-01/Volz.pdf (retrieved: 18.06.2012).

[Machine-Linking-36] Machine Linking. "We connect to the Linked Open Data cloud", http://thewikimachine.fbk.eu/html/index.html (retrieved: 18.06.2012).

[Inxight-Federal-Systems-37] Inxight Federal Systems (2008). "Inxight ThingFinder and ThingFinder Professional", http://inxightfedsys.com/products/sdks/tf/ (retrieved: 18.06.2012).

[Williams1992-38] Frawley William. F. et al. (1992), "Knowledge Discovery in Databases: An Overview", AI Magazine (Vol 13, No 3), 57-70 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1011)

[Fayyad1996-39] Fayyad U. et al. (1996), "From Data Mining to Knowledge Discovery in Databases", AI Magazine (Vol 17, No 3), 37-54 (online full version: http://www.aaai.org/ojs/index.php/aimagazine/article/viewArticle/1230

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]

[38]

[39]