Research with DEX: TreeBank browser, the syntactical structures specialized search engine

We are glad to share a new application powered by DEX for searching syntactical structures called TreeBank Brower implemented by the Institut Universitari de Lingüística Aplicada (IULA)  at the Universitat Pompeu Fabra (Barcelona).

The TreeBank browser is an interesting tool addressed to linguists, which contains a Spanish treebank with more than 42.000 sentences syntactically annotated. Dependency grammar is the formalism used to represent the syntactic information. Such formalism allows seeing a sentence as a graph, therefore all the syntactic information in the corpus is represented as a DEX directed graph, being the nodes the words in the corpus and the dependencies the relationship edges. The dependency is the annotation among related words, for instance the fragment “Sr. Salvatori vendió” would be represented by the following subgraph:

Treebank browser

 

 

“Sr. Salvatori ” and “vendió” are two nodes in a sentence of the corpus and the relationship between them is a dependency called “SUBJ” (Subject). It means that in a given sentence the main verb is “vendió” (to sell) and “Sr. Salvatori” is the subject who sold something.

All the sentences in the corpus have been semi-automatically analysed using a grammar with a predefined set of dependency relations such as subject, direct object, specifier, modifier, punctuation … Consider a more complete example in the following sentence “Además, la memoria es la base para el aprendizaje”. It has been analyzed and showed by the Treebank browser as follows:

Treebank browser

 

The TreeBank browser allows searching for sentences in the corpus that satisfies a user defined patterns. Such patterns take into consideration both dependencies and words information; the latter may include any combination of part-of-speech, word form and lemma. As an example, we may query for all the sentences in the corpus whose main verb is “establecer” with a modifier and has a common noun as SUBJECT.

Taking profit of the nature of the graph, there are no restrictions in the position of the elements in a search. Therefore, in the previous query, we will find a solution independently of the relative position of each item of the query in the sentences (ex. subjects/modifiers in preverbal or postverbal position). For each result, the user may download it in a standard tabulated form or as a graph by exporting it to a standard graphml format, which makes the information more attractive and readable.

Treebanks are a resource for developing a number of useful tools in the Natural Language Processing area like training of parsers and taggers, work on machine translation and speech recognition among many others.

For more information about the tool please take a look at the documentation available in the TreeBank browser website.

If you would also like to use DEX to power your research do not hesitate to contact us and request a free research license to join the research program!


—-

* This article appeared first on DZone

This entry was posted in DEX, Research, Use Case and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>