Matches in Nanopublications for { ?s ?p ?o <http://purl.org/np/RAf1cWkSXWFTHc88kqWkV_YbJrSVOu4tcfjEMj2cMieL8#assertion>. }
Showing items 1 to 2 of
2
with 100 items per page.
- paragraph type Paragraph assertion.
- paragraph hasContent "The unordered set of extracted verbs is the subject of a further analysis, which aims at discovering the most representative verbs with respect to the corpus. Two measures are combined to generate a score for each verb lemma, thus enabling the creation of a rank. We first compute the term frequency–inverse document frequency (TF-IDF) of each verb lexicalization (i.e., the occurring tokens) over each document in the corpus: this weighting measure is intended to capture the lexi- cographical relevance of a given verb, namely how important it is with respect to other terms in the whole corpus. Then, we determine the standard deviation value out of the TF-IDF scores list: this statistical measure is meant to catch heterogeneously distributed verbs, in the sense that the higher the standard deviation is, the more variably the verb is used, thus helping to understand its overall usage signal over the corpus. Ultimately, we produce the final score and assign it to a verb lemma by averaging all its lexicalizations scores. The top-N lemmas serve as candidate LUs, each evoking one or more frames according to the definitions of a given frame repository." assertion.