ROMJIST Volume 26, No. 1, 2023, pp. 3-20, DOI: 10.59277/ROMJIST.2023.1.01
Bob CHEN, Weiming PENG, Jihua SONG A Frequent Construction Mining Scheme Based on Syntax Tree
ABSTRACT: Natural language processing (NLP) is one of the main research directions in artificial intelligence. One of the goals of NLP is to identify various semantic information in the text. Currently, the mainstream semantic recognition tasks focus more on using the semantic information of each word in the text to perform semantic analysis of the entire sentence. The research on semantics in cognitive linguistics indicates that semantics is determined by both the words contained in the sentence and the arrangement of the words. Linguists refer to permutations and combinations containing certain semantic information as constructions. Since the construction plays an essential role in semantic information, identifying various constructions in text is a crucial work of semantic recognition tasks. Based on this background, the main works performed in this paper are as follows: 1) The definition and program representation of constructions and the corresponding constraints in NLP tasks are proposed. 2) A frequent construction mining algorithm is proposed to extract frequent structures that meet the construction requirements in the grammar structure tree. Based on the above works, the corresponding construction database can be extracted for the specified natural language corpus, which is helpful for more effective text semantic analysis.KEYWORDS: Construction; data mining; semantic recognition; sequential pattern miningRead full text (pdf)
