ROMANIAN JOURNAL OF INFORMATION SCIENCE AND TECHNOLOGY
Volume 4, Numbers 3-4, 2001, 353 - 372

The Trigram Statistical Structure in Printed Romanian

Adriana VLAD, Adrian MITREA, Mihai MITREA
"POLITEHNICA" University of Bucharest
Faculty of Electronics and Telecommunications

Abstract.
The main objective of this paper was to verify the stationarity hypothesis for printed Romanian language on the basis of the trigram structure. This was carried out by extending a statistical approach that we have advanced in a previous study for letter and digram structures. As a result, representative 95\% confidence intervals for each trigram, in various corpora, have been obtained. The stationarity hypothesis was strengthened by a mathematical comparison among and between various natural texts. The statistical inferences we used were: estimation theory with multiple confidence intervals, test of the hypothesis that probability belongs to an interval, and test of the equality between two probabilities. The evaluation of the type II statistical error probability enabled the accuracy in our measurements as well as the designing of a new corpus for mathematical purposes. The overall results point to printed Romanian stationarity.

Keywords
: natural language stationarity, trigram structure, multiple confidence intervals for probability, two types statistical errors.