Abstract.
The main
objective of this paper was to verify the stationarity hypothesis for printed Romanian
language on the basis of the trigram structure. This was carried out by extending a
statistical approach that we have advanced in a previous study for letter and digram
structures. As a result, representative 95\% confidence intervals for each trigram,
in various corpora, have been obtained. The stationarity hypothesis was strengthened by a
mathematical comparison among and between various natural texts. The statistical
inferences we used were: estimation theory with multiple confidence intervals, test of the
hypothesis that probability belongs to an interval, and test of the equality between two
probabilities. The evaluation of the type II statistical error probability enabled the
accuracy in our measurements as well as the designing of a new corpus for mathematical
purposes. The overall results point to printed Romanian stationarity.
Keywords: natural language stationarity, trigram structure, multiple confidence
intervals for probability, two types statistical errors. |