Text and Language Technology Group
Text-Tech

A Definition of Document Analysis

Document and/or Text Analysis refers to computer-assisted analysis of large numbers of documents in order to answer questions about the content of a document set. The goal of document analysis is to determine content, locate specific documents types, or extract language features without the expense of reading each document in a given set. Reliable document analysis is not a wholly automated process. Initially computers are used to perform large numbers of comparisons which exploit linguistic differences between a norm and an experimental document or a document set. These comparisons are then subject to statistical analysis which assists in reducing the data to a manageable level for interpretation by an experienced linguist. That is, the primary analysis is done by the linguist who uses a computer to reduce data to a interpretable level. Having done the primary analysis, it is often possible to develop more automated algorithms.

Example >>


TLTG Home Forensic Doc. Analysis Text Encoding Lexicography Members Services Site Map

700 Oglethorpe Ave. •  Athens, Georgia 30606 •  Phone: 706-549-5519 •  Fax: 706-549-1228 •  mail to