AI:LLM:GREP: Regular expressions using "grep"

The regular expression [ˆa-zA-Z], which we used to avoid embedded instances of "the", implies that there must be some single (although non-alphabetic) character before the the. We can avoid this by specifying that before the the we require either the beginning-of-line or a non-alphabetic character, and the same at the end of the line:

grep -E "(^|[^a-zA-Z])[tT]he([^a-zA-Z]|^)" wizard_of_oz

The process we just went through was based on fixing two kinds of errors: false false positives positives, strings that we incorrectly matched like other or there, and false negafalse negatives tives, strings that we incorrectly missed, like The. Addressing these two kinds of errors comes up again and again in implementing speech and language processing systems. Reducing the overall error rate for an application thus involves two antagonistic efforts:

• Increasing precision (minimizing false positives)

• Increasing recall (minimizing false negatives)

Some aliases for common ranges, which can be used mainly to save typing:

elaprendiz0000

Buscar este blog

AI:LLM:GREP: Regular expressions using "grep"

Etiquetas

Comentarios

Publicar un comentario

Entradas populares de este blog

[Validación Cruzada] [Machine Learning] [Evaluación de Modelos] [Ciencia de Datos] [R Programming] [Resampling] Validación Cruzada: Concepto y Técnicas Principales

[DATA SCIENCE] [R PROGRAMMING] [DATA VISUALIZATION] Explorando Técnicas de Análisis y Visualización de Datos en R

[Machine Learning][Python][Clasificación] Understanding Support Vector Machines with Python