出版时间:2010-6 出版社:东南大学出版社 作者:(英)伯德,(英)克莱因,(美)洛普 页数:479
Tag标签:无
前言
This is a book about Natural Language Processing. By "natural language" we mean alanguage that is used for everyday communication by humans; languages such as Eng-lish, Hindi, or Portuguese. In contrast to artificial languages Such as programming lan-guages and mathematical notations, natural languages have evolved as they pass fromgeneration to generation, and are hard to pin down with explicit rules. We will takeNatural Language Processing——-or NLP for shortmin a wide sense to cover any kind ofcomputer manipulation of natural language. At one extreme, it could be as simple ascounting word frequencies to compare different writing styles. At the other extreme,NLP involves "understanding" complete human utterances, at least to the extent ofbeing able to give Useful responses to them. Technologies based on NLP are becoming increasingly widespread. For example,phones and handheld computers support predictive text and handwriting recognition;web search engines give access to information locked up in unstructured text; machinetranslation allows us to retrieve texts written in Chinese and read them in Spanish. Byproviding more natural human-machine interfaces, and more sophisticated access tostored information, language processing has come to play a central role in the multi-lingual information society.This book provides a highly accessible introduction to the field of NIP. It can be usedfor individual study or as the textbook for a course on natural language processing orcomputational linguistics, or as a supplement to courses in artificial intelligence, textmining, or corpus linguistics. The book is intensely practical, containing hundreds offully worked examples and graded exercises.
内容概要
《Python自然语言处理(影印版)》提供了非常易学的自然语言处理入门介绍,该领域涵盖从文本和电子邮件预测过滤,到自动总结和翻译等多种语言处理技术。在《Python自然语言处理(影印版)》中,你将学会编写Python程序处理大量非结构化文本。你还将通过使用综合语言数据结构访问含有丰富注释的数据集,理解用于分析书面通信内容和结构的主要算法。 《Python自然语言处理》准备了充足的示例和练习,可以帮助你: 从非结构化文本中抽取信息,甚至猜测主题或识别“命名实体”; 分析文本语言结构,包括解析和语义分析; 访问流行的语言学数据库,包括WordNet和树库(treebank); 从多种语言学和人工智能领域中提取的整合技巧。 《Python自然语言处理(影印版)》将帮助你学习运用Python编程语言和自然语言工具包(NLTK)获得实用的自然语言处理技能。如果对于开发Web应用、分析多语言新闻源或记录濒危语言感兴趣——即便只是想从程序员视角观察人类语言如何运作,你将发现《Python自然语言处理》是一本令人着迷且极为有用的好书。
作者简介
伯德(Steven Bird)是墨尔本大学计算机科学和软件工程系副教授,以及宾夕法尼亚大学语言数据联合会高级研究助理。 克莱因(Ewan Klein)是爱丁堡大学信息学院语言技术教授。 洛普(Edward Loper)最近从宾夕法尼亚大学获得机器学习自然语言处理博士学位,目前是波士顿BBN Technologies公司的研究员。
书籍目录
Preface1.Language Processing and Python1.1 Computing with Language: Texts and Words1.2 A Closer Look at Python: Texts as Lists of Words1.3 Computing with Language: Simple Statistics1.4 Back to Python: Making Decisions and Taking Control1.5 Automatic Natural Language Understanding1.6 Summary1.7 Further Reading1.8 Exercises2.Accessing Text Corpora and Lexical Resources2.1 Accessing Text Corpora2.2 Conditional Frequency Distributions2.3 More Python: Reusing Code2.4 Lexical Resources2.5 WordNet2.6 Summary2.7 Further Reading2.8 Exercises3.Processing Raw Text3.1 Accessing Text from the Web and from Disk3.2 Strings: Text Processing at the Lowest Level3.3 Text Processing with Unicode3.4 Regular Expressions for Detecting Word Patterns3.5 Useful Applications of Regular Expressions3.6 Normalizing Text3.7 Regular Expressions for Tokenizing Text3.8 Segmentation3.9 Formatting: From Lists to Strings3.10 Summary3.11 Further Reading3.12 Exercises4.Writing Structured Programs4.1 Back to the Basics4.2 Sequences4.3 Questions of Style4.4 Functions: The Foundation of Structured Programming4.5 Doing More with Functions4.6 Program Development4.7 Algorithm Design4.8 A Sample of Python Libraries4.9 Summary4.10 Further Reading4.11 Exercises5.Categorizing andTagging Words5.1 Using a Tagger5.2 Tagged Corpora5.3 Mapping Words to Properties Using Python Dictionaries5.4 Automatic Tagging5.5 N-Gram Tagging5.6 Transformation-Based Tagging5.7 How to Determine the Category of a Word5.8 Summary5.9 Further Reading5.10 Exercises6.Learning to Classify Text6.1 Supervised Classification6.2 Further Examples of Supervised Classification6.3 Evaluation6.4 Decision Trees6.5 Naive Bayes Classifiers6.6 Maximum Entropy Classifiers6.7 Modeling Linguistic Patterns6.8 Summary6.9 Further Reading6.10 Exercises7.Extracting Information from Text7.1 Information Extraction7.2 Chunking7.3 Developing and Evaluating Chunkers7.4 Recursion in Linguistic Structure7.5 Named Entity Recognition7.6 Relation Extraction7.7 Summary7.8 Further Reading7.9 Exercises8.Analyzing Sentence Structure8.1 Some Grammatical Dilemmas8.2 Whats the Use of Syntax?8.3 Context-Free Grammar8.4 Parsing with Context-Free Grammar8.5 Dependencies and Dependency Grammar8.6 Grammar Development8.7 Summary8.8 Further Reading8.9 Exercises9.Building Feature-Based Grammars9.1 Grammatical Features9.2 Processing Feature Structures9.3 Extending a Feature-Based Grammar9.4 Summary9.5 Further Reading9.6 Exercises10.Analyzing the Meaning of Sentences10.1 Natural Language Understanding10.2 Propositional Logic10.3 First-Order Logic10.4 The Semantics of English Sentences10.5 Discourse Semantics10.6 Summary10.7 Further Reading10.8 Exercises11.Managing Linguistic Data11.1 Corpus Structure: A Case Study11.2 The Life Cycle of a Corpus11.3 Acquiring Data11.4 Working with XML11.5 Working with Toolbox Data11.6 Describing Language Resources Using OLAC Metadata11.7 Summary11.8 Further Reading11.9 ExercisesAfterword: The Language ChallengeBibliographyNLTK IndexGeneral Index
章节摘录
Back in elementary school you learned the difference between nouns, verbs, adjectives,and adverbs. These "word classes" are not just the idle invention of grammarians, but are useful categories for many language processing tasks. As we will see, they arise from simple analysis of the distribution of words in text. The goal of this chapter is to answer the following questions: 1. What are lexical categories, and how are they used in natural language processing? 2. What is a good Python data structure for storing words and their categories? 3. How can we automatically tag each word of a text with its word class? Along the way, well cover some fundamental techniques in NLP, including sequence labeling, n-gram models, backoff, and evaluation. These techniques are useful in many areas, and tagging gives us a simple context in which to present them. We will also see how tagging is the second step in the typical NLP pipeline, following tokenization. The process of classifying words into their parts-of-speech and labeling them accord-ingly is known as part-of-speech tagging, POS tagging, or simply tagging. Parts-of-speech are also known as word classes or lexical categories. The collection of tags used for a particular task is known as a tagset. Our emphasis in this chapter is on exploiting tags, and tagging text automatically.
媒体关注与评论
“很少有这样一本方法清晰、代码整洁的书来讨论如此高难度的计算机问题……这是学习自然语言处理的入门佳作。” ——Ken Getz,资深咨询顾问,MCW Technologies公司
图书封面
图书标签Tags
无
评论、评分、阅读与下载