Shared Flashcard Set

Details

Title

Patterns in Language Final

Description

Introduction course to computational linguistics

Total Cards

Subject

Computer Science

Level

Undergraduate 3

Created

12/13/2015

Click here to study/print these flashcards.

Create your own flash cards! Sign up here.

Additional Computer Science Flashcards

Cards Return to Set Details

Term

Document Classification

Definition

Sort documents to user-defined classes

Term

Sentiment Analysis

Definition

Automate the selection of positive and negative terms in a document. Useful for political polls, marketing.

Term

Spam Identification

Definition

Calculating the frequency of n-grams in a certain language that are usually spam words.

Term

Rule based spam identification

Definition

Filters spam based on rules and adds weight to certain n-grams and once it passes some threshold, its identified as spam.

Term

Statistical approach spam identification

Definition

These learn from a large set of examples--one spam set, one ham set. They can adapt based on what emails are marked as spam by all or specific users.

Term

Rule based identification drawbacks

Definition

They are, by nature, one step behind spammers because a pattern has to be identified first and by that time, the spam is already out.

Term

Supervised learning

Definition

Training set and test set that is pre-programmed with the correct answers.

Term

Supervised learning method

Definition

1. Label a corpus of artciels with desired categories to make training and test sets
2. Apply machine learning software to the labeled training system set that summarizes whats been learned.
3. Generate predictions for test set model
4. Deploy model on untested set

Term

Unsupervised learning

Definition

There are no pre-assumed categories but there are now cluster articles that have similar properties, like being about sports. Its less costly because you dont have to sit someone down and label every single document but the clusters may not be intuitive and clustering solutions are difficult to evaluate.

Term

Feature-engineering

Definition

Looks at most relevant properties of spam

Term

Kitchen sink feature engineering

Definition

Use many features in the hope that some will be relevant and useful. Make every word a feature and choose a machine learning method that is good at focusing on few but important features and ignores irrelevant features.

Term

Hand crafted strategy of feature enginering

Definition

Carefully and thoughtfully identify a small set of features that are likely to be relevant. The downside is that you have the choose the features.

Term

Naive Bayes for document classification

Definition

Take a word. Count how much of that word is in spam and how much is in ham and calculat ethat ratio Then calculate the odds ratio (ham/total over spam/total). Combine the

Term

Bag of words assumption

Definition

Pretend you're dealing with an unstructured set of data that ignores syntax and topic structure. Put all the words of a document in a bag, draw a word and calculate which document its most likely to have come from.

Term

Perceptron

Definition

Error-driven learning. It predicts outcomes and then adjusts the weights when it makes the wrong prediction. Initially the weights are uninformative but over time it builds up an ability to associate features with outcomes. Its a network with two layers; one node for each possible unput features and one for each possible outcome (spam and ham)

Term

Past tense debate

Definition

How do people learn regular and irregular forms of words?

Term

U-shaped curve

Definition

Star with good performance on some task, then get substantially worse, and then gradually get better again.

Term

Wug test

Definition

A test given to kids with a made up noun, "wug" and see if kids can determine the plural form.

Term

Gricean Maxims

Definition

Quantity: keep it short and sweet. Not TMI.
Quality: Don't lie or be sarcastic.
Relation: Say things that are pertinent to the question.
Manner: Be clear, brief, and orderly.

Term

SHRDLU

Definition

A robot that was an expert is moving shapes around. like, REALLY good. This showed that AI is successful but only in a very controlled and within a specific domain

Term

The chinese room

Definition

A man sits in a room with a Chinese rule book. The input is in English, he translates it using the rule book, and outputs in perfect chinese. Does he know chinese? Does the room know chinese?

Term

Eliza

Definition

A therapy model that wasnt very good at her job.

Term

Semantics

Definition

the logical aspects of language and its meaning

Term

Pragmatics

Definition

How context contributes to meaning

Flashcard Machine - create, study and share online flash cards

Shared Flashcard Set

Details

Additional Computer Science Flashcards

Cards Return to Set Details

My Flashcards

Flashcard Library

Browse

About

Help

Mobile