Skip to main content

22.02.22 - Machine Learning_ Techniques

Types of Tasks

Supervised Learning - Function from labelled training data, each example consisting of an input and a desired output (Classifications, regression)

Unsupervised Learning - Function to describe hidden structure from unlabelled data. There is basically no output

Classification - Learn to predict to which set a instance belongs to based on pre-labeled (classified) instances Supervised learning, many approaches: regression, decision trees, neural networks

Supervised Learning

Regression

Estimated relationship between variable Y and variable(s) X Function: Based on the given data to minimise its mean squared error (MSE) to 'fit' the data

Decision Tree

Internal Nodes: decision rules on features (decision variables, input) Leaf Nodes: a predicted class label(output) |Pros|Cons| |----|----| |Reasonable training time| Only simple decision boundaries| |Handle large features|Problem with lots of missing data| |Easy to implement|Cannot handle complicated relationship| |Easy to interpret|

Neural Networks

Set of neurons connected by directed weighted edges |Pros|Cons| |----|----| |Cam learn more complicated class boundaries|Hard to implement| |Can be more accurate| Slow training time| |Can handle large number of features|Can over fit the data| | |Hard to interpret|

Supervised Learning: Applications

Clustering

  • Given: Un-labeled data set D and Similarity/distance metric
  • Goal: Find 'natural' partitioning, or groups of similar data points
  • Application: Divide a market into distinct subsets of customers

Association Rules

Discover correlation between any two or more variables

  • Given: Set of records containing items
  • Goal: Produce dependency rules, to predict occurrence of one variable based on occurrences of another variable
  • Application: Market basket analysis

Correlation \ne Causation