Showing posts with label algorithms. Show all posts
Showing posts with label algorithms. Show all posts

2/22/15

Frequent ItemSets : Apriori Algorithm, Support and Confidence Part II

The Part I tutorial, is based on Apriori algorithm and we stated a few about association rules. Today, we will look about association rules, confidence and support. 

Association Rule

If we go by our previous post we defined learning association rule as means finding those items which were bought together most often i.e. single items, pair-wise items, triples etc.

In technical terms, If-then rules about the contents of the basket. Example is below:

Rule for {i1, i2, i3, i4, i5...., iN} -> j means : "if a basket contains all of i1,..., iN then its likely to contain an item j.

Confidence

Confidence of the association rule is the probability of j given i1,..., iN. Simple terms, it's the Ratio of support for I U { j } with support for I. Suppot of I is the number of baskets/transactions containing item I.

Example

Our Transactions/Baskets
Now if we want to check the association rule for {2, 4} -> 5.

The confidence is: Ratio of {2, 4} U {5} with support of {2, 4}. Therefore,

Confidence = 3 / 3 => 1

We can say that, {2, 4} -> {5} has a confidence of 1. But, we want to know how interesting the rule is. For this, we have an new parameter called Interest.

Interest of an association rule is the difference of it's confidence and the fraction of baskets which contain item j.

I ({2, 4} -> 5) =  conf( {2, 4} -> 5) - Fr(5)
                       = 1 - (3/4)
                       = 1 - .75
                       = .25

Therefore, the Interest is just 25 %. It's not an interesting rule. 

Interesting rules are those with high positive or negative interest values. As high positive or negative values means the presence of I encourages or discourages the presence of j.

Frequent ItemSets : Apriori Algorithm and Example Part I

This is the starting for our new Tutorial Topic, "Data Mining". Apriori Algorithm is one of the classic algorithm used in Data Mining to find association rules. An initial reading to Apriori might look complex but it's not. Let me give an example and try explaining it:

Suppose we have transactions of a shopping centre as below:


Learning association rule means finding those items which were bought together most often i.e. single items, pair-wise items, triples etc.

So, as I mentioned earlier Apriori is a classic and the most basic algorithm when it comes to find association rules. A lot of resources are available over the internet which we can find, but here I will try to make it intuitive and easy.

Algorithm:

- A two-pass algorithm which limits the need for main memory.
- One of the Key Idea behind Apriori is Monotonicity: If a set of items I appear at least s times, so does every subset J of I.

Pass 1: Read the baskets and count in main memory the occurrence/frequency of each item.

After the Pass 1, is completed, check the count for each item. And, if the count of item is more than equal to s i.e. Count(i) >= s, then the item i is frequent. Save this for next pass.

Pass 2: Read baskets again and count in main memory the occurrence/frequency of pair of items formed using the frequent items (which we got from Pass 1).

After Pass 2  end, check for the count of each pair of item and if more than equal to s, the pair if considered to be frequent, i.e. Cunt(i, j) >= s.



Example: 

We will consider few things:

- Our Support or threshold is 3.

Our Transaction Table: 


Step 1: Count the occurrence of each item.



Step 2: Remember, the algorithm says, an item is considered to be frequent if it's bought more then the Support/Threshold i.e. 3. Therefore, below is the list of Frequent Singletons.



Step 3: We start making pairs out of the frequent itemsets we got in the above step.


Step 4: After getting the frequent Item Pairs, we start counting the occurrence of these pairs in the Transaction Set.


Step 5: Now again, follow the Golden Rule, and discard non-frequent paris.



Now we have a table with pair of frequent items. Suppose we want to find frequent triplets. We the above table and make all the possible combinations.

Step 6: Make combinations of triples using the frequent Item pairs.

To make triples, the rule is: IF 12 and 13 are frequent, then the triple would be 123. Similarly, if 24 and 26 then triple would be 246.

So, using the above logic and our Frequent ItemPairs table, we get the below triples:


Step 7: Get the count of the above triples (Candidates).


After, this, if we can find quartets, then we find those and count their occurrence/frequency. 

If we had 123, 124, 134, 135, 234 and we wanted to generate a quartet then it would be 1234 and 1345. And after finding quartet we would have again got their count of occurrence /frequency and repeated the same also, until the Frequent ItemSet is null.

Thus, the frequent ItemSets are:

- Frequent Itemsets of Size 1: 1, 2, 4, 5, 6
- Frequent Itemsets of Size 2: 14, 24, 25, 45, 46
- Frequent Itemsets of Size 3: 245

To know more about how good the association rule formed is, i.e. calculating the confidence and  explanation of support, please click here for the Part II of this.



2/10/13

QuickSort Algorithm Tutorial

We have already done tutorial on Merge Sort and a tutorial on Heap Sort (Array Based) with both having a time complexity of O(n*log n). Here is another algorithm which has a time complexity of O(n*log n) and it's called QuickSort.


QuickSort as we all know has a similar approach to Merge Sort i.e. it uses Divide-and-Conquer recursive algorithm to sort the values. The difference being is it's an in-place sorting algorithm.


Basically an in-place algorithm is one which transforms the input using a data structure with a small, constant amount of extra storage space.

Binary Search Tree (BST) Algorithm Tutorial

Earlier we had a tutorial on Binary Seach Tree Basics, which you can check for refreshing the knowledge about it. Today we will be taking a look on BST algorithm and implementing it using Java.

Binary Search Tree is node based binary tree data structure with the following properties:

* The Left subtree contains the nodes with keys less than the node's key.
* The Right subtree contains the nodes with keys greater than the node's key.
* Both the right and left subtree should also be binary search tree.
* There should not be any duplicate nodes.

We have implemented below operations of Binary Search Tree:

* Searching
* Insert Node
* MinValue
* MaxValue

We will be seeing each of the operation and the corresponding Java code.


5/5/12

Algorithm Tutorials - Part 1 : Ad-Hoc Problems

I will start with the very basic of Algorithms.

Before moving on to the different algorithms, let's first see what does an algorithm actually mean?



Definition


A process or set of rules to be followed in calculations or other problem-solving operations, esp. by a computer.  


Aho, Hopcroft and Ullman, had stated in their book on algorithms, named, "The Design and Analysis of Computer Algorithms" that "Perhaps the most important principle for the good algorithm designer is to refuse to be content."



1/27/12

Minimax Algorithm Tutorial

There are plenty of application in AI but games are the most important thing in today's world and nowadays every OS comes with two player games like chess, checkers etc. So there are algorithm that help to design these games. These algorithm not only just help in making games but they also help in making the life of the player(i.e. user) tough in the game-play. So Minimax Algorithm is one such kind of algorithm. 



1/16/12

Counting Sort Algorithm with Example

After a long interval of time I am writing a post on ALGORITHMS. After writing posts on Heap Sort, Merge Sort and Insertion Sort, I decided to write on Counting Sort.

Counting sort is an algorithm for sorting a collection of objects according to keys that are small integers; that is, it is an integer sorting algorithm. It is a linear time sorting algorithm used to sort items when they belong to a fixed and finite set. 

9/24/11

HeapSort ( array Based) implementation in Java

There are two types of heaps. First one is Max heap and second one is Min heap. Heap (Max/Min) is a special type of binary tree.The roots of  the max heap is greater than its child roots. Other heap is Min heap it is also a special type of heap which has minimum root than his child. We can sort the array values using heap sorting algorithm. In this algorithm the heap build is used to rebuild the heap.


9/17/11

7/22/11

Insertion sort with program code and tutorial

Insertion sort is divided into two parts:
     1. Straight Insertion Sort
     2. Shell Sort 

Insertion sort is a simple sort algorithm, a comparison sort in which the sorted array [or list] is built one entry at a time. It is much less efficient on large lists than the more advanced  Algorithms such as Quick sort, Merge sort or Heap sort.


7/5/11

Merge Sort using Java with program code

In computer science, merge sort or mergesort is a sorting algorithm for rearranging lists (or any such linear sequential data storage structure) into a specified order. It is a particularly good example of the divide and conquer algorithmic paradigm. It is a comparison sort. Merging is the process of combining two or more sorted files into a third sorted file.


Conceptually, merge sort works as follows:


1. Divide the unsorted list into two sublists of about half the size
2. Sort each of the two sublists
3. Merge the two sorted sublists back into one sorted list