Sunday, 30 March 2014

Week 11: Sorting and Efficiency

We focused on sorting algorithms in the past two weeks. When judging the sorting algorithms, we usually look at their efficiency and memory requirement. Efficiency affects the overall performance of a program. To evaluate efficiency, we compare the processing time to produce the desired output for different algorithms, and use Big-O to denote the worst case or the upper bound for the number of steps required for a sorting algorithm. The number of steps required for sorting an input often vary dependent on the size of the input. The exact number does not matter, but how the number varies when the size of the input increases does.

We have learned bubble, insertion, and selection sort in csc108. These three sorting algorithms are relatively more time-consuming. Bubble sort compares adjacent items and keep "bubbling" the larger item to the right (the largest to its final place) and then sort the remaining n-1 items. Insertion sort takes the first unsorted item and inserts it into the proper position in the sorted list. Selection sort selects the smallest item from the unsorted side and swap it to the end of the sorted side. They all have a performance of O(n^2) in the worst scenarios.

One thing to notice is that Big-O notation abstracts away the difference between the similar worst-case performance. Constant coefficients are dropped out, but they actually vary across different algorithms, so many algorithms with the same efficiency do not have the same running time on the same input. For example, insertion sort and selection sort have the same efficiency (Big-O) in their worst cases, but insertion sort runs often faster than selection sort. Big-O is useful for the big picture.

In the lectures, quick sort and merge sort were introduced. In quick sort, we pick a pivot, partition the other items into a "smaller than pivot" sublist and a "larger than pivot" sublist, quick sort the sublists, and then combine them into the sorted list. Processing time for quick sort depends on how the pivots were chosen. It performs well in O(n*log(n))when pivot is randomly chosen that help avoid the possible performance of O(n^2) in the worst case when sorting a sorted list. In merge sort, we divide the list in half, mergesort the two halves, and then merge them in linear time. Merge sort runs in O(n*log(n)) in the best and the worst scenarios.

When evaluating sorting algorithms, we usually need to consider all their best, worst, and average performance. For instance, insertion sort has running time of O(n) when dealing with a sorted list and running time of O(n^2) when sorting a reverse sorted list. Its average performance is O(n^2), However, while quick sort also has a performance of O(n^2) in the worst case, its average running time is O(n*log(n)), so overall quick sort is a more efficient sorting algorithm than insertion sort.

Efficiency of a sorting algorithm also depends on the data structure that algorithm runs on. Time for merge sort to run on a Linked List will be much less than the time needed for merge sort to process on a list object.

Week 10: Assignment 2

This week I was mainly dealing with assignment two, which is on hierarchy of classes and recursion. The difficult point is to split the string into regexes and get the most outward symbol to decide the sort of regex tree we need to do with.

is_regex and build_regex_tree are implemented in very similar way of thinking. So are all_regex_permutations and regex_match. With the help of the gen_perm funcion we discussed in the previous lecture, all_regex_permutaions is not hard to implement. What we are doing with regex_match is also getting all possible combinations of strings and filterring to see if there is a meaningful match. In terms of StarTree, the hint given in class to think of the breakdown of n strings equivalently as two separate strings helped a lot.

One last thing we should be careful with is to reduce redundancies in the code. How we organize the order of our conditions to sort out the type of regex tree to deal with needs consideration.

Sunday, 16 March 2014

Week 9: Binary Search Tree

This week the focus is on Binary Search Tree. We have been introduced this structure last week and it is further explored with emphasis on efficiency. In the lecture, we firstly discussed the different cases of deletion of data from BST rooted at node root. The most complex case is the deletion of the node itself. We need to search for the leaf at the right end of the left subtree or the leaf at the left end of the right subtree to replace the root and link it to the remaining subtrees. This search and replacement is efficient in a binary search tree. Then we developed the topic a bit to the measure of performance of sorting algorithms. When evaluating the complexity, we ignore the difference between the steps not dependent on the list size n.

During the lab, we implemented the count_less method in different classes regarding binary search trees. This actually raised my attention on use of helper functions. I seldom used helper functions in my assignments and exercises. However, they are quite useful when the recursion steps are done on another type of structure.

Monday, 10 March 2014

Week 8: LinkedList

This week's topic is still LinkedList. Before the lecture and during the lab, I thought of LinkedList as an object containing different nodes and each node has a reference to the next node. This created great difficulty in communication with my partner during this week's lab, since he thought of LinkedList as a node that has a reference to the remaining list. The confusion was cleared in the evening's lecture. Dustin introduced to us two ways of thinking about LinkedList:

           1. lists made of item (value) and remaining list (rest)
           2. objects (nodes) with a value and a reference to other similar objects

We went through some stuff on Binary Search Tree as well. We talked about the implementation of methods like insertion, deletion, search an item on both LinkedList and BST. In terms of search of an item in these two structures, it is quite efficient to quick search an item in a BST. However, you have to go through every item in a LinkedList to find an item, which is not inefficient compared to BST.

Sunday, 2 March 2014

Week 7: Recursion

Recursion, by definition, is a function that calls itself.  A recursive function is usually composed of a base case, the recursive step, and the actual work. Recursion simplifies the problem by dividing it into the same but smaller versions of the problem, and you just need to do a tiny bit of work and than repeat the same thing again and again to solve the problem. Thus recursion is the key to many important and complex algorithms.

Monday, 17 February 2014

Week 5 & Week 6: Assignment 1 and Recursion

These two weeks I worked hard on recursion and assignment 1. Though I really had a hard time on writing recursive functions, fortunately my codes work at the end.

The first 4 steps of the assignment are rather simple and we need to be careful when dealing with things related to starter code. (by the way the GUI window is impressively cool!) Step 5 is about using recursive functions to move cheeses between 4 stools within the least steps.

In last week's lab, we first practiced writing recursion, and I found it is actually difficult even though I could read and understand the recursion examples shown in class. I redid the problems and the class examples after the lab and lecture and improved. Through hard try I found a tip: don't trace too far. Do not trace a very complex example. It might frustrate you. I was usually thinking too complicated and tracing to even smaller subproblems when writing the recursive step, confusing myself, and often I lost my ideas on what to do to solve the problem. Finding the base case is not a problem to me, what I should do in this stage is simply following the instruction on how a problem is brought into smaller subproblem by calling the function itself and interpret it using codes. (P.S., If you really want to trace a function step by step, Visualizer is a helpful tool.)

To step 5 of the assignment, it's the same procedure. First moving n-i cheeses to an intermediate stool, then moving the remaining cheeses to the destination, and finally moving the n-i cheeses from the intermediate stool to the destination solves the problem. The example of Tower of Ann Hoy using three stools also gave me a good insight to implement the function.

This week we learned a new concept Tree. It's very like the SearchList we did in the lab before. It is relatively easy to understand after my practice of recursion for assignment one. The lab is not hard this week. We were asked to reimplement functions using for loop instead of list comprehension, like what we did in csc108.

Sunday, 2 February 2014

Week 4: Unit test and first glimpse of recursion

This week we did some exercise of inheritance and exception in the lab. My partner and I were both not familiar with writing unittest cases, especially on how to check whether or not an exception was correctly raised. We checked the documentation and learned to use assertraises, and with the clarification later by the TA we successfully got it done. Checking the slides posted on past course websites can sometimes help as well. During the lecture, we saw several examples on recursion. We actually have seen examples of recursion since the very first lecture so it is becoming much easier to understand now. Dustin gave a very specific one with each execution clearly explained, which is helpful. He also introduced functional programming and functions with lambda to help us understand.

Thursday, 30 January 2014

Week 3: Object-Oriented Programming

Object-oriented programming is a programming paradigm that uses objects to represent data and functionality. OOP can be view as a collection of interactive objects that correspond to complex concepts in the real world which combine data with the sets of methods for accessing and managing data. Object-oriented programming is important for computer scientists because it keeps data safer by making it not accessible by the rest of the program. Abstract datatypes are implemented while details are hidden. Also, new objects with small difference to existing ones can be easily created since OOP makes it easy to maintain and modify existing code.