Skip to main content

11. The Vector Data Type, and Amortised Analysis

10/03/23

Vector ADT

The Vector is an ADT, corresponding to generalising the notion of the Array Key Ideas:

  • The "index" of an entry in an array can be though of as the "number of elements preceding it"

  • E.g. in an array A, then A[2] has two elements, A[0], A[1] that precede it

  • It is then called a "rank"

  • The notion of "rank" can be more general than the idea of index

  • The Vector ADT is based on the array CDT, and stores a sequence of arbitrary objects

  • An element can be accessed, inserted or removed by specifying its rank (number of elements preceding it)

  • An exception is thrown if an incorrect rank is specified

Vector as a Stack

  • Common usage of a Vector is as a Stack
    • Add and remove elements at the end
  • This is very useful if reading elements from a file, and not knowing how many there will be

Applications

  • There is not an automatic limit on the storage size

  • Direction applications - sorted collection of objects

  • Indirect applications - Auxiliary data structure for many algorithms. Components of other data structures

  • Array-based Vector - Operation elemAtRank(r) is implemented in O(1)O(1) time by returning V[r]V[r]

  • Insertion - Need to make room for the new element by shifting forward the nrn-r elements. In the worst case (r=0)(r=0), this takes O(n)O(n) time

  • Deletion - Need to fill the hole left by removed element by shifting backward the nr1n-r-1 elements. In the worst case (r=0)(r=0) this takes O(n)O(n) time

Performance

  • Space used by the data structure is O(n)O(n)
  • size, isEmpty, elemAtRank and replaceAtRank run in O(1)O(1)
  • insertAtRank and removeAtRank run in O(n)O(n) time
  • push runs in O(1)O(1) time, as do not need to move elements
  • pop runs in O(1)O(1) time

Growable Array-based Vector

  • When the array is full, instead of throwing an exception, can replace the array with a larger one. Resizing has a high cost of O(n)O(n), as needs to copy at all nn elements
  • Compare the incremental strategy and the doubling strategy by analysing the total time T(n)T(n) needed to perform a series of nn push operations
  • Call amortized time of a push operation the average time taken by a push over the series of operations T(n)/nT(n)/n

Amortize

Meaning - Refers to writing off, or paying off, debts over a period of time. Similar to the way a mortgage for a house is paid back over many years, as opposed to need to pay all in one go.

Remarks on Amortised Analysis

  • Suppose individual operation takes time TT in the worst-case
  • Suppose do a sequence of operations:
    • Suppose s such operations take total time TsT_s
    • Then sTsT is an upper-bound to the total time TsT_s
    • But, such an upper-bound might not ever occur
  • The time TsT_s might well be o(sT)o(sT) even in the worst-case
    • The average time per operation Ts/sT_s/s can be the most relevant quantity in practice

Amortised - (long) real sequence of dependent operations Average - Set of (possibly independent) operations

Describing amortised costs

  • Usual big-Oh family is still used to described amortised analysis
  • Have different measures of the runtime cost:
    • Worse case - Cost per operation of a sequence not just worst case of a single operation

Incremental Strategy Analysis

  • Replace the array k=n/ck=n/c times
  • Each "replace" costs the current size
  • The total time T(n)T(n) of a series of nn push operations is proportional to
  • n+c+2c+3c+...+kc=n+c(1+2+3+...+k)=n+ck(k+1)/2n+c+2c+3c+...+kc=n+c(1+2+3+...+k)=n+ck(k+1)/2
  • Since cc is a constant, T(n)T(n) is O(n+k2)O(n+k^2), i.e. O(n2)O(n^2)
  • The amortized time of a push operation is O(n)O(n)
  • This is bad as the normal cost of a push is O(1)O(1)

Doubling Strategy Analysis

  • Assume n is a power of 2. Replace the array k=log2nk=\log_2n times
  • The total time T(n)T(n) of a series of nn push operations is proportional to
  • n+1+2+3+4+8+...+2k1n+1+2+3+4+8+...+2^{k-1}

Sometimes have to use geometric sequences. Have to set T(n)T(n) is O(n)O(n) and end up doing induction.