July 24, 2016

SDET Prepwork: What is an SDET? Intro to the Efficiency of Sorting Algorithms: Insertion sort vs Merge sort

This blog post is part of a series as I research how to move from Automation Development to being a Software Developer in Test. These next few posts will deal with algorithms.

What is the Difference between Automation Developers and Software Developers in Test?


Automation Developers Use Code to Create Tests


Have you ever had to perform browser testing on your web application just before a release to production, taking a week or so to do regression testing, making sure that after the new functionality was added that all of the old stuff still worked? And you have ended up staring at the same twenty pages in a variety of Internet Explorer browsers, Firefox, and Chrome, and attempted to make sure the elements on the page are there? That you can still log in? That you can still perform the same actions in all of the browsers? Don't you wish that someone could just write a computer program to do this testing for you?

Create Browser User Interface Tests: When it comes to browser testing, automation developers solve this problem, by pairing Selenium WebDriver with a computer language such as Java to write clean, readable test code that uses basic object-oriented design principles such as:
  • Factoring out any commonalities in your tests into separate methods so it is easier to maintain.  
  • Grouping the web elements and methods that interact with the web elements on a page into Page Objects
  • Creating wrappers for Selenium objects to provide better diagnostics when something goes wrong... Or using the Page Factory to lazy load the web elements initializing them just before you need them.
  • How to use RemoteWebDriver and implement different browsers into your testing with Selenium Grid, Browser Stack or Sauce Labs.
Write Performance Tests: An Automation Developer deals with Stress Testing, Performance Testing and Load Testing using tools such as Apache JMeter.

Script Database Tests: Want to make sure that the data entered in the browser user interface is stored correctly in the database? Use libraries in your favorite programming language that handle opening connections to the database, and SQL scripts that query that the data was placed in the correct table.

As an Automation Developer, you might be working with Arrays, Lists, and ArrayList to assemble and store data, for loops and foreach loops, but their focus is on creating the tests.

Software Developers in Test Use Code to Write Test Frameworks



Software developers in Test take what the Automation Developer does one step further.

From the Microsoft Developer article, What is an SDET? 

"The SDET role can be compared with that of the SDE (Software Development Engineer) role. The latter is what most people would think of when they say 'Developer'. An SDE (depending on level and seniority) will Architect, Design, and Implement software systems that will be used by the target end users. SDEs are responsible for the quality of their software and will implement testing (often unit tests), as well as seek reviews for both their designs and their code. SDEs will often have influence on the software processes employed, and be responsible for following best practices in software development.

"The SDET is also a developer. The SDET must have knowledge of what represents good software design. The SDET must be able to create high quality, maintainable [...] code. The code generally created by the SDET however are for automated test cases and the frameworks to execute and report them. An SDET’s knowledge of software design is often focused on testability, robustness, and performance, and they usually play a contributory or reviewer role in the creation of designs for production software. An SDETs skill set will often include more experience in software processes and how to test software. Testing software is generally  done so as to assess and mitigate risk, and SDETs need to be expert in this. So an SDET can be summarized as having development skills and software quality skills. While SDEs should have this also, SDEs balance more towards the former while SDETs balance more towards the latter".


From talking to other Software Developers -- both in Test and in general -- I've gathered that if you want to make the switch, you really have to start thinking more about performance.

They need to focus on:
  • Figuring out how to store data more efficiently in data structures such as stacks, queues, linked lists, hash maps, heaps, and binary search trees.
  • Learning what a linear sort is and why it is inefficient, and figuring out different ways to search through data studying known search algorithms and sorting algorithms such as binary search, quicksort or mergesort. 
  • Understanding where you want to focus on: If memory is at a premium, you want your solution to use the minimum amount of space possible. If time the major factor, you want to make your solution as quick as possible. Your solution should take into consideration these space complexities or time complexities
  • Comparing and contrasting how fast your solution is using Big-O notation, and figuring out if your solution you are using will scale.


Worrying About Software Performance? Study Data Structures and Algorithms


To help me along, I decided to dive off the deep end this weekend with Thomas M. Cormen's Introduction to Algorithms, Third Edition (2009), and to help stick what I am reading into my mind, I decided to blog about it. I'll try to paraphrase the math the best I can, though it has been decades since my courses in Discrete Mathematics.

This isn't the first time I have encountered search algorithms: I was a Computer Science major in the early 1990s, and finished my Masters of Software Engineering in 2008.

... I'm still a bit of a greenhorn, though, when it comes to actually being Software Engineering... up until a few years ago, my work was in Software Testing and Quality Assurance. If you happen to actually be a software engineer, and I say something wacky, or completely untrue out of ignorance, please feel free to correct me in the comments section on the bottom of this page.

data structure is a way to store and organize data so it easier accessing, modifying and using it. I think of a search or sorting algorithm like a tried-and-true recipe others can use to figure out how to locate and organize data. Once you understand the recipes others came up with, then you can come up with little tweaks here and there, building your own algorithms.

Measuring Efficiency of Algorithms


Introduction to Algorithms, Third Edition is not an easy read. For instance, allow me to take an introductory problem as shown on Chapter 1: The Role of Algorithms in Computing, pages 12 and 13, in a paragraph called "Efficiency" and attempt to walk you through solving this problem:
  • Imagine two computer: a fast computer running a very slow sorting algorithm such as insertion sort, and a slow computer running a fast sorting algorithm such as mergesort. How long would it take to sort 10 million numbers? What difference would the better algorithm make? 

First, let's see how long these algorithms takes to sort an unknown amount of numbers, we'll call "n".

The Inefficiency of Insertion Sort

The slow sorting algorithm insertion sort, the book mentions, to sort n numbers, it takes n2 (or n * n ) long to sort through numbers. To sort two numbers it is two * two = four. It uses exponents, raising each number to the second power:
  • If n = 2, sorting two numbers, the time it takes is 22: 4
  • If n = 3, sorting three numbers, the time it takes is 32: 9
  • If n = 10, 102: 100
  • If n = 20, 202: 400

The Efficiency of Merge Sort


The faster sorting algorithm merge sort, the book mentions, to sort n numbers, it takes n log n to search.

Logarithms are sort of like the inverse of exponents.
  • If n = 8, the log n is 3, since 23 (2*2*2): 8.
  • If n = 16, the log n is 4, since 24: (2*2*2*2): 16.
  • If n = 32, the log n is 5, since 25: 32.
  • If n = 64, the log n is 6, since 26: 64.
  • If n = 128, the log n is 7, since 27: 128.
  • If n = a thousand, the log n is roughly 10, since 210: 1024.
  • If n = a million, log n is roughly 20 since 220 is roughly a million.

You can plot the efficiency of the two algorithms on a graph, like so, where n can be the amount of numbers that are being sorted. Plot on the x-axis how many numbers you are sorting. The y-axis will plot the factor of time:
From Khan Academy, Algorithm comparison
If you are using an insertion sort, you can see that with small numbers, the insertion sort is quicker. But if you are trying to sort, say, ten million numbers, the better algorithm, mergesort will win out.

Let's use the example in the book, and say we are using two different computers, a fast one using the slow insertion sort and a slow computer using mergesort. Which would win sorting ten million numbers?

Sure, ten million numbers seems like a lot, but the book mentions if each number is only stored as an 8 byte integer, then the amount of space to store those numbers would only take up 80 MB (megabytes) in RAM when doing calculations.

Getting down to specifics...
  • Fast computer: Executes 10 billion instructions per second. 
  • Slow computer: Executes 10 million instructions per second.

In case you are not familiar with the powers of ten:
  • Ten is 101
  • A hundred is 102, or 100
  • A thousand is 103, or 1000
  • Ten thousand is 104, or 10,000
  • A hundred thousand is 105, or 100,000
  • A million is 106, or 1,000,000
  • Ten million is 107, or 10,000,000
  • Ten billion is 1010, or 10,000,000,000
We are sorting 107 numbers on a fast computer/ slow algorithm that runs 1010 instructions a second and a slow computer / faster algorithm that runs 107 instructions a second.

Let's complicate the issue further... why? ... because it will make the math easier with the two different algorithms:
  • On the fast computer, an expert programmer writes the slow insertion sort in the machine language, so that it only takes 2n2 instructions to sort n numbers.
  • On the slow computer, an average programmer uses a programming language and compiler that takes 50 log n instructions to sort n numbers for the faster mergesort.  

Calculating Runtime of Algorithms

Slow Algorithm / Fast Computer

Given that:
  • Insertion sort has an poor efficiency of only n2, where n is 107 (an array of ten million numbers)
  • The insertion sort is implemented at the fast rate of 2n2 to sort n numbers, where n is 107.
  • We are dividing everything by 1010 (10 billion) instructions per second.
This gives us: 2 * (107)2 instructions divided by 1010 instructions per second.

With Exponent Math, lets calculate (107)2:
  • Since 7 * 2 = 14, (107)2 gives us 1014.
  • This gives us 2 * (1014) instructions divided by 1010 instructions per second.

Now let's figure out how many seconds this operation will actually take:
  • Take 1014 and divide it by 1010
  • Since 14 minus 10 is 4, then 1014 divided by 1010 is 104
  • We are left with 2 * 104: 20,000 seconds
  • Since there is 60 seconds in a minute and 60 minutes in an hour, that means there is 3600 seconds in an hour.
  • Dividing 20,000 minus 3600 is roughly 5.5. 
To sort ten million numbers, it is going to take...
... even on the extremely fast computer ...
... using a very quick method to implement the very slow algorithm...

... Five And A Half Hours!

Um... Wow.

Let's take a look at what is happening with the other computer...

Fast Algorithm / Slow Computer


Given that:
  • Merge sort has a better efficiency of n log n, where n is 107 (an array of ten million numbers)
  • The merge sort in this example is implemented at the slower rate of 50n log n to sort n numbers, where n is 107.
  • We are dividing everything on the slower computer by 107 (one million) instructions per second.
This gives us: 50 * 107 log 107 instructions divided by 107 instructions per second.

According to the book, this equals approximately 1163 seconds.
  • There is 60 seconds in a minute.
  • Dividing 1163 by 60 is 19.38 minutes
To sort ten million numbers, it is going to take...
... on the slow computer ...
... using a slow method to implement the faster algorithm...

A bit less than twenty minutes.


Where We Go From Here


Over the next few weeks, we will be looking at topics such as:
  • Big-O Notation and more about Logarithms
  • Data structures such as trees and hashmaps
  • More searching and sorting algorithms such as mergesort, quicksort, and other algorithms, and how to implement them in Java.

Or, if you want to do some independent study, you can jump to Eric Drowell and his awesome Big O Cheat Sheet at http://bigocheatsheet.com/

Big-O Complexity Chart

ExcellentGoodFairBadHorrible

Array Sorting Algorithms

AlgorithmTime ComplexitySpace Complexity
BestAverageWorstWorst
QuicksortO(n log(n))O(n log(n))O(n^2)O(log(n))
MergesortO(n log(n))O(n log(n))O(n log(n))O(n)
TimsortO(n)O(n log(n))O(n log(n))O(n)
HeapsortO(n log(n))O(n log(n))O(n log(n))O(1)
Bubble SortO(n)O(n^2)O(n^2)O(1)
Insertion SortO(n)O(n^2)O(n^2)O(1)
Selection SortO(n^2)O(n^2)O(n^2)O(1)
Tree SortO(n log(n))O(n log(n))O(n^2)O(n)
Shell SortO(n log(n))O(n(log(n))^2)O(n(log(n))^2)O(1)
Bucket SortO(n+k)O(n+k)O(n^2)O(n)
Radix SortO(nk)O(nk)O(nk)O(n+k)
Counting SortO(n+k)O(n+k)O(n+k)O(k)
CubesortO(n)O(n log(n))O(n log(n))O(n)
The Big O Cheatsheet By Eric Drowell



Happy Testing!

-T.J. Maher
Sr. QA Engineer,
Fitbit-Boston

// BSCS, MSE, and QA Engineer since Aug. 1996
// Automation developer for [ 1.5 ] years and still counting!
// Check out Adventures in Automation and Like us on Facebook!

No comments: