Hiển thị các bài đăng có nhãn testing. Hiển thị tất cả bài đăng

Thứ Năm, 4 tháng 8, 2016

Write Tests That Matter: Tackle The Most Complex Code First

tháng 8 04, 2016 ana05 code, testing, web developer No comments

There are a lot of discussions, articles, and blogs around the topic of code quality. People say - use Test Driven techniques! Tests are a “must have” to start any refactoring! That’s all cool, but it’s 2016 and there is a massive volume of products and code bases still in production that were created ten, fifteen, or even twenty years ago. It’s no secret that a lot of them have legacy code with low test coverage.

While I’d like to be always at the leading, or even bleeding, edge of the technology world - engaged with new cool projects and technologies – unfortunately it’s not always possible and often I have to deal with old systems. I like to say that when you develop from scratch, you act as a creator, mastering new matter. But when you’re working on legacy code, you’re more like a surgeon – you know how the system works in general, but you never know for sure whether the patient will survive your “operation”. And since it’s legacy code, there are not many up to date tests for you to rely on. This means that very frequently one of the very first steps is to cover it with tests. More precisely, not merely to provide coverage, but to develop a test coverage strategy.

Coupling and Cyclomatic Complexity: Metrics for Smarter Test Coverage

Forget 100% coverage. Test smarter by identifying classes that are more likely to break.

Basically, what I needed to determine was what parts (classes / packages) of the system we needed to cover with tests in the first place, where we needed unit tests, where integration tests would be more helpful etc. There are admittedly many ways to approach this type of analysis and the one that I’ve used may not be the best, but it’s kind of an automatic approach. Once my approach is implemented, it takes minimal time to actually do the analysis itself and, what is more important, it brings some fun into legacy code analysis.

The main idea here is to analyse two metrics – coupling (i.e., afferent coupling, or CA) and complexity (i.e. cyclomatic complexity).

The first one measures how many classes use our class, so it basically tells us how close a particular class is to the heart of the system; the more classes there are that use our class, the more important it is to cover it with tests.

On the other hand, if a class is very simple (e.g. contains only constants), then even if it’s used by many other parts of the system, it’s not nearly as important to create a test for. Here is where the second metric can help. If a class contains a lot of logic, the Cyclomatic complexity will be high.

The same logic can also be applied in reverse; i.e., even if a class is not used by many classes and represents just one particular use case, it still makes sense to cover it with tests if its internal logic is complex.

There is one caveat though: let’s say we have two classes – one with the CA 100 and complexity 2 and the other one with the CA 60 and complexity 20. Even though the sum of the metrics is higher for the first one we should definitely cover the second one first. This is because the first class is being used by a lot of other classes, but is not very complex. On the other hand, the second class is also being used by a lot of other classes but is relatively more complex than the first class.

To summarize: we need to identify classes with high CA and Cyclomatic complexity. In mathematical terms, a fitness function is needed that can be used as a rating - f(CA,Complexity) - whose values increase along with CA and Complexity.

Generally speaking, the classes with the smallest differences between the two metrics should be given the highest priority for test coverage.

Finding tools to calculate CA and Complexity for the whole code base, and provide a simple way to extract this information in CSV format, proved to be a challenge. During my search, I came across two tools that are free so it would be unfair not to mention them:

Coupling metrics: www.spinellis.gr/sw/ckjm/
Complexity: cyvis.sourceforge.net/

A Bit Of Math

The main problem here is that we have two criteria – CA and Cyclomatic complexity – so we need to combine them and convert into one scalar value. If we had a slightly different task – e.g., to find a class with the worst combination of our criteria – we would have a classical multi-objective optimization problem:

We would need to find a point on the so called Pareto front (red in the picture above). What is interesting about the Pareto set is that every point in the set is a solution to the optimization task. Whenever we move along the red line we need to make a compromise between our criteria – if one gets better the other one gets worse. This is called Scalarization and the final result depends on how we do it.

There are a lot of techniques that we can use here. Each has its own pros and cons. However, the most popular ones are linear scalarizing and the one based on an reference point. Linear is the easiest one. Our fitness function will look like a linear combination of CA and Complexity:

f(CA, Complexity) = A×CA + B×Complexity

where A and B are some coefficients.

The point which represents a solution to our optimization problem will lie on the line (blue in the picture below). More precisely, it will be at the intersection of the blue line and red Pareto front. Our original problem is not exactly an optimization problem. Rather, we need to create a ranking function. Let’s consider two values of our ranking function, basically two values in our Rank column:

R1 = A∗CA + B∗Complexity and R2 = A∗CA + B∗Complexity

Both of the formulas written above are equations of lines, moreover these lines are parallel. Taking more rank values into consideration we’ll get more lines and therefore more points where the Pareto line intersects with the (dotted) blue lines. These points will be classes corresponding to a particular rank value.

Unfortunately, there is an issue with this approach. For any line (Rank value), we’ll have points with very small CA and very big Complexity (and visa versa) lying on it. This immediately puts points with a big difference between metric values in the top of the list which is exactly what we wanted to avoid.

The other way to do the scalarizing is based on the reference point. Reference point is a point with the maximum values of both criteria:

(max(CA), max(Complexity))

The fitness function will be the distance between the Reference point and the data points:

f(CA,Complexity) = √((CA−CA )2 + (Complexity−Complexity)2)

We can think about this fitness function as a circle with the center at the reference point. The radius in this case is the value of the Rank. The solution to the optimization problem will be the point where the circle touches the Pareto front. The solution to the original problem will be sets of points corresponding to the different circle radii as shown in the following picture (parts of circles for different ranks are shown as dotted blue curves):

This approach deals better with extreme values but there are still two issues: First – I’d like to have more points near the reference points to better overcome the problem that we’ve faced with linear combination. Second – CA and Cyclomatic complexity are inherently different and have different values set, so we need to normalize them (e.g. so that all the values of both metrics would be from 1 to 100).

Here is a small trick that we can apply to solve the first issue – instead of looking at the CA and Cyclomatic Complexity, we can look at their inverted values. The reference point in this case will be (0,0). To solve the second issue, we can just normalize metrics using minimum value. Here is how it looks:

Inverted and normalized complexity – NormComplexity:

(1 + min(Complexity)) / (1 + Complexity)∗100

Inverted and normalized CA – NormCA:

(1 + min(CA)) / (1+CA)∗100

Note: I added 1 to make sure that there is no division by 0.

The following picture shows a plot with the inverted values:

Final Ranking

We are now coming to the last step - calculating the rank. As mentioned, I’m using the reference point method, so the only thing that we need to do is to calculate the length of the vector, normalize it, and make it ascend with the importance of a unit test creation for a class. Here is the final formula:

Rank(NormComplexity , NormCA) = 100 − √(NormComplexity2 + NormCA2) / √2

More Statistics

There is one more thought that I’d like to add, but let’s first have a look at some statistics. Here is a histogram of the Coupling metrics:

What is interesting about this picture is the number of classes with low CA (0-2). Classes with CA 0 are either not used at all or are top level services. These represent API endpoints, so it’s fine that we have a lot of them. But classes with CA 1 are the ones that are directly used by the endpoints and we have more of these classes than endpoints. What does this mean from architecture / design perspective?

In general, it means that we have a kind of script oriented approach – we script every business case separately (we can’t really reuse the code as business cases are too diverse). If that is the case, then it’s definitely a code smell and we need to do refactoring. Otherwise, it means the cohesion of our system is low, in which case we also need refactoring, but architectural refactoring this time.

Additional useful information we can get from the histogram above is that we can completely filter out classes with low coupling (CA in {0,1}) from the list of the classes eligible for coverage with unit tests. The same classes, though, are good candidates for the integration / functional tests.

You can find all the scripts and resources that I have used in this GitHub repository: ashalitkin/code-base-stats.

Does It Always Work?

Not necessarily. First of all it’s all about static analysis, not runtime. If a class is linked from many other classes it can be a sign that it’s heavily used, but it’s not always true. For example, we don’t know whether the functionality is really heavily used by end users. Second, if the design and the quality of the system is good enough, then most likely different parts / layers of it are decoupled via interfaces so static analysis of the CA will not give us a true picture. I guess it’s one of the main reasons why CA is not that popular in tools like Sonar. Fortunately, it’s totally fine for us since, if you remember, we are interested in applying this specifically to old ugly code bases.

In general, I’d say that runtime analysis would give much better results, but unfortunately it’s much more costly, time consuming, and complex, so our approach is a potentially useful and lower cost alternative.

This article was written by Andrey Shalitkin, a Toptal Java developer.

HTTP Request Testing: A Developer's Survival Tool

tháng 5 26, 2016 ana05 code, coding, Developer, HTTP, testing, web developer No comments

What To Do When A Testing Suite Isn’t Feasible

There are times that we – programmers and/or our clients – have limited resources with which to write both the expected deliverable and the automated tests for that deliverable. When the application is small enough you can cut corners and skip tests because you remember (mostly) what happens elsewhere in the code when you add a feature, fix a bug, or refactor. That said, we won’t always work with small applications, plus, they tend to get bigger and more complex over time. This makes manual testing difficult and super annoying.

For my last few projects, I was forced to work without automated testing and honestly, it was embarrassing to have the client email me after a code push to say that the application was breaking in places where I hadn’t even touched the code.

So, in cases where my my client either had no budget or intention of adding any automated test framework, I started testing the whole website’s basic functionality by sending an HTTP request to each individual page, parsing the response headers and looking for the ‘200’ response. It sounds plain and simple, but there is a lot you can do to ensure fidelity without actually having to write any tests, unit, functional, or integration.

While there are merits to all three types of testing, they don’t get written in most of projects.

Tests don’t get written in a lot of contract projects for a variety of reasons, so what can you do?

Automated Testing

In web development, automated tests comprise of three major test types: unit tests, functional tests and integration tests. We often combine unit tests with functional and integration tests to make sure everything runs smoothly as a whole application. When these tests are run in unison, or sequentially (preferably with single command or click), we start calling them automated tests, unit or not.

Largely the purpose of these tests (at least in web dev) is to make sure all application pages are rendered without trouble, free from fatal (application halting) errors or bugs.

Unit Testing

Unit testing is a software development process in which the smallest parts of code – units – are independently tested for correct operation. Here’s an example in Ruby:

test “should return active users” do
     active_user =  create(:user, active: true)
     non_active_user = create(:user, active: false)
     result = User.active

     assert_equal result, [active_user]
end

Functional Testing

Functional testing is a technique used to check the features and functionality of the system or software, designed to cover all user interaction scenarios, including failure paths and boundary cases.

Note: all our examples are in Ruby.

test "should get index" do
    get :index
    assert_response :success
    assert_not_nil assigns(:object)
end

Integration Testing

Once the modules are unit tested, they are integrated one by one, sequentially, to check the combinational behavior, and to validate that the requirements are implemented correctly.

test "login and browse site" do
    # login via https
    https!
    get "/login"
    assert_response :success
 
    post_via_redirect "/login", username: users(:david).username, password: users(:david).password
    assert_equal '/welcome', path
    assert_equal 'Welcome david!', flash[:notice]
 
    https!(false)
    get "/articles/all"
    assert_response :success
    assert assigns(:articles)
end

Tests in an Ideal World

Testing is widely accepted in the industry and it makes sense; good tests let you:

Quality assure your whole application with the least human effort
Identify bugs more easily because you know exactly where your code is breaking from test failures
Create automatic documentation for your code
Avoid ‘coding constipation’, which, according to some dude on Stack Overflow, is a humorous way of saying, “when you don’t know what to write next, or you have a daunting task in front of you, start by writing small.”

I could go on and on about how awesome tests are, and how they changed the world and yada yada yada, but you get the point. Conceptually, tests are awesome.

Tests in the Real World

While there are merits to all three types of testing, they don’t get written in most of projects. Why? Well, let me break it down:

Time/Deadlines

Everyone has deadlines, and writing fresh tests can get in the way of meeting one. It can take time and a half (or more) to write an application and its respective tests. Now, some of you do not agree with this, citing time saved ultimately, but I don’t think this is the case and I’ll explain why in ‘Difference of Opinion’.

Client Issues

Often, the client doesn’t really understand what testing is, or why it has value for the application. Clients tend to be more concerned with rapid product delivery and therefore see programmatic testing as counterproductive.

Or, it may be as simple as the client not having the budget to pay for the extra time needed to implement these tests.

Lack of Knowledge

There is a sizeable tribe of developers in the real world that doesn’t know testing exists. At every conference, meetup, concert, (even in my dreams), I meet developers that don’t know how to write tests, don’t know what to test, don’t know how to setup the framework for testing, and so on. Testing isn’t exactly taught in schools, and it can be a hassle to set up/learn the framework to get them running. So yes, there’s a definite barrier to entry.

‘It’s a Lot of Work’

Writing tests can be overwhelming for both new and experienced programmers, even for those world-changer genius types, and to top it off, writing tests isn’t exciting. One may think, “Why should I engage in unexciting busywork when I could be implementing a major feature with results that will impress my client?” It’s a tough argument.

Last, but not least, it is hard to write tests and computer-science students are not trained for it.

Oh, and refactoring with unit tests is no fun.

Difference in Opinion

In my opinion, unit testing makes sense for algorithmic logic but not so much for coordinating living code.

People claim that even though you’re investing extra time up front in writing tests, it saves you hours later when debugging or changing code. I beg to differ and offer one question: Is your code static, or ever changing?

For most of us, it’s ever changing. If you are writing successful software, you’re always adding features, changing existing ones, removing them, eating them, whatever, and to accommodate these changes, you must keep changing your tests, and changing your tests takes time.

But, You Need Some Kind Of Testing

No one will argue that lacking any sort of testing is the worst possible case. After making changes in your code, you need to confirm that it actually works. A lot of programmers try to manually test the basics: Is the page rendering in the browser? Is the form being submitted? Is the correct content being displayed? And so on, but in my opinion, this is barbaric, inefficient and labour intensive.

What I Use Instead

The purpose of testing a web app, be it manually or automated, is to confirm that any given page is rendered in the user’s browser without any fatal errors, and that it shows its content correctly. One way (and in most cases, an easier way) to achieve this is by sending HTTP requests to the endpoints of the app and parse the response. The response code tells you whether the page was delivered successfully. It’s easy to test for content by parsing the response body of the HTTP request and searching for specific text string matches, or, you can be one step fancier and use web scraping libraries such as nokogiri.

If some endpoints require a user login, you can use libraries designed for automating interactions (ideal when doing integration tests) such as mechanize to login or click on certain links. Really, in the big picture of automated testing, this looks a lot like integration or functional testing (depending on how you use them), but it’s a lot quicker to write and can be included in an existing project, or added to a new one, with less effort than setting up whole testing framework. Spot on!

Edge cases present another problem when dealing with large databases with a wide range of values; testing whether our application is working smoothly across all anticipated datasets can be daunting.

One way to go about it is to anticipate all the edge cases (which is not merely difficult, it’s often impossible) and write a test for each one. This could easily become hundreds of lines of code (imagine the horror) and cumbersome to maintain. Yet, with HTTP requests and just one line of code, you can test such edge cases directly on the data from production, downloaded locally on your development machine or on a staging server.

Now of course, this testing technique is not a silver bullet and has lots of shortcomings, the same as any other method, but I find these types of tests faster, and easier, to write and modify.

With HTTP requests & one line of code, you can test such edge cases directly on data from production.

In Practice: Testing with HTTP requests

Since we’ve already established that writing code without any kind of accompanying tests isn’t a good idea, my very basic go-to test for an entire application is to send HTTP requests to all its pages locally and parse the response headers for a 200 (or desired) code.

For example, if we were to write the above tests (the ones looking for specific content and a fatal error) with an HTTP request instead (in Ruby), it would be something like this:

# testing for fatal error
http_code = `curl -X #{route[:method]} -s -o /dev/null -w "%{http_code}" #{Rails.application.routes.url_helpers.articles_url(host: 'localhost', port: 3000)
}`
if  http_code !~ /200/
    return “articles_url returned with #{http_code} http code.”
end


# testing for content
 active_user =  create(:user, name: “user1”, active: true)
 non_active_user = create(:user, name: “user2”, active: false)
content = `curl #{Rails.application.routes.url_helpers.active_user_url(host: 'localhost', port: 3000)
}`
if content !~ /#{active_user.name}/
     return “Content mismatch active user #{active_user.name} not found in text body” #You can customise message to your liking
end
if content =~ /#{non_active_user.name}/
     return “Content mismatch non active user #{active_user.name} found in text body” #You can customise message to your liking
end

The line

curl -X #{route[:method]} -s -o /dev/null -w "%{http_code}" #{Rails.application.routes.url_helpers.articles_url(host: 'localhost', port: 3000) }

covers a lot of test cases; any method raising an error on the article’s page will be caught here, so it effectively covers hundreds of lines of code in one test.

The second part, which catches the content error specifically, can be used multiple times to check the content on a page. (More complex requests can be handled using mechanize, but that’s beyond the scope of this blog.)

Now, in cases where you want to test if a specific page works on a large, varied set of database values (for example, your article page template is working for all the articles in the production database), you could do:

ids = Article.all.select { |post| `curl -s -o /dev/null -w “%{http_code}” #{Rails.application.routes.url_helpers.article_url(post, host: 'localhost', port: 3000)
}`.to_i != 200).map(&:id)
return ids

This will return an array of IDs of all the articles in the database that were not rendered, so now you can manually go to the specific article page and check out the problem.

Now, I understand that this way of testing might not work in certain cases, such as testing a standalone script or sending an email, and it is undeniably slower than unit tests because we are making direct calls to an endpoint for each test, but when you can’t have unit tests, or functional tests, or both, this is better than nothing.

How would you go about structuring these tests? With small, non-complex projects, you can write all your tests in one file and run that file each time before you commit your changes, but most projects will require a suite of tests.

I usually write two to three tests per endpoint, depending on what I’m testing. You can also try testing individual content (similar to unit testing), but I think that would be redundant and slow since you will be making an HTTP call for every unit. But, on the other hand, they will be cleaner and easy to understand.

I recommend putting your tests in your regular test folder with each major end point having its own file (in Rails, for example, each model/controller would have one file each), and this file can be divided into three parts according to what we are testing. I often have at least three tests:

Test One

Check that the page returns without any fatal errors.

Note how I made a list of all the endpoints for Post and iterated over it to check that each page is rendered without any error. Assuming everything went well, and all the pages were rendered, you will see something like this in the terminal: ➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of failed url(s) -- []

If any page is not rendered, you will see something like this (in this example, the posts/index page has error and hence is not rendered):

➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of failed url(s) -- [{:url=>”posts_url”, :params=>[], :method=>”GET”, :http_code=>”500”}]

Test Two

Confirm that all the expected content is there:

If all the content we expect is found on the page, the result looks like this (in this example we make sure posts/:id has a post title, description and a status):

➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of content(s) not found on Post#show page with post id: 1 -- []

If any expected content is not found on the page (here we expect the page to show status of post - ‘Active’ if post is active, ‘Disabled’ if post is disabled) the result looks like this:

➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of content(s) not found on Post#show page with post id: 1 -- [“Active”]

Test Three

Check that the page renders across all datasets (if any):

Test 3 checks that the page renders across all datasets.

If all the pages are rendered without any error, we will get an empty list: ➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of post(s) with error in rendering -- []

If the content of some of the records has a problem rendering (in this example, pages with the ID 2 and 5 are giving an error) the result looks like this: ➜ sample_app git:(master) ✗ ruby test/http_request/post_test.rb List of post(s) with error on rendering -- [2,5]

If you want to fiddle around with the above demonstration code, here’s my github project.

So Which Is Better? It Depends…

HTTP Request testing might be your best bet if:

You’re working with a web app
You’re in a time crunch and want to write something fast
You’re working with a big project, pre-existing project where tests were not written, but you still want some way to check code
Your code involves simple request and response
You don’t want to spend a large portion of your time maintaining tests (I read somewhere unit test = maintenance hell, and I partially agree with him/her)
You want to test if an application works across all the values in an existing database

Traditional testing is ideal when:

You’re dealing with something other than a web application, such as scripts
You’re writing complex, algorithmic code
You have time and budget to dedicate to writing tests
The business requires bug-free or a low-error rate (finance, large user base)

Thanks for reading the article; you should now have a method for testing you can default to, one you can count on when you’re pressed for time.

This article was written by BHUSHAN LODHA, a Toptal Ruby developer.

GeForce Gaming