Makeup-Comparator

makeup-comparator terminal output

Hello everyone! After several months of working on this project, I am very pleased to finally be able to share the results.

The makeup-Comparator project was initially intended to serve as both a backend for a website and a command-line interface (CLI). As of today, September 2023, only the CLI program has been released to offer users a quick method for searching for products across different websites and comparing them. This is achieved through web scraping techniques applied to various websites to extract their product data.

First and foremost, installing Makeup-Comparator is a straightforward process using Cargo. You can follow the instructions below:

cargo install makeup-comparator

Also, you can find the entire repository in my Github.

After all, personal projects aren’t necessarily aimed at becoming a major success or a highly profitable service. In my case, I undertake them to learn and enhance my knowledge of various technologies and programming methods that I may not have the opportunity to explore in my daily work. That’s why I’d like to take this opportunity to explain what I’ve learned during this process and, perhaps more importantly, the challenges I’ve encountered along the way.

Rust

I chose Rust as the language for this project primarily because I wanted to delve deeper and learn more about this relatively new programming language, which I had previously used in other projects like: Easy_GA.

Rust allowed me to develop this project quite easily, thanks to its versatility and robustness. It is a language that makes it simple to start a project quickly and get things done efficiently.

Error handling is another powerful aspect of Rust. In this project, it was essential because web scraping relies heavily on whether we can retrieve the desired information or not, and it’s crucial to find solutions when this information is not available.

Testing

One of the most crucial aspects of this project was system testing, even more so than unit testing. This is significant because webpages can change frequently, and in a project that heavily relies on HTML structure, it was common to wake up one day and find that some parts of the web scraping were broken due to recent changes on the websites used to retrieve information.

Thanks to system testing, I was able to quickly identify the sources of these problems and address them promptly. It’s important to note that testing serves not only to ensure a high level of code reliability but also to define specific situations that, when altered, should trigger notifications or alerts.

Thanks to this project I also wrote an article about that: The reason you should test your personal projects.

Code coverage

Testing is indeed crucial, but its effectiveness depends on the coverage of the parts we expect to test. In this project, I paid special attention to the code coverage of my testing efforts. I designed a script to calculate the coverage generated by my tests and utilized various tools to visualize this information. I also set a goal of maintaining test coverage above 90% of the lines of code to ensure thorough testing.

The same way as with testing, this work gave me the idea of writing a article that was really popular online: Code coverage in Rust.

CI/CD

Continuous Integration/Continuous Deployment (CI/CD) pipelines are common in the industry but not often seen in personal projects. In my case, it was more about exploring this aspect and understanding what it could offer me.

Despite the fact that this project was developed by a single programmer (myself), I structured the repository to follow a Gitflow pattern of integration. I prohibited direct pushes to the master branch and enforced passing the tests defined by Github Actions before any changes were merged.

Before implementing the CI/CD pipeline, I established Git hooks to ensure that the code being pushed didn’t contain any warnings, didn’t fail static analysis, was well-formatted, and that all tests were passing.

You can find an article explaining more about that: What are Git Hooks and how to use them in Rust.

Deployment

Finally, the deployment process provided by Cargo is very straightforward and easy. I divided my project into two crates: one for web scraping, which can be reused in other projects, and the other for visualizing the results using the first crate.

The reason you should test your personal projects

Why do we underestimate testing?

Testing in software development tend to be the forgotten and underestimated part. It usually takes time and do not produce real products or concrete features, this is why it usually get relegated in order to prioritize other aspects in a development cycle.

There is a rule that says you have to have one line of testing for one line of code. In practice, this is usually not fulfilled, usually due to time, money and resources. Companies ofter underestimate how powerfull testing is and spend a lot of hours in finding and solving bugs that shouldn’t even been in production.

Testing in big projects

Testing in big projects is always a must. Even thought, you have more or less tests, better or worse tests, but there are always some testing in huge projects where a lot of people are involved and one minor change can downfall into a bug issue. Companies are aware of that and tend to force their developers to implement testing in the code, but let’s be honest, testing is usually the part that developers hate to implement and they tend to get rid of it as soon as possible, letting a lot of cases uncovered.

Testing in small and personal projects

But obviously, if there is a place when testing is absolutely forgotten is in small and mainly personal projects. Ey! I do not say that you have to “waste” time testing you small project used by absolutely nobody in the world. But this trend of not implement testing in small projects tend to extend to libraries or small software products used by users.

Almost 6 months ago, I started learning Rust and trying to heavily participate in adding my small grain of sand to the ecosystem the comunity is building arround it. That includes libraries, frameworks, external tools and also projects.

When you add a library made by someone in your project, you are trusting this piece of software. You take for granted that eveything should work as intended. And you, as developer, are in the moral responsability that this is true, otherwise, you are not only developing a bad software but making other developers to waste their time by your fault.

Hidden gems in testing

Programmers usually understand testing as a way to ensure that his code is well constructed for any possible scenarios that may happen in the execution.

Meanwhile this is true, it is only one of the reasons for testing. A part for covering all the possible edge cases that your program can have, there are other things that testing hold your back.

Development speed in the future

Adding test to your software gives you the ability to quickly ensure that any modification you are making will not break the existing logic. This is the main reason testing is done in big projects. You want to implement a new feature or fix a bug and if testing is properly done, you do not have to mind if you are breaking other stuffs with your new implementation because the tests will ensure that for you.

Third party libraries

If we only focus on us as the only people that can break our software we are wrong. Usually, in a product development we use a bunch of third party libraries that are constantly updating and changing its logic. Maybe that third party library that you used for a function called getDollars(float euros) have change in its last release and now its new implementation is more like this: getDollars(float pounds).

This change in a third party library will produce a silent bug in our code. Now, all of our conversions will return an incorrect value since we thought we are converting our euros to dollars. But the program will not crash in any way possible unless we made the propers tests for that.

External changes

Sometimes, our software depends in some public data or values that we take for granted as inmutable. This is the parciular case that made me make this post.

Recently, I have been working in a Rust project that I will talk more about it soon. The project consist in web scrapping some website in order to retrieve information and process it. For this reason, the implementation of my logic heavily depends on how the web page structure is done and any minor change in the website made by their developers can easily break all my code.

Obviously, I can not anticipate on how and when the web programmers and UI/UX teams will update the site. In this case, aswell as tests that tries to cover all the possibles fails if the page layout changes, I also want that my tests fails if someday this happens. For example:

HTML
<!-- Version 1.0 -->
<html>
	<body>
        <p> Valuable data $$ </p>
    </body>
</html>
C++
// logic.cpp
std::string data = get_first_element_in_body();

In this case our logic is simple, we want to get the first element inside the body in order to retrieve our valuable data. But then, the webpage made a change and this is the new code:

HTML
<!-- Version 1.1 -->
<html>
	<body>
        <h1> Stupid title </h1>
        <p> Valuable data $$ </p>
    </body>
</html>

With this small change, all our code is absolutely and silently broken. The developers decided that they wanted to add a title before that information for stetics reasons and now our webscrapping does not provide the same result as before. This would be so difficult to find without tests for that. Imagine we are storing this directly in a database or sending it to another resource. We wouldn’t know anything until the user tell us.

This can be prevented with a simple small test

C++
// test.cpp
std::string data = get_first_element_in_body();
static_assert(is_this_data_valuable(data));

Conclusion

Testing is as important as real product implementations. We have to start thinking as live saver and not as time wasted. In this article, I also proved that testing is not only important for breaking changes that you may introduce in your code but also for silent bugs introducen not even by yourself. I hope after this read you start seeing testing this way. You can not imagine how many stress take away from you when developing to know that your product is working as intended just by running a single command that executes the tests or even a CI flow that ensure that nobody can change anything that breaks the existing implementation.