Hello everyone! After several months of working on this project, I am very pleased to finally be able to share the results.
The makeup-Comparator project was initially intended to serve as both a backend for a website and a command-line interface (CLI). As of today, September 2023, only the CLI program has been released to offer users a quick method for searching for products across different websites and comparing them. This is achieved through web scraping techniques applied to various websites to extract their product data.
First and foremost, installing Makeup-Comparator is a straightforward process using Cargo. You can follow the instructions below:
After all, personal projects aren’t necessarily aimed at becoming a major success or a highly profitable service. In my case, I undertake them to learn and enhance my knowledge of various technologies and programming methods that I may not have the opportunity to explore in my daily work. That’s why I’d like to take this opportunity to explain what I’ve learned during this process and, perhaps more importantly, the challenges I’ve encountered along the way.
Rust
I chose Rust as the language for this project primarily because I wanted to delve deeper and learn more about this relatively new programming language, which I had previously used in other projects like: Easy_GA.
Rust allowed me to develop this project quite easily, thanks to its versatility and robustness. It is a language that makes it simple to start a project quickly and get things done efficiently.
Error handling is another powerful aspect of Rust. In this project, it was essential because web scraping relies heavily on whether we can retrieve the desired information or not, and it’s crucial to find solutions when this information is not available.
Testing
One of the most crucial aspects of this project was system testing, even more so than unit testing. This is significant because webpages can change frequently, and in a project that heavily relies on HTML structure, it was common to wake up one day and find that some parts of the web scraping were broken due to recent changes on the websites used to retrieve information.
Thanks to system testing, I was able to quickly identify the sources of these problems and address them promptly. It’s important to note that testing serves not only to ensure a high level of code reliability but also to define specific situations that, when altered, should trigger notifications or alerts.
Testing is indeed crucial, but its effectiveness depends on the coverage of the parts we expect to test. In this project, I paid special attention to the code coverage of my testing efforts. I designed a script to calculate the coverage generated by my tests and utilized various tools to visualize this information. I also set a goal of maintaining test coverage above 90% of the lines of code to ensure thorough testing.
The same way as with testing, this work gave me the idea of writing a article that was really popular online: Code coverage in Rust.
CI/CD
Continuous Integration/Continuous Deployment (CI/CD) pipelines are common in the industry but not often seen in personal projects. In my case, it was more about exploring this aspect and understanding what it could offer me.
Despite the fact that this project was developed by a single programmer (myself), I structured the repository to follow a Gitflow pattern of integration. I prohibited direct pushes to the master branch and enforced passing the tests defined by Github Actions before any changes were merged.
Before implementing the CI/CD pipeline, I established Git hooks to ensure that the code being pushed didn’t contain any warnings, didn’t fail static analysis, was well-formatted, and that all tests were passing.
Finally, the deployment process provided by Cargo is very straightforward and easy. I divided my project into two crates: one for web scraping, which can be reused in other projects, and the other for visualizing the results using the first crate.
Code coverage is a metric that verifies the extent to which code is safeguarded. It is pretty useful when implementing unit testing to ensure that your tests are covering the code conditions you really want to. Code coverage is conventionally represented as a numerical percentage, wherein a lower value implies a diminished level of code protection.
Metrics used for code coverage
While lines of code serve as one metric for assessing code coverage, it is crucial to acknowledge that they are not the sole determinant of comprehensive code protection. Various units of measurement contribute to achieving well-rounded coverage, such as function coverage, statement coverage, branch coverage and condition coverage.
Function coverage: Is a vital metric that quantifies the extent to which the defined functions are actually invoked or called during program execution. By measuring function coverage, developers can assess the effectiveness of their tests in exercising all the functions and identifying any potential gaps in code execution.
Statement coverage: Is a fundamental metric used to evaluate the degree to which statements are executed during program run-time. It measures the proportion of statements that are traversed and executed by the test suite. By examining statement coverage, developers can gain insights into the thoroughness of their tests in terms of exploring different code paths and identifying any unexecuted or potentially problematic statements.
Branch coverage: is a crucial metric that assesses the extent to which different branches of code bifurcations are executed by the test suite. It specifically measures whether all possible branches, such as those within if-else or if-elseif-else conditions, are exercised during program execution. By analyzing branch coverage, developers can determine whether their tests adequately explore various code paths, ensuring comprehensive validation of all possible branch outcomes. This helps identify potential gaps in testing and increases confidence in the reliability and robustness of the code.
Condition coverage: Is a metric used to evaluate the adequacy of tests in terms of covering all possible outcomes of boolean sub-expressions. It measures whether different possibilities, such as true or false evaluations of conditions, are effectively tested. By assessing condition coverage, developers can ensure that all potential combinations and variations within boolean expressions are thoroughly examined, mitigating the risk of undetected issues related to specific condition outcomes.
Given the next code snippet:
Rust
pubfnadd(x:usize, y:usize) ->usize {letmut z =0;if x >0&& y >0 { z = x; } z}
Function coverage will be achieved when the function add is executed.
Statement coverage will be achieved when the function add is called, such as add(1, 1), ensuring that all the lines within the function are executed.
Branch coverage will be achieved when the function is called with add(1, 0) and add(1, 1), as the first call does not cover the if statement and line 5 remains unexecuted, while the second call enters the if statement.
Condition coverage will be achieved when the function is called with add(1, 0), add(0, 1), and add(1, 1), encompassing all possible conditions within the if statement.
It is called source-based because it operates on AST (Abstract syntax tree) and preporcessor information.
Code coverage relies on 3 basic steps:
Compiling with coverage enabled: Enabling code coverage during compilation in clangd requires the inclusion of specific flags: -fprofile-instr-generate and -fcoverage-mapping.
Running the instrumented program: When the instrumented program concludes its execution, it generates a raw profile file. The path for this file is determined by the LLVM_PROFILE_FILE environment variable. If the variable is not defined, the file will be created as default.profraw in the program’s current directory. If the specified folder does not exist, it will be generated accordingly. The program replaces specific pattern strings with the corresponding values:
%p: Process ID.
%h: Hostname of the machine.
%t: The value of TMPDIR env variable.
%Nm: Instrumented binary signature. If N is not specified (run as %m), it is assumed to be N=1.
%c: This string does not expand to any specific value but serves as a marker to indicate that execution is constantly synchronized. Consequently, if the program crashes, the coverage results will be retained.
Creating coverage reports: Raw profiles files have to be indexed before generating coverage reports. This indexing process is performed by llvm-profdata.
To generate coverage reports for our tests, we can follow the documentation provided by Rust and LLVM. The first step is to install the LLVM tools.
Bash
rustupcomponentaddllvm-tools-preview
After installing the LLVM tools, we can proceed to generate the code coverage. It is highly recommended to delete any previous results to avoid potential issues. To do so, we need to execute the following sequence of commands:
Bash
# Remove possible existing coveragescargoclean && mkdir-pcoverage/ && rm-rcoverage/*CARGO_INCREMENTAL=0RUSTFLAGS='-Cinstrument-coverage'LLVM_PROFILE_FILE='coverage/cargo-test-%p-%m.profraw'cargotest
This will execute the command cargo test and calculate all the coverage for the tests executed. This process will generate the *.profaw files that contain the coverage information.
Generate HTML reports
To visualize the coverage results more effectively, we can utilize the grcov tool, which generates HTML static pages. Installing grcov is straightforward and can be done using cargo.
Bash
cargoinstallgrcov
Once grcov is installed, we can proceed to generate the HTML files that will display the coverage results.
–binary-path: Set the path to the compiled binary that will be used.
-s: Specify the root directory of the source files.
-t: Set the desired output type. Options include:
html for a HTML coverage report.
coveralls for the Coveralls specific format.
lcov for the lcov INFO format.
covdir for the covdir recursive JSON format.
coveralls+ for the Coveralls specific format with function information.
ade for the ActiveData-ETL specific format.
files to only return a list of files.
markdown for human easy read.
–branch: Enable parsing of branch coverage information.
–ignore-not-existing: Ignore source files that cannot be found on the disk.
–ignore: Ignore files or directories specified as globs.
-o: Specifies the output path.
Upon successful execution of the aforementioned command, an index.html file will be generated inside the target/coverage/ directory. The resulting HTML file will provide a visual representation of the coverage report, presenting the coverage information in a structured and user-friendly manner.
Generate lcov files
Indeed, in addition to HTML reports, generating an lcov file can be beneficial for visualizing the coverage results with external tools like Visual Studio Code. To generate an lcov file using grcov, you can use the same command as before but replace the output type “html” with “lcov.” This will generate an lcov file that can be imported and viewed in various coverage analysis tools, providing a comprehensive overview of the code coverage in a standardized format.
Finally, we can install any extension to interpretate this information. In my case, I will use Coverage Gutters.
Finally, our vscode will look something like this. When we can see more visually and dynamic which lines of our code are being tested and the total percentage of the current file.
This is my script I recommend to use in order to generate the code coverage with HTML and LCOV files.
Bash
#!/bin/bash# Define color variablesGREEN='\033[0;32m'YELLOW='\033[1;33m'NC='\033[0m'functioncleanup() {echo-e"${YELLOW}Cleaning up previous coverages...${NC}"cargoclean && mkdir-pcoverage/ && rm-rcoverage/*echo-e"${GREEN}Success: Crate cleaned successfully${NC}"}functionrun_tests() {echo-e"${YELLOW}Compiling and running tests with code coverage...${NC}"CARGO_INCREMENTAL=0RUSTFLAGS='-Cinstrument-coverage'LLVM_PROFILE_FILE='coverage/cargo-test-%p-%m.profraw'cargotest--workspaceif [[ $?-ne0 ]]; thenecho-e"${RED}Error: Tests failed to execute${NC}"exit1fiecho-e"${GREEN}Success: All tests were executed correctly!${NC}"}functiongenerate_coverage() {echo-e"${YELLOW}Generating code coverage...${NC}"grcov.--binary-path./target/debug/deps/-s.-thtml--branch--ignore-not-existing--ignore'../*'--ignore"/*"-otarget/coverage/ && \grcov.--binary-path./target/debug/deps/-s.-tlcov--branch--ignore-not-existing--ignore'../*'--ignore"/*"-otarget/coverage/lcov.infoif [[ $?-ne0 ]]; thenecho-e"${RED}Error: Failed to generate code coverage${NC}"exit1fiecho-e"${GREEN}Success: Code coverage generated correctly!${NC}"}echo-e"${GREEN}========== Running test coverage ==========${NC}"echocleanuprun_testsgenerate_coverage
Using Tarpaulin
There is an existing powerful tool called Tarpaulin that can do all this for us. We can install it with cargo:
Branch and condition coverage are the two main functionalities missed in this project. Apart from that, I do think that Tarpaulin is a good choice if you want to quickly assess your code coverage.