Makeup-Comparator

makeup-comparator terminal output

Hello everyone! After several months of working on this project, I am very pleased to finally be able to share the results.

The makeup-Comparator project was initially intended to serve as both a backend for a website and a command-line interface (CLI). As of today, September 2023, only the CLI program has been released to offer users a quick method for searching for products across different websites and comparing them. This is achieved through web scraping techniques applied to various websites to extract their product data.

First and foremost, installing Makeup-Comparator is a straightforward process using Cargo. You can follow the instructions below:

cargo install makeup-comparator

Also, you can find the entire repository in my Github.

After all, personal projects aren’t necessarily aimed at becoming a major success or a highly profitable service. In my case, I undertake them to learn and enhance my knowledge of various technologies and programming methods that I may not have the opportunity to explore in my daily work. That’s why I’d like to take this opportunity to explain what I’ve learned during this process and, perhaps more importantly, the challenges I’ve encountered along the way.

Rust

I chose Rust as the language for this project primarily because I wanted to delve deeper and learn more about this relatively new programming language, which I had previously used in other projects like: Easy_GA.

Rust allowed me to develop this project quite easily, thanks to its versatility and robustness. It is a language that makes it simple to start a project quickly and get things done efficiently.

Error handling is another powerful aspect of Rust. In this project, it was essential because web scraping relies heavily on whether we can retrieve the desired information or not, and it’s crucial to find solutions when this information is not available.

Testing

One of the most crucial aspects of this project was system testing, even more so than unit testing. This is significant because webpages can change frequently, and in a project that heavily relies on HTML structure, it was common to wake up one day and find that some parts of the web scraping were broken due to recent changes on the websites used to retrieve information.

Thanks to system testing, I was able to quickly identify the sources of these problems and address them promptly. It’s important to note that testing serves not only to ensure a high level of code reliability but also to define specific situations that, when altered, should trigger notifications or alerts.

Thanks to this project I also wrote an article about that: The reason you should test your personal projects.

Code coverage

Testing is indeed crucial, but its effectiveness depends on the coverage of the parts we expect to test. In this project, I paid special attention to the code coverage of my testing efforts. I designed a script to calculate the coverage generated by my tests and utilized various tools to visualize this information. I also set a goal of maintaining test coverage above 90% of the lines of code to ensure thorough testing.

The same way as with testing, this work gave me the idea of writing a article that was really popular online: Code coverage in Rust.

CI/CD

Continuous Integration/Continuous Deployment (CI/CD) pipelines are common in the industry but not often seen in personal projects. In my case, it was more about exploring this aspect and understanding what it could offer me.

Despite the fact that this project was developed by a single programmer (myself), I structured the repository to follow a Gitflow pattern of integration. I prohibited direct pushes to the master branch and enforced passing the tests defined by Github Actions before any changes were merged.

Before implementing the CI/CD pipeline, I established Git hooks to ensure that the code being pushed didn’t contain any warnings, didn’t fail static analysis, was well-formatted, and that all tests were passing.

You can find an article explaining more about that: What are Git Hooks and how to use them in Rust.

Deployment

Finally, the deployment process provided by Cargo is very straightforward and easy. I divided my project into two crates: one for web scraping, which can be reused in other projects, and the other for visualizing the results using the first crate.

Code coverage in Rust

Code coverage in Rust

What is code coverage

Code coverage is a metric that verifies the extent to which code is safeguarded. It is pretty useful when implementing unit testing to ensure that your tests are covering the code conditions you really want to.
Code coverage is conventionally represented as a numerical percentage, wherein a lower value implies a diminished level of code protection.

Metrics used for code coverage

While lines of code serve as one metric for assessing code coverage, it is crucial to acknowledge that they are not the sole determinant of comprehensive code protection. Various units of measurement contribute to achieving well-rounded coverage, such as function coverage, statement coverage, branch coverage and condition coverage.

  • Function coverage: Is a vital metric that quantifies the extent to which the defined functions are actually invoked or called during program execution. By measuring function coverage, developers can assess the effectiveness of their tests in exercising all the functions and identifying any potential gaps in code execution.
  • Statement coverage: Is a fundamental metric used to evaluate the degree to which statements are executed during program run-time. It measures the proportion of statements that are traversed and executed by the test suite. By examining statement coverage, developers can gain insights into the thoroughness of their tests in terms of exploring different code paths and identifying any unexecuted or potentially problematic statements.
  • Branch coverage: is a crucial metric that assesses the extent to which different branches of code bifurcations are executed by the test suite. It specifically measures whether all possible branches, such as those within if-else or if-elseif-else conditions, are exercised during program execution. By analyzing branch coverage, developers can determine whether their tests adequately explore various code paths, ensuring comprehensive validation of all possible branch outcomes. This helps identify potential gaps in testing and increases confidence in the reliability and robustness of the code.
  • Condition coverage: Is a metric used to evaluate the adequacy of tests in terms of covering all possible outcomes of boolean sub-expressions. It measures whether different possibilities, such as true or false evaluations of conditions, are effectively tested. By assessing condition coverage, developers can ensure that all potential combinations and variations within boolean expressions are thoroughly examined, mitigating the risk of undetected issues related to specific condition outcomes.

Given the next code snippet:

Rust
pub fn add(x: usize, y: usize) -> usize {
    let mut z = 0;

    if x > 0 && y > 0 {
        z = x;
    }
    z
}

  • Function coverage will be achieved when the function add is executed.
  • Statement coverage will be achieved when the function add is called, such as add(1, 1), ensuring that all the lines within the function are executed.
  • Branch coverage will be achieved when the function is called with add(1, 0) and add(1, 1), as the first call does not cover the if statement and line 5 remains unexecuted, while the second call enters the if statement.
  • Condition coverage will be achieved when the function is called with add(1, 0), add(0, 1), and add(1, 1), encompassing all possible conditions within the if statement.

LLVM-based coverage

In Rust 1.60.0, support for LLVM-based coverage instrumentalitation was stabilized un rustc.

It is called source-based because it operates on AST (Abstract syntax tree) and preporcessor information.

Code coverage relies on 3 basic steps:

  1. Compiling with coverage enabled: Enabling code coverage during compilation in clangd requires the inclusion of specific flags: -fprofile-instr-generate and -fcoverage-mapping.
  2. Running the instrumented program: When the instrumented program concludes its execution, it generates a raw profile file. The path for this file is determined by the LLVM_PROFILE_FILE environment variable. If the variable is not defined, the file will be created as default.profraw in the program’s current directory. If the specified folder does not exist, it will be generated accordingly. The program replaces specific pattern strings with the corresponding values:
    • %p: Process ID.
    • %h: Hostname of the machine.
    • %t: The value of TMPDIR env variable.
    • %Nm: Instrumented binary signature. If N is not specified (run as %m), it is assumed to be N=1.
    • %c: This string does not expand to any specific value but serves as a marker to indicate that execution is constantly synchronized. Consequently, if the program crashes, the coverage results will be retained.
  3. Creating coverage reports: Raw profiles files have to be indexed before generating coverage reports. This indexing process is performed by llvm-profdata.
Bash
llvm-profdata merge -sparse foo.profraw -o foo.profdata

Generate code coverage in Rust

To generate coverage reports for our tests, we can follow the documentation provided by Rust and LLVM. The first step is to install the LLVM tools.

Bash
rustup component add llvm-tools-preview

After installing the LLVM tools, we can proceed to generate the code coverage. It is highly recommended to delete any previous results to avoid potential issues. To do so, we need to execute the following sequence of commands:

Bash
# Remove possible existing coverages
cargo clean && mkdir -p coverage/ && rm -r coverage/*
CARGO_INCREMENTAL=0 RUSTFLAGS='-Cinstrument-coverage' LLVM_PROFILE_FILE='coverage/cargo-test-%p-%m.profraw' cargo test

This will execute the command cargo test and calculate all the coverage for the tests executed. This process will generate the *.profaw files that contain the coverage information.

Generate HTML reports

To visualize the coverage results more effectively, we can utilize the grcov tool, which generates HTML static pages. Installing grcov is straightforward and can be done using cargo.

Bash
cargo install grcov


Once grcov is installed, we can proceed to generate the HTML files that will display the coverage results.

Bash
grcov . --binary-path ./target/debug/deps/ -s . -t html --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o target/coverage/

  • –binary-path: Set the path to the compiled binary that will be used.
  • -s: Specify the root directory of the source files.
  • -t: Set the desired output type. Options include:
    • html for a HTML coverage report.
    • coveralls for the Coveralls specific format.
    • lcov for the lcov INFO format.
    • covdir for the covdir recursive JSON format.
    • coveralls+ for the Coveralls specific format with function information.
    • ade for the ActiveData-ETL specific format.
    • files to only return a list of files.
    • markdown for human easy read.
  • –branch: Enable parsing of branch coverage information.
  • –ignore-not-existing: Ignore source files that cannot be found on the disk.
  • –ignore: Ignore files or directories specified as globs.
  • -o: Specifies the output path.

Upon successful execution of the aforementioned command, an index.html file will be generated inside the target/coverage/ directory. The resulting HTML file will provide a visual representation of the coverage report, presenting the coverage information in a structured and user-friendly manner.

Generate lcov files

Indeed, in addition to HTML reports, generating an lcov file can be beneficial for visualizing the coverage results with external tools like Visual Studio Code. To generate an lcov file using grcov, you can use the same command as before but replace the output type “html” with “lcov.” This will generate an lcov file that can be imported and viewed in various coverage analysis tools, providing a comprehensive overview of the code coverage in a standardized format.

Bash
grcov . --binary-path ./target/debug/deps/ -s . -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o target/coverage/lcov.info

Finally, we can install any extension to interpretate this information. In my case, I will use Coverage Gutters.

Finally, our vscode will look something like this. When we can see more visually and dynamic which lines of our code are being tested and the total percentage of the current file.

This is my script I recommend to use in order to generate the code coverage with HTML and LCOV files.

Bash
#!/bin/bash

# Define color variables
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

function cleanup() {
  echo -e "${YELLOW}Cleaning up previous coverages...${NC}"
  cargo clean && mkdir -p coverage/ && rm -r coverage/*
  echo -e "${GREEN}Success: Crate cleaned successfully${NC}" 
}

function run_tests() {
  echo -e "${YELLOW}Compiling and running tests with code coverage...${NC}"
  CARGO_INCREMENTAL=0 RUSTFLAGS='-Cinstrument-coverage' LLVM_PROFILE_FILE='coverage/cargo-test-%p-%m.profraw' cargo test --workspace
  if [[ $? -ne 0 ]]; then
    echo -e "${RED}Error: Tests failed to execute${NC}"
    exit 1
  fi
  echo -e "${GREEN}Success: All tests were executed correctly!${NC}"
}

function generate_coverage() {
  echo -e "${YELLOW}Generating code coverage...${NC}"
  grcov . --binary-path ./target/debug/deps/ -s . -t html --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o target/coverage/ && \
  grcov . --binary-path ./target/debug/deps/ -s . -t lcov --branch --ignore-not-existing --ignore '../*' --ignore "/*" -o target/coverage/lcov.info
  if [[ $? -ne 0 ]]; then
    echo -e "${RED}Error: Failed to generate code coverage${NC}"
    exit 1
  fi
  echo -e "${GREEN}Success: Code coverage generated correctly!${NC}"
}

echo -e "${GREEN}========== Running test coverage ==========${NC}"
echo

cleanup

run_tests

generate_coverage

Using Tarpaulin

There is an existing powerful tool called Tarpaulin that can do all this for us. We can install it with cargo:

Bash
cargo install cargo-tarpaulin

And finally, execute with a single command:

Bash
cargo tarpaulin
Jun 18 16:54:56.341  INFO cargo_tarpaulin::config: Creating config
Jun 18 16:54:56.433  INFO cargo_tarpaulin: Running Tarpaulin
Jun 18 16:54:56.433  INFO cargo_tarpaulin: Building project
Jun 18 16:54:56.433  INFO cargo_tarpaulin::cargo: Cleaning project
   Compiling rust_sandbox v0.1.0 (/home/imrubensi/rust_sandbox)
    Finished test [unoptimized + debuginfo] target(s) in 0.35s
Jun 18 16:54:56.907  INFO cargo_tarpaulin::process_handling::linux: Launching test
Jun 18 16:54:56.907  INFO cargo_tarpaulin::process_handling: running /home/imrubensi/rust_sandbox/target/debug/deps/rust_sandbox-0bab96c8aa79a774

running 1 test
test tests::it_works ... ok

test result: ok. 1 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.02s

Jun 18 16:54:57.137  INFO cargo_tarpaulin::report: Coverage Results:
|| Uncovered Lines:
|| Tested/Total Lines:
|| src/lib.rs: 4/4
|| 
100.00% coverage, 4/4 lines covered

Branch and condition coverage are the two main functionalities missed in this project. Apart from that, I do think that Tarpaulin is a good choice if you want to quickly assess your code coverage.

References

What are Git Hooks and how to use them in your Rust projects

What is a Git Hook?

Git hooks are script files that can be executed in response to specific events or operations. They can be classified into two distinct categories: client-side hooks and server-side hooks.

How to create a Git Hook

Git hooks are located within the .git/hooks directory. During the initialization of a repository using git init, a set of example hooks is generated, covering various types of hooks. These examples are suffixed with .sample, and in order to utilize them, the suffix must be removed. The provided examples are predominantly written as shell scripts, incorporating some Perl, but any appropriately named executable scripts can be used—be it Ruby, Python, or any other language you are familiar with.

Each hook possesses its own unique interpretation, necessitating a review of the provided examples or documentation to determine whether specific parameters are required, if a return value is expected, and other relevant specifications.

List of Git Hooks

  • applypatch.msg: Check the commit log taken by applypatch from an e-mail message. This script can modify the commit message. Exit with a non-zero status to stop the commit.
  • post-update: Prepare a packed repository for use over dump transports.
  • pre-applypatch: This script is executed before the applypatch hook.
  • prepare-commit-msg: This hook is called with git commit and it is used to prepare the commit message. It receives two parameters: the name of the file that has the changes and the description of the commit. If the script returns a non-zero status, the commit will be aborted
  • commit-msg: Called with one argument, the name of the file that has the commit message. This script can modify the commit message. Exit with a non-zero status to stop the commit.
  • pre-commit: This hooks is executed when you use git commit command. It takes no arguments but it is useful to pause the commit returning a non-zero status with a message. E.g: stop the commit if the code is not properly formatted or stop the commit if tests are failing.
  • post-commit: It runs once the git commit is finished. It does not take any argument and it is usually used to provide custom messages.
  • pre-merge-commit: This hook is executed when you are about to perform a git merge operation. It takes no arguments and should return a non-zero status to stop a merge.
  • pre-push: This hooks is executed when you perform a git push command. After checking the remote status, but before pushing anything. It takes two arguments: name of the remote branch and the URL to which the push will be done. If this script retun a non-zero status, it will stop the push.
  • pre-rebase: This hooks is executed before git rebase is started. It takes two arguments: the upstream the series was forked from and the branch being rebased (or empty when rebasing the current branch). If it returns a non-zero status, the rebase will be aborted.
  • post-checkout: Executed after a successful git checkout.

Check the complete list in the next url: Official list with all the git hooks

Setup Git Hooks with Rust

A wide range of freely available crates exists to enhance the implementation of git hooks in Rust projects. Personally, I have chosen to utilize the rusty-hook crate.

There are two methods for incorporating this crate into our project. The first approach involves adding it as a dev dependency in our Cargo.toml file. Upon building the project for the first time, it will be automatically initialized:

TOML
[dev-dependencies]
rusty-hook = "^0.11.2"

Or install and initialize it:

Bash
cargo install rusty-hook
rusty-hook init

How to use it

Once the crate is initialized we will be able to see new file named .rusty-hook.toml at the root of our project. This file is the only thing we need to work with rusty-hook.

The first time we open the file everything is already set up.

TOML
[hooks]
pre-commit = "cargo test"

[logging]
verbose = true

Under the [hooks] section, we have the ability to assign a custom script that will be executed for a specific git hook. The following example, extracted from a personal project, showcases a script that formats the code, runs clippy (a Rust linter), and executes all tests in the project before committing the staged changes.

TOML
[hooks]
pre-commit = "cargo fmt && git add . && cargo check && cargo test && cargo test --workspace -- --include-ignored --show-output && cargo clippy"
pre-push = "cargo fmt -- --check || cargo fmt && cargo fmt -- --check"

References:

About the keyword generics iniciative in Rust

Introduction

In july of 2022, a new iniciative was presented by Yoshua Wuyts in the Inside Rust Blog. This new iniciative was aiming to make possible the generalization between async and non-async traits and functions. Which means that we no longer have to split traits and functions in two to handle the both behaviour contexts.

I am not going to focus on the first released document because a week ago, a second update about the iniciative had been released. This time, including aswell the desire to make generic the const and non-const traits and functions.

At the time I am writting this article, Rust 1.69 is the latest nightly version. Take that in count since this iniciative is already focusing in changes that, at this moment, are not even available in standard Rust. Like async traits.

Async traits in Rust

When it comes to defining async functions and traits in Rust, the only way of doing it is by using the async_trait crate. If you are curious about why it is not standard to define async traits in Rust, I’d highly recommend to read this article.

If we try to declare a function of a trait as async, the next error message will be shown when compiling our crate:

Rust
error[E0706]: functions in traits cannot be declared `async`
 --> src/main.rs:2:5
  |
2 |     async fn Foo() {
  |     -----^^^^^^^^^
  |     |
  |     `async` because of this
  |
  = note: `async` trait functions are not currently supported
  = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait
  = note: see issue #91611 <https://github.com/rust-lang/rust/issues/91611> for more information
  = help: add `#![feature(async_fn_in_trait)]` to the crate attributes to enable

But even using the recomended crate for asynchronous functions, we still have the lack of generalization. See the next code snippet that fails to compile:

Rust
#[async_trait]
trait MyTrait {
    async fn bar(&self);
}

struct FooAsync;
struct FooNotAsync;

#[async_trait]
impl MyTrait for FooAsync {
    async fn bar(&self) {}
}

impl MyTrait for FooNotAsync {
    fn bar(&self) {} 
}

//////////////////////////////////////
  --> src/main.rs:17:11
   |
5  |     async fn bar(&self);
   |     -----    ---------- lifetimes in impl do not match this method in trait
   |     |
   |     this bound might be missing in the impl
...
17 |     fn bar(&self) {}
   |           ^ lifetimes do not match method in trait

Then the only possible way is either by creating two different traits, or by declaring two different functions.

What the KGI is proposing?

The Keyword Generics Iniciative (KGI from now on) is proposing a new syntax in order to declare traits and functions generics over asyncness and constness. The same way we already have generalization over types.

In all the article, I will be focusing only in the asyncness generalization, since it is the same explanation for constness.

This is the proposed syntax at this moment:

Rust
trait ?async Read {
    ?async fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
    ?async fn read_to_string(&mut self, buf: &mut String) -> Result<usize> { ... }
}

/// Read from a reader into a string.
?async fn read_to_string(reader: &mut impl ?async Read) -> std::io::Result<String> {
    let mut string = String::new();
    reader.read_to_string(&mut string).await?;
    Ok(string)
}

Yayy!! Finally we have our generalization over asyncness. The syntax for ?async, is choosen to match other examples like ?Sized. If we call read_to_string in an asynchronous context, it will be locked in the .await? call, waiting to be completed. If we call it in a non-async context, it will just work as a no-op.

Rust
// `read_to_string` is inferred to be `!async` because
// we didn't `.await` it, nor expected a future of any kind.
#[test]
fn sync_call() {
    let _string = read_to_string("file.txt")?;
}

// `read_to_string` is inferred to be `async` because
// we `.await`ed it.
#[async_std::test]
async fn async_call() {
    let _string = read_to_string("file.txt").await?;
}

In addition to that, the KGI is also proposing a new built-in function called is_async(). This will let the user to create flow branches depending on the calling context.

Advantages

The advantages are pretty obvious, this is super powerful to avoid duplication code. It is almost like magic. Just one function that let the user use it in different contexts.

This will help a lot of crates and even the standard library in order to reduce the existing splitted implementatios for async and non-async contexts. It will even let the programmer of libraries to declare all theirs traits as “maybe async” and let the user choose what they want to implement.

Disadvantages

Syntax complexity

The main disadvantage that was brought by the community is the syntax. I also think that the Rust syntax is becoming more and more complex everytime and with one of the main objectives of the Rust Team being the goal of making Rust more easy to learn, this will not help at all.

I told that I didn’t want to include in this article the “maybe const” signature, but it is important to mention that they could be used together, e.g:

Rust
trait ?const ?async Read {
    ?const ?async fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
    ?const ?async fn read_to_string(&mut self, buf: &mut String) -> Result<usize> { .. }
}

/// Read from a reader into a string.
?const ?async fn read_to_string(reader: &mut impl ?const ?async Read) -> io::Result<String> {
    let mut string = String::new();
    reader.read_to_string(&mut string).await?;
    Ok(string)
}

The KGI is also thinking about a combined syntax for that called ?effect/.do but I will not enter on that because it is not included in the actual proposal.

So… Hard to understand that code for “just” a “maybe async and maybe const” trait and functions right? Let’s make it even more funny. Lets add them generic types, e.g:

Rust
trait ?const ?async Read {
    ?const ?async fn read<T: Display + Add>(&mut self, buf: &mut ?const ?async T) -> Result<T, Box<dyn Error>>;
}

This is getting out of hands… I would spent half of my daily work just reading over functions signatures… For those curious, there are 111 characters just to declare a function with 1 input parameter.

Low implementation efforts

It is easy to see lot of people using this feature to vagly declaring functions. Something like: “I don’t know if this will be async or not, let’s declare it as maybe”. Causing a lot of half implementations and smelly code that indicates that this is ?async so lets call with async and non-async context.

I think at this point is where the compiler should take action. We should avoid at all cost the possibility of a user calling a function in an async context, just to later realize that the implementation is completaly synchronous. This is not acceptable, the compiler should check if a possible async and non-async branch flow are defined, and this could be difficult if we want to make it absolutely opaque from the user by just interpreting .await? as a no-op in a synchronous context. But it will be easier with the is_async() and is_const() proposed.

My proposed changes

Since I would really like to see this running in the future, I would also like to propose some of my thoughts on how we could improve it.

Syntax

I really like the ?async syntax. I prefer it over the async? signature that some other users were proposing. The question mark prefix is already being used for traits and I do think that fits well with the maybe async context.

Even though the ?effect/.do is not formally proposed, I hope that the combination of ?async and ?const will not go throgh. This will add so many syntax complexity and Rust is aiming for exactly the opposite at this moment.

Implementation restrictions

It is important to ensure the user will never face a moment when it is calling a function in an asynchronous context and the function is working entirely synchronous. The compiler must check if requierements are satisfied fot both contexts.

Other option is to split the implementations in two functions, e.g:

Rust
trait ?async Read {
    ?async fn read(&mut self, buf: &mut [u8]) -> Result<usize>;
    ?async fn read_to_string(&mut self, buf: &mut String) -> Result<usize> { ... }
}

/// Async function.
async fn read_to_string(reader: &mut impl async Read) -> std::io::Result<String> {
    let mut string = String::new();
    reader.read_to_string(&mut string).await?;
    Ok(string)
}

/// Non-async function.
fn read_to_string(reader: &mut impl Read) -> std::io::Result<String> {
    let mut string = String::new();
    reader.read_to_string(&mut string)?;
    Ok(string)
}

Even though this still produces some duplication code, it also gives the advantage of declaring the trait and functions once and calling the correct function without the user action.

Swap_remove in Rust

Introduction

The other day, looking at the std::Vec documentation in Rust, I found this interesting function called swap_remove.

First, we have to talk about the obvious difference between the standard remove, that everybody has used before, and this swap_remove that is a more tricky implementation for some use cases:

Remove

Remove, as defined in its documentation, follows the next function signature: pub fn remove(&mut self, index: usize) -> T. It recieves the mutable vector itself, and the desired index of the element to be removed, and return it.

What it is important to mention is how Rust (and almost every other language) handle vectors. Vectors in computer science are defined as a collection of elements followed in memory, that means, the distance between two elements of a vector is the size of the element.

4 blocks with the values 1,2,3,4, splited in 4 bytes each since they are signed integers of 4 bytes

Once we delete an element that is in the middle of the structure (if you want to delete the first or last element Rust recommends to use VecDeque instead), Rust has to shift the elements to avoid empty spots in the collection. This work has a O(n) complexity in the worst case. Wich means that if we have one million elements and we want to remove the second one, the vector will have to shift 999.998 elements!!!

Swap_remove

The function swap_remove is useful to handle this problem. This operation will swap the last element with the one to be erased, and finally perform a back pop to remove it. E.g: If we remove the value 2, the vector will look like the following:

4 blocks with the values 1,4,3, splited in 4 bytes each since they are signed integers of 4 bytes
Since access and insertions in a vector are O(1), this provides vec::swap_remove a complexity of O(1). Wich means that we have to make 999.997 less operations than the remove method. Cool right?

When to use it or not

With the last explanation seems like we have to use always swap_remove instead of remove right? Thats wrong!!!

Vectors are a powerful structure when we have the values sorted. This invalidates the whole point of using swap_remove, since we will have to sort the values again.

Rust
let mut v: Vec<i32> = vec![1, 2, 3, 4];
assert!(v.is_sorted());
v.swap_remove(1); // v = [1, 4, 3]
assert!(!v.is_sorted());

We have to be careful with working with raw pointers in Rust since they are unsafe, but I will explain some errors that we can find.

Rust
let mut v: Vec<i32> = vec![1 ,2, 3, 4];
let points_to_four = &v[3] as *const i32;
let points_to_two = &v[1] as *const i32;
assert_eq!(unsafe{*points_to_four}, 4);
assert_eq!(unsafe{*points_to_two}, 2);
v.swap_remove(1);
assert_eq!(unsafe{*points_to_four}, 4); // Could be false at some point that we can not control
assert_ne!(unsafe{*points_to_two}, 2); // Now points to the value 4 at v[1]

This test was very interesting. It seems like Rust clones the elements to swap instead of moving it, this is understandable since an i32 implements the Copy and Clone traits.

Those are unsafe operations but we have to be very careful of what are we doing because if our Vec<T> where T !Copy, the assert could be or not true. By the way, even if now it is true because the value is copied, this could change in the future once the OS claim this free memory address. This happens aswell with vec::remove.

Rust
let mut v: Vec<i32> = vec![1, 2, 3, 4];
let last_ptr = &v[3] as *const i32;
assert_eq!(unsafe{last_ptr.read()}, 4);
v.swap_remove(1);
assert_eq!(unsafe{last_ptr.read()}, 4); // Still true
v.insert(1, 10);
assert_eq!(unsafe{last_ptr.read()}, 4); // FAIL: Left: 3, Right: 4. The vector has growth, shifting the value 3 to the right.

Conclusion

We should use vec::remove when we care about the order and vec::swap_remove otherwise. Working with raw pointers and remove elements of a vector is highly unsafe, but at worst… We have the same as C++ right?

EasyGA 1.2.0 is out!

Current Crates.io Version

Hello rustaceans! The version 1.1.0 for Easy_GA crate is finally out! As always you can find the repository in Github and the crate in crates.io

The full CHANGELOG is avaliable in CHANGELOG.md

[1.2.0]

  • Added new SelectionAlgorithm::Random.
  • Added new SelectionAlgorithm::Stochastic.
  • SelectionAlgorithm now derive PartialEq, Eq, PartialOrd and Ord.
  • Added unit testing for Selection trait.
  • Logs are now saved inside target/easy_ga/logs insted of target/tmp/

Memory alignment and layout in Rust

Introduction

When working with data values it is not arbitrary the way we align those values. At first glance we may think that we should just store one next to the other no matter what. This is often true but not always, in this article I am going to explain different ways we have in Rust to explicitly define how we want our structures to be aligned.

Layout alignment in Rust

The Rust compiler does not guarantee to generate the same layout alignment every compilation. What that means is that our structures of data may be different everytime we change the code. Sometimes this could be good (obviously there is a reason why Rust does that) but sometimes could tend to performance issues we do not want. Is always a fight between two properties: Size vs Speed.

Size vs Speed

This is the main reason to choose one or another layout representation. Sometimes we are struggling with low capacity microcontrollers and we need this boost of space optimization to store our structures. In this kind of situations is where it is better to just represent our data structures layout as sequencials as possible.

On the other hand, other times we face optimization issues where our code is not fast enought to perform the task we need. The cache and the way we access to our memory is the key to have a well organized data layout.

We are not going to discuss how cache and memory access works in computers but you can find more about this topic in the following links:

Default representation

With the attribute #[repr] we can specify the way we want to represent our values in the layout. If no attribute value is presented, Rust will use its own representation that does not guarantee any kind of data layout.

C representation

The #[repr(C)] attribute lets us represent our data structures the way C does it. This way let Rust to interoperate with C. Representation in Rust with the C way has some defined properties:

  • Structure alignment: The alignment of the struct is the alignment of the most-aligned field in it, e.g: If we have a struct called T with two values; bool and u32. The structure alignment would be 4 bytes (based on u32).
  • Size and offset: For each field in declaration order in the struct, first determine the size and alignment of the field. If the current offset is not a multiple of the field’s alignment, then add padding bytes to the current offset until it is a multiple of the field’s alignment. The offset for the field is what the current offset is now. Then increase the current offset by the size of the field. Finally, the size of the struct is the current offset rounded up to the nearest multiple of the struct’s alignment.

Pseudocode:

Rust
/// Returns the amount of padding needed after `offset` to ensure that the
/// following address will be aligned to `alignment`.
fn padding_needed_for(offset: usize, alignment: usize) -> usize {
    let misalignment = offset % alignment;
    if misalignment > 0 {
        // round up to next multiple of `alignment`
        alignment - misalignment
    } else {
        // already a multiple of `alignment`
        0
    }
}

struct.alignment = struct.fields().map(|field| field.alignment).max();

let current_offset = 0;

for field in struct.fields_in_declaration_order() {
    // Increase the current offset so that it's a multiple of the alignment
    // of this field. For the first field, this will always be zero.
    // The skipped bytes are called padding bytes.
    current_offset += padding_needed_for(current_offset, field.alignment);

    struct[field].offset = current_offset;

    current_offset += field.size;
}

struct.size = current_offset + padding_needed_for(current_offset, struct.alignment);

Simple and small example extracted from allaboutcircuits:

Rust
#[repr(C)]
struct T {
    c: u32,
    d: bool,
}
Image from: https://www.allaboutcircuits.com/technical-articles/understanding-memory-structures-in-embedded-c-language/

#[repr(C)] for unions and enums

Unions

The #[repr(C)] for unions types will be structured with size of the maximum value in the union rounded with the alignment, and an alignment of the maximum alignment of all of its fields.

Rust
#[repr(C)]
union MyUnion {
    f1: u64,
    f2: [u32; 8],
}

fn main() {
    assert_eq!(std::mem::size_of::<MyUnion>(), std::mem::size_of::<u32>() * 8);
    assert_eq!(std::mem::align_of::<MyUnion>(), std::mem::size_of::<u64>());
}

Enums with fields

The representation of an enum with fields is a struct with two fields, also called a “tagged union” in C:

Rust
// This Enum has the same representation as ...
#[repr(C)]
enum MyEnum {
    A(u32),
    B(f32, u64),
}

// ... this struct.
#[repr(C)]
struct MyEnumRepr {
    tag: MyEnumDiscriminant,
    payload: MyEnumFields,
}

// This is the discriminant enum.
#[repr(C)]
enum MyEnumDiscriminant { A, B, C, D }

// This is the variant union.
#[repr(C)]
union MyEnumFields {
    A: MyAFields,
    B: MyBFields,
}

#[repr(C)]
#[derive(Copy, Clone)]
struct MyAFields(u32);

#[repr(C)]
#[derive(Copy, Clone)]
struct MyBFields(f32, u64);

fn main() {
    assert_eq!(std::mem::size_of::<MyEnum>(), std::mem::size_of::<MyEnumRepr>());
}

References

  1. https://doc.rust-lang.org/reference/type-layout.html
  2. Rust for Rustaceans: Idiomatic Programming for Experienced Developers (B0957SWKBS)

Easy_GA 1.1.0 is out!

Current Crates.io Version

Hello rustaceans! The version 1.1.0 for Easy_GA crate is finally out! As always you can find the repository in Github and the crate in crates.io

The full CHANGELOG is avaliable in CHANGELOG.md

[1.1.0]

  • Added benchmarks. You can run benchmarks using cargo bench. The benches are all inside benches/.
  • Added logger feature to get information about the execution.
  • Added logger::VerbosityLevel to filter the output.
  • Added logger::VerbosityType to choose between only LOG, SAVE and LOG_AND_SAVE.

Logger

The logger is a very usefull tool to measure and retrieve some data from the execution. By default the logger is disabled, you can enable it this way:

use easy_ga::VerbosityLevel; // Verbosity level {DISABLED, LOW, MID, HIGH}
use easy_ga::VerbosityType; // Verbosity type {LOG, SAVE, LOG_AND_SAVE}
use easy_ga::LOG_verbosity; // Sets the verbosity level.
use easy_ga::LOG_verbosity_type; // Sets the verbosity type.

LOG_verbosity(VerbosityLevel::LOW); // VerbosityLevel::DISABLED by default
LOG_verbosity_type(VerbosityType::LOG_AND_SAVE); // VerbosityType::LOG by default
  • VerbosityLevel:
    • DISABLED: The logs are disabled.
    • LOW: Only very usefull information.
    • MID: Maybe not to desired information but also usefull.
    • HIGH: All logs are avaliable including tracing logs.
  • VerbosityType:
    • LOG: Only terminal logs.
    • SAVE: Saves the logs into target/tmp/.
    • SAVE_AND_LOG: Both.

Benchmarking

Benchmarking was added in the version 1.1.0 and you can run them donwloading the repository and running cargo bench from the command-line. The benchmarks are placed inside the benches/ folder.

Easy_GA

Current Crates.io Version

Crates: https://crates.io/crates/easy_ga

Github repository: https://github.com/RubenRubioM/easy_ga

Easy_GA is a genetic algorithm library made for Rust projects. It provides full customization for your own genotypes definitions and a genetic algorithm implementation to wrap all the common logic within a genetic algorithm.

Features

  • trait Gene: Definition to implement for your custom genotypes.
  • trait Selection: Definition for your custom selection algorithms.
    • Roulette: Selection algorithm already implemented.
    • Tournament: Selection algorithm implementation with n members on it.
  • GeneticAlgorithm: The main class to wrap the business logic in the genetic algorithm execution.

Usage

In your Cargo.tml you have to add the Easy_GA dependency

[dependencies]
easy_ga = "*"

Now I will show you a basic example of Easy_GA that you can find on main.rs

Files to include in order to use features:

use easy_ga::Gene; // For defining our own gene.
use easy_ga::GeneticAlgorithm; // To create a GeneticAlgorithm.
use easy_ga::SelectionAlgorithms; // To specity a concrete SelectionAlgorithm.

Definition of a custom Gene implementing easy_ga::Gene trait:

#[derive(Clone, Copy)]
struct MyGene {
    // Fields.
    fitness: f64 // Recomended to avoid recalculate fitness on `get_fitness`
}

impl Gene for MyGene {
    fn init() -> Self {
        // Gene constructor.
    }

    fn calculate_fitness(&mut self) -> f64 {
        // Fitness function.
    }

    fn crossover(&self, other: &Self) -> Self {
        // Crossover implementation.
    }

    fn mutate(&mut self) {
        // Mutation implementation.
    }

    fn get_fitness(&self) -> f64 {
        // Returns the fitness
    }
}

At this moment, we need to implement the Clone & Copy traits for our Gene. I will try to avoid that in a future versions.


Initialization of our GeneticAlgorithm:

let genetic_algorithm = GeneticAlgorithm::<MyGene>::new()
            .population_size(20)
            .iterations(50)
            .mutation_rate(0.10)
            .selection_rate(0.90)
            .selection_algorithm(Box::new(SelectionAlgorithms::Tournament(10)))
            .fitness_goal(100.0)
            .init().unwrap();

We have other ways to initializate our GeneticAlgorithm such as GeneticAlgorithm::new_with_values if we don’t want the chain calling method.


Now that we have defined our genotype and have initializate our GeneticAlgorhtm we have 2 ways of running it:

  • GeneticAlgorithm::run: This method runs the algorithm until the end and returns a tuple with (Gene, StopCriteria) that represents the best Gene in the execution and the reason to stop the execution.
let (gene, stop_criteria) = genetic_algorithm.run();
  • Iteration by iteration: We have the posibilty of running the algorithm generation by generation and make modification while the execution is running.
while genetic_algorithm.is_running() {
    let new_generation: &Vec<MyGene> = genetic_algorithm.next_iteration();
}

Logger

The logger is a very usefull tool to measure and retrieve some data from the execution. By default the logger is disabled, you can enable it this way:

use easy_ga::VerbosityLevel; // Verbosity level {DISABLED, LOW, MID, HIGH}
use easy_ga::VerbosityType; // Verbosity type {LOG, SAVE, LOG_AND_SAVE}
use easy_ga::LOG_verbosity; // Sets the verbosity level.
use easy_ga::LOG_verbosity_type; // Sets the verbosity type.

LOG_verbosity(VerbosityLevel::LOW); // VerbosityLevel::DISABLED by default
LOG_verbosity_type(VerbosityType::LOG_AND_SAVE); // VerbosityType::LOG by default
  • VerbosityLevel:
    • DISABLED: The logs are disabled.
    • LOW: Only very usefull information.
    • MID: Maybe not to desired information but also usefull.
    • HIGH: All logs are avaliable including tracing logs.
  • VerbosityType:
    • LOG: Only terminal logs.
    • SAVE: Saves the logs into target/tmp/.
    • SAVE_AND_LOG: Both.

Benchmarking

Benchmarking was added in the version 1.1.0 and you can run them donwloading the repository and running cargo bench from the command-line. The benchmarks are placed inside the benches/ folder.

Next steps

This is a personal side project mainly for me so any further implementations will be done in my spare time as a good way to teach me more about Rust.

  • Multithreading
  • Add verbosity for debugging ✔
  • More unit testing and system testing
  • New default Selection algorithms
  • CSV and JSON result export
  • Fix some quality of life problems with references and chain calling
  • Add benchmarks ✔

License

Easy_GA is licensed under Mozilla Public License 2.0.