Do nuke your local repository!

Tamás Cservenák

January 3, 2025

Yes, so let me repeat: “Do nuke your local repository” (regularly). Or to rephrase: do not stick to your local repository (irrelevant are we talking about Maven 3 or Maven 4).

Maven local repository is “mish-mash” of things: is a cache, but also some “staging” area as well, and it is very, very easy to get it corrupted. Simplest fix? Nuke it regularly.

Yes, things like split local repository may help, but you still need to exercise huge self-discipline. Again; simplest is to regularly nuke it.

And despite Maven 4 is coming, it will not bring any (radical) improvement in this area, mostly due backward compatibility with Maven 3 plugins and extensions.

But, there is some ray of light…

IMPORTANT: This below is mainly for workstations, later may be extended, but read it while you focus on your own workstation(s).

During holidays had some time to tinker, and one of topics were “what if we introduce a new cache layer between Resolver Local Repository and Resolver Transport?”. At least on workstation, we could have some single reliable cache, that should be reusable no matter how many or what kind of Local Repositories user on workstation uses. Hopefully, this would make user nuke regularly his local repository, no? It would be really enough for start, to handle release artifacts only (so no snapshots, nor metadata), as most often I hear as counter-argument that is time-consuming after nuking to rebuild things, that initial population of local repository takes long time (not with MRM, but more about that later).

Then later followup idea came: okay, so all workstations can have central piece of cache, but what if they publish and share cache contents across LAN? I usually work on 3 computers (1 desktop and 2 laptops), and if I downloaded one huge artifact once, I’d really like to have it shared across LAN, instead to re-download it again after I switched workstation. But let’s go in order.

The current layout of relevant Resolver bits, that are unchanged between Resolver 1.x and 2.x is following:

block-beta
  columns 1
  artifactResolver("Artifact Resolver")
  block
    columns 2
    localRepository("Local Repository")
    connector("Repository Connector")
  end
  artifactResolver --> localRepository
  artifactResolver --> connector
  block
    columns 2
    space
    transport("Transport")
  end
  connector --> transport

What happens (simplified), is that Artifact Resolver checks for Local Repository, and if searched thing not found, will ask for Repository Connector to get it. Repository Connector handles all the “transport related bookkeeping”, so response from it will contain Artifacts that are validated according to effective policies (ie. checksums), and finally, the connector asks Transport to get the content from some remote repository.

For simplicity, I am showing only one (relevant) detail, as similar setup is in Metadata Resolver as well, but it is irrelevant for current discussion.

Meet Mímir

Mímir could sit between Artifact Resolver and Repository Connector, and provide a new layer of cache. For simplicity’s sake currently it handles only release artifacts originating from HTTP-ish remote repositories.

Mímir has two types of “nodes”: Local Node that is user-wide (workstation wide?) and always present, and may have 0 or more Remote Nodes.

block-beta
  columns 1
  artifactResolver("Artifact Resolver")
  block
    localRepository("Local Repository")
    mimir("Mímir Connector")
  end
  artifactResolver --> localRepository
  artifactResolver --> mimir
  block
    session("Mímir session")
    connector("Repository Connector")
  end
  mimir --> session
  mimir --> connector
  block
    lnode("Local node")
    rnode("Remote node")
    transport("Transport")
  end
  session --> lnode
  session --> rnode
  connector --> transport

What happens, is that Mímir can fetch artifacts from the local cache, moreover, if there are node publisher(s) on LAN, will consult them as well.

Mímir In Action

This idea is drafted here: https://github.com/maveniverse/mimir

Note: this is just Proof of Concept, and there is a lot of TODOs, like config and other things. But, it “works for me”.

In short, current Mímir holds Local node under ~/.mimir/local directory, and hardlinks to effective local repository the files (if possible, otherwise plain file copy happens). It also provides one Remote node implementation using JGroups 5.x that will discover “Node publishers” on LAN. Right now, a Node publisher publishes workstation Local node cache via LAN, and has to be manually started and kept alive. Protocol is trivial and remains to be improved: consumer multicasts a lookup message, and nodes will report back with “OK” (hit) or “KO” (miss) messages. The hit messages also contain the IP address and port and a txID, where receiver can connect to to download the file. Consumer will use his own Local node as “write through” cache and will finally offer it to Connector. This means that cache contents on single workstation are tried to keep in minimal copies (via hard-linking) but they do replicate across workstations. Currently, the main intent was to keep dependencies minimum.

Example session

On consumer side: This build uses Mímir extension, and starts with empty local repository (with a cheat: Mímir is present, as it is unreleased yet). The build finishes quickly, and without any remote transport, as Mímir served cached content via LAN (can be seen with -X): https://asciinema.org/a/696900

On publisher side: Here, the workstation has pre-populated Mímir local cache. The publisher is started, that listens on LAN, and serves local cache content to remote. Assuming serving it via LAN is faster that letting LAN neighbor builder go remote: https://asciinema.org/a/696902

There are still rough edges, but for now I intentionally keep everything dead simple.

Update: Version 0.1.0 is released and is on Maven Central! To use it, either add to project in file .mvn/extensions.xml or, if using Maven 4, you can make it user-wide extension by adding it to ~/.m2/extensions.xml:

<?xml version="1.0" encoding="UTF-8"?>
<extensions>
    <extension>
        <groupId>eu.maveniverse.maven.mimir</groupId>
        <artifactId>extension3</artifactId>
        <version>0.1.0</version>
    </extension>
</extensions>

To publish your local cache on LAN, download publisher from here and run it:

$ java -jar jgroups-0.1.0-cli.jar

To stop it, use Ctrl+C in console.