Multiple hash algorithms by pks-t · Pull Request #4438 · libgit2/libgit2

pks-t · 2017-12-08T13:33:37Z

A first step towards an abstraction for the use of multiple of hash algorithms. It lays the groundwork for having git_hash_ctx dispatch to different hash algorithms based on which one is being requested. Next steps:

extend git_repository by a git_hash_algorithm field
implement required new algorithms
extend git_hash_init and git_hash_ctx_init to accept an algorithm which is then used for hashing by that particular context only
extend OIDs to be a union of all possible hashes
scavenge through the tree and adjust all callers. In most cases we'd want to retain SHA1 (e.g. pack trailers, patch IDs, filebuffers), for others we'd simply use whatever the repository format states

pks-t · 2018-06-22T10:28:19Z

Rebased to fix a whole lot of conflicts

pks-t · 2018-06-22T12:12:27Z

Todo: remove this weird separation between git_hash_init and git_hash_ctx_init. It is only used by Win32 to initialize crypto providers, but this API simply sucks. We should just make sure to always call git_hash_global_init and git_hash_sha1_global_init and initialize the crypto provider there.

tiennou · 2018-06-22T20:18:25Z

src/hash.h

+	union {
+		git_hash_sha1_ctx sha1;
+	} ctx;
+	git_hash_algorithm alg;


Maybe put the algo first ? I'm thinking we might want a GIT_HASH_ALGORITHM_CUSTOM, and having that appear first would allow a user-provided struct to be overlayed transparently ?

ethomson · 2018-06-26T14:07:35Z

Todo: remove this weird separation between git_hash_init and git_hash_ctx_init.

It may be weird, but the goal is to avoid allocations in git_hash_init. Building a git_hash_ctx may perform an allocation of an object that contains the hash state / accumulator registers / etc. So this cannot be shared across multiple hashes at once (ie, across threads). On Windows, the BCryptCreateHash can't be moved into the global initialization. (The Windows hashing stuff does load libraries and such in the global startup.)

The current implementation was chosen to be as similar to the existing init / update / finish pattern as possible, just adding a reusable object on top that you need to allocate and free. (I called that object a context, because our constructors are usually called "init" and needed to disambiguate between hash initialization.)

Indeed we could make git_hash_init do any necessary allocation work, and introduce a git_reset to clean it (ie, call the underlying hash't init method) and introduce a git_hash_dispose. (Now that we've settle on that name, we should fix the cleanup name that we're using.)

Certainly moving to having a "hash context" that uses the init / update / finish paradigm was less code churn and the changes were more obvious when we introduced them. I don't love the current API, but I wanted to at least give you some background on why it is the way it is.

pks-t · 2018-06-29T10:05:36Z

Thanks for the clarafication, @ethomson. I didn't realize that hash_ctx_cng_init actually allocates memory, so you've convinced me that the API makes sense. That being said, I think that the names suck and don't quite help to grasp the situation. My proposal would be to have:

git_hash_ctx_init to initialize the context and potentially allocate memory (same)
git_hash_ctx_dispose to free the context's memory (previously git_hash_ctx_free)
git_hash_update to add bytes to compute the hash
git_hash_final to compute the final hash
git_hash_clear to reset the hash to its initial state (previously git_hash_init)

I think that renaming git_hash_init to git_hash_clear would make it a lot more obvious what the distinction between git_hash_init and git_hash_ctx_init are.

pks-t · 2019-02-22T13:49:43Z

Rebased to fix conflicts. I've tried to reduce the scope a bit by only doing the split between generic and SHA1-specific hashing layers. I didn't do any API changes at all

pks-t · 2019-02-22T13:55:04Z

By the way: I noticed that our "Generic" SHA1 implementation doesn't compile right now. This is being fixed in here, too, but it shows that we're not as thorough with testing the different implementations as we maybe should be. Or, alternatively, that we have too many backends and that some aren't used at all.

pks-t · 2019-04-05T09:04:11Z

Rebased to fix conflicts

pks-t · 2019-06-14T08:07:46Z

Rebased to fix conflicts. @ethomson, @tiennou: could we maybe get this merged soonish? This doesn't introduce any behaviour changes yet, but it's required to get started with the actual transition to SHA256. The goal was to get going in small chunks to make later parts easier to review.

tiennou

A few minor, completely ignorable comments. It's pretty unintrusive as well (for now), so 👍.

tiennou · 2019-06-14T10:14:58Z

src/hash.h

 } git_buf_vec;

+typedef enum {
+	GIT_HASH_ALGO_UNDEF = 0,


Nitpick: UNKNOWN ? It's never really undefined, as it's either SHA1 or something we don't know. Semantics 😉.

tiennou · 2019-06-14T10:16:12Z

src/hash.h

+typedef enum {
+	GIT_HASH_ALGO_UNDEF = 0,
+	GIT_HASH_ALGO_SHA1,
+} git_hash_algo;


(Remark: we've started adding _t suffixes to some of these, but I've yet to grasp the rules.)

No idea. From a quick glance it looks like _t prefixes are more common, so I'll just add it

tiennou · 2019-06-14T10:22:01Z

src/hash/sha1/collisiondetect.h

+ * a Linking Exception. For full terms see the included COPYING file.
+ */
+
+#ifndef INCLUDE_hash_sha1_collisiondetect_h__


Any chance this could be subsumed as "private implementation detail" inside their respective C files (or sha1.h) ? I don't think we really care about accessing that hashing context, so apart from a struct-sizing/compilation POV, that would reduce the amount of "small files". Do ignore that if you have a compelling reason though.

Unfortunately, no. We allocate these structs on the stack, so their size needs to be known at compile time :/ Would've loved that, too

What about moving the required include + struct inside the #ifdef cascade in sha1.h ? At least the ones that are 10-20 lines (ie. keep win32.h separate) ? I think I prefer more "complete", straightforward #define blocks vs 3-4 smaller ones.

I'd definitely agree if all of those struct declarations were small. But as you correctly point out, the "win32.h" one is a rather complex beast. I do have the goal of refactoring it in the future to internalize more of its implementation, but I'd like to keep that separate from this PR.

So for now, I think we should keep those headers separate to not have a hybrid of everything in "sha1.h" but not "win32.h".

tiennou · 2019-06-14T10:49:02Z

src/hash/sha1/mbedtls.h


-#ifndef INCLUDE_hash_mbedtld_h__
-#define INCLUDE_hash_mbedtld_h__
+#ifndef INCLUDE_hash_sha1_mbedtld_h__


mbedtld 🙄.

Heh, in fact I can point at you here :P

(the eyesballs were my own blame-acceptance 😉 )

I cannot see any smileys, thus I didn't get that :P

pks-t · 2019-06-14T11:15:54Z

Fixed comments by @tiennou. Thanks for your review!

pks-t · 2019-06-14T12:22:16Z

Rebased to fix conflicts with #5055

The structure `git_hash_prov` is only ever used by the Win32 SHA1 backend. As such, it doesn't make much sense to expose it via the generic "hash.h" header, as it is an implementation detail of the Win32 backend only. Move the typedef of `git_hash_prov` into "hash/sha1/win32.h" to fix this.

The hash source files have circular include dependencies right now, which shows by our broken generic hash implementation. The "hash.h" header declares two functions and the `git_hash_ctx` typedef before actually including the hash backend header and can only declare the remaining hash functions after the include due to possibly static function declarations inside of the implementation includes. Let's break this cycle and help maintainability by creating a real implementation file for each of the hash implementations. Instead of relying on the exact include order, we now especially avoid the use of `GIT_INLINE` for function declarations.

pks-t · 2019-06-24T16:32:56Z

Rebased to fix SHA1DC breakage

As we will include additional hash algorithms in the future due to upstream git discussing a move away from SHA1, we should accomodate for that and prepare for the move. As a first step, move all SHA1 implementations into a common subdirectory. Also, create a SHA1-specific header file that lives inside the hash folder. This header will contain the SHA1-specific header includes, function declarations and the SHA1 context structure.

As a preparatory step to allow multiple hashing APIs to exist at the same time, split the hashing functions into one layer for generic hashing and one layer for SHA1-specific hashing. Right now, this is simply an additional indirection layer that doesn't yet serve any purpose. In the future, the generic API will be extended to allow for choosing which hash to use, though, by simply passing an enum to the hash context initialization function. This is necessary as a first step to be ready for Git's move to SHA256.

Create a separate `git_hash_sha1_ctx` structure that is specific to the SHA1 implementation and move all SHA1 functions over to use that one instead of the generic `git_hash_ctx`. The `git_hash_ctx` for now simply has a union containing this single SHA1 implementation, only, without any mechanism to distinguish between different algortihms.

Create an enum that allows us to distinguish between different hashing algorithms. This enum is embedded into each `git_hash_ctx` and will instruct the code to which hashing function the particular request shall be dispatched. As we do not yet have multiple hashing algorithms, we simply initialize the hash algorithm to always be SHA1. At a later point, we will have to extend the `git_hash_init_ctx` function to get as parameter which algorithm shall be used.

pks-t · 2019-06-27T09:38:00Z

I'm just going ahead now and merge this. It's been open for a rather long time, and I finally want to continue working on this. There's still the one commen from @tiennou about inlining struct decls into "sha1.h", but we might as well tackle that in a future iteration.

pks-t force-pushed the pks/hash-algorithm branch from ccf7733 to a4fe12f Compare December 15, 2017 13:25

pks-t force-pushed the pks/hash-algorithm branch from a4fe12f to cb79aa1 Compare June 22, 2018 10:26

pks-t force-pushed the pks/hash-algorithm branch 3 times, most recently from d590a31 to 62690ae Compare June 22, 2018 12:10

pks-t force-pushed the pks/hash-algorithm branch 2 times, most recently from 7eafcf9 to 3b4409e Compare June 22, 2018 13:18

tiennou reviewed Jun 22, 2018

View reviewed changes

pks-t force-pushed the pks/hash-algorithm branch from 3b4409e to d72f112 Compare February 22, 2019 13:49

pks-t mentioned this pull request Mar 29, 2019

Each hash implementation should define git_hash_global_init #5026

Merged

pks-t force-pushed the pks/hash-algorithm branch from d72f112 to 2c3cfcc Compare April 5, 2019 09:02

pks-t mentioned this pull request Apr 5, 2019

sha1: don't inline git_hash_global_init for win32 #5039

Merged

pks-t force-pushed the pks/hash-algorithm branch from 2c3cfcc to 30927c0 Compare April 12, 2019 10:21

pks-t force-pushed the pks/hash-algorithm branch from 30927c0 to 2ee01f8 Compare June 14, 2019 08:05

tiennou approved these changes Jun 14, 2019

View reviewed changes

pks-t force-pushed the pks/hash-algorithm branch from 2ee01f8 to 362557a Compare June 14, 2019 11:15

pks-t force-pushed the pks/hash-algorithm branch from 362557a to 359017b Compare June 14, 2019 12:22

pks-t added 2 commits June 24, 2019 18:26

pks-t force-pushed the pks/hash-algorithm branch from 359017b to b7187ed Compare June 24, 2019 16:32

pks-t added 4 commits June 24, 2019 18:33

pks-t merged commit e9102de into libgit2:master Jun 27, 2019

pks-t deleted the pks/hash-algorithm branch June 27, 2019 09:38

implausible mentioned this pull request Jul 23, 2019

Bump libgit2 nodegit/nodegit#1705

Merged

snyk-bot mentioned this pull request Feb 23, 2020

[Snyk] Upgrade nodegit from 0.4.1 to 0.26.4 saurabharch/Breezeblocks#1

Open

snyk-bot mentioned this pull request Apr 22, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 aminatakonate000/Graviton-App#4

Open

snyk-bot mentioned this pull request May 5, 2020

[Snyk] Upgrade nodegit from 0.24.3 to 0.26.5 Barnstorm-Online/ngp-openapi-generator#1

Open

Conversation

pks-t commented Dec 8, 2017

Uh oh!

pks-t commented Jun 22, 2018

Uh oh!

pks-t commented Jun 22, 2018

Uh oh!

tiennou Jun 22, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethomson commented Jun 26, 2018

Uh oh!

pks-t commented Jun 29, 2018

Uh oh!

pks-t commented Feb 22, 2019

Uh oh!

pks-t commented Feb 22, 2019

Uh oh!

pks-t commented Apr 5, 2019

Uh oh!

pks-t commented Jun 14, 2019

Uh oh!

tiennou left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pks-t commented Jun 14, 2019

Uh oh!

pks-t commented Jun 14, 2019

Uh oh!

pks-t commented Jun 24, 2019

Uh oh!

pks-t commented Jun 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tiennou Jun 22, 2018 •

edited

Loading