Conversation
ccf7733 to
a4fe12f
Compare
a4fe12f to
cb79aa1
Compare
|
Rebased to fix a whole lot of conflicts |
d590a31 to
62690ae
Compare
|
Todo: remove this weird separation between |
7eafcf9 to
3b4409e
Compare
src/hash.h
Outdated
| union { | ||
| git_hash_sha1_ctx sha1; | ||
| } ctx; | ||
| git_hash_algorithm alg; |
There was a problem hiding this comment.
Maybe put the algo first ? I'm thinking we might want a GIT_HASH_ALGORITHM_CUSTOM, and having that appear first would allow a user-provided struct to be overlayed transparently ?
It may be weird, but the goal is to avoid allocations in The current implementation was chosen to be as similar to the existing init / update / finish pattern as possible, just adding a reusable object on top that you need to allocate and free. (I called that object a context, because our constructors are usually called "init" and needed to disambiguate between hash initialization.) Indeed we could make Certainly moving to having a "hash context" that uses the |
|
Thanks for the clarafication, @ethomson. I didn't realize that
I think that renaming |
3b4409e to
d72f112
Compare
|
Rebased to fix conflicts. I've tried to reduce the scope a bit by only doing the split between generic and SHA1-specific hashing layers. I didn't do any API changes at all |
|
By the way: I noticed that our "Generic" SHA1 implementation doesn't compile right now. This is being fixed in here, too, but it shows that we're not as thorough with testing the different implementations as we maybe should be. Or, alternatively, that we have too many backends and that some aren't used at all. |
d72f112 to
2c3cfcc
Compare
|
Rebased to fix conflicts |
2c3cfcc to
30927c0
Compare
30927c0 to
2ee01f8
Compare
tiennou
left a comment
There was a problem hiding this comment.
A few minor, completely ignorable comments. It's pretty unintrusive as well (for now), so 👍.
src/hash.h
Outdated
| } git_buf_vec; | ||
|
|
||
| typedef enum { | ||
| GIT_HASH_ALGO_UNDEF = 0, |
There was a problem hiding this comment.
Nitpick: UNKNOWN ? It's never really undefined, as it's either SHA1 or something we don't know. Semantics 😉.
src/hash.h
Outdated
| typedef enum { | ||
| GIT_HASH_ALGO_UNDEF = 0, | ||
| GIT_HASH_ALGO_SHA1, | ||
| } git_hash_algo; |
There was a problem hiding this comment.
(Remark: we've started adding _t suffixes to some of these, but I've yet to grasp the rules.)
There was a problem hiding this comment.
No idea. From a quick glance it looks like _t prefixes are more common, so I'll just add it
| * a Linking Exception. For full terms see the included COPYING file. | ||
| */ | ||
|
|
||
| #ifndef INCLUDE_hash_sha1_collisiondetect_h__ |
There was a problem hiding this comment.
Any chance this could be subsumed as "private implementation detail" inside their respective C files (or sha1.h) ? I don't think we really care about accessing that hashing context, so apart from a struct-sizing/compilation POV, that would reduce the amount of "small files". Do ignore that if you have a compelling reason though.
There was a problem hiding this comment.
Unfortunately, no. We allocate these structs on the stack, so their size needs to be known at compile time :/ Would've loved that, too
There was a problem hiding this comment.
What about moving the required include + struct inside the #ifdef cascade in sha1.h ? At least the ones that are 10-20 lines (ie. keep win32.h separate) ? I think I prefer more "complete", straightforward #define blocks vs 3-4 smaller ones.
There was a problem hiding this comment.
I'd definitely agree if all of those struct declarations were small. But as you correctly point out, the "win32.h" one is a rather complex beast. I do have the goal of refactoring it in the future to internalize more of its implementation, but I'd like to keep that separate from this PR.
So for now, I think we should keep those headers separate to not have a hybrid of everything in "sha1.h" but not "win32.h".
src/hash/sha1/mbedtls.h
Outdated
|
|
||
| #ifndef INCLUDE_hash_mbedtld_h__ | ||
| #define INCLUDE_hash_mbedtld_h__ | ||
| #ifndef INCLUDE_hash_sha1_mbedtld_h__ |
There was a problem hiding this comment.
Heh, in fact I can point at you here :P
There was a problem hiding this comment.
(the eyesballs were my own blame-acceptance 😉 )
There was a problem hiding this comment.
I cannot see any smileys, thus I didn't get that :P
2ee01f8 to
362557a
Compare
|
Fixed comments by @tiennou. Thanks for your review! |
362557a to
359017b
Compare
|
Rebased to fix conflicts with #5055 |
The structure `git_hash_prov` is only ever used by the Win32 SHA1 backend. As such, it doesn't make much sense to expose it via the generic "hash.h" header, as it is an implementation detail of the Win32 backend only. Move the typedef of `git_hash_prov` into "hash/sha1/win32.h" to fix this.
The hash source files have circular include dependencies right now, which shows by our broken generic hash implementation. The "hash.h" header declares two functions and the `git_hash_ctx` typedef before actually including the hash backend header and can only declare the remaining hash functions after the include due to possibly static function declarations inside of the implementation includes. Let's break this cycle and help maintainability by creating a real implementation file for each of the hash implementations. Instead of relying on the exact include order, we now especially avoid the use of `GIT_INLINE` for function declarations.
359017b to
b7187ed
Compare
|
Rebased to fix SHA1DC breakage |
As we will include additional hash algorithms in the future due to upstream git discussing a move away from SHA1, we should accomodate for that and prepare for the move. As a first step, move all SHA1 implementations into a common subdirectory. Also, create a SHA1-specific header file that lives inside the hash folder. This header will contain the SHA1-specific header includes, function declarations and the SHA1 context structure.
As a preparatory step to allow multiple hashing APIs to exist at the same time, split the hashing functions into one layer for generic hashing and one layer for SHA1-specific hashing. Right now, this is simply an additional indirection layer that doesn't yet serve any purpose. In the future, the generic API will be extended to allow for choosing which hash to use, though, by simply passing an enum to the hash context initialization function. This is necessary as a first step to be ready for Git's move to SHA256.
Create a separate `git_hash_sha1_ctx` structure that is specific to the SHA1 implementation and move all SHA1 functions over to use that one instead of the generic `git_hash_ctx`. The `git_hash_ctx` for now simply has a union containing this single SHA1 implementation, only, without any mechanism to distinguish between different algortihms.
Create an enum that allows us to distinguish between different hashing algorithms. This enum is embedded into each `git_hash_ctx` and will instruct the code to which hashing function the particular request shall be dispatched. As we do not yet have multiple hashing algorithms, we simply initialize the hash algorithm to always be SHA1. At a later point, we will have to extend the `git_hash_init_ctx` function to get as parameter which algorithm shall be used.
|
I'm just going ahead now and merge this. It's been open for a rather long time, and I finally want to continue working on this. There's still the one commen from @tiennou about inlining struct decls into "sha1.h", but we might as well tackle that in a future iteration. |
A first step towards an abstraction for the use of multiple of hash algorithms. It lays the groundwork for having
git_hash_ctxdispatch to different hash algorithms based on which one is being requested. Next steps:git_repositoryby agit_hash_algorithmfieldgit_hash_initandgit_hash_ctx_initto accept an algorithm which is then used for hashing by that particular context only