How to Utilize Submodules within Git Repos

One Solution When the Primary Code Can be Open Source, but Specific Content Needs to be Private

Paige Niedringhaus
Bits and Pieces

--

Photo by Sigmund on Unsplash

The Dev Community’s Willingness to Share Almost Anything is One of the Many Reasons I ❤️ Being a Web Developer

From the first day I started learning to code at my bootcamp, I learned how open, how helpful and how freely info is shared within the web development community. It’s awesome and inspiring (and at times a little overwhelming). So. Much. Information!

But truly, I have such respect for the people who are open source maintainers of code, the people who answer questions on Stack Overflow, the people who speak at conferences, make courses, write books and blogs — and (often) do so on their own time, outside of their day jobs.

What other industry can you point to where such open collaboration and sharing (for no other benefit than to build cool things and enable others to do so too), is not only encouraged but also celebrated?

This is a reason why I, myself, write now. I’ve been lucky enough to learn so much between the problems I get to solve at my day job and the problems I’ve run into and overcome in my side projects, that I wanted to give back to the community I’ve benefitted so much from.

So let’s talk about a topic I learned a good bit about recently: Git submodules.

Imagine, if you will, a scenario where you want to share the majority of a code base with everyone, but perhaps, keep one folder within that codebase private. I encountered this very scenario when building my personal website. I used a popular, albeit barebones, Gatsby starter site to build it, which I’m happy to share the source code for (the styling, the colors, the React components and structure — have at it, no trade secrets there).

However, my site also houses my personal articles on web development and technology (like the one you’re reading now), and I’d rather not make it so easy that anyone could fork my whole repo, deploy it live, and have all of the articles I’ve put the time and effort into creating under their own name on a copycat site (it sounds a little ridiculous to think this would happen so blatantly, but I assure you, it’s not as far-fetched as it seems).

And so, meet Git submodules, a way to have the best of both worlds: the public repo with the source code anyone can see, and inside of it, a private repo with the protected information only specific people can access.

Essentially, Git submodules are a way to have two or more repos work together as a single unit. One project (or more), entirely separate from, yet used within, another project.

It is true the same can be achieved by authoring independent components and composing them together using Bit (with some components exported to a private remote scope and others to a public remote scope) but I guess that’s more of a matter of preferred methodology than anything else.

Enter Git Submodules

There’s a fantastic Git version control book (also available for free online), that does a wonderful job of explaining the reason for Git submodules:

It often happens that while working on one project, you need to use another project from within it. Perhaps it’s a library that a third party developed or that you’re developing separately and using in multiple parent projects. A common issue arises in these scenarios: you want to be able to treat the two projects as separate yet still be able to use one from within the other. — Git SCM

And yes, you could just copy the source code from one project and drop it into the other, but then if you update the copied source code later and want those same updates reflected in the original library, (or in my case, want some code in a repo to be private and other code to be public), it gets messy and hard to keep track of pretty quickly.

Git addresses this issue using submodules. Submodules allow you to keep a Git repository as a subdirectory of another Git repository. This lets you clone another repository into your project but still keep your commits to that project separate.

Sounds handy, right? Let me show you how it’s done — I’ll use my own site code as an example.

Create a Submodule (Another repo in GitHub)

So, first things first, create the repo to be imported into the other project (the submodule).

In my case, I created a repo in GitHub I named pn-blog-posts. I named it that way, since this submodule I wanted to make private in my personal website contains all the contents of the blog posts that would be displayed there.

After creating the new repo, set it to private (if applicable to you), add any content and folder structures you want and you should be almost done with this step. Last thing to do: copy the URL of the repo in GitHub — you’ll need it in the next step when the submodule is added to the main project.

Here’s a screenshot of what my submodule repo looks like, it’s currently very simple: an images/ folder, a samplePosts/ folder and (eventually) a posts/ folder will be included too.

Screenshot of structure and contents of the private repo. in GitHub that I’ll be adding as a submodule to my main project. It’s simple: an images folder and a sample-posts folder, side by side.

Add a Submodule to an Existing Repo

Ok, now that the repo to be imported is created, it’s time to add it as a submodule to the main project.

Before doing anything with submodules, I recommend running this command to update the config and set submodule.recurse to true, which allows git clone and git pull to automatically update submodules. (Trust me, this definitely makes life easier, later on).

To Prepare your Main Project Run this Submodule Command First

git config --global submodule.recurse true

If you neglected to do this when initially adding the repo, it’s OK, you can later run the command below, and it will initialize, fetch and checkout any nested submodules for you. Nice! 😄

If You Forgot to Run the Above Command First, Run this Command After the Fact

git submodule update --init --recursive

Right, the main project should be ready, so let’s add that submodule.

Add the Submodule Command

To add a new submodule to an existing project, you’ll open the project up and use the following command in the terminal:

git submodule add [URL of project to add in GitHub] [name of directory you want to see in the project]

By default, submodules will add the subproject into a directory named the same as the repository. You can add a different path at the end of the command if you want it to go elsewhere (also remove the brackets, I added those to make it easier to see the different parts of the git command).

So, although my repo is named pn-blog-posts in GitHub, I wanted it to be referenced as a folder named content within my project (because that’s what my Gatsby project looks for to generate blog posts from any markdown files present).

Here’s what my full git submodule command looked like:

git submodule add https://github.com/paigen11/pn-blog-posts content

At this point, there should be a new folder present in the project named content/ and if you run git status in the command line, there should be a couple of new files present. First is a .gitmodules file, and second is a folder named content/.

Let’s take a closer look at these two new files.

.gitmodules File

If you look at the content inside the .gitmodules file, you’ll see something like the following:

[submodule "content"]
path = content
url = https://github.com/paigen11/pn-blog-posts
branch = main

This is a configuration file that stores the mapping between the submodule project’s URL and the local subdirectory you’ve pulled it into, and if you have multiple submodules, you’ll have multiple entries here with the name of each submodule and its URL.

This is how other people who clone this project know where to get the submodule projects from. The same holds true for deploying this project somewhere not on your local machine (like Netlify, for instance), but we’ll get to that a little later in this article.

content/ Folder

The other output from the git status command, content/ is the project folder entry.

Although content is a subdirectory in your working directory, Git sees it as a submodule and doesn’t track its contents when you’re not in that directory (stay with me here, this fact will be important later). Instead, Git sees it as a particular commit from that repository.

And when you run the commands:

git commit -am 'adding content submodule' 
git push origin HEAD

Your new submodule will be added to your GitHub project like the screenshot below:

Screenshot of my main blog repo in GitHub, with the content submodule present, as delineated by the “content @ <HASH>”.

When you click on the content @ <HASH HERE> folder in GitHub, instead of being taken to a folder of files inside of the main paigeniedringhaus repo, you’ll be linked over to the pn-blog-posts repo. And since the repo is set to private, only those with access can see the contents of the repo; all others will see a 404 page instead.

Bottom Line on Submodules:

Remember when I wrote “Git sees [the newly added subdirectory] as a submodule and doesn’t track its contents when you’re not in that directory” just a few lines ago?

What this means in practice is that all updates to the files within the content/ directory must be done inside the pn-blog-posts repo, pushed to GitHub, pulled into the other project where it’s a submodule and finally pushed as a commit to the main repo’s GitHub source. 🥵 Yes, it’s extra work, but the benefits of public and private make it worth it to me.

If you think about it, working with submodules is very similar to working with node_modules that are imported through your package.json. You know that modifying them locally won’t be persisted when you commit to GitHub, and the same is true of modifying your submodule locally. Making any changes to the submodule in your main project will throw it out of sync with its source, which is a Git nightmare none of us want to deal with.

Time to move on to updating the contents of the submodules now.

Update the Contents of the Submodule

Let’s say you’ve made some additions to the submodule repo. Maybe a new image has been added, or a new post has been written and you’re ready to preview it and deploy it inside of your main project.

First thing is to push the change in the submodule repo to GitHub. Next, open the main project locally and pull in the changes of the submodule. You can do that with the following command:

Pull Updates to the Submodule in the Main Project

git submodule update --remote

With this command Git will go into your submodules folder and fetch any updates for you, and if you want to fetch updates for just one submodule in a project, just add the name of the submodule folder after --remote to target it in particular.

So if I only wanted to update my submodule housing my blog posts, my command would look like this:

git submodule update --remote content

Change the Branch the Submodule Updates From

Note: The submodule update command will, by default, assume that you want to update the checkout to the master branch of the submodule repository. If your changes are in another branch (like main), you can set this in your .gitmodules file or just in the local .git/config file like so:

Set Branch Git Submodule Update Command Pulls From

git config -f .gitmodules submodule.content.branch main

Deploy to Netlify (or another Host) with Private Submodules

Another thing I want to cover (because this tripped me up initially): how to authorize a cloud host to access your private submodule when it tries to build and deploy your site.

I’ll be using Netlify as the example host because it is actually where I host my site, and it’s a joy to work with.

When I pushed my Gatsby project to Netlify for the first time, I got the following error during build time:

Error checking out submodules: Submodule 'content' (https://github.com/paigen11/pn-blog-posts) registered for path 'content'Cloning into '/opt/build/repo/content'...
fatal: could not read Username for 'https://github.com': No such device or address
fatal: clone of 'https://github.com/paigen11/pn-blog-posts' into submodule path '/opt/build/repo/content' failed
Failed to clone 'content'.

When I googled “netlify error checking out submodules”, the first StackOverflow answer I found advised updating the .gitmodules url to reference the submodule’s GitHub address as its SSH address instead of the HTTPS address I’d initially used.

So I did. I went back to GitHub and copied the SSH address for my private repo (which I’ve taken a screenshot of here for reference).

Notice the screenshot above shows me copying the SSH address for my private repo to replace the original HTTPS address in my main project’s .gitmodules file.

Then, I updated the .gitmodules url as shown below in my main project.

Update Your .gitmodules File to Use the SSH GitHub Link for Your Submodule

[submodule "content"]
path = content
url = git@github.com:paigen11/pn-blog-posts.git
branch = main

Notice that now my URL is no longer the https://github.com/paigen11/pn-blog-posts, instead it’s become git@github.com:paigen11/pn-blog-posts.git.

I pushed to Netlify again, my project failed to build again. But the error message this time led me to a Netlify Forum that cleared up what was happening this time.

Since my submodule was set to private in GitHub, I needed to grant permissions for Netlify to access the private submodule via SSH key.

I followed the Netlify instructions to create a deploy key in GitHub and added the key to my private repo as a read-only key (you can find this setting in a repo at github.com/github-username/repo-name/settings/keys).

Screenshot of the deploy key (with actual key contents blanked out) I added to my pn-blog-posts repo in GitHub so Netlify could access the submodule’s contents at build time.

And I pushed it to Netlify to build one more time and… voila! My site went live — submodules and all! 🙌

Note: If my submodule repository had been public, I wouldn’t have needed to update my .gitmodules file and add the SSH key, but since it was private, I had to.

If the submodule you link to is public, you should be fine deploying with no changes at all to those files.

Cloning a Project with Submodules

Ok, the last thing I’ll talk about in this blog is cloning a project with submodules present.

When you clone such a project, by default you get the directories that contain submodules, but none of the files within them yet.

To download the submodules’ contents, you must run two commands: git submodule init to initialize your local configuration file, and git submodule update to fetch all the data from that project and check out the appropriate commit listed in your main project.

After that, your submodules should be in exactly the same state as when you committed them earlier.

To Simplify Cloning a Project with Submodules, Use Recursion

The trick I listed above about recursion works here too. If you pass --recurse-submodules to the git clone command, it will automatically initialize and update each submodule in the repository, including nested submodules if any of the submodules in the repository have submodules themselves. 😎

git clone [repo URL from github] --recurse-submodules

Again, it’s kind of like cloning a repo from GitHub and then installing all the node_modules listed in the package.json locally. We do it so often as developers, we don’t even have to think too much about it, and it’s nice to me, that submodules operate in a similar fashion.

Conclusion

Git submodules are a lesser talked about, but very powerful tool to have in your developer tool belt.

Want to share third-party libraries with multiple projects? Want to open-source the majority of a project but keep some code proprietary? Git submodules might just be the answer you’re looking for.

Before I needed them to help me keep my website code open source except for the blog post content, I didn’t know they existed. Now, I’m excited to see what other uses I can come up for them in my projects.

Check back in a few weeks — I’ll be writing more about JavaScript, React, ES6, or something else related to web development. If you’d like to make sure you never miss an article I write, sign up for my newsletter here: https://paigeniedringhaus.substack.com

Thanks for reading. I hope this article gives you a better understanding of what submodules are in git, and how you can use them in your own projects. I’d love to hear how you can make them work for you and your team.

Autonomous teams building together

Building monolithic apps means everyone works together, in one codebase, and on the same bloated version. This often makes development painful and slow as you scale.

But what if instead, you build independent components first, and then compose apps? Autonomous teams, building together!

Every team could work in their own codebase, develop and deploy their own features, and continuously collaborate with others to share and use each other’s components.

OSS Tools like Bit offer a powerful developer experience for doing exactly that. Many teams start by building their Design Systems or Micro Frontends, through components. Give it a try →

An independently source-controlled and shared “card” component. On the right => its dependency graph, auto-generated by Bit.

--

--

Staff Software Engineer at Blues, previously a digital marketer. Technical writer & speaker. Co-host of Front-end Fire & LogRocket podcasts