Skip to content

Free Public Speaking Workshop For Women

We're hosting our first ever free public speaking workshop for women in San Francisco! If you're interested in leveling up your public speaking skills, join us on Saturday, February 22nd for a day of inspiring talks from women who rock, workshopping with incredible mentors from the tech community, and (only if you're up for it) getting on stage to deliver your first lightning talk.

stage

Conferences are notable not only for the prominent people on stage, but also for those who are missing.

— Sarah Millstein in Putting An End To Conferences Dominated By White Men

Changing the ratio starts with increasing the visibility of those people who are missing from tech conference lineups. With this workshop, we're hoping to give you the tools not only to feel comfortable talking about the work you do, but help you to increase your own visibility within the community.

Meet Our Keynote Speakers:

  • Denise Jacobs, Speaker, Author, Creativity Evangelist, Passionate Diversity Advocate
  • Diana Kimball, Expert Novice, Bright Soul, and Harvard MBA Set Out on Making the World A Better Place

Our Awesome Mentors For The Day:

  • Ana Hevesi, Community Developer at StackExchange, Conference Organizer, Brilliant Wordsmith, So Damn Well-Spoken
  • Andi Galpern, Expert Web Designer, Rockin' Musician, and Passionate Tech Educator
  • Alexis Finch, Sketch Artist, Has Probably Seen More Conference Talks Than Ted Himself, Badass Women's Advocate
  • Alice Lee, Designer and Illustrator at Dropbox, Super Talented Letterer, and Organizer of Origins
  • Anita Sarkeesian, Creator and Host of Feminist Frequency, Pop Culture Trope Expert , Probably the Most Hilarious Human Alive
  • Angelina Fabbro, Engineer/Developer and Developer Advocate at Mozilla. Writes Code/Writes Words About Code/Speaks About Code
  • Ash Huang, Designer at Pinterest, Really Quite Handy with Gifs IRL
  • C J Silverio, Cats, Weightlifting, and Node.js, Not Necessarily In That Order.
  • Divya Manian, Crazy Talented Speaker, Avid Coder, and Armchair Anarchist
  • Garann Means, JavaScript Developer, Incredible Writer, Proud Austin-ite, and Beyond Powerful Speaker
  • Emily Nakashima, Resides in the East Bay, Programs at GitHub
  • Jackie Balzer, Writes CSS Like It's Her Job (It Is), Leads An Army of CSS Badasses at Behance
  • Jen Myers, Former Passion Projects Speaker, Dev Bootcamp Instructor, Fantastic Keynoter, and Starter of Brilliant Things
  • Jesse Toth, Developer at GitHub, Cal CS Grad
  • Jessica Dillon, Lover, Fighter, Javascript Writer
  • Jessica Lord, Open Sourcerer, Former Code For America Fellow, Changing The Way The World Interacts With GitHub/Code/Javascript
  • Julie Ann Horvath, Passion Projects Creator, Developer, and Designer of Websites and Also Slides
  • Kelly Shearon, All Things Marketing and Content Strategy at GitHub, Could Write You Under A Table, Super Cool Mom
  • Luz Bratcher, Helvetica-loving UX designer at Design Commission, Event Admin for Seattle Creative Mornings
  • Mina Markham, Badass Lady Dev, Girl Develop It Founder/Instructor, Generally Rad Person
  • Netta Marshall, Lead Designer at Watsi, Formerly Rdio, Professional Ninja, Owner Of Best Website Footer On The Internet
  • Raquel Vélez, Hacker of The Web (node.js), Robotics Engineer, Polyglot, (Cal)Techer
  • Sara Pyle, Supportocat at GitHub, Amateur Shapeshifter, and Professional Superhero
  • Sonya Green, Chief Empathy Officer, Leads Support at GitHub
  • Tatiana Simonian, VP of Music at Nielsen, Formerly Music at Twitter and Disney
  • Willo O'Brien, Heart-Centered Entrepreneur, Speaker, Coach, Seriously Positive Person

The Pertinent Details:

  • GitHub’s First Public Speaking Workshop For Women
  • At GitHub HQ in San Francisco, CA
  • Saturday, February 22nd, from 11:00am-4:00pm
  • Food, beverages, moral support and also plenty of fun provided.
  • You must register interest here if you'd like to attend. The last day to register interest is Sunday, February 16th. You will be notified on Monday, February 17th if* you've been selected to participate.

*Because we can only host so many people in our space, we're using a lottery system to select participants to ensure the process is fair and balanced.

If you can't make our workshop but are interested in leveling up as a speaker, here's a few resources:

If you're a conference organizer who is looking for some resources to help diversify your lineups this year, these are all great places to start:

Video from Passion Projects Talk #10 with Dana McCallum

Dana McCallum joined us in January of 2013 for the 10th installment of our Passion Projects talk series. Dana's talk revealed how she brought her non-tech passions to life through programming. Check out the full video of her talk and our panel discussion below.

Photos from the event

Thanks to everyone who came out for Dana's talk, including our musical performance for the evening, Running in the Fog.

passionproj_danamccallum-5138 passionproj_danamccallum-5122 passionproj_danamccallum-5754 passionproj_danamccallum-5175 passionproj_danamccallum-5234 passionproj_danamccallum-5740 passionproj_danamccallum-5741 passionproj_danamccallum-5791 passionproj_danamccallum-5783

Photos courtesy of our fab photog :sparkles: Mona Brooks :sparkles: of Mona Brooks Photography.

Proxying User Images

A while back, we started proxying all non-https images to avoid mixed-content warnings using a custom node server called camo. We’re making a small change today and proxying HTTPS images as well.

Proxying these images will help protect your privacy: your browser information won't be leaked to other third party services. Since we're also routing images through our CDN, you should also see faster overall load times across GitHub, as well as fewer broken images in the future.

Related open source patches

DNS Outage Post Mortem

Last week on Wednesday, January 8th, GitHub experienced an outage of our DNS infrastructure. As a result of this outage, our customers experienced 42 minutes of downtime of services along with an additional 1 hour and 35 minutes of downtime within a subset of repositories as we worked to restore full service. I would like apologize to our customers for the impact to your daily operations as a result of this outage. Unplanned downtime of any length is unacceptable to us. In this case we fell short of both our customers' expectations and our own. For that, I am truly sorry.

I would like to take a moment and explain what caused the outage, what happened during the outage, and what we are doing to help prevent events like this in the future.

Some background…

For some time we've been working to identify places in our infrastructure that are vulnerable to Distributed Denial of Service (DDoS) attacks. One of the things we specifically investigated was options for improving our defenses against DNS amplification attacks, which have become very common across the internet. In order to simplify our access control rules, we decided to reduce the number of hosts which are allowed to make DNS requests and receive DNS replies to a very small number of name servers. This change allows us to explicitly reject DNS traffic that we receive for any address that isn't explicitly whitelisted, reducing our potential attack surface area.

What happened...

In order to roll out these changes, we had prepared changes to our firewall and router configuration to update the IP addresses our name servers used to send queries and receive responses. In addition, we prepared similar changes to our DNS server configuration to allow them to use these new IP addresses. The plan was to roll out this set of changes for one of our name servers, validate the new configuration worked as expected, and proceed to make the same change to the second server.

Our rollout began on the afternoon of the 8th at 13:20 PM PST. Changes were deployed to the first DNS server, and an initial verification led us to believe the changes had been rolled out successfully. We proceeded to deploy to the second name server at 13:29 PM PST, and again performed the same verification. However, problems began manifesting nearly immediately.

We began to observe that certain DNS queries were timing out. We quickly investigated, and discovered a bug in our rollout procedure. We expected that when our change was applied, both our caching name servers and authoritative name servers would receive updated configuration - including their new IP addresses - and restart to apply this configuration. Both name servers received the appropriate configuration changes, but only the authoritative name server was restarted due to a bug in our Puppet manifests. As a result, our caching name server was requesting authoritative DNS records from an IP that was no longer serving DNS. This bug created the initial connection timeouts we observed, and began a cascade of events.

Our caching and authoritative name servers were reloaded at 13:49 PST, resolving DNS query timeouts. However, we observed that certain queries were now incorrectly returning NXDOMAIN. Further investigation found that our DNS zone files had become corrupted due to a circular dependency between our internal provisioning service and DNS.

During the investigation of the first phase of this incident, we triggered a deployment of our DNS system, which performs an API call against our internal provisioning system and uses the result of this call to construct a zone file. However, this query requires a functioning DNS infrastructure to complete successfully. Further, the output from this API call verification was not adequately checked for sanity before being converted into a zone file. As a result, this deployment removed a significant amount of records from our name servers, causing the NXDOMAIN results we observed. The missing DNS records were restored by performing the API call manually, validating the output, and updating the affected zones.

Many of our servers recovered gracefully once DNS service began responding appropriately. However, we quickly noted that github.com performance had not returned to normal, and our error rates were far higher than normal. Further investigation found that a subset of our fileservers were actively refusing connections due to what we found out later was memory exhaustion, exacerbated by the spawning of a significant number of processes on during the DNS outage.

Total number of processes across fileservers

total number

Total memory footprint across fileservers

total memory

The failing fileservers began creating a back pressure in our routing layer that prevented connections to healthy fileservers. Our team began manually removing all misbehaving fileservers from the routing layer, restoring services for the fileservers that had survived the spike in processes and memory during the DNS event.

The team split up the pool of disabled fileserver, and triaged their status. Collectively, we found one of two scenarios existed to be repaired: either the node had calmed down ‘enough’ as a result of DNS service restoration to allow one of our engineers to log into the box and start forcefully killing hung processes to restore service, or the node had become so exhausted that our HA daemon kicked in to STONITH the active node and bring up our secondary node. In both of these situations, our team went in and performed checks against our low-level DRBD block devices to ensure there were no inconsistencies or errors in data replication. Full service was restored for all of our customers by 15:47 PM PST.

What we’re doing about it...

This small problem uncovered quite a bit about our infrastructure that we will be critically reviewing over the next few weeks. This includes:

  1. We are investigating further decoupling of our internal and external DNS infrastructure. While the pattern of forwarding requests to an upstream DNS server is not uncommon, the tight dependency that exists between our internal name servers and our external name servers needs to be broken up to allow changes to happen independently of each other.
  2. We are reviewing our configuration management code for other service restart bugs. In many cases, this means the improvement of our backend testing. We will be reviewing critical code for appropriate tests using rspec-puppet, as well as looking at integration tests to ensure that service management behaves as intended.
  3. We are reviewing the cyclic dependency between our internal provisioning system and our DNS resolvers, and have already updated the deployment procedure to verify the results returned from the API call before removing a large number of records.
  4. We are reviewing and testing all of the designed safety release valves in our fileserver management systems and routing layers. During the failure when filesevers became so exhausted that the routing layer failed due to back pressure, we should have seen several protective measures kick in to automatically remove these servers from service. These mechanisms did not fire off as designed, and need to be revisited.
  5. We are implementing process accounting controls to appropriately limit the resources consumed by our application processes. Specifically, we are testing Linux cgroups to further isolate application processes from administrative system functionality. In the event of a similar event in the future, this should allow us to restore full access much more quickly.
  6. We are reviewing the code deployed to our fileservers to analyze for tight dependencies to DNS. We reviewed the DNS time-outs on our fileservers and found that DNS requests should have timed out after 1 second, and only retried to resolve 2 times in total. This analysis along with cgroup implementation should provide a better barrier to avoid runaway processes in the first place, and a safety valve to manage them if processing becomes unruly in the future.

Summary

We realize that GitHub is an important part of your development and workflow. Again, I would like to take a moment to apologize for the impact that this outage had to your operations. We take great pride in providing the best possible service quality to our customers. Occasionally, we run into problems as detailed above. These incidents further drive us to continually improve the quality of our own internal operations and ensure that we are living up to the trust you have placed in us. We are working diligently to provide you with a stable, fast, and pleasant GitHub experience. Thank you for your continual support of GitHub!

Optimizing large selector sets

CSS selectors are to frontend development as SQL statements are to the backend. Aside from their origin in CSS, we use them all over our JavaScript. Importantly, selectors are declarative, which makes them prime candidates for optimizations.

Browsers have a number of ways of dealing with parsing, processing, and matching large numbers of CSS selectors. Modern web apps are now using thousands of selectors in their stylesheets. In order to calculate the styles of a single element, a huge number of CSS rules need to be considered. Browsers don't just iterate over every selector and test it. That would be way too slow.

Most browsers implement some kind of grouping data structure to sort out obvious rules that would not match. In WebKit, it's called a RuleSet.

SelectorSet

SelectorSet is a JavaScript implementation of group technique browsers are already using. If you have a set of selectors known upfront, it makes matching and querying elements against that set of selectors much more efficient.

Selectors added to the set are quickly analyzed and indexed under a key. This key is derived from a significant part of the right most side of the selector. If the selector targets an id, the id name is used as the key. If there's a class, the class name is used and so forth. The selector is then put into a map indexed by this key. Looking up the key is constant time.

When it's time to match the element against the group, the element's properties are examined for possible keys. These keys are then looked up in the mapping which returns a smaller set of selectors which then perform a full matches test against the element.

Speeding up document delegated events

jQuery’s original $.fn.live function (and its modern form, $.fn.on) are probably the most well known delegation APIs. The main advantage of using the delegated event handler over a directly bound one is that new elements added after DOMContentLoaded will trigger the handler. A technique like this is essential when using a pattern such as pjax, where the entire page never fully reloads.

Extensive usage of document delegated event handlers is considered controversial. This includes applications with a large number of $(‘.foo’).live(‘click’) or $(document).on(‘click’, ‘.foo’) registrations. The common performance argument is that the selector has to be matched against entire ancestor chain of the event target. On an application with large and deeply nested DOM, like github.com, this could be as deep as 15 elements. However, this is likely not the most significant factor. It is when the number of delegated selectors themselves is large. GitHub has 100+ and Basecamp has 300+ document delegated events.

Using the selector set technique described above, installing this jQuery patch could massively speed up your apps event dispatch. Here’s a fun little jsPerf test using real GitHub selectors and markup to demonstrate how much faster the patched jQuery is.

Conclusion

Both of these libraries should be unnecessary and hopefully obsoleted by browsers someday. Browsers already implement techniques like this to process CSS styles efficiently. It's still unfortunate we have no native implementation of declarative event handlers, even though people have been doing this since 2006.

References

Video from Passion Projects Talk #7 with Jen Myers

Jen Myers joined us in December of 2013 for the 7th installment of our Passion Projects talk series. Jen taught us the importance of not being an expert and how to be responsible for our own learning and personal and professional growth. Check out the full video of her talk and our panel discussion below.

Photos from the event

passionproj_jenmyers-1761 passionproj_jenmyers-1839 passionproj_jenmyers-1811 passionproj_jenmyers-3000 passionproj_jenmyers-3053 passionproj_jenmyers-3080 passionproj_jenmyers-3112 passionproj_jenmyers-1827 passionproj_jenmyers-3025

Improving our SSL setup

As we announced previously we've improved our SSL setup by deploying forward secrecy and improving the list of supported ciphers. Deploying forward secrecy and up to date cipher lists comes with a number of considerations which makes doing it properly non trivial.

This is why we thought it would be worth expanding some more on the discussions we've had, choices we've made and feedback we've got from people.

Support newer versions of TLS

A lot of the internet's traffic is still secured by TLS 1.0. This version was attacked numerous times and also doesn't provide support for newer algorithms that you'd want to deploy.

We were glad that we were already on a recent enough OpenSSL version that supports TLS 1.1 and 1.2 as well. If you're looking at improving your SSL setup, making sure that you can support TLS 1.2 is the first step you should take, because it makes the other improvements possible as well. TLS 1.2 is supported in OpenSSL 1.0.0h and 1.0.1 and newer.

To BEAST or not to BEAST

When the BEAST attack first was published, the recommended way to mitigate this attack vector was to switch to RC4. Since then though, additional attacks against RC4 have been devised.

This led more and more people to recommend to move away from RC4, like the people behind SSL Labs and Mozilla.

Attacks against RC4 will only get better over time, and the vast majority of browsers have implemented client-side protections against BEAST. This is why we have decided to move RC4 to the bottom of our cipher prioritization, keeping it only for backwards compatibility.

The only cipher that is relatively broadly supported and that hasn't been compromised by attacks is AES GCM. This mode of AES doesn't suffer from keystream bias like RC4 or attacks on CBC that resulted in BEAST and Lucky 13.

Currently AES GCM is supported in Chrome, but it's also in the works for other browsers like Firefox. We've given priority to these ciphers and, given our usage patterns we now see a large majority of our connections being secured by this cipher.

Forward secrecy pitfalls

So the recommendations on which ciphers to use are fairly straighforward. But, choosing the right ciphers is only one step to ensuring forward secrecy. There are some pitfalls that can cause you to not actually provide any additional security to customers.

In order to explain these potential problems, first we need to introduce the concept of session resumption. Session resumption is a mechanism used to significantly shorten the handshake mechanism when a new connection is opened. This means that if a client connects again to the same server, we can do a shorter setup and greatly reduce the time it takes to setup a secure connection.

There are two mechanisms for implementing these session resumption, the first is using session IDs, the second is using session tickets.

SSL Session IDs

Using session IDs means that the server keeps track of state and if a client reconnects with a session ID the server has given out, it can reuse the existing state it tracked there. Let's see how that looks when we connect to a server supporting session IDs.

openssl s_client -tls1_2 -connect github.com:443 < /dev/null

...

New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    Session-ID: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

...

What you can see here is that the server hands out a Session-ID that the client can then use to send and reconnect. The downside of this is of course that this means the server needs to keep track of this state.

This state tracking also means that if you have a site that has multiple front ends for SSL termination, you might not get the benefits that you expect. If a client ends up on a different front end the second time, that front end doesn't know about the session ID and will have to setup a completely new connection.

SSL Session tickets

SSL Session tickets are described in RFC5077 and provide a mechanism that means we don't have to keep the same state at the server.

How this mechanism works is that the state is encrypted by the server and handed to the client. This means the server doesn't have to keep track of all this state in memory. It does mean however that the key used to encrypt session tickets needs to be tracked server side. This is how it looks when we connect to a server supporting session tickets.


New, TLSv1/SSLv3, Cipher is ECDHE-RSA-AES256-GCM-SHA384
Server public key is 2048 bit
Secure Renegotiation IS NOT supported
Compression: NONE
Expansion: NONE
SSL-Session:
    Protocol  : TLSv1.2
    Cipher    : ECDHE-RSA-AES256-GCM-SHA384
    ...
    TLS session ticket:
    0000 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0010 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0020 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0030 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0040 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0050 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0060 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0070 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0080 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................
    0090 - XX XX XX XX XX XX XX XX-XX XX XX XX XX XX XX XX   ................

With a session ticket key, it is possible to share this ticket across multiple front ends. This way you can have the performance benefits of session resumption even across different servers. If you don't share this ticket key it has the same performance benefits as using session ID's.

How this applies to GitHub

Not carefully considering the session resumption mechanism can lead to not getting the benefits of forward secrecy. If you keep track of the state for too long, it can be used to decrypt prior sessions, even when deploying forward secrecy.

This is described well by Adam Langley on his blog. Twitter also did a technical deep dive describing how they deployed a setup with sharing the session ticket key.

So, we had to decide whether developing a secure means of sharing ticket keys (ala Twitter) was necessary to maintain acceptable performance given our current traffic patterns. We found that clients usually end up on the same load balancer when they make a new connection shortly after a previous one. As a result, we decided that we can rely on session IDs as our resumption mechanism and still maintain a sufficient level of performance for clients.

This is also where we got tripped up. We currently use HAProxy as our SSL termination which ends up using the default OpenSSL settings if you don't specify any additional options. This means that both session IDs and session tickets are enabled by default.

The problem here lies with session tickets being enabled. Even though we didn't setup sharing the key across servers, it still means that HAProxy uses an in-memory key to encrypt session tickets. This encryption key is initialized once the process starts up and stays the same for the process lifetime.

This means that if we would have a HAProxy running for a long time, an attacker who who obtains the session ticket key can decrypt traffic from prior sessions whose ticket was encrypted using the session ticket key. This of course doesn't provide the forward secrecy properties we were aiming for.

Session IDs don't have this problem, since they have a lifetime of 5 minutes (on our platform), making the window for this attack only 5 minutes wide instead of the entire process lifetime.

Given that session tickets don't provide any additional value for us at this point, we decided to disable them and only rely on session IDs. This way we get the benefits of forward secrecy while also maintaining an acceptable level of performance for clients.

Acknowledgements

We would like to thank Jeff Hodges for reaching out to us and point us at what we've missed in our initial setup.

Introducing Forward Secrecy and Authenticated Encryption Ciphers

As of yesterday we've updated our SSL setup on the systems that serve traffic for GitHub. The changes introduce support for Forward Secrecy and Authenticated Encryption Ciphers.

So what is Forward Secrecy? The EFF provides a good explanation of what it is and why it is important. Authenticated Encryption means that we provide ciphers that are much less vulnerable to attacks. These are already supported in Chrome.

Also check SSL Labs if you want to know more details of the setup we've deployed.

Since this article was published, we've also written a more extensive post on what we've done.

The Ghost of Issues Past

The end of the year is fast approaching, and this is a good time to review open issues from long ago. A great way to find older issues and pull requests is our wonderful search system. Here are a few examples:

That last group, the ones not touched in the past year, should probably just be closed. If it's remained untouched in 2013, it probably won't be touched in 2014. There are 563,600 open issues across GitHub that have not been touched in the past year.

So go forth and close with impunity!

Join our Octostudy!

There are a lot of interesting people on GitHub today. Since we can't meet everyone at a conference, drinkup, or charity dodgeball game, we are hoping you can tell us a little more about yourself.

Please take a minute to fill out this short survey. You'll be helping us learn how we can make GitHub even better for you.

octostudy-cat-2

Cheers & Octocats!

(Also: tell your friends.)

Weak passwords brute forced

Some GitHub user accounts with weak passwords were recently compromised due to a brute force password-guessing attack. I want to take this opportunity to talk about our response to this specific incident and account security in general.

We sent an email to users with compromised accounts letting them know what to do. Their passwords have been reset and personal access tokens, OAuth authorizations, and SSH keys have all been revoked. Affected users will need to create a new, strong password and review their account for any suspicious activity. This investigation is ongoing and we will notify you if at any point we discover unauthorized activity relating to source code or sensitive account information.

Out of an abundance of caution, some user accounts may have been reset even if a strong password was being used. Activity on these accounts showed logins from IP addresses involved in this incident.

The Security History page logs important events involving your account. If you had a strong password or GitHub's two factor authentication enabled you may have still seen attempts to access your account that have failed.

This is a great opportunity for you to review your account, ensure that you have a strong password and enable two-factor authentication.

While we aggressively rate-limit login attempts and passwords are stored properly, this incident has involved the use of nearly 40K unique IP addresses. These addresses were used to slowly brute force weak passwords or passwords used on multiple sites. We are working on additional rate-limiting measures to address this. In addition, you will no longer be able to login to GitHub.com with commonly-used weak passwords.

If you have any questions or concerns please let us know.

An African hack trip

GitHub has a long tradition of supporting developer communities throughout the world. We throw drinkups, speak at and sponsor conferences, and host training events in most corners of the globe.

However, with the exception of a few talks and drinkups in Cape Town, we've not yet had much of a chance to see what's going on in the burgeoning sub-Saharan tech scene.

We had heard that several African countries were rapidly growing new and innovative tech communities, but very little is being said about how they operate. So last month @nrrrdcore, @luckiestmonkey and I decided to check it out for ourselves.

It just so happened that a group of like-minded developers and designers from Europe called The AfricaHackTrip, were also planning a very similar trip. So we reached out to them to see if we could tag along and help out in any way. We ended up sponsoring the Hackathons and BarCamps they had organised in 4 African cities and participating in a couple of them as well.

Rwanda

Our first stop was Kigali, Rwanda. Here we joined the AfricaHackTrip halfway into their adventures and took part in their BarCamp and Hackathon at the Office co-working space.

kigali

We met a ton of awesome techies and had great discussions about topics ranging from Open Source Software schools to time travel. On the hackathon day we discovered how some Rwandan hardware hackers are using Arduinos to solve rural farming problems and that reliance on decent internet connectivity is a big problem for developers there. Big enough that one group created a hack project that would monitor different Rwandan wifi networks at the same time:

bitranks

Tanzania

In Dar es Salaam the BarCamp and Hackathon events were held at the awesome Buni co-working space. Across the trip we noticed how technology hubs are working together to create vibrant communities. It wasn't uncommon to see the manager of one hub helping out at events at another.

Dar es Salaam was just as exciting as Kigali. People discussed local online payment systems and hacked on mapping solutions for Tanzanian health initiates. At a Git & GitHub workshop that we held at the Kinu co-working space 50 people suddenly turned up – excited and ready to learn.

dar

Kenya

Once the AfricaHackTrips in Kigali and Dar es Salaam had finished, I then travelled on to Nairobi to check out the DEMO Africa conference and meet some of the inspiring startups coming out of Africa.

I also spent a day at the iHub, Nairobi's centre for everything tech. Here I met teams from Ushahidi and BRCK, as well as Akirachix – who are taking underprivileged women from Nairobi, teaching them how to code, and mentoring them into programming jobs.

A developing world

We'd like to continue supporting developer communities throughout Africa and the developing world. If you're putting on a conference, hackathon or meet-up and you're in search of a sponsor, please get in touch through our community page. Or, if you're running an innovative tech space or developer-community project, get in touch and we'll see how we can help!

Disabling old IP addresses

We've made some significant upgrades to the network infrastructure powering GitHub, and it's time to turn off some of the old gear. We've updated DNS records to point at our new IP space, but continue to see a steady trickle of requests to IP addresses long since removed from DNS.

On Tuesday, November 5th 2013, at 12pm Pacific Time, we'll stop serving all HTTP, Git, and SSH requests to IP addresses that aren't returned from DNS queries for the following domains:

  • github.com
  • gist.github.com
  • api.github.com
  • raw.github.com
  • ssh.github.com
  • wiki.github.com
  • assets.github.com

This won't affect you if you don't have any /etc/hosts entries for any of the above domains. However, if you've added github.com or any of the listed domains to /etc/hosts over the last few years, you'll need to remove those entries or GitHub will stop working for you next Tuesday at noon. Take a quick look at your /etc/hosts and/or your Puppet/Chef manifests to make sure you're ready to go!

Please note that our DNS servers are configured to automatically return the IP address of a random, healthy load balancer for queries for the above records. If you have an existing /etc/hosts entry, we highly recommend not replacing it, but rather removing it entirely.

Update: If you're on Windows, you'll want to check %SystemRoot%\system32\drivers\etc\hosts for anything matching github.com. If there are no entries there and you're still seeing a warning on GitHub.com, please send your network administrator a link to this blog post!

Modeling your App's User Session

If you've been keeping an eye on your cookies, you may have noticed some recent changes GitHub has made to how we track your session. You shouldn't notice any difference in session behavior (beyond the new ability to revoke sessions), but we'd like to explain what prompted the change.

Replay attacks on stateless session stores have been known and documented for quite some time in Rails and Django. Using signed cookies for sessions is still incredibly easy to use and scales to high traffic web apps. You just need to understand its limitations. When implementing authentication, simply storing a user ID in the session cookie leaves you open to replay attacks and provides no means for revocation.

The other option is to switch to persisted storage for sessions. Either using a database, memcache or redis. On a high traffic site this may be a performance concern since a session may be allocated even for anonymous browsing traffic. Another downside, there is no clear insight into these sessions. They are stored as serialized objects. So there's no way to query the store to see if a user has any sessions. It's all abstracted away by Rails.

Hybrid Cookie Store / DB approach

After ruling out Rails’ built in method for DB backed sessions, we decided that the concept of user sessions ought to be treated as a first class domain concern. Something with a real application API we can query, test and extend with other app concerns.

The UserSession class is just a normal ActiveRecord class like any other. There's no excess Rails abstraction layer between it. We've extended it with other concerns such as manual revocation, sudo mode tracking and data like IP and user agent to help users identify sessions on the active sessions page.

class UserSession < ActiveRecord::Base
  belongs_to :user

  before_validation :set_unique_key

  scope :active, lambda {
    { :conditions => ["accessed_at >= ? AND revoked_at == NULL", 2.weeks.ago] }
  }

  def self.authenticate(key)
    self.active.find_by_key(key)
  end

  def revoke!
    self.revoked_at = Time.now
    save!
  end

  def sudo?
    sudo_enabled_at > 1.hour.ago
  end

  def sudo!
    self.sudo_enabled_at = Time.now
    save!
  end

  def access(request)
    self.accessed_at = Time.now
    self.ip          = request.ip
    self.user_agent  = request.user_agent
    save
  end

  private
    def set_unique_key
      self.key = SecureRandom.urlsafe_base64(32)
    end
end

Staying true to the restful authentication spirit, SessionsController#create creates a new UserSession and SessionsController#destroy deletes it.

A separate cookie called user_session is set referencing the record unique random key. Only signed in users allocate this record. Anonymous traffic to GitHub never creates junk data in our sessions table.

We still have our signed cookie store around as session in our controllers. This handles non-sensitive data like flash notices and multi-step form state. Then we have a separate user_session helper that references the current user’s session record.

This infrastructure change took a few months. For a month, we ran both the old session code path on this new user session path at once. This allowed users to transition over to the new cookie without noticing.

Overall, we are pretty happy with the change. It has made our authentication logic much more clear and explicit. This opens up some new potential now that we have the data on the server.

Git Internals PDF Open Sourced

Over 5 years ago, shortly after GitHub initially launched, Chris pointed out on one of our earliest blog posts this Peepcode PDF on Git internals that I had just written:

peepcode-git pdf page 1 of 121

Well, today Pluralsight has agreed to open source the book under Creative Commons Attribution-ShareAlike license and the source is on GitHub. You can now download and read this book for free. Get it on its GitHub releases page and maybe learn a bit about how Git works under the covers.

Something went wrong with that request. Please try again.