Fastest Thread-safe Singleton in Java

Recently, we looked at different ways to implement a thread-safe “lazily initializing” singleton in Java.

The simplest way approach is with ‘synchronized’ — but there were several other suggestions, some right, and some not-so-right (double checked locking).

One approach however, was 25 times faster..

Lazy Singleton

Our basic requirement here is for a singleton within the application (perhaps some service) to be lazily initialized. Generally this could be useful for services which are costly to setup, and only needed sometimes.

We’re not concerned with setup costs here — that’s specific to the service. What we are interested in, is performance cost to access that singleton (ie, get it in a thread-safe way) once it has been setup.

A Variety of Approaches

This task, while fundamentally simple, is interesting in the number of different ways of doing it.

My first approach was that simple is good — use the feature provided by the language, and synchronize on the method. Other people contributed a variety of different approaches.. not all of them necessarily ideal or correct!

  • ‘synchronized’ method
  • AtomicReference fast-path before a ‘synchronized’ section
  • AtomicReference with a spinlock
  • double-checked locking  (not reliable in Java)
  • double-checked locking using a ‘volatile’ field

Using a simple ‘synchronized’ method is obviously simplest, and gives good performance.

AtomicReference is technically simpler & faster than a ‘synchronized’ operation, so can potentially offer some performance benefit — at the cost of complexity. Spinlocks should almost certainly be avoided, though.

Double-checked locking, as the pattern was originally popularized, is unsafe in Java due to constructor code potentially not having been executed when references are stored. JVM optimizations & code reordering are specified to respect “synchronized” boundaries, but double-checked locking skips these. Not recommended. Since 1.5 you can use a ‘volatile’ field to get around this problem, however.

The ‘Inner Class’ Approach

In Java, class initialization is ‘on-demand’ & performed the first time the class is used. Normally, this underlying behavior is of little interest.. But can we use it?

The approach here is to create a ‘holder’ as an inner class, which will statically initialize the singleton.

This pattern is known as the “initialization-on-demand holder” idiom:

public class Example {
    private static class StaticHolder {
        static final MySingleton INSTANCE = new MySingleton();

    public static MySingleton getSingleton() {
        return StaticHolder.INSTANCE;

Calling getSingleton() references the inner class, triggering the JVM to load & initialize it. This is thread-safe, since classloading uses locks.

For subsequent calls, the JVM resolves our already-loaded inner class & returns the existing singleton.  Thus — a cache.

And thanks to the magic of JVM optimizations, a very very efficient one.


Performance benchmarking in Java is a difficult area — requiring warmup, stable conditions, and care to avoid JIT optimizing the benchmark away in its entirety.

For this benchmark, we used 20,000 loops of warmup and measured 10 million loops. To prevent our test code from being optimized away, we used our returned singletons (by summing their hash-codes). I’ve updated the table to also show loop & hashcode overheads, and include a per-operation cost to highlight exactly how efficient Java is at optimizing this form.

The figures:

Technical Approach Total Time Minus Overhead Per Operation
‘synchronized’ method 858 ms 834 ms 83.4 ns
double-checked locking, ‘volatile’ field 39.27 ms 15.79 ms 1.58 ns
inner-class static init 33.4 ms 9.92 ms 0.99 ns
loop & hashcode overhead 23.48 ms 2.35 ns

This is over 25 times faster on our benchmark!

Thanks to the JVM, the inner-class reference, class-loading & thread-safety are all JIT’d away. All that is left for the CPU to execute, is essentially a memory read from the static field.

On our 2.4 GHz test CPU, the ‘synchronized’ method — without thread contention — required 206 cycles. By comparison, a loop iteration & ‘inner class’ singleton can be accessed in just 8 CPU cycles.

This pattern is singleton-specific, and not really helpful for a map-based cache. But for singleton services — is it fast, or what! Kudos to the JVM developers & those who came up with this technique.

What do you think of this approach? Add your comment now.

- Java Concurrency blog:  Double-checked locking
- Wikipedia: Initialization-on-demand holder idiom

13 thoughts on “Fastest Thread-safe Singleton in Java”

  1. There’s no apostrophe in “its” (when used as the possessive of it), so change:
    “in it’s entirety” to “in its entirety” and “on it’s own” to “on its own”. “It’s” is a contraction of “it is”.

  2. Hey u said what is right, I am searching for job, thats why am not concentrate on blog…k. Thanks for giving me advice….. what u r provide is really good.


  3. In Java 5.0 and above it is enough for a singleton to do this:

    public class MySingleton {
        private final static MySingleton INSTANCE = new MySingleton();
        private MySingleton() {
        public static MySingleton getInstance() {
            return INSTANCE;

    Have you tested how fast this implementation is?

    1. Good question, Kovi! Speed will be equivalent, but using the inner class provides the “lazy initialization” pattern. Not using the inner-class “holder” may allow class initialization to be triggered early & lose laziness.

      Spring uses the inner-class strategy pattern quite commonly to break classloading dependency on eg. optional libraries. This is effective to avoid component-scans & other early activity from triggering premature loading — so, for lazy use, I definitely prefer the inner “holder” pattern.

  4. Why don’t we go with enums for Singleton? I believe for JAVA 5+ this is the best approach as mentioned in EJ by Joshua.

    1. Enum on it’s own is not always lazy. You can use an enum as the inner class though, but this is equivalent to the “inner class” approach described above.

    1. Big conclusion is, that the JVM implements Class singleton for us — and is very efficient at doing so.

      Benchmark takes basic microbenchmarking factors into account, though not using that framework. We’ll definitely consider using JMH for future benchmarks. Thanks for the suggestion!

Leave a Reply to Foo bar Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>