Battle in the clouds: The NoSQL movement

By Mia Cathell

As consumers, we’ve kept our heads out of the digital cloud. Unknown to us, there’s a war raging above us over how that data is structured.

We don’t care as long as our cloud-native applications are running — faster, better. But this relatively new way to structure data has rocked the computer science world with major implications for data security. But now that the coronavirus pandemic shuttered businesses and forced people to work from home, we’re relying on the cloud more than ever.

“[T]he COVID-19 outbreak rapidly pushed demand for cloud services,” Microsoft sent via email.

For decades, relational databases have been a traditional way to organize data, accessed by Structured Query Language or SQL — a beneath-the-hood interactive language that acts as a middle man to retrieve data by queries. As you’re reading this article, you might have a few tabs open on your device for banking, online shopping, and emailing; these web applications probably use relational databases.


Screenshots of an example database demonstrating how a SELECT statement selects all columns, denoted by the asterisk, FROM the People table, specifying the WHERE condition. Credit: Mia Cathell

Then a new player emerged: non-relational databases that do not require SQL (pronounced “sequel”) to operate — also known as cloud-friendly NoSQL databases. Because people want information right here and now, non-relational databases took center stage.

Leading tech companies have dipped their toes in NoSQL waters. Facebook, Twitter, and Google have adopted NoSQL systems to process absorbent amounts of unstructured data.

In 2018, the NoSQL market size was valued at $2,410.5 million and is projected to reach $22,087 million by 2026, according to statistics by Allied Market Research.

This critical data-intensive time accelerated the world’s adoption to the cloud as people sought to stay connected, itching for the newest information about a novel virus, Microsoft explained via email.

“From Microsoft’s perspective, we want to make sure that our technologies can help people adapt to this new world of living and learning and working online,” said Microsoft’s chief technology officer Mark Russinovich. Microsoft just updated its Azure Cosmo DB at the Microsoft Build 2020 conference in May.

Couchbase Server is among the NoSQL champions with a growing cloud footprint, a self-described early pioneer of the movement. Amid the COVID-19 pandemic, Couchbase announced in May that hopeful investors funded $105 million for its latest product development. The Couchbase Cloud debuted in June, launching this year’s unveiled beta into the cloud battleground.

Customers are evolving, too, and moving to the cloud. “The users of these applications expect millisecond response times regardless of where they are or what device they are using,” noted Couchbase’s spokesperson Christina Knittel.

It’s OK if programming languages are all Greek to you, but NoSQL’s debated Achilles heel can have negative impacts for consumers.

NoSQL software developers have chosen scalability — a quick business decision to optimize the most bang for their bucks — even if it cuts programming corners and leads to misconfigured applications and data breaches.

Over the last few years, a string of data breaches all pinpointed to one type of database: open source NoSQL — and notoriously the NoSQL-based MongoDB database program, according to WIRED. Anyone can implement open source code that is readily available and freely distributed. That means open source database programs like MongoDB don't have control over how users set up and secure their NoSQL databases.

This gap has allowed for historic fallouts. Memorable database breaches include the 2015 MacKeeper leak that exposed over 13 million customers’ usernames, passwords and other private information. A year later, this same lead MacKeeper security researcher discovered another misconfigured MongoDB database containing the full names, addresses, birthdays, and voter registration numbers for all 93.4 million Mexican voters hosted on Amazon. That same month, hackers stole the user data of 1.1 million people from the dating website BeautifulPeople.com. These are just publicized hacks.

And worse, the attacks haven’t stopped; they’ve evolved. In 2017, a Bitcoin ransom data thief launched more than a week-long attack against unprotected MongoDB databases, deleting data files and asking for Bitcoin ransom worth a few hundred dollars for their restoration.

It’s also a tradeoff between NoSQL’s speed and simplicity for the guarantees of SQL. Relational databases support accuracy, while non-relational databases support innovation, which is about flexibility in design.

"NoSQL is not exactly a subject of study like SQL that you can teach a course on,” said Peter Alvaro, a computer science professor at the University of California, Santa Cruz.

As an academic and a database researcher, Alvaro admitted that he’s a skeptic of the NoSQL Movement.

Alvaro described SQL as a general purpose Swiss army knife, while NoSQL is about “finding the right tool for the right job.” The latter requires a specialized tool to fit a demand-specific job — like a hammer to pound in a nail.

"Engineers like to build new things more than they like to use existing things," Alvaro stated.

Even if the world has an abundance of databases, Alvaro said programmers are tasked with the dilemma: Should we use an existing relational database or invent one from scratch?

This new generation of database management won’t be leaving the old relational folk in the dust just yet.

“I do believe that NoSQL — interpreted as ‘relaxing the many rigid restrictions of SQL’ — has a definite future,” said Michael Carey, a database thought leader and the consulting chief architect for Couchbase.

However, SQL will live “forever,” Carey continued. “It has a solid place in the world.”

Alvaro also sees a future inhabited by special-purpose databases living alongside their matured predecessors. So perhaps, it’s not such a catch-all between the two database structures.

To pit SQL and NoSQL as “enemies” is a marketplace mindset held by the sharks that swim and feast on competition. Rather, the two communities have influenced each other, Alvaro explained.

"Even though they were set up as opposites as each other, they have gone on to cross-pollinate each other,” Alvaro said.

As these specialized databases have features added to them, they begin to resemble SQL. And some systems might support SQL-like query languages. Others adopt SQL on top of their non-relational code, merging into hybrid models.

Alvaro teaches his students that NoSQL is a "historical movement” and “a phenomenon that has changed the course of databases.”

Mia Cathell is a senior at Boston University majoring in journalism with a minor in computer science. Her data-driven work is published in the Boston Globe. She is a reporter for The Post Millennial, the co-founder of the student-run political news show Government Center, and a producer of the award-winning BUTV10 newscast “Primary Focus.” Follow her on Twitter @MiaCathell or email her at miacat@bu.edu.

This story was produced as part of NASW's David Perlman Summer Mentoring Program, which was launched in 2020 by our Education Committee. Cathell was mentored by Erin Ross.

Hero image by Wynn Pointaux from Pixabay.


Mia Cathell

Erin Ross

ADVERTISEMENT
Knight Science Journalism @MIT

ADVERTISEMENT
Stanford Center for Biomedical Ethics