Even When the Lights Go Out in Government, Data Never Sleeps

Government shutdowns are a costly business. The 2013 shutdown cost $24 billion in lost economic output while the 1996 shutdown resulted in $2.1 billion in government costs. We are yet to learn the impact, if any, of the three-day 2018 shutdown.

But, what we do know is that shutdowns are not universal. For many critical government employees, the lights never go out. Here’s just a shortlist:

Veterans Affairs (VA) remained operational.

A skeleton staff at the Centers for Disease Control (CDC) stayed in place to support its annual seasonal flu program and maintain the agency’s outbreak detection function.

NOAA continued to forecast and monitor weather and the climate, maintain critical charts, and monitor the impacts of the 2010 Deepwater Horizon oil spill, although weather research will cease.

U.S. Geological Survey (USGS) staff who monitor and respond to natural disasters, such as earthquakes, were spared in a bureau that otherwise shudders to a halt.

IRS employees continued to deal with the new tax law and beginning to process tax returns remain on the job.

Many of NASA’s Jet Propulsion Lab (JPL) staff were exempt from furlough.

Military operations remained in place and military personnel continued working.

One Commonality Across Shutdown Agencies – Big Data Never Sleeps

Despite their varying missions, these agencies, and others that continue to keep the lights on in certain functions, have one thing in common – they must still collect and attempt to parse and analyze big data. VA gathered patient data across its hospital network, engineers at JPL were monitoring and measuring International Space Station instrument controls, the USGS Earthquake Hazards Program was still collecting earthquake data, the CDC continued to track data on nationwide flu cases, and so on.

Yet big data insights can be a challenge in a shutdown or sequestered environment.

When the lights are off across the rest of the agency or even in sequestered times, streamlining connections across applications, databases, APIs, and clouds and making that data accessible and relevant to users at any given time is of the essence.

Shutdown or not, search relevancy is a big problem for many agencies dealing with cumbersome and slow data environments. We all know how great Amazon and Google’s search and recommendation engines are, and expect the same in the workplace. But when you look at how complex government data is, gathered from millions of sensors and potentially hundreds of data systems within each agency, achieving a Google-like experience is a monolithic effort.

Another challenge for data analysis is speed, we’ve come to expect search results returned in 100ms or less, a big ask when dealing with huge data sets. Consider this, the U.S. Census Bureau’s legacy data system took seven seconds to return a data query. When you’re trying to do more with less, that’s an unacceptable delay for agencies.

Propel Your Data Insights

But that’s not the situation at the Census Bureau today. With the help of DLT partner, Elastic.io, the Bureau is now using Elasticsearch to easily search and disseminate data.

Elasticsearch is a distributed, RESTful search and analytics engine capable of solving a growing number of use cases. Query structured, unstructured, geo, and metric data with just a simple question, then step back and analyze. It’s one thing to find the 10 best documents to match your query. But how do you make sense of, say, a billion log lines? Elasticsearch aggregations let you zoom out to explore trends and patterns in your data.  And it does this without operational overhead.

Elasticsearch addressed two key challenges for the Bureau. The first was query times. Thanks to query matching, response times were improved by ~25% to achieve their 100ms goal. Relevancy of search results was another key achievement. The Bureau’s legacy system was a mine field of mismatched results. For example, a search for “VA” would fail to delineate between “Veterans Affairs” and “Virginia”. Thanks to the powerful scoring infrastructure available in Elasticsearch, search results are propelled to greater relevance.

Checking the Scalability Box

With such huge data sets and geographic reach, the Census Bureau also needed a solution that would scale easily. Elasticsearch checked this box too. Agencies can go from prototype to production seamlessly; you talk to Elasticsearch running on a single node the same way you would in a 300-node cluster. Elasticsearch can scale horizontally to handle kajillions of events per second, while automatically managing how indices and queries are distributed across the cluster for oh-so smooth operations.

Getting Over the Shutdown Crunch with Fast Data Insights

Forward-leaning agencies who are still gathering and analyzing critical data sets, even with the threat of shutdown and sequestration, can find out more about how they can help make data more accessible and relevant to users in this insightful talk from those who implemented the Census Bureau’s Elasticsearch platform.