Why and How to Shed Light on Your Agency’s Dark Data

“Dark data” – by that we mean data that’s been stored away in some IT system and untouched for years, is about to come out of the shadows.

Thanks to big data technologies it’s become easier than ever to extract new business value from information assets that haven been collected, processed and stored away (either because they’d served their purpose or were required to be kept for compliance reasons).

Not using, analyzing, or even accessing that data not only inflates storage costs, it could also leave that data vulnerable to the mismanagement of sensitive data.

Could your agency be sitting on a data goldmine – and you don’t even know it?

In The Data Goldmine that’s Hiding in Plain Sight, DLT partner, Informatica, offers four reasons why organizations struggle to use all the data at their dark data:

  • They don’t realize the data exists – Too often business teams attempting to answer difficult questions or improve the way they work, avoid the challenge of seeking out and analyzing datasets they aren’t familiar with. But if you were to store this data in a way that is accessible to the whole organization, more teams would be empowered to make better-informed decisions and test more hypotheses.
  • They know where it exists, but don’t know where – If your organizational structure or data architecture gets in the way, accessing data is made difficult.
  • It’s too expensive to use the data – The additional costs involved in processing data on legacy systems or replicating the data on cheaper hardware using new software frameworks like Hadoop is often too high for a single project.
  • There are legitimate compliance concerns about some of the data – One of the most important reasons that certain data isn’t made available to analytics tools is the fear of security breaches and concerns about regulatory compliance (such as HIPAA). However, with clearly defined processes and tools in place, the security and in some cases the anonymity of that data can be guaranteed.

So how do agencies shine a light on data data?

Opening up dark government data, without proper controls can have significant impacts – both good and bad. So what’s the best approach?

In his blog, Dark Data in Government: Sounds Sinister, Bobby Caudill, Informatica’s Director of Public Sector Marketing, offers up some initial tips for getting you thinking in the right direction. Here are his suggestions:

  • Begin with the end in mind – identify quantitative business benefits of exposing certain dark data.
  • Determine what’s truly available – perform a discovery project – seek out data hidden in the corners of your agency – databases, documents, operational systems, live streams, logs, etc.
  • Create an extraction plan – determine how you will get access to the data, how often does the data update, how will handle varied formats?
  • Ingest the data – transform the data if needed, integrate if needed, capture as much metadata as possible (never assume you won’t need a metadata field, that’s just about the time you will be proven wrong).
  • Govern the data – establish standards for quality, access controls, security protections, semantic consistency, etc. – don’t skimp here, the impact of bad data can never really be quantified.
  • Store it – it’s interesting how often agencies think this is the first step
  • Get the data ready to be useful to people, tools and applications – think about how to minimalize the need for users to manipulate data – reformatting, parsing, filtering, etc. – to better enable self-service.
  • Make it available – at this point, the data should be easily accessible, easily discoverable, easily used by people, tools and applications.

Read more about what’s possible from shining the light on dark data, read The Data Goldmine that’s Hiding in Plain Sight.