6 Steps to Turn Data into Revenue

Blog Project work

Who’s posting: Stephanie Burton

Which Company: Product Marketing Manager

Post was about: this post firstly sets out that companies that utilize business intelligence historically move from “reactive” to “proactive” use of analytics—first using data to instigate internal changes that impact company efficiencies before eventually launching customer-facing analytics offerings that create growth. It then sets out a recommended step-by-step approach to data monetization based on serving the needs of customers while creating additional revenue streams.

What did I get from the post: When looking to deliver analytics to customers, a company can embed their analytical creations in software-as-a-service (saas) products or create company-branded data portals. This provides an add-on service to the customer while also generating an additional revenue stream for the company.


6 Steps to Turn Data into Revenue

Stephanie Burton’s picture Stephanie Burton Twitter Google+

Product Marketing Manager

DECEMBER 17, 2014

The impact of analytics is always expanding.  In just a few years, companies that once didn’t know what to do with the data they collected about their customers are now utilizing that data  to build and deliver customer-facing advanced analytics offerings that have become new streams of revenue.


You could be sitting on treasure troves of data you may have not even noticed before. According to recent CITO Research, companies that utilize business intelligence historically move from “reactive” to “proactive” use of analytics—first using data to instigate internal changes that impact company efficiencies before eventually launching customer-facing analytics offerings that create growth.


The 4 Stages of Analytics


Reactive Analytics: In this stage, a company may collect data about how customers use its products, but doesn’t engage customers with this data beyond alerting people to problems.


Descriptive Analytics: By visualizing data, you’ll start to uncover patterns and trends. Through visibility into business operations, you’ll start to understand current situations, see where problems are occurring, and know where to focus your attention.


Diagnostic Analytics: When you start to recognize the value of your data, it’s natural to want to share this value with clients. Whether this happens as an epiphany or a request, this stage leads to ‘productization.’


Proactive Analytics: In this stage, companies evaluate their situation to see what new revenue can be generated by converting an analytics platform into a new product or embedding analytics into existing products.


A Roadmap to Data Monetization — Growing a Cash Crop with Your Data


“It can be difficult to determine which data sets offer the most value to customers,” cautions CITO Research. To help determine which will prove to be the most lucrative, here is a recommended step-by-step approach to data monetization based on serving the needs of customers while creating additional revenue streams.


Step 1: Perform business intelligence internally.

Look at the usage of your systems and products. Create dashboards for internal use so you can analyze who’s using the most of your products and services. Determine anomalies and how you should change things based on what you are learning.


Step 2: Share visibility with your clients and partners.

Embed dashboards and analytics to provide some basic visibility free of charge to customers. This provides visibility into their own usage and helps spark the beginning of data monetization.


Step 3: Add self-service or extensibility.

You may not realize how valuable your data is until you see others using it. Based on your free offerings, customers will begin to ask for new views, new angles, and may ask to white label your dashboards for their own users or stakeholders. Your answer is “Yes, for a fee.”


Step 4:  Look at information you can aggregate.

You have benchmarking data that customers want—information on how they are performing compared with others. With no other objective way to gain this type of information, the aggregate data you can provide becomes extremely valuable to customers and partners.


Step 5: Find ways to personalize the data.

The more specific and personalized you can make the analytics you deliver, the better. Consider mixing in external data sources like geodata, address enhancement, machine, weather, demographic, or business data to enrich the data you already have.


Step 6: Keep listening to your customers.

As your customers request new types of analytics, such as churn analysis, internal activity reports, or longer views of historical information, you’ll get ideas for additional data products.


If you’re looking to deliver analytics to your customers, you can embed your analytical creations in software-as-a-service products or create company-branded data portals. The Ultimate Guide to Embedded Analytics provides a plan for taking the GoodData platform and building amazing data products that will solve mission-critical business problems for your users.

The Data Science Puzzle, Explained

Blog Project work

Who’s posting: Matthew Mayo, KDnuggets

Which Company: N/A

Post was about: an opinion piece focusing on an examination of the data science field through the broad definition of key concepts around Data Science, Big Data, Data Mining, Machine Learning, Artificial Intelligence and Deep Learning


What did I get from the post: This post enabled me to better visualise the interdependencies and otherwise between Data Science, Big Data, Data Mining, Machine Learning, Artificial Intelligence and Deep Learning. How the processes and tools associated with each assist in the extraction of potentially valuable patterns held within datasets.

The Data Science Puzzle, Explained

The puzzle of data science is examined through the relationship between several key concepts in the data science realm. As we will see, far from being concrete concepts etched in stone, divergent opinions are inevitable; this is but another opinion to consider.

By Matthew Mayo, KDnuggets.


There is no dearth of articles around the web comparing and contrasting data science terminology. There are all sorts of articles written by all types of people relaying their opinions to anyone who will listen. It’s almost overwhelming.

So let me set the record straight, for those wondering if this is one of those types of posts. Yes. Yes it is.

Why another one? I think that, while there may be an awful lot of opinion pieces defining and comparing these related terms, the fact is that much of this terminology is fluid, is not entirely agreed-upon, and, frankly, being exposed to other peoples’ views is one of the best ways to test and refine your own.

So, while one may not agree entirely (or even minimally) with my opinion on much of this terminology, there may still be something one can get out of this. Several concepts central to data science will be examined. Or, at least, central in my opinion. I will do my best to put forth how they relate to one another and how they fit together as individual pieces of a larger puzzle.

As an example of somewhat divergent opinions, and prior to considering any of the concepts individually, KDnuggets’ Gregory Piatetsky-Shapiro has put together the following Venn diagram which outlines the relationship between the very same data science terminology we will be considering herein. The reader is encouraged to compare this Venn diagram with Drew Conway’s now famous data science Venn diagram, as well as my own discussion below and modified process/relationship diagram near the bottom of the post. I think that, while differences exist, the concepts line up with some degree of similarity (see the previous few paragraphs).

We will now give treatment to the same 6 selected core concepts as depicted in the above Venn diagram, and provide some insight as to how they fit together into a data science puzzle. First, we quickly dispense with one of the biggest buzz terms of the past decade.

Big Data

There are all sorts of articles available defining big data, and I won’t spend much time on this concept here. I will simply state that big data could very generally be defined as datasets of a size “beyond the ability of commonly used software tools to capture, manage, and process.” Big data is a moving target; this definition is both vague and accurate enough to capture its central characteristic.

As for the remaining concepts we will investigate, it’s good to gain some initial understanding of their search term popularities and N-gram frequencies, in order to help separate the hard fact from the hype. Given that a pair of these concepts are relatively new, the N-gram frequencies for our ‘older’ concepts from 1980 to 2008 are shown above.


The more recent Google Trends show the rise of 2 new terms, the continued upward trend of 2 others, and the gradual, but noticeable, decline of the last. Note that big data was not included in the above graphics due to it already being quantitatively analyzed to death. Read on for further insights into the observations.

Machine Learning

According to Tom Mitchell in his seminal book on the subject, machine learning is “concerned with the question of how to construct computer programs that automatically improve with experience.” Machine learning is interdisciplinary in nature, and employs techniques from the fields of computer science, statistics, and artificial intelligence, among others. The main artifacts of machine learning research are algorithms which facilitate this automatic improvement from experience, algorithms which can be applied in a variety of diverse fields.

I don’t think there is anyone who would doubt that machine learning is a central aspect of data science. I give the term data science detailed treatment below, but if you consider that at a very high level its goal is to extract insight from data, machine learning is the engine which allows this process to be automated. Machine learning has a lot in common with classical statistics, in that it uses samples to infer and make generalizations. Where statistics has more of a focus on the descriptive (though it can, by extrapolation, be predictive), machine learning has very little concern with the descriptive, and employs it only as an intermediate step in order to be able to make predictions. Machine learning is often thought to be synonymous with pattern recognition; while that really won’t get much disagreement from me, I believe that the term pattern recognition implies a much less sophisticated and more simplistic set of processes than machine learning actually is, which is why I tend to shy away from it.

Machine learning has a complex relationship with data mining

Data Mining

Fayyad, Piatetsky-Shapiro & Smyth define data mining as “the application of specific algorithms for extracting patterns from data.” This demonstrates that, in data mining, the emphasis is on the application of algorithms, as opposed to on the algorithms themselves. We can define the relationship between machine learning and data mining as follows: data mining is a process, during which machine learning algorithms are utilized as tools to extract potentially-valuable patterns held within datasets.

Data mining, as a sister term of machine learning, is also critical to data science. Before the explosion of the term data science, in fact, data mining enjoyed much greater success as a Google search term. Having a look at Google Trends stretching back a further 5 years than those shown in the above graphic, data mining was once much more popular. Today, however, data mining seems to be split as a concept between machine learning and data science itself. If one was to endorse the above explanation, that data mining is a process, then it makes sense to view data science as both a superset of data mining as well as a successor term.

Deep Learning

Deep learning is a relatively new term, although it has existed prior to the dramatic uptick in online searches of late. Enjoying a surge in research and industry, due mainly to its incredible successes in a number of different areas, deep learning is the process of applying deep neural network technologies – that is, neural network architectures with multiple hidden layers – to solve problems. Deep learning is a process, like data mining, which employs deep neural network architectures, which are particular types of machine learning algorithms.

Deep learning has racked up an impressive collection of accomplishments of late. In light of this, it’s important to keep a few things in mind, at least in my opinion:

  • Deep learning is not a panacea – it is not an easy one-size-fits-all solution to every problem out there
  • It is not the fabled master algorithm – deep learning will not displace all other machine learning algorithms and data science techniques, or, at the very least, it has not yet proven so
  • Tempered expectations are necessary – while great strides have recently been made in all types of classification problems, notably computer vision and natural language processing, as well as reinforcement learning and other areas, contemporary deep learning does not scale to working on very complex problems such as “solve world peace”
  • Deep learning and artificial intelligence are not synonymous

Deep learning can provide an awful lot to data science in the form of additional processes and tools to help solve problems, and when observed in that light, deep learning is a very valuable addition to the data science landscape.

Artificial Intelligence

Most people find a precise, and often times even a broad, definition of artificial intelligence difficult to put their finger on. I am not an artificial intelligence researcher, and so my answer here may wildly differ from someone who is, or may even upset folks in other fields. I have philosophized on the idea of AI a lot over the years, and I have come to the conclusion that artificial intelligence, at least the concept of it which we generally think of when we do think of it, does not actually exist.

In my opinion, AI is a yardstick, a moving target, an unattainable goal. Whenever we get on a path toward AI achievements, somehow these accomplishments seem to morph into being referred to as something else.

I once read something like the following: If you asked an AI researcher in the 1960s what their idea of AI was, they would probably agree that a small device that fit in our pockets, which could help anticipate our next moves and desires, and had the entirety of human knowledge readily available at will, there would probably be consensus that said device was true AI. But we all carry smartphones today, a very few of us would refer to them as artificial intelligence.

Where does AI fit into data science? Well, as I have stated that I don’t believe that AI is really anything tangible, I guess it’s hard to say that it fits in anywhere. But there are a number of areas related to data science and machine learning where AI has provided motivation, which at times is just as valuable as the tangible; computer vision certainly comes to mind, as does contemporary deep learning research, which have both benefited from the Artificial Intelligence Ethos at some point, if not indefinitely.

AI may well be the research and development apparatus with the deepest pockets which never actually produces anything in its namesake industry. While I would say that drawing a straight line from AI to data science may not be the best way to view the relationship between the 2, many of the intermediary steps between the 2 entities have been developed and refined by AI in some form.

Data Science

So, after discussing these related concepts and their place within data science, what exactly isdata science? To me, this is the toughest concept of the lot to attempt to define precisely. Data science is a multifaceted discipline, which encompasses machine learning and other analytic processes, statistics and related branches of mathematics, increasingly borrows from high performance scientific computing, all in order to ultimately extract insight from data and use this new-found information to tell stories. These stories are often accompanied by pictures (we call them visualizations), and are aimed at industry, research, or even just at ourselves, with the purpose of gleaning some new idea from The Data.

Data science employs all sorts of different tools from a variety of related areas (see everything you’ve read above here). Data science is both synonymous with data mining, as well as a superset of concepts which includes data mining.

Data science yields all sorts of different outcomes, but they all share the common aspect of insight. Data science is all of this and more, and to you it may be something else completely… and we haven’t even covered acquiring, cleaning, wrangling, and pre-processing data yet! And by the way, what even is data? And is it always big?

I think my idea of the data science puzzle, at least, the version of it which can be represented by the above diagram, jives well with Piatetsky-Shapiro’s Venn diagram at the top of this post. I would also suggest that it is also mostly in agreement with with Drew Conway’s data science Venn diagram, though I would add one caveat: I believe his very well-reasoned and useful graphic is actually referring to data scientists, as opposed to data science. This may be splitting hairs, but I don’t think the { field | discipline | concept } of data science, itself, encompasses hacking skills; I believe this is a skill that scientists possess in order to allow the to do data science. Admittedly, this may be quibbling over semantics, but it makes sense in my mind.

Of course, this is not a full picture of the landscape, which is constantly evolving. For example, I recall reading, not very long ago, that data mining was a sub-field of business intelligence! Even with differences in opinions, I really can’t imagine this being a valid idea today (it was difficult to accept a few years ago, to be honest).

And there you have it: some of your favorite terms bent out of shape in new ways you won’t forgive me for. If you’re furious right now and can’t wait to tell me how wrong I am, remember the point of this post: you have just read one man’s opinion. In that spirit, feel free to sound off in the comments with your (potentially heated and sharply) contrasting views. Otherwise, I hope that this has either exposed new readers to the puzzle which is data science or forced them to look at their own version of this puzzle in their heads.

Top 10 Data Analysis Tools for Business

Blog Project work

Who’s posting: Alex Jones

Which Company: Graduate student in Business Analytics

Post was about: List of what Alex Jones considers the Top 10 Data analysis tools for Business. He chose these because of their free availability (for personal use), ease of use (no coding and intuitively designed), powerful capabilities (beyond basic excel), and well-documented resources 

What did I get from the post: This piece presented me with a broad list of the various tools currently in operation in the data analysis field (some I was aware, some not). It presented a high level summary of the strengths and weaknesses of these tools which I found informative and also curious to investigate some of these tools in more detail.

Top 10 Data Analysis Tools for Business

Ten free, easy-to-use, and powerful tools to help you analyze and visualize data, analyze social networks, do optimization, search more efficiently, and solve your data analysis problems.


By Alex Jones, June 2014.

As a graduate student in Business Analytics, I have worked the better part of a year to become a predictive analytics architect. While the skills I have developed have been invaluable, taking a year of computer science, advanced mathematics, engineering and business classes, is simply not feasible for most people.

Although the challenge of collecting and analyzing “Big Data” requires some complex and technical solutions, the fact is, that most businesses do not realize what they are currently capable of.

Specifically, there are a number of exceptionally powerful analytical tools that are free and open source that you can leverage today to enhance your business and develop skills that can genuinely propel your career.

Rather than just leave you to navigate the frightening and giant world of IT tools and software, I have put together a list of what I see as the Top 10 Data analysis tools for Business. I picked these because of their free availability (for personal use), ease of use (no coding and intuitively designed), powerful capabilities (beyond basic excel), and well-documented resources (if you get stuck, you can Google your way through).

  1. Tableau Public: Tableau democratizes visualization in an elegantly simple and intuitive tool. It is exceptionally powerful in business because it communicates insights through data visualization. Although great alternatives exist, Tableau Public’s million row limit provides a great playground for personal use and the free trial is more than long enough to get you hooked. In the analytics process, Tableau’s visuals allow you to quickly investigate a hypothesis, sanity check your gut, and just go explore the data before embarking on a treacherous statistical journey.
  2. OpenRefine: Formerly GoogleRefine, OpenRefine is a data cleaning software that allows you to get everything ready for analysis. What do I mean by that? Well, let’s look at an example. Recently, I was cleaning up a database that included chemical names and noticed that rows had different spellings, capitalization, spaces, etc that made it very difficult for a computer to process. Fortunately, OpenRefine contains a number of clustering algorithms (groups together similar entries) and makes quick work of an otherwise messy problem.
    **Tip- Increase Java Heap Space to run large files (Google the tip for exact instructions!)
  3. KNIME: KNIME allows you to manipulate, analyze, and modeling data in an incredibly intuitive way through visual programming. Essentially, rather than writing blocks of code, you drop nodes onto a canvas and drag connection points between activities. More importantly, KNIME can be extended to run R, python, text mining, chemistry data, etc, which gives you the option to dabble in the more advanced code driven analysis.
    **TIP- Use “File Reader” instead of CSV reader for CSV files. Strange quirk of the software.
  4. RapidMiner: Much like KNIME, RapidMiner operates through visual programming and is capable of manipulating, analyzing and modeling data. Most recently, RapidMiner won KDnuggets software poll, demonstrating that data science does not need to be a counter-intuitive coding endeavor.
  5. Google Fusion Tables: Meet Google Spreadsheets cooler, larger, and much nerdier cousin. Google Fusion tables is an incredible tool for data analysis, large data-set visualization, and mapping. Not surprisingly, Google’s incredible mapping software plays a big role in pushing this tool onto the list. Take for instance this map, which I made to look at oil production platforms in the Gulf of Mexico. With just a quick upload, Google Fusion tables recognized the latitude and longitude data and got to work.
  6. NodeXL: NodeXL is a visualization and analysis software of networks and relationships. Think of the giant friendship maps you see that represent linkedin or Facebook connections. NodeXL takes that a step further by providing exact calculations. If you’re looking for something a little less advanced, check out the node graph on Google Fusion Tables, or for a little more visualization try out Gephi.
  7. Import.io: Web scraping and pulling information off of websites used to be something reserved for the nerds. Now with Import.io, everyone can harvest data from websites and forums. Simply highlight what you want and in a matter of minutes Import.io walks you through and “learns” what you are looking for. From there, Import.io will dig, scrape, and pull data for you to analyze or export.
  8. Google Search Operators: Google is an undeniably powerful resource and search operators just take it a step up. Operators essentially allow you to quickly filter Google results to get to the most useful and relevant information. For instance, say you’re looking for a Data science report published this year from ABC Consulting. If we presume that the report will be in PDF we can search

“Date Science Report” site:ABCConsulting.com Filetype:PDF
then underneath the search bar, use the “Search Tools” to limit the results to the past year. The operators can be even more useful for discovering new information or market research.

  1. Solver: Solver is an optimization and linear programming tool in excel that allows you to set constraints (Don’t spend more than this many dollars, be completed in that many days, etc). Although advanced optimization may be better suited for another program (such as R’soptim package), Solver will make quick work of a wide range of problems.
  2. WolframAlpha: Wolfram Alpha’s search engine is one of the web’s hidden gems and helps to power Apple’s Siri. Beyond snarky remarks, Wolfram Alpha is the nerdy Google, provides detailed responses to technical searches and makes quick work of calculus homework. For business users, it presents information charts and graphs, and is excellent for high level pricing history, commodity information, and topic overviews.

One of my favorite data related quotes is:

Data matures like wine, applications like fish

—James Governor, Founder of Redmonk.

Although these tools make analysis easier, they’re only as valuable as the information put in and analysis that you conduct. So take a moment to learn a few new tricks, challenge yourself, and let these tools enhance and complement the logic and reasoning skills that you already have.

How to tell a great analyst from a good analyst

Who’s posting: Quandl commentary

Which Company: Quandl, a company committed to delivering financial and economic data in the format that analysts need

Post was about: What separates a great analyst from a good analyst.

What did I get from the post: This post highlights the extra steps needed to truly understand the patterns emerging from the review but also how this information is presented to the key decision makers to ensure action ensues. Utilising all the current and emerging tools but also critically examining the results with a questioning eye to sense check that a great analyst presents the true outcome rather than proving a pre-conceived idea.

How to tell a great analyst from a good analyst

Good analyst help businesses to stay in the competition, but great analyst sets the business apart from its competition. Learn more about how to be a great analyst by walking that extra mile.


The following insights are courtesy of Quandl, a company committed to delivering financial and economic data in the format that analysts need. Their collections offering can help data analysts save time and effort. Financial analysts will find that in addition to the following insights, Quandl’s futures collection can be a great resource.

1) A good analyst looks for answers, a great analyst reveals the truth.

A great analyst goes beyond developing reports. She digs deeper — she doesn’t just find out what the number is, she figures out why it is that way. She asks the bigger questions that lead to actionable insights and drive strategy.


2) A good analyst is detail-oriented, a great analyst is a master of nuance.
Good analysts can spot minute details and subtle patterns, but first-rate analysts can also place those nuances within the bigger picture. They immerse themselves in the data, but they don’t get lost in it. Because they’re plugged into the wider strategy, they’re better at knowing what to focus on and what to set aside.


3) A good analyst is analytical, a great analyst is also synthetical.
If analysis is reverse engineering, synthesis is engineering. It’s the ability to build things, to combine data points, patterns, and themes into a coherent story. Good analysts can take a number and deconstruct it into its most minute components. A synthesizer can create a unifying pattern for those data points and patterns.


4) A good analyst is dubious, a great analyst is an outright sceptic.
While decent analysts guard against their own biases, great analysts enjoy questioning their conclusions. They will seek out devil’s advocates, inviting colleagues to scrutinize any beliefs. Because no process is perfect, they don’t try to hide the flaws of their approach. Instead, they try to expose them.


5) A good analyst presents insights, a great analyst tells stories.
Great analysts know how to make their findings digestible to a wide variety of audiences. They can help any part of the organization understand why the data is meaningful. They’re not just number crunchers, they have the ability to make people believe in the results. Insights are only meaningful if they inspire action. Ultimately, great analysts drive strategy.

Top 10 Threats to SME Data Security

Blog Project work

Who’s posting: Scott Pinzon

Which Company: WatchGuard Live Security team

Post was about: this paper lists the top 10 most common areas of data compromise for Small and Medium-sized Enterprises (SME’s), whilst also offering practical techniques and defences to counter each threat.

What did I get from the post: The post provided a relevant listing of key threats in the data security field for SME’s and practical pointers around how these threats can be mitigated against.

Top 10 Threats to SME Data Security

The following article is excerpted from “Top 10 Threats to SME Data Security (and what to do about them),” a white paper written by Scott Pinzon, CISSP, of the WatchGuard® LiveSecurity® team.

This summary lists the ten threats and one countermeasure for each. For more details on how we selected the threats we did, what type of network we are addressing, and at least two more countermeasures for each threat, download a free copy of the full white paper at www.watchguard.com/whitepapers. It’s difficult to find reality-based, accurate reporting on what the network security threat really is today, for the average business. Since 1999, the WatchGuard LiveSecurity team has monitored emerging network security threats every day, with a special focus on issues that affect small to medium sized enterprises (SMEs). When we spot an issue that could impact SMEs negatively, we alert our subscribers with email broadcasts. Because our subscribers are time-constrained, overworked IT professionals, we alert only when we know an attack is not merely feasible, but likely. This emphasis on business context and practicality makes our service nearly unique. This approach is constantly refined by input from our tens of thousands of subscribers, field trips to customer sites, focus groups, and “security over beer” bull sessions.

The result: this paper lists the top 10 most common vectors of data compromise in our experience as security analysts for SMEs. We also suggest practical techniques and defences to counter each vector.

Threat # 10: Insider attacks

Verizon’s Intrusion Response Team investigated 500 intrusions in 4 years and could attribute 18% of the breaches to corrupt insiders. Of that 18%, about half arose from the IT staff itself.

Implement the principle of dual control.

Implementing dual control means that for every key resource, you have a fall back. For example, you might choose to have one technician primarily responsible for configuring your Web and SMTP servers. But at the very least, login credentials for those servers must be known or available to another person.


Threat # 9: Lack of contingency

Businesses that pride themselves on being “nimble” and “responsive” oftentimes achieve that speed by abandoning standardization, mature processes, and contingency planning. Many SMEs have found that a merely bad data failure or compromise turns disastrous when there is no Business Continuity Plan, Disaster Recovery Plan, Intrusion Response Policy, up-to-date backup system from which you can actually restore, or off-site storage.

Mitigation for lack of planning

Certainly, if you have budget for it, hire an expert to help you develop sound information assurance methodologies. If you don’t have much money to work with, leverage the good work others have done and modify it to fit your organization. The SANS Security Policy Project offers free templates and other resources that can help you write your own policies. For more, visit http://www.sans.org/resources/policies/.


Threat # 8: Poor configuration leading to compromise

Inexperienced or underfunded SMEs often install routers, switches, and other networking gear without involving anyone who understands the security ramifications of each device. In this scenario, an amateur networking guy is just happy to get everything successfully sending data traffic back and forth. It doesn’t occur to him that he should change the manufacturer’s default username and password login credentials.

Mitigation for poor configuration choices

Perform an automated vulnerability audit scan. If you can’t afford to hire consultants, you probably can afford a one-time, automated scan of your network. There are many, many “vulnerability management” products on the market at all price points. Regular use of them should be part of your network maintenance routine. 1 Summarized at http://www.infosectoday.com/Articles/2008_Data_Breach_Investigations_Report.htm. For a PDF of the report, visit http://www.verizonbusiness.com/resources/security/databreachreport.pdf. WatchGuard Technologies www.watchguard.com www.watchguard.com page 2

Threat # 7: Reckless use of hotel networks and kiosks

Hotel networks are notoriously lousy with viruses, worms, spyware, and malware, and are often run with poor security practices overall. Public kiosks make a convenient place for an attacker to leave a keylogger, just to see what falls into his net. Laptops that don’t have up-to-date personal firewall software, anti-virus, and anti-spyware can get compromised on the road. Traditional defenses can be rendered useless when the user literally carries the laptop around the gateway firewall, and connects from inside the Trusted zone.

Mitigating reckless use of hotel networks

Set and enforce a policy forbidding employees from turning off defenses. According to a survey commissioned by Fiberlink, 1 in 4 “road warriors” admitted to altering or disabling security settings on their laptops. Your policy should be that workers are never to turn off defenses unless they call and receive authorization from you. Many popular anti-virus solutions can be configured so that they cannot be turned off, even by a user with local administrator privileges; check for such capabilities in your current solution.

Threat # 6: Reckless use of Wi-Fi hot spots

Public wireless hot spots carry all the same risks as hotel networks — and then some. Attackers commonly put up an unsecured wireless access point which broadcasts itself as “Free Public WiFi.” Then they wait for a connection-starved road warrior to connect. With a packet sniffer enabled, the attacker can see everything the employee types, including logins. This attack is particularly nefarious because the attacker pulls the data out of the air, leaving absolutely no trace of compromise on the victim computer. Mitigating reckless use of Wi-Fi Teach users to always choose encrypted connections. Have them connect via a Virtual Private Network (VPN). This encrypts the data stream, so that even if eavesdroppers listen in wirelessly, what they receive is gibberish.

Threat # 5: Data lost on a portable device

Much sensitive data is compromised every year when workers accidentally leave their smart phone in a taxi, their USB stick in a hotel room, or their laptop on a commuter train. When data is stored on small devices, it’s wiser for administrators to stop thinking about what they’ll do “if that device ever gets lost…” and instead, think, “when it gets lost…” Mitigating data lost on portable devices Manage mobile devices centrally. Consider investing in servers and software that centrally manage mobile devices. RIM’s Blackberry Enterprise Server can help you ensure transmissions are encrypted; and if an employee notifies you of a lost phone, you can remotely wipe data from the lost Blackberry. Such steps go a long way toward minimizing the negative impact of lost devices.

Threat # 4: Web server compromise

The most common botnet attack today is against web sites; and the fatal flaw in most web sites is poorly-written custom application code. Attackers have compromised hundreds of thousands of servers in a single stroke with automated SQL injection attacks. Legitimate sites are then caused to serve malware, thus unwittingly spreading the bot master’s empire. Mitigating web server compromise Audit your web app code. If (for instance) a Web form has a field for a visitor to supply a phone number, the web application should discard excess characters. If the web application doesn’t know what to do with data or a command, it should reject it, not process it. Seek the best code auditing solution you can afford (whether a team of experts or an automated tool), with emphasis on finding out whether your code does proper input validation.

Threat # 3: Reckless web surfing by employees

A 2006 study by the University of Washington found that the sites that spread the most spyware were (in order) 1. Celebrity fan sites (such as the type that give updates on the follies of Paris Hilton and Britney Spears); 2. Casual gaming sites (where you can play checkers against a stranger) 3. Porn sites (coming in at a surprising third place) Social networking sites such as MySpace and Facebook have taken the lead as virtual cesspools of spam, trojans, and spyware. Employees who surf to non-business-related sites end up inviting into the corporate network bot clients, Trojans, spyware, keyloggers, spambots… the entire gamut of malware. www.watchguard.com page 3 Mitigating reckless web surfing Implement web content filtering. Use web filtering software such as WatchGuard’s WebBlocker. Web filtering solutions maintain databases (updated daily) of blocked URLs in scores of categories. More categories means more nuance. Such tools help you enforce your Acceptable Use Policy with technology.

Threat # 2: Malicious HTML email

The most common email attack now arrives as an HTML email that links to a malicious, booby-trapped site. One wrong click can trigger a drive-by download. The hazards are the same as in Threat # 3, “Reckless web surfing;” but the attacker uses email to get the victim to his malicious website. Mitigating malicious HTML email Implement an outbound web proxy. You can set up your LAN so that all HTTP requests and responses redirect to a web proxy server, which provides a single choke-point where all Web traffic can be monitored for appropriateness. The web proxy won’t catch an inbound malicious email, but if a user on your network clicks a link in that HTML email, that will generate an HTTP request that the web proxy can catch. If the user’s HTTP request never gets to the attacker’s booby-trapped web site, your user does not become the victim.

Threat # 1: Automated exploit of a known vulnerability

Verizon’s 2008 Data Breach Investigations Report compiles factual evidence from more than 500 data breaches, occurring over 4 years. Verizon’s RISK Team found that 73% of the breaches occurred from external sources. Negligent SMEs get victimized if they don’t install Windows patches during the same month the patch is published. But your network contains much more than Microsoft products. Your patching routine needs to extend systematically to all the applications and OS components on your network. Mitigating automated exploits Invest in patch management. Patch management software will help you scan your network, identify missing patches and software updates, and distribute patches from a central console, greatly increasing your chance of having your entire network up-to-date. Build an inexpensive test network. Even reputable companies can slip up. Therefore, we recommend installing a patch on a test system and seeing how it behaves before deploying it throughout your network. If you don’t have a test network now, the next time you replace outmoded desktop computers and servers, hang onto them and dedicate them to being your test network.


The countermeasures we’ve suggested above can go a long way in mitigating your risk and protecting your network. But these are only a sampling of the steps that a diligent IT administrator could implement to increase network security. For more practical advice on hardening your network against common problems, download a free copy of the complete Top Ten Security Threats for SMEs (and what to do about them) white paper from the WatchGuard web site. WatchGuard® provides extensible threat management (XTM) gateway security appliances that address nine of the ten threats listed herein. (Sadly, our appliances cannot stop your employees from losing portable devices.) We can help you secure your wireless network, check the integrity of clients requesting access to your network, filter spam, proxy web services, minimize insider threats, create VPNs, and much more. For information about WatchGuard security solutions and the protection they provide against botnets and other network threats, visit us at www.watchguard.com or contact your reseller.

Data Science Skills for 2016

Blog Project work

Who’s posting: Seamus Breslin

Which Company: Solas Consulting

Post was about: this post sets out the skill set required by Data Scientists to land a Data Science role in 2016

What did I get from the post: This post sets out the key skills which will be in high demand for any would be Data Scientist (DS) to land that dream job in 2016. It specifically references data visualisation, SQL and R which I found to be the most interesting items on the Data Analytics course. It references Statistics as a cornerstone of any DS’s skill set while emphasising creativity as the “differentiator” between a good and a great DS.

Python is also something I would like to gain more experience of over the summer months.

Data Science Skills for 2016

Tags: Apache SparkCrowdFlowerData SciencePythonSkillsSQL

As demand for the hottest job is getting hotter in new year, the skill set required for them is getting larger. Here, we are discussing the skills which will be in high demand for data scientist which include data visualization, Apache Spark, R, python and many more.


By Seamus Breslin, Solas Consulting. 

A Data Scientist is high in demand for 2016. Now that “data scientist” has been named this year’s hottest job, have you got the needed skills to have what it takes? Securing a data scientist job isn’t easy, especially at the big organisations as there may be a number of similar candidates going for the same job. Nonetheless, there are certain skills that employers look for, that could put you ahead of the pecking order.

Here are, in  my opinion, top Data Science Skills needed in 2016:


SQL is still one of the most important tools required, to be a successful data scientist,as the majority of data stored by businesses is in these databases.

CrowdFlower recently did an analysis of the 3,490 postings for data science jobs on LinkedIn and SQL is a skill that was specifically named on more than half of the listings, that they analysed, meaning SQL was the most commonly cited skill.

Fig 1.  Most Common Data Science Job skills, according to Crowdflower

Data Visualization

Companies that make data-driven decisions depend immensely on a data scientist’s ability to visualize and convey a story by analysing data, since the Data Scientist needs to communicate data-driven insights to both, technical and non-technical people in the company.

Communication Skills

Communication is still as important as ever and another skill that is needed for a Data Science professional. For instance, communication is vital for when sharing results via presentations or publishing results. They should be able to engage with senior management, so that they can talk their language and translate the data into decisions and actions.


Learning Hadoop coupled with Big Data Analytics will make you stand out from the crowd. One of the most pressing barriers of adoption for Big Data in the enterprise is the lack of skills around Hadoop.


Spark is at the forefront of technologies that have evolved to meet the ever growing need to model and analyse large quantities of data.

Spark is a skill that gets more and more attention in the big data space, because of its speed and its ease of use.



Python is another sector seeing a rise in demand. This programming language helps engineers create concepts with less amounts of code than Java or C++, so it is seen as more efficient, less bug prone and with potential to create clearer programs.

Not convinced on Python yet? Not only is Python easy to learn, it also has a large active community that’s still growing. If you get stuck with some coding problems, there is a wide range of experts to help you in the community. According to a study, Python remains the No. 1 tool for data science.


This is the core of what you can offer as a data scientist. Statistics will always be an essential component of a Data Scientist, so it’s very important that they can decide the most fitting statistical techniques for tackling different classes of problems and applying the relevant techniques.


It’s often said that 80% of the work in data science is data manipulation. In spite of this, with R, data manipulation is easy as R has some of the best data management tools you’ll find.

R is an essential tool for Finance and analytics-driven companies such as Google, Facebook, and LinkedIn, so it’s worthwhile you look into R more closely if you have not done so yet.


Anyone can be formulaic. Businesses in 2016 want innovation that will set them apart from their competitors, in terms of sales and the image they present to their consumers.

Creativity is the ability to apply the technical skillsets mentioned above, and use it to produce something of worth, in a way other than following a pre-concluded formula.

Skip to toolbar