21 tricks to crack data science

I am David, this is the best list for data science now. I hope you love it. 1.Understand the business before you start solving problems...

I am David, this is the best list for data science now. I hope you love it.

1.Understand the business before you start solving problems

I know you are an analyst and all you care about is numbers. But, what differentiates an awesome business analyst from average data analyst? It’s their potential to understand business. You should try to understand business even before you take up your first project. Here are a few things you should definitely explore:

a. Customer level information: Total number of active customers, month on month customer attrition, segments defined by business on portfolio.

b. Business Strategies: How do we acquire new customers, what are the channels. How do we retain valuable customers.

c. Product Information: How does your customer interact with your products? How do you earn money through your product? Is your product a direct revenue maker or is just an engagement tool?

If you can answer all these questions, you are in a good shape to start your first project.   

Related post:

Top 7 free and paid data science courses

+ Top 201 data science resources

2. Think hard of whether you are solving an underlying problem or just an outcome

I have observed that analysts aim for objectives which are not even the main concern of the problem. For instance, lets imagine we found that more a customer calls at customer care, higher will be his propensity to abandon the services.

Now, if we begin solving for method to minimize the calls at customer care, we probably won’t reduce the attrition rate. Rather, I already see higher dissatisfaction in your customers if you do not have a human justifying your faults. This is probably an easy kill, and you will refuse to get into such easy traps. But, real life problems are much difficult to find. I will say, it is much easier to solve a defined problem than finding what is the right problem to solve.

3. Spend more time on finding out the right evaluation metric and how much is required for implementation

This probably is the easiest puzzle to solve for an analyst yet a simple trap to fall in. Let me explain it using few examples.

Suppose, you are trying to build a targeting model for a marketing campaign. Which of the metric will you choose to gauge your model:

  • KS stats
  • Lift on 1st decile
  • Log-Likelihood

I will always choose KS in this case, given that Lift will only give you estimate on a particular decile. Hence, it probably won’t help us to find the total target population and the break point. AUC-ROC will be an estimate for the overall population, which is not our intent in this case. Log-Likelihood is probably the biggest misfit in this case, as all matters to us is the rank order and not the actual probability.

4. Follow the diverge-converge thinking process to avoid pre-mature convergence

I have seen this as the biggest problem in many functions/industries. Today business leaders seek innovation in every thing they do.

To truly innovate, you need to follow a systematic way to diverge and converge. The extent to which you need to diverge will come along as you get more experienced in this approach. What we mean is to think of all possible ways to crack the problem being unbiased on feasibility, time for development and traditional approaches. Once, you are convinced that you have covered the universe, you now apply all your constraints to narrow down the approaches.

5. Break industry silos to think of alternate solutions

Analytics is being used in every possible industry. But, why do we not go beyond our traditional approaches and look for solution in other industries?

For instance, a recommended video solution implemented in E-Commerce industry can be very well used in a blogging portal like Analytics Vidhya. The only way to do this is to interact with people working in other industry and learn about their efforts using analytics.

6. Engage with business counterparts throughout the process

Right from the first day of your analysis, you should interact with business partners. One thing which I have seen going wrong in general is that analyst and business partner get in touch on the solution non-frequently. Business partners want to  stay away from the technical details and so does the analyst from business. This does no good to the project. It is very essential to maintain constant interaction to understand implementation of the model parallel to building of the model.

7. Think of simplest implementation levers to bring your idea to life

I know you are a statistician and love to confuse business people using complex formulations. Bringing such complexity in discussion with business people might help you get out of the immediate conversation but decreases the chances of successful implementation.

Here is what you need to do: Once you have the output variables, try to find out a simple lever which can make it easier for the business to understand. Let me give you an example of this approach. We were trying to find out the right agents who will be top performers once they onboard. We came up with the stratfied population and their expected performance. However, we had to choose a lever which can change the population mix. What we did was very simple: we implemented differential fee strategy to change the mix of applicant and hence the mix of our population.

8. While making a business deck, make sure you lay it out in their language

The target variable is never the end product of your analysis. It is always a business deck! So you need to put a lot of effort while bringing out your idea in a crisp and effective way. Try learning terminologies with which your audience can connect and think of what will business partner look out if you were in their shoes.

9. Learn to speak business language while presenting to business leaders

I recently started learning chinese for one of my projects. The entire project was extremely easy but I found that even with a robust model, I was doing a bad job at selling it to business. The reason being my gap of understanding their internal discussions. It is very essential to speak a language of your audience. I have seen very simple models being appreciated and smartest models being rejected. The only difference being the ability of the analyst to speak business while presenting their models.

10. Actively follow up on the implementation plan

Coming to the last but not the least, what happens once every one is convinced with the effectiveness of your model. Your job is still not done. Set up monthly follow ups with business to understand how project was implemented, is it being used in the right send.

11. Actively participate in Data Hackathons

One thing which you will realize with time is that analytics industry is extremely dynamic. However, if you are a person who likes to be in their comfort zone, you will soon find your skill sets redundant. One thing which I found extremely useful is participating in data science competitions and competing with your peers/learn from your peers. Kaggle and Analytics Vidhya are good points to take up a few challenges.

12. Read blogs and books on upcoming tools and techniques on analytics

I believe this is another way to keep yourself up to date.

13. Learn upcoming tools to know what is possible and what is not

Get out of your comfort zone of programming on SAS, R or Python. Try learning upcoming technologies to handle big data. SPARK and JAVA will be my recommendation to start with.

14: Learn to Program

With data science already heavily dependent on computing resources and machine learning quickly become the top way to derive insights, coding skills have never been more important. Fortunately, you don’t have to be a full-fledged application developer. Several programming languages are being increasingly tailored to serve those who need to build their own data analysis tools. Two of the biggest languages worth keeping up with are:

  • Python
  • R

If you’re looking to perform work using modern machine learning systems like TensorFlow, you’ll likely want to steer toward Python, as it has the largest set of supported libraries for ML. R, however, is very handy for quickly mocking up models and processing data. It’s also prudent to pick up some understanding of database queries.

15: Develop a Rigid Workflow for Each Project

One of the biggest challenges in the world of data analytics is keeping your data as clean as possible. The best way to meet this challenge head on is to have a rigid workflow in place. Most folks in the field have set down these steps to follow:

  • Gather and store data
  • Verify integrity
  • Clean the data and format it for processing
  • Explore it briefly to get a sense of the dataset’s apparent strengths and weaknesses
  • Run analysis
  • Verify integrity again
  • Confirm statistical relevance
  • Build end products, such as visualizations and reports

16: Find a Focus

The expanding nature of the data analytics world makes trying to know and explore it all as impossible as getting to the edge of the universe. It might be fun to explore machine vision to identify human faces, for example, but that skill likely isn’t going translate well if your life’s work is doing plagiarism detection.

In order to find a focus, you need to look at the real-world problems that interest you. This will then allow you to check out the data analysis tools that are commonly used to solve those problems.

17: Always Think About Design

How you choose to analyze data will have a lot of bearing on how a project turns out. From a design standpoint, this means confronting questions like:

  • What metrics will be used?
  • Is this model appropriate for this job?
  • Can the compute time be optimized more?
  • Are the right formats being used for input and output?

18: Make Data Scientist Friends with Github

Github is a wonderful source of code, and it can help you avoid needlessly reinventing the wheel. Register an account, and then learn the culture of Github and source code sharing. That means making a point of providing attribution in your work. Likewise, try to contribute to the community rather than just taking from it.

19: Curate Data Well

One of the absolute keys to getting the most mileage out of data is to curate it competently. This means maintaining copies of original sources in order to allow others to track down issues later. You also need to provide and preserve unique identifiers for all your entries to permit tracking of data across database tables. This will ensure that you can distinguish duplicates from mere doppelgängers. When someone asks you to answer questions about oddities in the data or insights, you’ll be glad you left yourself a trail of breadcrumbs to follow.

20: Know When to Cut Losses

Digging into a project can be fun, and there’s a lot to be said for grit and work ethic when confronting a problem. Spending forever fine-tuning a model that isn’t working, though, carries the risk of wasting a significant portion of the time you have available. Sometimes, the most you can learn from a particular approach is that it doesn’t work.

21: Learn How to Delegate

Most great discoveries and innovations in the modern world are the final work products of teams. For example, STEM-related Nobel Prize are pretty much never awarded to individual winners anymore. While the media may enjoy telling the stories of single founders of companies, the reality is that all the successful startups of the internet age were team projects.

If you don’t have a team, find one. Recruit them in-house or go on the web and find people of similar interests. Don’t be afraid to use novel methods to find team members, too, such as holding contests or putting puzzles on websites.


Data science tips 2548490669294536659

Post a Comment


Follow Us

Hot in week