Data Science and Sports; A symbiotic relationship

Data Science and Sports; A symbiotic relationship

FIFA, ICC Cricket World Cup, Olympics, Grand Slams, Formula 1– the list goes on and on. So many disciplines, so many rules, so many styles of performance, so many genetics performing at the same time. This is reason enough for a data scientist to get super excited. And rightfully excited they are. Sports at the international level doesn’t just bring out the best of the human condition, it also brings out the best of human technology that promises a spectacular performance. To bring this performance, the key metrics mentioned above serve as the foundation upon which companies such as Nike, Reebok, Adidas & Puma, to name a few,  build their top of the line sports apparel and accessories. Sports Analytics (SA), as the term has come to be known, has become critical for companies aiming to be the best in the business of delivering world class performances. But how does SA work? We’ll take up two sports I love talking about to understand the role of SA namely swimming and running

One of the world’s best swimming apparel brands, Speedo, brought out its latest line of swimwear for competitive swimming namely the LZR Racer. Taking help from NASA, sports institutes and swimmers the world over, Speedo managed to deliver a swimsuit that was not only a perfect specimen of Artificial Intelligence (AI) and Human Intelligence (HI) but it also set records at the 2008 Beijing Olympics. Legendary swimmer Michael Phelps won his Olympic Gold Medals wearing the aforementioned suit. The team at Speedo, spent close to 55,000 man hours trying to make the radical suit. To achieve their target, the R&D team spent considerable time with experts from the world of swimming, kinesiology, biomechanics and even conjured up 3-D models of the kind of forces that the human body would have to overcome when swimming against the clock at competitive events. They were ultimately able to come out with a suit that not only proved to be a breakthrough at the Olympics, it also served as the model upon which other brands began to model their own designs. 

Another independent research team from the University of Regina in Canada used accelerometers to conduct swimming biomechanics research and performance analysis. Accelerometers are motion sensors that are used by smartphone and automobile companies to enable screen rotation and build airbag units respectively. The accelerometers are accurate enough to detect the drag and speed recorded by each swimmer doing different kinds of swimming styles namely butterfly, backstroke, breaststroke and freestyle.

The world of competitive running too isn’t far from using analytics in their sports apparel. Nike, for instance, worked closely with Kenyan long distance runner Eliud Kipchoge to break the sub-2 hour marathon world record. Observing the athlete’s stance, body dynamics and speed, Nike and Kipchoge were able to set a new record of completing the 2018 Berlin Marathon in 2:01:39 seconds! This was just a couple seconds of breaching the coveted 2 hour mark. Not content with just this, Nike has also launched the Nike+Running application for iOS and Android users. The app, through its detailed run schedules and event calendars, not only helps runners of all levels and ages to set personal bests, it also helps lets the user see real time data of his/her performance on solo or competitor runs. One enthusiastic person used the app to write an enviable piece of intricate code as well!

As technology and human willpower keep progressing, the limits to sports and records will be redefined from time to time. These are interesting times for sports indeed!

Political campaigning and data science

Political campaigning and data science

Politics and sentiment have a deep relation. Only the leaders who can feel and understand the pulse of the people can effectively become better guides for their country. While public debates and speeches were and are still the most effective way for politicians of a country to have a tête-à-tête with audiences, thanks to advancements in information technology, establishing meaningful relationships with the audience has taken new forms and mediums. Today, politicians can understand the mood of their voter base via social media platforms, online campaigns and ‘Sentiment Analysis’ via neatly and comprehensively prepared surveys. 

If we are to focus our attention to the world’s biggest democracy aka India, the above scenario has special relevance. In the run-up to the 2014 Lok Sabha Elections in the country, the Bharatiya Janata Party (BJP) and the Indian National Congress (INC) utilised data science in various ways to gauge public perception of their respective party’s ideologies and performance. 

The BJP roped in techie and entrepreneur Arvind Gupta as its digital campaign manager. Using a host of data science from national and international surveys, Gupta put forth an electoral campaign that factored in normal and critical time periods during which campaigning would yield the maximum results. ‘Key words’ spoken by leaders and public were paired against each other to see the level of coherence and connectivity. The higher the coherence, the better it was for the party to establish a firm footing. The trick, Gupta says, was to identify the issues that people mostly wanted to talk about and then gather data relevant to those issues. The top leadership could then communicate their party’s agenda based on the issues at rallies and via advertisements and woo the voter base to vote for them. 

Key details such as the Internet Penetration rate, mobile subscriber base, electricity penetration rate amongst others were researched, catalogued and entered into a neat system wherein BJP leaders could connect with their voter base. Use of hashtags and viral videos disseminated at the right place at the right time to the right audience were key to the BJP’s success. For the voters who were not or could not be connected to via these means, face to face communication was adopted. 

Given such a premise, it was a big moment when the BJP won 282 out of 543 seats in the elections, an absolute majority. The party has employed data science this time to gain a special place in people’s hearts. But can it set another absolute win in its second innings? That’s a question that we’ll have to patiently wait for. 

IRCTC and Data Science

IRCTC and Data Science

One of the most popular and convenient ways to travel in India is via trains. Truly, few modes of transport can beat the nostalgia and environment that train journeys evoke. There was a time when people used to queue up to buy tickets for their journeys but with the advent of Indian Railways Catering and Tourism Corporation (IRCTC), booking tickets became easier and faster. IRCTC was the beginning of the transformation of Indian Railways (IR).


Founded on 27 September, 1999, the organisation will be celebrating its 20th anniversary this year. But the journey to become one of the best ticket booking platforms hasn’t been easy. As time progressed and data costs, and smartphones became readily available, IRCTC had to deal with a huge influx of passengers. And it has done this task quite well. In fact, in the process of serving India’s 2 billion plus population, it has set quite a few interesting records. 


From only 27 tickets booked when the server first came online to booking more than 13 lakh tickets on April 1, 2015, the organisation has come a long way in terms of handling the demands of its customers. IRCTC along with Centre for Railway Information Systems (CRIS) have changed the face of not just how tickets are booked but also how goods and services are booked, confirmed and transported across the country. But along with its string of noteworthy achievements, IRCTC has had to grapple with a lot of data pertaining to its customers. 


On an average, the site handles close to 5 lakh tickets per day. Translating that to numbers means the system sees about 5 lakh addresses, names, age, sex, quota, berth preferences, meal preferences and travel insurance preferences either as new data entries or as part of individual passenger history(s). This process is happening 24X7 and 366 days of the year with few to no server crashes. The system only rests for 45 minutes between 2345 hrs to 0030 hrs. And if we are to go a step further and estimate the transaction volume of one day, it is simply astounding. Taking a hypothetical figure of 100 Rs/ticket and 5 lakh transactions gives us a figurative daily revenue of 5 Crores. 


But the big question is how safe is all this data? Though IRCTC assures its partners and customers that the data they’re entering is protected using industry leading encryption standards, there have been instances when IRCTC faced major breaches in its system. Thanks to alert and responsible hackers, the situation was diffused before it could have escalated to a nationwide crisis. 


More than that, the Govt. of India is also considering monetising the tremendous amount of  data it has received. From a data scientist’s point of view, IRCTC has a treasure trove of data. Not only would the data help in understanding the travel demographics of India, it would also help in ideating and implementing schemes, upgrading routes and providing a more sustainable and environmentally responsible way of doing business at all levels. But the major question is; To whom will the data be sold? Other questions also abound; 


  1. Will the data be stored within India indefinitely or be kept on third party servers?
  2. How will customer privacy be maintained in the face of changing data usage policies ?
  3. How will data be used to empower allied transport and tourism services? 


While IR is coming up with strategies to deal with the above concerns and more, these are interesting times for the national railway carrier. 


Subscribe To Our Newsletter

Join our mailing list to receive the latest news and updates from our team.

You have Successfully Subscribed!