Rerunning Patterns of Members of Legislative Assembly in India's State Elections, 1985-2018

Dataset

DOI

The dataset covers regional assembly or Vidhan Sabha elections in India from 1985-2018. We first divided the data into two sets, pre-delimitation (1985-2007) and post-delimitation (2008-2018). We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. We manually went over a large portion of the data, and corrected for any measurement errors. There were 91260 candidates with 17961 constituency-years in 4424 Vidhan Sabha constituencies for the pre-delimitation period after excluding independents. We further subset the dataset to top four candidates because candidates further down the list rarely received many votes. We were left with 58842 candidates with 17959 constituency-years in 4424 Vidhan Sabha constituencies. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same seat. We were further left with 20362 candidates with 12310 constituency-years in 4098 Vidhan Sabha constituencies. Out of the 20362 candidates, we have 10920 incumbent party candidates and 9442 challenger party candidates. Therefore, for the pre-delimitation time period, we have rerunning data for 10920 incumbent party candidates and 9442 challenger party candidates. For the post-delimitation period, there were 27723 candidates with 9096 constituency-years in 4067 Vidhan Sabha constituencies after removing independents and subsetting the dataset to top four candidates. We further subset the data to observations where incumbent and/or challenger parties reran for elections in the same seat. There were 7543 candidates with 4652 constituency-years in 3792 Vidhan Sabha constituencies. Therefore, for the post-delimitation time period, we have rerunning data for 3953 incumbent party candidates and 3590 challenger party candidates.Past research on incumbency in India has identified an incumbency disadvantage using Regression Discontinuity Designs (RDD). We argue, however, that these findings are hampered by selection biases. This is because political parties in India do not renominate all their candidates, nor do they renominate them at random. Political parties in India strategically select candidates from constituencies where they outperformed the state-wide party average. This is true for both incumbent and challenger parties, although the magnitude of the selection effect is greater for challenger parties as they are able to exercise greater discretion over whom to renominate. Moreover, we show that incumbent and challenger party candidates that rerun for elections perform better in comparison to newly selected candidates. Part of this is down to the strategic selection effects on behalf of the political parties. However, the rerunning advantage holds even when we account for the selection effects.

We used the stringdist package in R developed by Van der Loo (2014) to name-match candidates in order to identify which of them reran for the same party in the same seat. The stringdist package offers a uniform interface to a number of well-known string distance measures, such as edit-based, q-gram and heuristics distances. Following an iterative process, we used a combination of Damerau-Levenshtein, q-gram, cosine, jaccard, and Jaro-Winker distances to identify candidates which have rerun for political office. The easiest to code were those that have a distance score of zero for all the distance measures. There were 4643 cases where all the distance measures were zero. Furthermore, there were an additional 4036 cases where the names were only marginally different across election years. For example, J.C. Divakar Reddy of INC in Tadpatri constituency for the 1989 Vidhan sabha elections in Andhra Pradesh was recorded as J.C. Diwakara Reddy in the same constituency for the 1994 Vidhan Sabha elections. A Levenshtein distance of less than or equal to 2 allows us to account for them. Moreover, considering the way the Election Commission of India recorded names across election years, we used a combination of distance measures to name-match candidates. For example in Birapur constituency of Uttar Pradesh Prof. Shivakant Ojha reran for elections in 2007 for the Bharatiya Janata Party (BJP), but was entered as Shiva Kant Ojha without the prefix and with spaces. A simple Levenshtein distance score would not be able to capture this. Instead, we used a combination of Levenshtein and cosine distance to capture such cases. Another common example was the lack of order in terms of naming candidates across elections. For example, in 2004 Kagal constituency of Maharashtra, Mushrif Hasan Miyalal reran for elections for the Nationalist Congress Party (NCP), but was entered as Hasan Miyalal Mushrif in 1999. Once again, a simple Levenshtein distance score would not be able to help us accurately record this. Instead we used a combination of Levenshtein and q-gram distance measures to accurately code them. Then there were some cases that showed a very high Levenshtein distance but were captured because of a distance of zero on either cosine, Jaccard or q-gram distance measures. For example, Aqbal Hasan Alias Aqbal Husain of Gainsari constituency of Uttar Pradesh was coded as Aqbal Husain for the 1991 Vidhan Sabha elections. The candidate has a Levenshtein distance of 18 but a Jaccard distance score of 0. When it came to coding candidates which did not rerun for elections in the same seat for the same party, a Levenshtein distance of greater than 15 in combination with high scores on other distance measures was applied as a cut-off. Of course, there were many cases of candidates not rerunning for the same party in the same seat across elections, but had a Levenshtein distance of less than 15. These cases were captured by using a combination of Levenshtein distance with other distance measures. For example, the Telugu Desam Party (TDP) ran Godam Rama Rao in Boath constituency of Andhra Pradesh in 1989 but replaced him with Godem Nagesh for the 1994 Vidhan Sabha election. The two strings have Levenshtein distance of 9, but a cosine distance of 0.408, a Jaccard distance of 0.363, a q-gram distance of 13 and a Jaro-Winker distance of 0.450. In most cases, a combination of Levenshtein distance of greater than 5 with either a high Jaro-Winker distance and/or cosine distance helped us capture cases where candidates have not rerun for elections for the same party in the same seat.

Identifier
DOI	https://doi.org/10.5255/UKDA-SN-854706
Metadata Access	https://datacatalogue.cessda.eu/oai-pmh/v0/oai?verb=GetRecord&metadataPrefix=oai_ddi25&identifier=ee8233a46260fc6176c06ea233c030a6857c9c7fbf4d713d75f3d1a436782c10

Provenance
Creator	Shrimankar, D, Royal Holloway, University of London
Publisher	UK Data Service
Publication Year	2021
Funding Reference	Economic and Social Research Council
Rights	Dishil Shrimankar, Royal Holloway, University of London; The UK Data Archive has granted a dissemination embargo. The embargo will end on 01 March 2022 and the data will then be available in accordance with the access level selected.
OpenAccess	true

Representation
Resource Type	Numeric
Discipline	Social Sciences
Spatial Coverage	India