Machine Commits Suicide: Suicide Study

Check out more papers on Machine Learning Suicide


Suicide is a big cause of death, we can say this is a premature and preventable. It outlines the cross-country disparities and the rate of suicide patterns in different countries between 1985 to 2016 respectively. Data are form the World Health Organization (2018). This show that suicides rates differ widely from country to country, often within the same region or at similar levels. The findings indicates that over ninety percent variations in suicide rates is due to variations between countries. But this paper does include a basic review of paper suicide rate 1985 to 2016 recorded dataset which is based on three technique machine learning methods, K-Nearest Neighbours, (SVM)support vector Machine and (LR)logistic Regression.


In period of rapid growth of the IT, we can get information through social sites how many people are dying through different cases. So evidence indicates [1], mental illness is the greatest risk factor for suicide and that more than ninety percent-suicidal people have psychiatric or addiction disorders. Depression one of the most common illness among suicide, and hundred and thousand of people approximately 60 percent of people suffering from this disease. But one of disease which is know as mental illness is enough for suicide. Because of even more suicide e.g, mental illness, martial breakdown, economic problem, physical health declining, a big loss or missing the social support. This data set lists the current figures suicide rate, many looking and difference class, age and martial status. The principal source of data is suicide rate overviews in different years.

This paper analyses a dataset focused on knowledge about suicide rate summary and strengthens the irrational data collection. On the dataset, three methods of ML were used to construct various models, and then compare the expected outcomes of three methods.

The work material of related literature is added in section II. The data collection is explored in greater detail in section III. The ML approach to dataset is introduced and discussed in section

Literature Review

The research comprised thousand of subjects from the 1985 till 2016 [2], NHS Service sample dataset. Thedataset was split at random into two different training and validation grouping, improving the efficiency of suicide predictions, we applied SVM, machine learning and LR.

Andrea Cipriani, John Geddes has discussed on the machine learning algorithms. Used patient reported data from depressed patients from level 1 of sequenced care approaches to depression relief. And features detection that was most predictive outcome of treatment. And used those variables to train a ML model for clinical remission prediction.

From thousand of patient reportable variable, we found 25 variables that were the most of the predictive treatment.

Dataset_Description_And Preparations:

A: The dataset descriptions:

The select of dataset was got from [3], Kaggle website. Which was gathered and analysed during worldline and machine learning research collaboration on suicides rate globally in different countries. it contains overview of different years from 1985 to 2015 how many people are dying which age and gender. This dataset presents suicides rate that occurred 1985 to 2015. This dataset is seriously shockable. A total of 961 cases included of which died in different years in different countries. but the ratio of both gender female and male suicide rate same 50 percent of all records. I order to protect the all information of suicides rates in different age like 15 to 24 years of age approximately 17 percent and 35 to 54 age same ratio but the other age people died ratio was 67 percent. The format of the dataset is show in Table 1. Features country, year, sex, age, suicide no, population, suicides/100k pop, country, HDI(Human Develop Index) for year, gdp(Gross Domestic Product)for year, gdp per capita, generation(different),

In this dataset to, five highest countries suicides declare which is Sri Lanka, Lithuania, Russian, Hungary and Belarus.

B: Preparation of Data:

First we do a basic transaction analysis. Fig 1, illustrate that the class function has a significant disparity in the dataset. Using the dataset directly can result in algorithms to learn machines that do not produce valid results on real data.

So now this data is showing different age of people died due to suicide cases in different countries so you can see given below. a suicide attempt cases in different years so y aixs showing gdp per capita(A measure of the economic performance of a country which accounts for its numbers of citizens. This separates the gross domestic product of the country by the population at large.) and x_axis showing to generation in this generation group include generation x, silent, GI generation, Boomers, generation Z, Millenials.

As we can notice from fig.2., such as age, GDP, gender and group. Some of conclusion which can be reached here.

No correlation between GDP and suicide rate.

Experimental Setup:

All of this paper algorithms are written in the python programming language. Simultaneously, a range of scientific computing libraries like etc, are used. The project end up working on a ubuntu 18.04.3, and VM ware virtual machine on a window dell laptop and all of used kits can be installed fast and easily through 2019.10 version of Anaconda.

The research adopts three methods of supervised machine learning:

  • Support Vector Machine
  • K-Nearest Neighbours
  • Logistic Regression
  • K-Nearest Neighbours:

One of the simplest methods for, pattern classification is the k nearest neighbours rule. Nonetheless, it is also produce favourable outcomes, and in some cases, if intelligently paired with pervious knowledge,

Its aim is to use a database, where the data points are divided into several classes to predict a new sample point being categorised. KNN is used to calculate the difference between the various eigenvalues. E.g, two points are p,q. formula will compute the distance d.

In classification problem, the selection of the k(which they chose neighbours) value is important.if k value is too small then the noise part will be around. This will have a more pronounced effect. If value of K is big value. Its similar to forecasting an instance of training in wider neighbourhood. And maximum error will occur.

A normal approach is to use the test set to estimate classifier error rates starting from k=1. Repeat this process, each time K increments by 1. So due to this method error will be occur little bit.

Support Vector Machine

SVM is a model of two forms. The basic model of linear classifier with the greatest interval specified in the space of the function. The average time distinguish this from the perceptron. SVM requires nuclear technologies, too. SVM learning approach is to optimise the time interval. Which can be formalised as a convex quadratic programming problem. This is also equal to minimizing regularised loss function of the hinge. The example of linear separation and non linear separation using the kernel shown in the fig. 3. and fig.4.

Logistic Regression(LR)

Contrary to linear regression, the two class problem can be sloved well by logistic regression. LR [7], is a classification algorithm for ML, which is used to estimate a categorial dependent variables likelihood. The dependent variable in LR is a binary variable containing data encoded as either 1(yes) or 0(failure). Or put it another way, p(y=1) is predicted. A function of X by the LR model.

This will contain only the relevant variables.

Logistic regression includes sample size very large.

While using a LR algorithm, various functions for different problems need to be built. Three type of function mostly use in logistic regression which are hypothetical Function, cost function and Activation function.

  • The hypothetical function LR method is as follow.
  • X is input value and (?) is parameter.

The cost function generally expressed:

  • Several iteration of the gradient descent algorithm and the function of reduce value is continually less. And eventually, you can obtain the desired model result.

But we cannot analysed the whole of the problem in LR.


K nearest neighbours

The dataset processed by x aixs year and y axis on sucides/100k. The KNN model is being trained using scikit-learn default K Neighbour classifier process.

Illustration.6. shows K Nearest Neighbour model output for different k values from 1985 to 2016 on data collection. It can be shows that if the k value goes up from 1985 to 1994 suicides rate and ended in 1995. But with the passage of time the suicide rate is slowly decreasing till 2015 but again increasing in 2016.

Support vinctor machine

We use a scikit-learn library model classifier supported by support victor machine. The test collection contains all of the suicides/100k pop data after index 1985 to 2016. we have equipped two models in SVM. It contains all of the features and the other contains only the top different values with the highest absolute value of a combination of correlation. Fig.7 represent the suicide rate overview which is data collected in different country according to different years.

Logistic regression

we did raw data functional regression preparation. It seems that the relationship between GDP and suicide rate is not obvious. Nonetheless, one thing is here the over this time the GDP is gradually increasing. Illustration 8 is a LR ROC curve on the original dataset.


Overall, dataset of suicide rate aren’t very complex to practise. So, we are not doing a lot of preprocessing data research. Because of a number of transaction have different in dataset like we compute different years from 1985 to 2016 process on valid transaction, different methods of machine learning can have a various effect on model building.

I did work on three ML algorithm and got a good result. One of the best algorithm is KNN on this dataset and the main part of the KNN is finding a right K value.

SVM one of the other best algorithm which we can a good result. But here in this paper, it is easier to pick only 10 function with the highest value of the analytics function than to use all the functions.

In LR, we got a result yet there was incredibly disparity in the number of instance between different features. This can lead to overfitting of models.


  1. (2020) [online] available from [4 April 2020]
  2. (2020) available from [4 April 2020]
  3. Suicide Rates Overview 1985 To 2016 (2020) available from [4 April 2020]
  4. Analysis Of Suicide Dataset Via 10 Questions (2020) available from [4 April 2020]
  5. (2020) [online] available from [5 April 2020]
  6. Keller, J., Gray, M. and Givens, J. (2020) A Fuzzy K-Nearest Neighbor Algorithm [online] available from [5 April 2020]
  7. Building A Logistic Regression In Python, Step By Step (2020) available from [5 April 2020]
Did you like this example?

Cite this page

Machine Commits Suicide: Suicide Study. (2022, Feb 07). Retrieved December 4, 2023 , from

Save time with Studydriver!

Get in touch with our top writers for a non-plagiarized essays written to satisfy your needs

Get custom essay

Stuck on ideas? Struggling with a concept?

A professional writer will make a clear, mistake-free paper for you!

Get help with your assignment
Leave your email and we will send a sample to you.
Stop wasting your time searching for samples!
You can find a skilled professional who can write any paper for you.
Get unique paper

I'm Chatbot Amy :)

I can help you save hours on your homework. Let's start by finding a writer.

Find Writer