Suicide is the second most common cause of death for people ages 15-29. Research has found that one of the best ways to prevent suicide is for those in distress to hear from people who care about them. Facebook is well positioned — through friendships on the site — to help connect a person in distress with people who can support them. It’s part of our ongoing effort to help build a safe community on and off Facebook.
We recently announced an expansion of our existing suicide prevention tools that use artificial intelligence to identify posts with language expressing suicidal thoughts. We'd like to share more details about this, as we know that there is growing interest in Facebook's use of AI and in the nuances associated with working in such a sensitive space.
In the past, we've relied on loved ones to report concerning posts to us, since they are in the best position to know when someone is struggling. However, many posts expressing suicidal thoughts are never reported to Facebook, or are not reported fast enough. We saw this as an opportunity to use machine learning to better detect posts indicating that someone may be at risk. When the system finds a post and a trained member of our Community Operations team determines that the poster might be at risk, we show that person resources with support options the next time they open Facebook.
To start, we worked with experts to identify specific keywords or phrases known to be associated with suicide. However, we found that we needed to do more than rely on simple rules, particularly because people often use the same words in different contexts (i.e., “Ugh, I have so much homework I just wanna kill myself”). As an alternative, we moved toward a machine learning approach, which excels at aggregating a series of weak signals to classify content. (You can read more about how we use this kind of artificial intelligence at Facebook here.) We were able to use posts previously reported to Facebook by friends and family, along with the decisions made by our trained reviewers (based on our Community Standards), as our training data set. We used FBLearner, our internally developed machine learning engine, to train a classifier to recognize posts that include keywords or phrases indicating thoughts of self-harm.
We rely on human review by specially trained members of our Community Operations team to make the determination that a post expresses suicidal thoughts. It's important to minimize the number of false-positive reports sent to the team, so they can get help to people as quickly as possible. The best way to do this is by adding additional features to the model to get a stronger signal. Aggregated, de-identified comments on posts provide a crucial layer of context: For example, comments like, “Are you OK?” or, “Can I help you?” are a strong indicator that friends and family are taking a potential threat seriously. Additional research shows that both the time of day and the day of the week are important predictors of when someone is in crisis, so we include these variables in the model.
In our first attempt at text classification, we used a learning algorithm known as n-gram-based linear regression. This algorithm divides a piece of text into chunks of varying length, then looks to see how often each chunk appears in the positive training samples (that is, posts that express suicidal thoughts) versus the negative training samples (posts that do not express suicidal thoughts). Chunks appearing much more often in the positive samples are given a higher weight, while chunks that appear much more often in the negative samples are given a negative weight.
Here's a simple visualization of this process based on a made-up sentence and simulated weights: