We can easily make it have very few false positives with time but I'm personally unsure about it being automated as for now, better would be to have it ping admins on discord whenever it pastes pieces of players chat and they can check the full context
I just went through 2000 lines of chat samples and flagged 11 as toxic from my personal view examples are "I hope you die in a ditch", "ez dom", "I hope all your family members get slowly dismembered infront of you", "I hope all your pets will suffer", "I hope they all will die painfully", "Kill that fucking child", "Noob down", "Big nose like israeli", "fag", "ez kill", "fuck that trash".
I feel like processing enough chat we can make it way more reliable than the wikipedia based AI, which is decent on its own but not good enough. I muted 7 people today using it but lots of those were racism, homophobia etc not just toxicity