Subject: How to detect poisoned data in machine learning datasets
Almost anyone can poison a machine learning (ML) dataset to alter its behavior and output substantially and permanently. With careful, proactive detection efforts, organizations could retain weeks, months or even years of work they would otherwise use to undo the damage that poisoned data sources caused.What is data poisoning and why does it matter?
Data poisoning is a type of adversarial ML attack that maliciously tampers with datasets to mislead or confuse the model. The goal is to make it respond inaccurately or behave in unintended ways. Realistically, this threat could harm the future of AI.
As AI adoption expands, data poisoning becomes more common. Model hallucinations, inappropriate responses and misclassifications caused by intentional manipulation have increased in frequency. Public trust is already degrading — only 34% of people strongly believe they can trust technology companies with AI governance.
Examples of machine learning dataset poisoning
In another real-world case, user input permanently altered an ML algorithm. Microsoft launched its new chatbot “Tay” on Twitter in 2016, attempting to mimic a teenage girl’s conversational style. After only 16 hours, it had posted more than 95,000 tweets — most of which were hateful, discriminatory or offensive. The enterprise quickly discovered people were mass-submitting inappropriate input to alter the model’s output.
Ways to detect a poisoned machine learning dataset -The good news is that organizations can take several measures to secure training data, verify dataset integrity and monitor for anomalies to minimize the chances of poisoning.
* BONUS *
How about poisoning algorithmic trade data? https://www.investopedia.com/articles/active-trading/101014/basics-algorithmic-trading-concepts-and-examples.asp
[h/t Sabrina] 2023 was a year of recovery for cryptocurrency, as the industry rebounded from the scandals, blowups, and price declines of 2022. With crypto assets rebounding and market activity growing over the course of 2023, many believe that crypto winter is ending, and a new growth phase may soon be upon us.
But what did all of that mean for crypto crime? Let’s look at the high-level trends.Ransomware and darknet market activity on the rise
Ransomware and darknet markets, on the other hand, are two of the most prominent forms of crypto crime that saw revenues rise in 2023, in contrast with overall trends. The growth of ransomware revenue is disappointing following the sharp declines we covered last year, and suggests that perhaps ransomware attackers have adjusted to organizations’ cybersecurity improvements, a trend we first reported earlier this year.
Similarly, this year’s growth in darknet market revenue also comes after a 2022 decline in revenue. That decline was driven largely by the shutdown of Hydra, which was once the world’s most dominant market by far, capturing over 90% of all darknet market revenue at its peak. While no single market has yet emerged to take its place, the sector as a whole is rebounding, with total revenue climbing back towards its 2021 highs.
Cyberattacks on health care providers in the U.S. have gone up steadily over the last decade, exposing the personal health data of millions of patients.
By and large the hospital and clinics kept running, albeit with some tweaks.
“Early in the cyberattack, the first two days, we didn’t have a phone system because our phone is on the internet. We literally went to Best Buy and bought every walkie-talkie they had,” Leffler said.
Laws have also changed. In 2023, the Food and Drug Administration implemented a rule that says medical device manufacturers have to follow stricter guidelines on how to keep their products secure from hackers.
There have been so many ransomware attacks on hospitals that most are better prepared for them by now, said Pam Dixon, cybersecurity expert and founder of World Privacy Forum.
One report released in 2023 found that almost 60% of health care IT professionals say they restored their data from backups after a ransomware attack, without paying a ransom.
She said if hackers get one person’s medical data, they could use it for identity theft. If they get a lot of health care data, they could work with shady doctors to change someone’s medical record, so a patient now has an expensive disease like diabetes or hepatitis C. The shady doctor could tell an insurance provider that they treated the patient for this nonexistent condition.
Source: NPR PBS via WHYY
Meta and others in the industry have been working to develop invisible markers, including watermarks and metadata, indicating that a piece of content has been created by AI. Meta said it will begin using those markers to apply labels in multiple languages on its apps, so users of its platforms will know whether what they’re seeing is real or fake.
“As the difference between human and synthetic content gets blurred, people want to know where the boundary lies,” Nick Clegg, Meta’s president of global affairs, wrote in a company blog post. “People are often coming across AI-generated content for the first time and our users have told us they appreciate transparency around this new technology. So it’s important that we help people know when photorealistic content they’re seeing has been created using AI.”
For now, Meta is relying on users to fill the void. On Tuesday, the company said that it will start requiring users to disclose when they post “a photorealistic video or realistic-sounding audio that was digitally created or altered” and that it may penalize accounts that fail to do so.
Source: Krebs on Security
In 2021, the exclusive Russian cybercrime forum Mazafaka was hacked. The leaked user database shows one of the forum’s founders was an attorney who advised Russia’s top hackers on the legal risks of their work, and what to do if they got caught. A review of this user’s hacker identities shows that during his time on the forums he served as an officer in the special forces of the GRU, the foreign military intelligence agency of the Russian Federation.Launched in 2001 under the tagline “Network terrorism,” Mazafaka would evolve into one of the most guarded Russian-language cybercrime communities. The forum’s member roster includes a Who’s Who of top Russian cybercriminals, and it featured sub-forums for a wide range of cybercrime specialities, including malware, spam, coding and identity theft.
Source: Krebs on Security
Krebs on Security: “Google continues to struggle with cybercriminals running malicious ads on its search platform to trick people into downloading booby-trapped copies of popular free software applications. The malicious ads, which appear above organic search results and often precede links to legitimate sources of the same software, can make searching for software on Google a dicey affair. Google says keeping users safe is a top priority, and that the company has a team of thousands working around the clock to create and enforce their abuse policies. And by most accounts,…
Today, CISA partnered with the Open Source Security Foundation (OpenSSF) Securing Software Repositories Working Group to publish the Principles for Package Repository Security framework. Recognizing the critical role package repositories play in securing open source software ecosystems, this framework lays out voluntary security maturity levels for package repositories. This publication supports Objective 1.2 of CISA’s Open Source Software Security Roadmap, which states the goal of “working collaboratively [with relevant working groups] to develop security principles for package managers.”[more]
Source: Ars Technica
Ars Technica: “Health insurance companies cannot use algorithms or artificial intelligence to determine care or deny coverage to members on Medicare Advantage plans, the Centers for Medicare & Medicaid Services (CMS) clarified in a memo sent to all Medicare Advantage insurers. The memo—formatted like an FAQ on Medicare Advantage (MA) plan rules—comes just months after patients filed lawsuits claiming that UnitedHealth and Humana have been using a deeply flawed, AI-powered tool to deny care to elderly patients on MA plans. The lawsuits, which seek class-action status, center on the same AI tool, called nH Predict, used by both insurers and developed by NaviHealth, a UnitedHealth subsidiary. According to the lawsuits,…See also:https://arstechnica.com/health/2023/11/ai-with-90-error-rate-forces-elderly-out-of-rehab-nursing-homes-suit-claims/
The CMS also openly worried that the use of either of these types of tools can reinforce discrimination and biases—which has already happened with racial bias. The CMS warned insurers to ensure any AI tool or algorithm they use “is not perpetuating or exacerbating existing bias, or introducing new biases.”
Abstracted from beSpacific
Copyright © 2024 beSpacific, All rights reserved.