#reconnaissance

Articles repérés par Hervé Le Crosnier @hlc CC BY 28/07/2020

3

3

MIT apologizes, permanently pulls offline huge dataset that taught AI systems to use racist, misogynistic slurs • The Register
▻https://www.theregister.com/2020/07/01/mit_dataset_removed
https://regmedia.co.uk/2019/03/23/facial_recog.jpg
The dataset holds more than 79,300,000 images, scraped from Google Images, arranged in 75,000-odd categories. A smaller version, with 2.2 million images, could be searched and perused online from the website of MIT’s Computer Science and Artificial Intelligence Lab (CSAIL). This visualization, along with the full downloadable database, were removed on Monday from the CSAIL website after El Reg alerted the dataset’s creators to the work done by Prabhu and Birhane.
The key problem is that the dataset includes, for example, pictures of Black people and monkeys labeled with the N-word; women in bikinis, or holding their children, labeled whores; parts of the anatomy labeled with crude terms; and so on – needlessly linking everyday imagery to slurs and offensive language, and baking prejudice and bias into future AI models.
Screenshot from the MIT AI training dataset
A screenshot of the 2.2m dataset visualization before it was taken offline this week. It shows some of the dataset’s examples for the label ’whore’, which we’ve pixelated for legal and decency reasons. The images ranged from a headshot photo of woman and a mother holding her baby with Santa to porn actresses and a woman in a bikini ... Click to enlarge
Antonio Torralba, a professor of electrical engineering and computer science at CSAIL, said the lab wasn’t aware these offensive images and labels were present within the dataset at all. “It is clear that we should have manually screened them,” he told The Register. “For this, we sincerely apologize. Indeed, we have taken the dataset offline so that the offending images and categories can be removed.”
In a statement on its website, however, CSAIL said the dataset will be permanently pulled offline because the images were too small for manual inspection and filtering by hand. The lab also admitted it automatically obtained the images from the internet without checking whether any offensive pics or language were ingested into the library, and it urged people to delete their copies of the data:
“The dataset contains 53,464 different nouns, directly copied over from WordNet," Prof Torralba said referring to Princeton University’s database of English words grouped into related sets. “These were then used to automatically download images of the corresponding noun from internet search engines at the time, using the available filters at the time, to collect the 80 million images.”
WordNet was built in the mid-1980s at Princeton’s Cognitive Science Laboratory under George Armitage Miller, one of the founders of cognitive psychology. “Miller was obsessed with the relationships between words,” Prabhu told us. “The database essentially maps how words are associated with one another.”
For example, the words cat and dog are more closely related than cat and umbrella. Unfortunately, some of the nouns in WordNet are racist slang and insults. Now, decades later, with academics and developers using the database as a convenient silo of English words, those terms haunt modern machine learning.
“When you are building huge datasets, you need some sort of structure,” Birhane told El Reg. “That’s why WordNet is effective. It provides a way for computer-vision researchers to categorize and label their images. Why do that yourself when you could just use WordNet?”
WordNet may not be so harmful on its own, as a list of words, though when combined with images and AI algorithms, it can have upsetting consequences. “The very aim of that [WordNet] project was to map words that are close to each other,” said Birhane. "But when you begin associating images with those words, you are putting a photograph of a real actual person and associating them with harmful words that perpetuate stereotypes.”
The fraction of problematic images and labels in these giant datasets is small, and it’s easy to brush them off as anomalies. Yet this material can lead to real harm if they’re used to train machine-learning models that are used in the real world, Prabhu and Birhane argued.
“The absence of critical engagement with canonical datasets disproportionately negatively impacts women, racial and ethnic minorities, and vulnerable individuals and communities at the margins of society,” they wrote in their paper.
#Intelligence_artificielle #Images #Reconnaissance_image #WordNet #Tiny_images #Deep_learning

Articles repérés par Hervé Le Crosnier @hlc CC BY

Écrire un commentaire
Articles repérés par Hervé Le Crosnier @hlc CC BY 8/03/2018

3

3

Google’s AI scans and tags millions of ’Life’ magazine photos
▻https://www.engadget.com/2018/03/07/googles-ai-scans-and-tags-millions-of-life-magazine-photos
https://o.aolcdn.com/images/dims?thumbnail=1200%2C630&quality=80&image_uri=https%3A%2F%2Fs.aolcdn.com%2Fhss%2Fstorage%2Fmidas%2Fb652f513f687f33b8f7ad50443088606%2F206190471%2FLifeMag.png&client=cbc79c14efcebee57402&signature=4120d08ae4b60b5992f7f670bda13416b6bf3c2d
Google is pretty big on art. Its technology has turned clumsy doodles into masterpieces, transformed smartphones into virtual exhibitions and, in a move that caused momentary internet hysteria, helped selfie-takers find their fine art doppelganger. Now it’s unveiled a new set of machine-learning experiments that not only make exploring art more engaging, but help solve some of the biggest challenges faced by curators and museums.
First up is Art Palette, which lets you choose a group of colors and then matches your selection to artworks from institutions around the world. Handy if you’re after some prints for your newly-decorated apartment, or if you’re wondering what masterpieces your outfit is channelling today. Then there’s Life Tags. Life magazine’s 70-year run saw millions upon millions of photos taken, but only five percent ever published. This tool unveils four million photographs from its archives and makes them instantly searchable via thousands of automatically created labels, from “astronauts” to “zombies”.
Finally, there’s the MoMA tool, which is big news for art curators and museums. The Museum of Modern Art in New York has been taking photos of its exhibitions since its first in 1929, but many of them were missing corresponding information. Identifying the art in each photo (and there are 30,000 of them) would have taken months, if not years. Google’s MoMA identification tool automatically recognizes the artworks in each photo, and has helped turn the pictures into an interactive archive of the museum’s exhibitions.
Much of Google’s art-focused machine learning technology has been directed at consumers — fun ways to immerse them in a world they may not otherwise have access to. But as these latest tools demonstrate, the practical applications are significant, saving curators hours of manual, tedious tasks — and everyone gets to enjoy the result.
#Google #Art #Reconnaissance_image #Photographie
- #Google
Articles repérés par Hervé Le Crosnier @hlc CC BY
Écrire un commentaire
Articles repérés par Hervé Le Crosnier @hlc CC BY 30/11/2017

Amazon Focuses on Machine Learning to Beat Cloud Rivals - Bloomberg
▻https://www.bloomberg.com/news/articles/2017-11-29/amazon-shows-new-cloud-services-in-bid-to-stay-ahead-of-rivals
https://assets.bwbx.io/images/users/iqjWHBFdfxIU/iqi4_xEqHleE/v0/1200x317.jpg
Amazon.com Inc. unveiled new machine-learning tools, including algorithms that automate decisions and speech recognition, seeking to solidify its dominant position over Microsoft Corp. and Alphabet Inc. in the fast-growing and profitable cloud-computing market.
While customers are interested in machine learning, many lack the resources and expertise that the cloud companies can provide.
The products introduced Wednesday further the evolution of AWS from its origins. Cloud-computing began as a way to cheaply gain computing power and data storage, letting customers rent space in data centers accessed via the internet rather than maintaining their own servers. The industry has turned into a race to provide customers tools and functions to use that data in new ways. Those tools are helping speed the transition to the cloud, since companies that don’t have access to them will be at a competitive disadvantage, Jassy said.
Amazon also showed off AWS DeepLens, a $249 device to help developers understand and experiment with machine learning. In a demonstration, the camera recognized a smile to be a positive reaction to a music album cover and a frown to be a negative reaction, enabling it to fine tune a customized playlist for the user. It can also program a garage door to open when the camera recognizes a license plate number. The device, which is intended to inspire developers to experiment with machine learning, also gives Amazon a look into how image- recognition technology is being used.
#Cloud #Intelligence_Artificielle #Amazon #Reconnaissance_images
Articles repérés par Hervé Le Crosnier @hlc CC BY
Écrire un commentaire
Nicolas Hoizey @nhoizey CC BY-NC-SA 21/03/2016

Extracting image metadata at scale at Netflix
▻http://techblog.netflix.com/2016/03/extracting-image-metadata-at-scale.html
#reconnaissance_image_photo_algorithme_visage_détection_crop_texte_placement
- #Netflix
Nicolas Hoizey @nhoizey CC BY-NC-SA
Écrire un commentaire
Nicolas Hoizey @nhoizey CC BY-NC-SA 21/03/2016

1

1

Shutterstock Has Trained A Computer To Find You The Perfect Photos
▻http://www.popsci.com/shutterstock-is-visualizing-images-in-whole-new-way
#reconnaissance_image_photo_algorithme

Nicolas Hoizey @nhoizey CC BY-NC-SA
- Nicolas🌱 @nicolasm CC BY-SA 22/03/2016
  
  un ptit souci dans tes tags
  
  Nicolas🌱 @nicolasm CC BY-SA
- Nicolas Hoizey @nhoizey CC BY-NC-SA 22/03/2016
  
  @nicolasm problème connu, malheureusement : ►http://seenthis.net/messages/324311
  
  Nicolas Hoizey @nhoizey CC BY-NC-SA
- Nicolas🌱 @nicolasm CC BY-SA 22/03/2016
  
  Ah bizarre j’avais pas remarqué le problème avant aujourd’hui
  
  Nicolas🌱 @nicolasm CC BY-SA
Écrire un commentaire
0gust1 @0gust1 CC BY-NC 22/05/2015

2

2

The Unreasonable Effectiveness of Recurrent Neural Networks
▻http://karpathy.github.io/2015/05/21/rnn-effectiveness
#à_lire #réseaux_de_neurones #dev #reconnaissance_images

0gust1 @0gust1 CC BY-NC
- 0gust1 @0gust1 CC BY-NC 23/05/2015
  
  Franchement, cet article est vraiment pas mal. Il vient de me remettre à niveau sur les réseaux de neurones (sujet que je n’ai pas retouché depuis ma maîtrise – Modèle ed Donahoe & Palmer).
  Pédagogique, vulgarisateur, mais néanmoins pointu.
  
  0gust1 @0gust1 CC BY-NC
Écrire un commentaire

#reconnaissance_image