Lino Galiana sur X @LinoGaliana
Insee : données détaillées du recensement au format #Parquet
Lino Galiana sur X @LinoGaliana
Insee : données détaillées du recensement au format #Parquet
Ce guide présente quelques exemples d’utilisation des données du recensement de la population diffusées au format Parquet. Il s’agit d’une version HTML enrichissant le guide publié sur le site insee.fr pour les langages Python et R avec des exemples interactifs pouvant être construits par le biais de Quarto Markdown et Observable.
Que veut dire « #Libre » (ou « #open_source ») pour un grand #modèle de langage ?
Le flou entretenu entre open source et libre, déjà ancien et persistant dans l’industrie des technologies de l’information, revêt une nouvelle importance maintenant que les entreprises se lancent dans la course aux #IA… Explications, décantation et clarification par Stéphane Bortzmeyer, … Lire la suite
Utiliser l’API de .fr en Python (épisode 2 sur 4) ▻https://www.afnic.fr/observatoire-ressources/papier-expert/utiliser-lapi-de-fr-en-python-2-4
Using the .fr API with Python (episode 2 of 4) ▻https://www.afnic.fr/en/observatory-and-resources/expert-papers/using-the-fr-api-with-python-2-4
Un petit script #python pour décoder l’entête
X-VR-SPAMCAUSE des #mails marqués comme #SPAM, bien utile car tous les outils en ligne à ce sujet ne sont plus disponibles cf ▻https://wiki.visionduweb.fr/index.php?title=Installer_Exim#D.C3.A9crypter_le_contenu_de_la_vari
text = 
for i in range(0, len(msg), 2):
text.append(unrot(msg[i: i + 2]))
return str.join('', text)
def unrot(pair, key=ord('x')):
offset = 0
for c in 'cdefgh':
if c in pair:
offset = (ord('g') - ord(c)) * 16
return chr(sum(ord(c) for c in pair) - key - offset)
if __name__ == '__main__':
Can remove the background from a remote image, a local file or all images in a folder.
Du python dans le navigateur...
from browser import document
document <= "Hello !"
Extraction de données en CSV depuis un PDF
Détection des cellules pour repérer le découpage des tables dans les pages
$ python pdf_to_csv.py
file = "file.pdf"
tables = camelot.read_pdf(file, table_regions=['81,475,761,86'], pages="1-end")
tables.export("def.csv", f="csv", compress = True)
$ cat *.csv >merged.csv
Une solution pour rapidement déployer et administrer un serveur de messagerie complet, compatible avec les dernières normes et optimisé pour la distribution et la protection de la réputation.
Modoboa is a #mail hosting and management platform including a modern and simplified Web User Interface. It provides useful components such as an administration panel or a webmail.
Modoboa integrates with well known software such as Postfix or Dovecot. A SQL database (MySQL, PostgreSQL or SQLite) is used as a central point of communication between all components.
Modoboa is developed with modularity in mind, expanding it is really easy. Actually, all current features are extensions.
It is written in #Python 3 and uses the Django, jQuery and Bootstrap frameworks.
“Researchers find bug in #Python script may have affected hundreds of [scientific] studies [in biology]”
“#Willoughby_Hoye” scripts used OS call that caused incorrect measurements on some platforms.
The paper showing the problem: ▻https://pubs.acs.org/doi/full/10.1021/acs.orglett.9b03216
Python’s creator thinks it has a diversity problem — Quartz
In a rare interview with the programmer in October last year, which was recently published on YouTube, he was asked about the lack of diversity among the people working on open-source programming languages. He noted that it was an issue, and said that those who ignore it, because open-source projects are available for anyone to contribute, are not seeing the full picture.
“It’s not just joining a project that’s the problem, it’s staying in the project, which means you have to feel comfortable exchanging emails and code reviews… with people that you don’t know personally but you communicate frequently with online,” he said. Van Rossum thinks that these exchanges can be difficult for women because of unconscious bias and male-driven cultural norms within open-source communities.
“It’s not just about writing the code, but you have stand up for your code and defend your code, and there is a certain male attitude that is endemic in many projects where a woman would just not feel comfortable claiming that she is right,” he explained. “A guy who knows less than that woman might honestly believe [he is right], so they present a much more confident image.” In his experience, van Rossum sees incompetent men’s ideas gaining acceptance more often than merited because they are more forceful in how they present them.
Van Rossum believes that the different attitudes of women and men in programming communities is due to wider societal problems that we need to fix from the bottom up. “I’ve always felt that feminism was right and we need to change the whole society,” he said. In the meantime, he feels a responsibility to act in the places he has influence, like in the Python community.
He believes the key to making open-source communities more inclusive is establishing (and enforcing) codes of conduct and mentoring. Van Rossum says that he now mentors women and underrepresented minority programmers. “But white guys can forget it,” he said. “They are not the ones who need it most.” (In typical programmer speak, he calls mentoring a “completely distributed, democratic approach.”)
Rather, he thinks it’s important that men are educated about their biases. “[There are] some guys who are super defensive when you tell about this shit, but the majority of guys just don’t know any better,” he said. “The first time I heard the term unconscious bias was maybe five years ago and it was an eye opener.” It’s changed him, and he thinks it could change others.
Utilisation de la vision par ordinateur pour redresser des images
Dans un module de comparaison d’images, lorsque deux photographies ne sont pas cadrées de la même manière, non-superposable, c’est frustrant. On vous propose ici d’y remédier avec du redressement d’images par homographie.
How to Hire a Python Developer With Right Skill Set?
Bram Cohen has beautifully crafted Python language in a nutshell, as “simple, clean syntax, object encapsulation, good library support and optional named parameters”.Hence hiring a Python developer is the best approach for any company where it has a huge potential to grow any business to a great extent. Some of the pioneers in the technology industry like YouTube, Reddit, NASA, PayPal, Spotify, Quora etc are the popular projects that are built using Python language. Hire a python developer to get benefited from the compelling features of the Python program.Why Python is a preferable language among the companies?In the era of Artificial Intelligence and Machine Learning certain programming languages always have a standard demand in the market irrespective of the evolution of other niche (...)
#python is First Step to Data Science
The steadily increasing importance of data science across industries has led to a rapid demand for data scientists. It’s been said that the role of data scientist is the 21st century’s sexiest job title. If you wonder why it has become such a sought after position these days, the short answer is that there has been a huge explosion in both data generated and captured by organizations and common people and data scientists are the people who derive valuable insights from that data and figure out what can be done with it.If you go through some job advertisements for data scientists, you’ll see that expertise in data science and Python are considered as two of the most crucial skills described.In this post, we’re going to discuss why these skills are considered must for data scientists.1- What (...)
Pyodide: Bringing the scientific #Python stack to the browser - Mozilla Hacks - the Web developer blog
Pyodide is an experimental project from Mozilla to create a full Python data science stack that runs entirely in the browser.
Our 25 Favorite Data Science Courses From Harvard To Udemy
Originally Posted HereLearning every facet of data science takes time. We have written pieces on different resources before. But we really wanted to focus on courses, or video like courses on youtube.There are so many options, it can be nice to have a list of classes worth taking.We are going to start with the free data science options so you can decide whether or not you want to start investing more in courses.Tip : Coursera can make it seem like the only option is to purchase the course. But they do have an audit button on the very bottom. Now, if you appreciate Coursera, by all means, you should purchase their specialization, I am still uncertain how I feel about it. But, I do love taking Coursera courses.Select the audit course option to not pay for the courseBootcamps and (...)
Semantic Versioning 101
Semantic Versioning 2.0.0 (semver.org) is a robust and elementary standard that encapsulates a wealth of information about the software you’re publishing or consuming.Open source veterans know and understand the importance of this standard. If you’ve run a project in long-term maintenance mode, you come to realize its power one way or another. Still, enthusiastic, fast-moving dev teams like to find ways around this standard. I’ve seen more than a few engineers decide to invent their own ideas around major, minor, and patch increments. Their rationale is rooted in aesthetics or their own release schedule.A key principleAside from the concise and complete information at semver.org, it is critical to understand:Semantic versioning is for your consumers. It’s not for your release schedule or (...)
10 Great Articles On Data Science And Data Engineering
Data science and #programming are such rapidly expanding specialities it is hard to keep up with all the articles that come out from Google, Uber, Netflix and one off engineers. We have been reading several over the past few weeks and wanted to share some of our top blog posts for this week April 2019!We hope you enjoy these articles.Building and Scaling Data Lineage at NetflixBy: Di Lin, Girish Lingappa, Jitender AswaniImagine yourself in the role of a data-inspired decision maker staring at a metric on a dashboard about to make a critical business decision but pausing to ask a question — “Can I run a check myself to understand what data is behind this metric?”Now, imagine yourself in the role of a software engineer responsible for a micro-service which publishes data consumed by few critical (...)
Building a #serverless Data Pipeline with #aws S3 Lamba and DynamoDB
AWS Lambda plus Layers is one of the best solutions for managing a data pipeline and for implementing a serverless architecture. This post shows how to build a simple data pipeline using AWS Lambda Functions, S3 and DynamoDB.What this pipeline accomplishes?Every day an external datasource exports data to S3 and imports to AWS DynamoDB table.PrerequisitesServerless frameworkPython3.6PandasdockerHow this pipeline worksOn a daily basis, an external data source exports data of the pervious day in csv format to an S3 bucket. S3 event triggers an AWS Lambda Functions that do ETL process and save the data to DynamoDB.Install Serverless FrameworkBefore getting started, Install the Serverless Framework. Open up a terminal and type npm install -g serverless to install Serverless framework.Create (...)