Selenium 结合 ddddocr库 自动化验证码
带带弟弟OCR 库
最近(2023年11月)发现一个 Python 的验证码识别库,名为 带带弟弟OCR,库代码名为 ddddocr 。
点击这里打开 ddddocr 在 PYPI 的主页
这个库结合Seleium,可以做 网页自动化通过验证码校验。
安装
执行如下命令即可安装 ddddocr 库
pip3 install ddddocr
当前由于这个库依赖 pillow ,但是这个库没有指定依赖的pillow版本, 目前和最新的 pillow 10 版本不兼容,
需要运行下面的命令,指定安装 pillow 9.5
pip3 install pillow==9.5
使用
大家可以参考PYPI的文档说明使用该库
下面是一段示例代码
# 安装的 pillow 版本 要 9.5, 不能是 10, 否则 会有错误 module 'PIL.Image' has no attribute 'ANTIALIAS'
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
import ddddocr
ocr = ddddocr.DdddOcr()
options = webdriver.ChromeOptions()
options.add_experimental_option('excludeSwitches', ['enable-logging']) # for ignore warning and error
driver = webdriver.Chrome(options=options)
driver.implicitly_wait(4)
driver.get('http://127.0.0.1/1.html')
time.sleep(2) # 这里出现captcha时间有点长,等待2秒
while True:
# 获取元素展示内容为图片数据
pngData = driver.find_element(By.ID,'captcha').screenshot_as_png
# with open('d:/tmp1.png', 'wb') as f:
# f.write(pngData)
res = ocr.classification(pngData)
print('验证码是', res)
ch = input('')
if ch != '':
break
driver.refresh()
自动化的网页代码如下
<html>
<head>
<meta charset="utf-8">
<style>
input[type=text] {
padding: 12px 20px;
display: inline-block;
border: 1px solid #ccc;
border-radius: 4px;
box-sizing: border-box;
}
button{
background-color: #4CAF50;
border: none;
color: white;
padding: 12px 30px;
text-decoration: none;
margin: 4px 2px;
cursor: pointer;
}
canvas{
/*prevent interaction with the canvas*/
pointer-events:none;
}
</style>
</head>
<body onload="createCaptcha()">
<form onsubmit="validateCaptcha()">
<div id="captcha">
</div>
<input type="text" placeholder="Captcha" id="cpatchaTextBox"/>
<button type="submit">Submit</button>
</form>
</body>
<script>
var code;
function createCaptcha() {
//clear the contents of captcha div first
document.getElementById('captcha').innerHTML = "";
var charsArray =
"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ你我他们大小多少";
var lengthOtp = 6;
var captcha = [];
for (var i = 0; i < lengthOtp; i++) {
//below code will not allow Repetition of Characters
var index = Math.floor(Math.random() * charsArray.length + 1); //get the next character from the array
if (captcha.indexOf(charsArray[index]) == -1)
captcha.push(charsArray[index]);
else i--;
}
var canv = document.createElement("canvas");
canv.id = "captcha";
canv.width = 100;
canv.height = 50;
var ctx = canv.getContext("2d");
ctx.font = "25px Georgia";
ctx.strokeText(captcha.join(""), 0, 30);
//storing captcha so that can validate you can save it somewhere else according to your specific requirements
code = captcha.join("");
document.getElementById("captcha").appendChild(canv); // adds the canvas to the body element
}
function validateCaptcha() {
event.preventDefault();
debugger
if (document.getElementById("cpatchaTextBox").value == code) {
alert("Valid Captcha")
}else{
alert("Invalid Captcha. try Again");
createCaptcha();
}
}
</script>
</html>