Elasticsearch导入小量级csv格式数据-Python


环境:

Elasticsearch6.2.2 

Kibana6.2.2

x-pack6.2.2

python2.7

elasticsearch-analysis-ik6.2.2(中文分词插件)

下面开始创建以及导入

首先创建Index: 

curl -X PUT elastic:changeme@localhost:9200/santak -H 'Content-Type:application/json' -d'
{
  "mappings": {
    "information": {
      "properties": {
        "userid": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "password": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "mail": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "phone": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "username": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "registration_time": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        },
        "ip": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"
        }
      }
    }
  }
}'

返回

{
  "acknowledged":true,
  "shards_acknowledged":true
}

 里面的acknowledged:true 表示创建成功

这里是创建了一个名为santak的Index,这个Index中包含一个名为information的Type,information

userid

password

mail

phone

username

registration_time

ip

这七个个字段都是中文,而且类型都是文本(text),所以需要指定中文分词器,不能使用默认的英文分词器。
Elastic 的分词器称为 analyzer。对每个字段指定分词器。

        "ip": {
          "type": "text",
          "analyzer": "ik_max_word",
          "search_analyzer": "ik_max_word"

查询创建好的Index:

可以用下面Postman工具进行请求,也可以用curl "elastic:changeme@localhost:9200/_cat/indices?v" 



下面开始导入csv文件

首先查看csv数据格式以及内容

root@localhost:/home/data# wc -l alluser.csv
43636 alluser.csv
root@localhost:/home/data# tail -n 1 alluser.csv
xiaoming,89805340e40543f85fa69384a21d9032,test@test.com,19951978816,xiaoming,2018-03-29 12:55:23,"153.3.56.76,中国江苏南京"

共43636条数据,7个字段依次对应elasticsearch中的索引,","逗号分割

分析一通,直接上python代码,20行搞定

#!/usr/bin/env python2.7
# coding:utf-8
import csv
import json , sys
import requests

csvfile = file('/home/data/alluser.csv', 'rb')
reader = csv.reader(csvfile)

for line in reader:
    str = json.dumps(line,encoding="UTF-8",ensure_ascii=False)
    list = json.loads(str)
    #print list[0]
    url = 'http://localhost:9200/santak/information/'
    headers = {"Content-type": "application/json"}
    data = {"userid": list[0], "password": list[1], "mail": list[2], "phone": list[3], "username": list[4], "registration_time": list[5], "ip": list[6]}
    format_data = json.dumps(data)
#    print format_data
    res = requests.post(url, data=format_data, headers=headers, auth=('elastic', 'change'))
    print res.text

csvfile.close() 

使用姿势: 修改auth=('elastic', 'change')为正确的elasticsearch用户名密码,修改 /home/data/alluser.csv 为正确的csv文件的绝对路径后运行即可


I only do what I like, and this is ideal life.