SVD=>MF=>ALS=>FM=>FFM

发表于 2018-02-09 | 分类于 ml

前言
以下为近期对矩阵分解到银子分解机研究的总结。本来想把公式都打进去，无奈对mathjax实在不熟，直接上图吧。

1. SVD

$R=PSV$
其中$S$为对角矩阵

2. MF(LMF)

$R=P^T*Q$

BasicMF:
loss:$ \sum_{i=1}^n $

阅读全文 »

Spark中实现ItemCF、UserCF

发表于 2018-01-18 | 分类于 recsys

前言
以下为对实际生产环境中ItemCF、UserCF召回策略的总结

1. Spark中带分数的itemBaseCF

1.1 不考虑用户打分的差异性

1.item , (user,score)
2.item , list[(user,score)] ==> item , sqrt(|item|) --> sqrt_item
3.item , (user,score,sqrt_item) ==> user , list[(item,score,sqrt_item)]
4.cal_score  (item1,item2), item1_score * item2_score/sqrt_item1 * sqrt_item2          
5.group by add => (item1,item2) , similar_score   ---这就是余弦距离的计算方法sim=dot(d1,d2)/sqrt(|d1|)*sqrt(|d2|)
6.item1，list[(item2,similar_score)]
7.item1, list[topK]

1.2 考虑用户打分的差异性

-1.user , (item,score)
-2.user , list[(item,score)] ==> user, avg(score) --》 avg_score
-3.user , (item,score,avg_score) ==> user , (item,adj_score)
下面的score就是上面的adj_score
1.item , (user,score)
2.item , list[(user,score)] ==> item , sqrt(|item|) --> sqrt_item
3.item , (user,score,sqrt_item) ==> user , list[(item,score,sqrt_item)]
4.cal_score  (item1,item2), item1_score * item2_score/sqrt_item1 * sqrt_item2          
5.group by add => (item1,item2) , similar_score   ---这就是余弦距离的计算方法sim=dot(d1,d2)/sqrt(|d1|)*sqrt(|d2|)
6.item1，list[(item2,similar_score)]
7.item1, list[topK]

2. Spark中不带分数的itemBaseCF

1.item , user
2.item , list[user] ==> item , sqrt(item_cnt) --> sqrt_item
3.item , (user,sqrt_item) ==> user , list[(item,sqrt_item)]
4.cal_score  (item1,item2), 1 * 1/sqrt_item1 * sqrt_item2          
5.group by add => (item1,item2) , similar_score   ---这就是余弦距离的计算方法sim=d1_d2_cnt/sqrt(d1_cnt)*sqrt(d2_cnt)
6.item1，list[(item2,similar_score)]
7.item1, list[topK]

注:

计算方法只是把上面余弦距离的score全都置为1了
本方法最终也就是是共现矩阵的余弦相似度计算方法 d1_d2_cnt/sqrt(d1_cnt*d1_cnt)

3. cal_score相似度计算方法

def cal_score(line):
"""
Args: line:汇总的一个用户的评分行为，tuple格式为：(user, list[(item,score,sqrt_item)])
Returns: item间相似度，tuple格式为:(item1\1item2, similar_score)
"""
user,item = line
index1 = 0
while index1 < len(item):
    index2 = index1 + 1
    while index2 < len(item):

        if item[index1][0] == item[index2][0] :
            index2 += 1
            continue

        item1_score = float(item[index1][1])
        item1_score = float(item[index2][1])
        sqrt_item1 = float(item[index1][2])
        sqrt_item2 = float(item[index2][2])
        try:
            similar = item1_score*item2_score/(sqrt_item1*sqrt_item2)
            if item[index1][0] != None and item[index2][0] != None:
                 key = item[index1][0] + "\1" + item[index2][0]
                 yield (key, similar)
        except:
            pass
        index2 += 1
    index1 += 1

4. UserCF 和ItemCF原理相同

将user , list[(item,score,sqrt_item)] 换为 item , list[(user,score,sqrt_user)] 即可

阅读全文 »

Python 正则表达式

发表于 2015-02-06 | 分类于 python

Python 正则表达式的使用经验

1.Python正则表达式指南

http://www.cnblogs.com/huxi/archive/2010/07/04/1771073.html

2.正则表达式30分钟入门教程

http://www.jb51.net/tools/zhengze.html

3.示例

pattern = re.compile(r'Regulator')
result=''
p=pattern.search("str")
if p:
    result = p.group()
print result

阅读全文 »

Python ConfigParser的使用

发表于 2015-02-06 | 分类于 python

Python ConfigParser的使用经验

1.基本的读取配置文件

-read(filename) 直接读取ini文件内容
-sections() 得到所有的section，并以列表的形式返回
-options(section) 得到该section的所有option
-items(section) 得到该section的所有键值对
-get(section,option) 得到section中option的值，返回为string类型
-getint(section,option) 得到section中option的值，返回为int类型，还有相应的getboolean()和getfloat() 函数。

2.基本的写入配置文件

1 2	-add_section(section) 添加一个新的section -set( section, option, value) 对section中的option进行设置，需要调用write将内容写入配置文件。

3.基本例子

test.conf

[sec_a]
a_key1 = 20
a_key2 = 10

[sec_b]
b_key1 = 121
b_key2 = b_value2
b_key3 = $r
b_key4 = 127.0.0.1

parse_test_conf.py

#!/usr/bin/env python
#coding:utf8

import ConfigParser

cf = ConfigParser.ConfigParser()

#read config
cf.read("test.conf")

# return all section
secs = cf.sections()
print 'sections:', secs

opts = cf.options("sec_a")
print 'options:', opts

kvs = cf.items("sec_a")
print 'sec_a:', kvs

#read by type
str_val = cf.get("sec_a", "a_key1")
int_val = cf.getint("sec_a", "a_key2")

print "value for sec_a's a_key1:", str_val
print "value for sec_a's a_key2:", int_val

#write config
#update value
cf.set("sec_b", "b_key3", "new-$r")
#set a new value
cf.set("sec_b", "b_newkey", "new-value")
#create a new section
cf.add_section('a_new_section')
cf.set('a_new_section', 'new_key', 'new_value')

#write back to configure file
cf.write(open("test.conf", "w"))

得到终端输出：
sections: [‘sec_b’, ‘sec_a’]
options: [‘a_key1’, ‘a_key2’]
sec_a: [(‘a_key1’, “i’m value”), (‘a_key2’, ‘22’)]
value for sec_a’s a_key1: i’m value
value for sec_a’s a_key2: 22

更新后的test.conf

[sec_b]
b_newkey = new-value
b_key4 = 127.0.0.1
b_key1 = 121
b_key2 = b_value2
b_key3 = new-$r

[sec_a]
a_key1 = i’m value
a_key2 = 22

[a_new_section]
new_key = new_value

4.Python的ConfigParser Module

Python的ConfigParser Module中定义了3个类对INI文件进行操作。分别是RawConfigParser、ConfigParser、SafeConfigParser。RawCnfigParser是最基础的INI文件读取类，ConfigParser、SafeConfigParser支持对%(value)s变量的解析。

设定配置文件test2.conf

[portal]
url = http://%(host)s:%(port)s/Portal
host = localhost
port = 8080

使用RawConfigParser：

import ConfigParser

cf = ConfigParser.RawConfigParser()

print "use RawConfigParser() read"
cf.read("test2.conf")
print cf.get("portal", "url")

print "use RawConfigParser() write"
cf.set("portal", "url2", "%(host)s:%(port)s")
print cf.get("portal", "url2")

得到终端输出：
use RawConfigParser() read
http://%(host)s:%(port)s/Portal
use RawConfigParser() write
%(host)s:%(port)s

改用ConfigParser：

import ConfigParser

cf = ConfigParser.ConfigParser()

print "use ConfigParser() read"
cf.read("test2.conf")
print cf.get("portal", "url")

print "use ConfigParser() write"
cf.set("portal", "url2", "%(host)s:%(port)s")
print cf.get("portal", "url2")

得到终端输出：
use ConfigParser() read
http://localhost:8080/Portal
use ConfigParser() write
localhost:8080

改用SafeConfigParser：

import ConfigParser

cf = ConfigParser.SafeConfigParser()

print "use SafeConfigParser() read"
cf.read("test2.conf")
print cf.get("portal", "url")

print "use SateConfigParser() write"
cf.set("portal", "url2", "%(host)s:%(port)s")
print cf.get("portal", "url2")

得到终端输出(效果同ConfigParser)：
use SafeConfigParser() read
http://localhost:8080/Portal
use SateConfigParser() write
localhost:8080

阅读全文 »

git使用记录

发表于 2014-09-09 | 分类于 Hexo

git的使用记录

1.常用命令

1.1 git clone

1	git clone https://code.jd.com/jd_49526c7409da9/jae_liuhongbin_2.git

1.2 git status

1	git status

修改后

1.3 git add

git add .

1.4 git commit

1	git commit -m 'new'

1.5 git push

git push

1.6 git rm

1	git rm index.txt

1.7 git commit

1	git commit -m 'remove'

1.8 git push

git push

1.9 hexo 常用git

git init
git checkout -b hexo
vi .gitignore
git add .
git commit -m 'first'
git push origin :hexo
git push origin hexo

阅读全文 »

研二这一年那点项目

发表于 2014-06-30 | 分类于那些年的小项目

研二这一年那点项目

前言
研二这一年一直待在南通，跟老板的项目(VB做的，对老板无语了，这么旧的东西还在坚持，还有其他各种无语，关于人品的——你懂得，就不在这里吐槽了)
下面直接记录这一年多对这个项目技术的一些总结。

1. VB项目生成

直接采用VB6.0的文件->生成

2. VB项目打包

用setup factory 6.0进行打包

3. 项目部署

3.1 SqlServer2005

3.2 IIS上架载一个工作池or网站目录

阅读全文 »

Spark初探——关于分享Spark的一些感悟

发表于 2014-06-24 | 分类于 Spark

本文主要用于分享Spark的一些感悟

阅读全文 »

Mahout开发环境搭建及推荐系统初探

发表于 2014-06-18 | 分类于那些年的小项目

Mahout开发环境搭建及推荐系统初探

1.用Maven构建Mahout项目

1.1 Maven介绍和安装

下载Maven：http://maven.apache.org/download.cgi

下载最新的xxx-bin.zip文件，在win上解压到 F:\ProgramFiles\maven-3.2.1
并把maven/bin目录设置在环境变量PATH：

然后，打开命令行输入mvn -version，我们会看到mvn命令的运行效果

Apache Maven 3.2.1 (ea8b2b07643dbb1b84b6d16e1f08391b666bc1e9; 2014-02-15T01:37:5
2+08:00)
Maven home: F:\ProgramFiles\maven-3.2.1\bin\..
Java version: 1.6.0_20, vendor: Sun Microsystems Inc.
Java home: F:\Program Files\Java\jdk1.6.0_20\jre
Default locale: zh_CN, platform encoding: GBK
OS name: "windows xp", version: "5.1", arch: "x86", family: "windows"

安装Eclipse的Maven插件: http://www.eclipse.org/m2e/

2.用Maven构建Mahout开发环境

2.1用Maven创建一个标准化的Java项目

1
2

D:\Code\workspace>mvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes
-DgroupId=org.lhb.mymahout -DartifactId=myMahout -DpackageName=org.lhb.mymahout -Dversion=1.0-SNAPSHOT -DinteractiveMode=false

进入项目，执行mvn命令

1 2	D:\Code\workspace>cd myHadoop D:\Code\workspace\myHadoop>mvn clean install

2.2导入项目到eclipse

我们创建好了一个基本的maven项目，然后导入到eclipse中。这里我们最好已安装好了Maven的插件。

2.3增加hadoop依赖

这里我使用hadoop-1.0.3版本，修改文件：pom.xml

vi pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>org.conan.mymahout</groupId>
  <artifactId>myMahout</artifactId>
  <packaging>jar</packaging>
  <version>1.0-SNAPSHOT</version>
  <name>myMahout</name>
  <url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<mahout.version>0.8</mahout.version>
</properties>

<dependencies>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-core</artifactId>
<version>${mahout.version}</version>
</dependency>
<dependency>
<groupId>org.apache.mahout</groupId>
<artifactId>mahout-integration</artifactId>
<version>${mahout.version}</version>
<exclusions>
<exclusion>
<groupId>org.mortbay.jetty</groupId>
<artifactId>jetty</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.cassandra</groupId>
<artifactId>cassandra-all</artifactId>
</exclusion>
<exclusion>
<groupId>me.prettyprint</groupId>
<artifactId>hector-core</artifactId>
</exclusion>
</exclusions>
</dependency>
</dependencies>
</project>

2.4下载依赖

1	mvn clean install

在eclipse中刷新项目:

2.5从Hadoop集群环境下载hadoop配置文件

core-site.xml
hdfs-site.xml
mapred-site.xml

vim core-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>
  <property>
    <name>fs.default.name</name>
    <value>hdfs://Master.Hadoop:9000</value>
  </property>
  <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/hadoop/tmp</value>
  </property>
  <property>
    <name>io.sort.mb</name>
    <value>1024</value>
  </property>
</configuration>

vim hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>dfs.data.dir</name>
    <value>/usr/hadoop/data</value>
  </property>
  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <name>dfs.permissions</name>
    <value>false</value>
  </property>
</configuration>

vim mapred-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
  <property>
    <name>mapred.job.tracker</name>
    <value>hdfs://Master.Hadoop:9001</value>
  </property>
</configuration>

2.6配置本地host，增加master的域名指向

1
2
3

vi c:/Windows/System32/drivers/etc/hosts

192.168.1.210 Master.Hadoop

3.用Mahout实现协同过滤userCF

3.1新建数据文件: item.csv

mkdir datafile
vi datafile/item.csv

1,101,5.0
1,102,3.0
1,103,2.5
2,101,2.0
2,102,2.5
2,103,5.0
2,104,2.0
3,101,2.5
3,104,4.0
3,105,4.5
3,107,5.0
4,101,5.0
4,103,3.0
4,104,4.5
4,106,4.0
5,101,4.0
5,102,3.0
5,103,2.0
5,104,4.0
5,105,3.5
5,106,4.0

数据解释：每一行有三列，第一列是用户ID，第二列是物品ID，第三列是用户对物品的打分。

3.2新建JAVA类：org.lhb.mymahout.UserCF.java

package org.lhb.mymahout;

import java.io.File;
import java.io.IOException;
import java.util.List;

import org.apache.mahout.cf.taste.common.TasteException;
import org.apache.mahout.cf.taste.impl.common.LongPrimitiveIterator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.EuclideanDistanceSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.recommender.RecommendedItem;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;

/**
 * @author LHB
 *
 */
public class UserCF {

  /**
   * @param LHB
   * @throws IOException
   * @throws TasteException
   */
  final static int NEIGHBORHOOD_NUM=2;
  final static int RECOMMENDER_NUM=3;
  public static void main(String[] args) throws IOException, TasteException {
    // TODO Auto-generated method stub
    String file= "datafile/item.csv";
    //构建数据模型
    DataModel model=new FileDataModel(new File(file));
    //构建用户相似度模型 EuclideanDistanceSimilarity:欧式距离相似度
    UserSimilarity similar=new EuclideanDistanceSimilarity(model);
    //最近邻 NEIGHBORHOOD_NUM个最近邻
    NearestNUserNeighborhood neighbor=new NearestNUserNeighborhood(NEIGHBORHOOD_NUM,similar,model);
    //构建推荐模型
    Recommender recommender=new GenericUserBasedRecommender(model,neighbor,similar);

    LongPrimitiveIterator iter=model.getUserIDs();

    while(iter.hasNext()){
      Long uid=iter.nextLong();
      List<RecommendedItem> list=recommender.recommend(uid, RECOMMENDER_NUM);
      System.out.printf("uid:%s",uid);
      for(RecommendedItem item:list){
        System.out.printf("(%s,%f)",item.getItemID(),item.getValue());
      }
      System.out.println();
    }

  }

}

3.3运行程序控制台输出:

SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details.
uid:1(104,4.274336)(106,4.000000)
uid:2(105,4.055916)
uid:3(103,3.360987)(102,2.773169)
uid:4(102,3.000000)
uid:5

3.4推荐结果解读

向用户ID1，推荐前二个最相关的物品, 104和106
向用户ID2，推荐前二个最相关的物品, 但只有一个105
向用户ID3，推荐前二个最相关的物品, 103和102
向用户ID4，推荐前二个最相关的物品, 但只有一个102
向用户ID5，推荐前二个最相关的物品, 没有符合的

阅读全文 »

Hive UDF函数开发

发表于 2014-06-10 | 分类于 Hive

Hive学习笔记系列，Hive UDF函数开发

1.在eclipse中新建java project

然后新建class package(com.jd.lhb) name(UDFLower)

2.添加jar library

添加hadoop-core-1.0.3.jar hive-exec-0.8.1.jar两个文件到project
build path

3.编写代码

package com.jd.lhb;

import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.Text;

@Description(name="my_lower",
		value="_FUNC_(String) - transfer the upper string to lowwer",
		extended="Example:\n"
			+ " > select _FUNC_(String) from src;\n")
public class UDFLower extends UDF{

	/**
	 * @author lhb
	 * @param s
	 * 将大写的字符串s	transfer to小写的字符串
	 */
	public Text evaluate(final Text s){
		if(null==s){
			return null;
		}
		return new Text(s.toString().toLowerCase());
	}
}

4.编译打包文件为HiveUDF.jar

5.加载至hive中

1
2
3

hive> add jar /home/hadoop/udf/HiveUDF.jar
Added udf_hive.jar to class path
Added resource: udf_hive.jar

6.测试准备

6.1data.txt文件内容为

WHO
AM
I
HELLO

6.2创建测试数据表

hive> create table dual (info string);

6.3导入数据文件data.txt

1	hive> load data local inpath '/home/hadoop/data.txt' into table dual;

6.4创建udf函数

1	hive> create temporary function my_lower as 'com.afan.UDFLower';

7.测试

1
2
3

hive> select info from dual;
使用udf函数
hive> select my_lower(info) from dual;

阅读全文 »

Hive学习笔记Chapter6——HQL:查询

发表于 2014-05-27 | 分类于 Hive

Hive学习笔记系列，HQL:查询。

1.select…from

1.1使用正则表达式来指定列

1	select symbol,`price.*` from stocks;

1.2算术运算符

A+B
A-B
A*B
A/B     A除以B
A%B     A除以B的余数
A&B     A和B按位取与
A|B     A和B按位取或
A^B     A和B按位取异或
~A      A按位取反

1.3使用函数

1.3.1数学函数

round(Double d)			返回d的BIGINT类型的近似值
round(Double d,INT n)		返回d的Double类型的,保留n位小数的近似值
floor(Double d)			返回<=d的最大BIGINT类型的近似值
ceil(Double d)/ceiling(Double d)		返回>=d的最小的BIGINT类型的近似值
rand() 	rand(Int seed)		返回一个Double类型的随机数，整数seed是随机因子。
exp(Double d)			返回e的d幂次方，返回的是个Double类型值
pow/power(Double d,Double p)	返回d的p次幂
sqrt(Double d)			计算d的平方根
bin(BIGINT i)			计算二进制值i的string类型值
hex(BIGINT i)			计算十六进制值i的string类型值
hex(string str)			计算十六进制表达的值str的string类型值
hex(BINARY b)			计算二进制值d的string类型值
unhex(string i)			hex(string i)的逆操作
conv(string num,Int from_base,Int to_base)	将string类型的num从from_base进制转化为to_base进制，并返回string类型的结果
pmod(INT i1,INT i2)		INT值i1对INT值i2取模，结果也是INT型
pmod(Double i1,Double i2)	Double值i1对Double值i2取模，结果也是Double型
e()				数学常数e
pi()				数学常数pi

1.3.2聚合函数

sum(DISTINCT col)		指定列排重后的总和
variance(col)/var_pop(col)	返回集合col的一组数值的方差
var_samp(col)			返回集合col的一组数值的样本方差
stddev_pop(col)			返回集合col的一组数值的标准偏差
stddev_samp(col)		返回集合col的一组数值的样本方差
covar_pop(col1,col2)	返回两组数值的协方差
covar_samp(col1,col2)	返回两组数值的样本协方差
corr(col1,col2)			返回两组数值的相关系数
percentile(BIGINT int_expr,p)			int_expr在p(范围是：[0,1])处对应的百分比，其中p是一个double型数值
percentile(BIGINT int_expr,ARRAY(p1[,p2]...))			int_expr在p1(范围是：[0,1])处对应的百分比，其中p1是一个double型数值
histogram_numeric(col,NB)	返回NB数量的直方图仓库数组，返回结果中的值x是中心，值y是仓库的高
collect_set(col)		返回集合col元素排重后的数组

注：可以通过设置属性hive.map.aggr的值为true来提高聚合的性能

1	hive> set hive.map.aggr=true;

1.3.3生成表函数

explode(ARRAY array)	返回0到多行结果，每行都对应输入的array数组中的一个元素
explode(MAP map)		返回0到多行结果，每行都对应一个map键-值对
explode(ARRAY<TYPE> a)		对于a中的每个元素，explode()会生成一行记录包含这个元素
inline(ARRAY <struct[,struct]>)		将结构体数组提取出来并插入到表中
json_tuple(String jsonStr,p1,p2,...,pn)		本函数可以接受多个标签名称，对输入的json字符串进行处理
parse_url_tuple(url,partname1,partname2,...,partnameN)		从url中解析出N个部分的信息
stack(INT n,col1,...,colM)		把M列转化为N行，每行有M/N个字段。其中n必须是常数

1.3.4其他内置函数

cast(<expr> as <type>)				将expr转化为type类型。
contact(BINARY s1,BINARY s2)			将二进制字符s1,s2拼接成一个字符串
contact(String s1,String s2)			将字符串s1,s2拼接成一个字符串
contact_ws(String separator,String s1,String s2)和contact类似，不过是使用指定的字符分隔开的
decode(BINARY bin,String charset)		使用指定的字符集charset将二进制值bin解码成字符串
encode(String src,String charset)		使用指定的字符集charset将字符串src编码成二进制值
find_in_set(String s,String commaSepa-ratedString)返回在以逗号分隔的字符串中S出现的位置
format_number(NUMBER x,INT d)			将数值x转换成‘#，###，###.##’格式的字符串，并保留d位小数。
in (val1,val2)					例如，test in (val1,val2),如果test值等于后面列表的任意值返回true
in_file(String s,String filename)		如果文件名filename的文件中有完整一行数据和字符串s完全相同true
instr(String str,String substr)			查找字符串中子字符串substr第一次出现的位置
length(String s)				计算字符s的长度
locate(String substr,String str [,INT pos])	查找字符串str中的pos位置后字符串substr第一次出现的位置
lpad(String s,INT len,String pad)		从左边开始对字符串S使用字符串pad进行填充，最终达到len长度为之。
ltrim(String s)			将字符串s前面出现的空格全部去除
parse_url(String url,String partname [,String key])	从url中抽取指定部分的内容
printf(String format,Obj ... args)		按照printf风格格式化输出输入的字符

阅读全文 »

1. SVD

2. MF(LMF)

1. Spark中带分数的itemBaseCF

1.1 不考虑用户打分的差异性

1.2 考虑用户打分的差异性

2. Spark中不带分数的itemBaseCF

3. cal_score相似度 计算方法

4. UserCF 和ItemCF原理相同

1.Python正则表达式指南

2.正则表达式30分钟入门教程

3.示例

1.基本的读取配置文件

2.基本的写入配置文件

4.Python的ConfigParser Module

1.常用命令

1.1 git clone

1.2 git status

1.3 git add

1.4 git commit

1.5 git push

1.6 git rm

1.7 git commit

1.8 git push

1.9 hexo 常用git

1. VB项目生成

2. VB项目打包

3. 项目部署

3.1 SqlServer2005

3.2 IIS上架载一个工作池or网站目录

1.用Maven构建Mahout项目

1.1 Maven介绍和安装

2.用Maven构建Mahout开发环境

2.1用Maven创建一个标准化的Java项目

2.2导入项目到eclipse

2.3增加hadoop依赖

2.4下载依赖

2.5从Hadoop集群环境下载hadoop配置文件

2.6配置本地host，增加master的域名指向

3.用Mahout实现协同过滤userCF

3.1新建数据文件: item.csv

3.2新建JAVA类：org.lhb.mymahout.UserCF.java

3.3运行程序 控制台输出:

3.4推荐结果解读

1.在eclipse中新建java project

2.添加jar library

3.编写代码

4.编译打包文件为HiveUDF.jar

5.加载至hive中

6.测试准备

6.1data.txt文件内容为

6.2创建测试数据表

6.3导入数据文件data.txt

6.4创建udf函数

7.测试

1.select…from

1.1使用正则表达式来指定列

1.2算术运算符

1.3使用函数

1.3.1数学函数

1.3.2聚合函数

1.3.3生成表函数

1.3.4其他内置函数

3. cal_score相似度计算方法

3.3运行程序控制台输出: